[Theory] Static Program Analysis 读书笔记(二):dataflow analysis

5. Dataflow Analysis with Monotone Frameworks

Dataflow Analysis

  • Sign Analysis
  • Constant Propagation Analysis
  • Live Variables Analysis
  • Available Expressions Analysis
  • Very Busy Expressions Analysis
  • Reaching Definitions Analysis
  • Initialized Variables Analysis

Classical dataflow analysis starts with a CFG and a lattice with finite height. The lattice describes abstract information we wish to infer for the different CFG nodes.

The combination of a lattice and a space of monotone functions is called a monotone framework.

An analysis is sound if all solutions to the constraints correspond to correct information about the program. The solutions may be more or less imprecise, but computing the least solution will give the highest degree of precision possible.

  • naive fixed-point algorithm
  • round-robin algorithm
  • chaotic-iteration algorithm
  • work-list algorithm

A variable is live at a program point if there exists an execution where its value is read later in the execution without it is being written to in between. Clearly undecidable, this property can be approximated by a static analysis called live variables analysis (or liveness analysis).

The typical use of live variables analysis is optimization: there is no need to store the value of a variable that is not live. For this reason, we want the analysis to be conservative in the direction where the answer “not live” can be trusted and “live” is the safe but useless answer.

  • parameterized lattice

The set of available expressions for all program points can be approximated using a dataflow analysis.

An expression is very busy if it will definitely be evaluated again before its value changes.

The reaching definitions for a given program point are those assignments that may have defined the current values of variables.

A dataflow analysis is specified by providing the lattice and the constraint rules.

A forward analysis is one that for each program point computes information about the past behavior. A backward analysis is one that for each program point computes information about the future behavior.

A may analysis is one that describes information that may possibly be true and, thus, computes an over-approximation. Conversely, a must analysis is one that describes information that must definitely be true and, thus, computes an under-approximation.

Forward Backward
May Reaching Definitions Live Variables
Must Available Expressions Very Busy Expressions
  • transfer function

6. Widening

A central limitation of the monotone frameworks approach is the requirement that the lattices have finite height. We describe a technique called widening that overcomes that limitation (and a related technique called narrowing).

An interval analysis computes for every integer variable a lower and an upper bound for its possible values.

Intervals are interesting analysis results, since sound answers can be used for optimizations and bug detection related to array bounds checking, numerical overflows, and integer representations.

The interval analysis lattice has infinite height, so applying the naive fixed-point algorithms may never terminate.

To obtain convergence of the interval analysis we shall use a technique called widening.

Let f : L → L denote the function from the fixed-point theorem and the naive fixed-point algorithm. A particularly simple form of widening, which often suffices in practice, introduces a function ω: L → L so that the sequence

(\omega\circ f)^i(\perp)~~~~for~i=0,1,\cdots 

is guaranteed to converge on a fixed-point that is larger than or equal to each approximant f^i(\perp) of the naive fixed-point algorithm and thus represents sound information about the program. To ensure this property, it suffices that ω is monotone and extensive, and that the image ω(L) = {y ∈ L | ∃x ∈ L: y = ω(x)} has finite height. The widening function ω will intuitively coarsen the information sufficiently to ensure termination.

Widening generally shoots above the target, but a subsequent technique called narrowing may improve the result. If we define

fix=\bigvee f^i(\perp)
fix~\omega=\bigvee(\omega\circ f)^i(\perp)

then we have fix\sqsubseteq fix~\omega. However, we also have that fix\sqsubseteq f(fix~\omega)\sqsubseteq fix~\omega, which means that a subsequent application of f may improve our result and still produce sound information. This technique, called narrowing, may in fact be iterated arbitrarily many times.

The simple kind of widening discussed above is sometimes unnecessarily aggressive: widening every interval in every abstract state in each iteration of the fixed-point algorithm is not necessary to ensure convergence. For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision.

For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision. It involves a binary operator,

\triangledown : L\times  L\rightarrow L

The widening operator \triangledown must satisfy

x \sqsubseteq x \triangledown y \wedge y \sqsubseteq x \triangledown y ~~~~\forall x,y\in L

and that for any increasing sequence z_0 \sqsubseteq z_1 \sqsubseteq z_2 \sqsubseteq \cdots, the sequence y_0,y_1,y_2,\cdots defined by y_0 = z_0 and y_{i+1} = y_i\triangledown z_{i+1} for i=0,1,\cdots converges after a finite number of steps.

With such an operator, we can obtain a safe approximation of the least fixed-point of f by computing the following sequence

\left\{\begin{matrix}
x_0 = \perp\\ 
x_{i+1} = x_i \triangledown f(x_i)
\end{matrix}\right.

This sequence eventually converges, that is, for some k we have x_{k+1} = x_k. Furthermore, the result is a safe approximation of the ordinary fixed-point: fix \sqsubseteq x_k.

7. Path Sensitivity and Relational Analysis

Until now, we have ignored the values of branch and loop conditions by simply treating if- and while-statements as a nondeterministic choice between the two branches, which is called control insensitive analysis. Such analyses are also path insensitive, because they do not distinguish different paths that lead to a given program point.

  • control sensitive (branch sensitive)
  • independent attribute analysis
  • relational analysis
  • path contexts
  • path sensitive
  • iterative refinement

The present analysis is also called an independent attribute analysis as the abstract value of the file is independent of the abstract value of the boolean flag. What we need is a relational analysis that can keep track of relations between variables. One approach to achieve this is by generalizing the analysis to maintain multiple abstract states per program point.

Often, Paths consists of combinations of predicates that appear in conditionals in the program. This quickly results in an exponential blow-up: for k predicates, each statement may need to be analyzed in 2^k different path contexts.

In practice, however, there is usually much redundancy in these analysis steps. Thus, in addition to the challenge of reasoning about the assert predicates relative to the lattice elements, it requires a considerable effort to avoid too many redundant computations in path sensitive analysis. One approach is iterative refinement where Paths is initially a single universal path context, which is then iteratively refined by adding relevant predicates until either the desired properties can be established or disproved or the analysis is unable to select relevant predicates and hence gives up.

8. Interprocedural Analysis

So far, we have only analyzed the body of individual functions, which is called intraprocedural analysis. We now consider interprocedural analysis of whole programs containing multiple functions and function calls.

We use the subset of the TIP language containing functions, but still ignore pointers and functions as values. As we shall see, the CFG for an entire program is then quite simple to obtain. It becomes more complicated when adding function values.

In the CFG, we represent each function call statement using two nodes: a call node representing the connection from the caller to the entry of f, and an after-call node where execution resumes after returning from the exit of f. Next, we represent each return statement as an assignment using a special variable named result.

  • context insensitive
  • interprocedurally invalid paths
  • call contexts
  • call string approach
  • functional approach
  • parameter sensitivity
  • object sensitivity

Context sensitivity with the functional approach as presented here gives optimal precision, in the sense that it is as precise as if inlining all function calls (even recursive ones). This means that it completely avoids the problem with dataflow along interprocedurally invalid paths.

When analyzing object oriented programs, a popular choice is object sensitivity, which is essentially a variant of the functional approach that distinguishes calls not on the entire abstract states at function entries but only on the abstract values of the receiver objects.


Reference

Static Program Analysis

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容