[Theory] Static Program Analysis 读书笔记（二）：dataﬂow analysis

5. Dataﬂow Analysis with Monotone Frameworks

Dataﬂow Analysis

Sign Analysis
Constant Propagation Analysis
Live Variables Analysis
Available Expressions Analysis
Very Busy Expressions Analysis
Reaching Deﬁnitions Analysis
Initialized Variables Analysis

Classical dataﬂow analysis starts with a CFG and a lattice with ﬁnite height. The lattice describes abstract information we wish to infer for the diﬀerent CFG nodes.

The combination of a lattice and a space of monotone functions is called a monotone framework.

An analysis is sound if all solutions to the constraints correspond to correct information about the program. The solutions may be more or less imprecise, but computing the least solution will give the highest degree of precision possible.

naive ﬁxed-point algorithm
round-robin algorithm
chaotic-iteration algorithm
work-list algorithm

A variable is live at a program point if there exists an execution where its value is read later in the execution without it is being written to in between. Clearly undecidable, this property can be approximated by a static analysis called live variables analysis (or liveness analysis).

The typical use of live variables analysis is optimization: there is no need to store the value of a variable that is not live. For this reason, we want the analysis to be conservative in the direction where the answer “not live” can be trusted and “live” is the safe but useless answer.

parameterized lattice

The set of available expressions for all program points can be approximated using a dataﬂow analysis.

An expression is very busy if it will deﬁnitely be evaluated again before its value changes.

The reaching deﬁnitions for a given program point are those assignments that may have deﬁned the current values of variables.

A dataﬂow analysis is speciﬁed by providing the lattice and the constraint rules.

A forward analysis is one that for each program point computes information about the past behavior. A backward analysis is one that for each program point computes information about the future behavior.

A may analysis is one that describes information that may possibly be true and, thus, computes an over-approximation. Conversely, a must analysis is one that describes information that must deﬁnitely be true and, thus, computes an under-approximation.

	Forward	Backward
May	Reaching Deﬁnitions	Live Variables
Must	Available Expressions	Very Busy Expressions

transfer function

6. Widening

A central limitation of the monotone frameworks approach is the requirement that the lattices have finite height. We describe a technique called widening that overcomes that limitation (and a related technique called narrowing).

An interval analysis computes for every integer variable a lower and an upper bound for its possible values.

Intervals are interesting analysis results, since sound answers can be used for optimizations and bug detection related to array bounds checking, numerical overﬂows, and integer representations.

The interval analysis lattice has infinite height, so applying the naive fixed-point algorithms may never terminate.

To obtain convergence of the interval analysis we shall use a technique called widening.

Let f : L → L denote the function from the fixed-point theorem and the naive fixed-point algorithm. A particularly simple form of widening, which often suffices in practice, introduces a function ω: L → L so that the sequence

(\omega\circ f)^i(\perp)~~~~for~i=0,1,\cdots

is guaranteed to converge on a fixed-point that is larger than or equal to each approximant f^i(\perp) of the naive fixed-point algorithm and thus represents sound information about the program. To ensure this property, it suffices that ω is monotone and extensive, and that the image ω(L) = {y ∈ L | ∃x ∈ L: y = ω(x)} has finite height. The widening function ω will intuitively coarsen the information sufficiently to ensure termination.

Widening generally shoots above the target, but a subsequent technique called narrowing may improve the result. If we define

fix=\bigvee f^i(\perp)

fix~\omega=\bigvee(\omega\circ f)^i(\perp)

then we have fix\sqsubseteq fix~\omega. However, we also have that fix\sqsubseteq f(fix~\omega)\sqsubseteq fix~\omega, which means that a subsequent application of f may improve our result and still produce sound information. This technique, called narrowing, may in fact be iterated arbitrarily many times.

The simple kind of widening discussed above is sometimes unnecessarily aggressive: widening every interval in every abstract state in each iteration of the fixed-point algorithm is not necessary to ensure convergence. For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision.

For this reason, traditional widening takes a more sophisticated approach that may lead to better analysis precision. It involves a binary operator,

\triangledown : L\times  L\rightarrow L

The widening operator \triangledown must satisfy

x \sqsubseteq x \triangledown y \wedge y \sqsubseteq x \triangledown y ~~~~\forall x,y\in L

and that for any increasing sequence z_0 \sqsubseteq z_1 \sqsubseteq z_2 \sqsubseteq \cdots, the sequence y_0,y_1,y_2,\cdots defined by y_0 = z_0 and y_{i+1} = y_i\triangledown z_{i+1} for i=0,1,\cdots converges after a finite number of steps.

With such an operator, we can obtain a safe approximation of the least fixed-point of f by computing the following sequence

\left\{\begin{matrix}
x_0 = \perp\\ 
x_{i+1} = x_i \triangledown f(x_i)
\end{matrix}\right.

This sequence eventually converges, that is, for some k we have x_{k+1} = x_k. Furthermore, the result is a safe approximation of the ordinary fixed-point: fix \sqsubseteq x_k.

7. Path Sensitivity and Relational Analysis

Until now, we have ignored the values of branch and loop conditions by simply treating if- and while-statements as a nondeterministic choice between the two branches, which is called control insensitive analysis. Such analyses are also path insensitive, because they do not distinguish different paths that lead to a given program point.

control sensitive (branch sensitive)
independent attribute analysis
relational analysis
path contexts
path sensitive
iterative refinement

The present analysis is also called an independent attribute analysis as the abstract value of the file is independent of the abstract value of the boolean ﬂag. What we need is a relational analysis that can keep track of relations between variables. One approach to achieve this is by generalizing the analysis to maintain multiple abstract states per program point.

Often, Paths consists of combinations of predicates that appear in conditionals in the program. This quickly results in an exponential blow-up: for k predicates, each statement may need to be analyzed in 2^k different path contexts.

In practice, however, there is usually much redundancy in these analysis steps. Thus, in addition to the challenge of reasoning about the assert predicates relative to the lattice elements, it requires a considerable effort to avoid too many redundant computations in path sensitive analysis. One approach is iterative refinement where Paths is initially a single universal path context, which is then iteratively refined by adding relevant predicates until either the desired properties can be established or disproved or the analysis is unable to select relevant predicates and hence gives up.

8. Interprocedural Analysis

So far, we have only analyzed the body of individual functions, which is called intraprocedural analysis. We now consider interprocedural analysis of whole programs containing multiple functions and function calls.

We use the subset of the TIP language containing functions, but still ignore pointers and functions as values. As we shall see, the CFG for an entire program is then quite simple to obtain. It becomes more complicated when adding function values.

In the CFG, we represent each function call statement using two nodes: a call node representing the connection from the caller to the entry of f, and an after-call node where execution resumes after returning from the exit of f. Next, we represent each return statement as an assignment using a special variable named result.

context insensitive
interprocedurally invalid paths
call contexts
call string approach
functional approach
parameter sensitivity
object sensitivity

Context sensitivity with the functional approach as presented here gives optimal precision, in the sense that it is as precise as if inlining all function calls (even recursive ones). This means that it completely avoids the problem with dataﬂow along interprocedurally invalid paths.

When analyzing object oriented programs, a popular choice is object sensitivity, which is essentially a variant of the functional approach that distinguishes calls not on the entire abstract states at function entries but only on the abstract values of the receiver objects.

Reference

Static Program Analysis

[Theory] Static Program Analysis 读书笔记（二）：dataﬂow analysis