[Theory] Static Program Analysis 读书笔记（三）：function, pointer & abstract interpretation

9. Control Flow Analysis

If we introduce functions as values (and thereby higher-order functions), or objects with methods, then control flow and dataflow suddenly become intertwined.

The task of control flow analysis is to conservatively approximate the interprocedural control flow, also called the call graph, for such programs.

closures
closure analysis

The constraints for closure analysis are an instance of a general class that can be solved in cubic time.

A language with functions as values must use the kind of control flow analysis to obtain a reasonably precise CFG. For common object-oriented languages, such as Java or C#, it is also useful, but the added structure provided by the class hierarchy and the type system permits some simpler alternatives.

Class Hierarchy Analysis (CHA)
Rapid Type Analysis (RTA)
Variable Type Analysis (VTA)

10. Pointer Analysis

Interprocedural Points-To Analysis
Null Pointer Analysis
Flow-Sensitive Points-To Analysis
Escape Analysis

The final extension of the TIP language introduces pointers and dynamic memory allocation.

allocation-site abstraction
points-to analysis
address taken
Andersen’s algorithm
Steensgaard’s Algorithm

In languages with both function values and pointers, functions may be stored in the heap, which makes it difficult to perform control flow analysis before points-to analysis. But it is also difficult to perform interprocedural points-to analysis without the information from a control flow analysis.

points-to graphs
escape analysis

11. Abstract Interpretation

If an analysis is sound, the properties it infers for a given program hold in all actual executions of the program.

The theory of abstract interpretation provides a solid mathematical foundation for what it means for an analysis to be sound, by relating the analysis specification to the formal semantics of the programming language.

Another use of abstract interpretation is for understanding whether an analysis design, or a part of an analysis design, is as precise as possible relative to a choice of analysis lattice and where imprecision may arise.

What matters is that the semantics captures the meaning of programs in ordinary executions, without any approximations. The semantics specifies how a concrete program execution works, whereas our analyses can be thought of as abstract interpreters.

collecting semantics
continuous

With these definitions and observations, we can define the semantics of a given program as the least solution to the generated constraints. (A solution to a constraint system is, as usual, a valuation of the constraint variables that satisfies all the constraints – in other words, a fixed-point of cf .)

As readers familiar with theory of programming language semantics know, the fixed-point theorem also holds for infinite-height lattices provided that f is continuous. This tells us that a unique least solution always exists – even though the solution, of course, generally cannot be computed using the naive fixed-point algorithm.

To clarify the connection between concrete information and abstract information for the sign analysis example, let us consider three different abstraction functions that tell us how each element from the semantic lattices is most precisely described by an element in the analysis lattices. The functions map sets of concrete values, sets of concrete states, and n-tuples of sets of concrete states to their abstract counterparts.

concretization functions

Furthermore, abstraction functions and concretization functions that arise naturally when developing program analyses are closely connected.

extensive
reductive

The pair of monotone functions, α and γ, is called a Galois connection if it satisfies these two properties.

The intuition of the first property is that abstraction may lose precision but must be safe. One way to interpret the second property is that the abstraction function should always give the most precise possible abstract description for any element in the semantic lattice.

We have argued that the Galois connection property is natural for any reasonable pair of an abstraction function and a concretization function, including those that appear in our sign analysis example. The following exercise tells us that it always suffices to specify either α or γ, then the other is uniquely determined if requiring that they together form a Galois connection.

Once the analysis designer has specified the collecting semantics and the analysis lattice and constraint rules, then the relation between the semantic domain and the analysis domain may be specified using an abstraction function α (resp. a concretization function γ), and then the associated concretization function γ (resp. abstraction function α) is uniquely determined – provided that one exists such that the two functions form a Galois connection.

This raises an interesting question: Under what conditions does α (resp. γ) have a corresponding γ (resp. α) such that α and γ form a Galois connection?

The following exercise demonstrates that the Galois connection property can be used as a “sanity check” when designing analysis lattices.

Soundness means that the analysis result over-approximates the abstraction of the semantics of the program.

We often use the term soundness of analyses without mentioning specific programs. An analysis is sound if it is sound for every program.

soundness theorem
Let L_1 and L_2 be lattices where L_2 has finite height, assume \alpha :L_1\rightarrow L_2 and \gamma :L_2\rightarrow L_1 form a Galois connection, cf:L_1 \rightarrow L_1 is continuous, and af:L_2 \rightarrow L_2 is monotone. If af is a sound abstraction of cf , then

\alpha (fix(cf)) \sqsubseteq fix(af)

To summarize, a general recipe for specifying and proving soundness of an analysis consists of the following steps:

Specify the analysis, i.e. the analysis lattice and the constraint generation rules, and check that all the analysis constraint functions are monotone.
Specify the collecting semantics, and check that the semantic constraint functions are continuous.
Establish the connection between the semantic lattice and the analysis lattice, either by an abstraction function or by a concretization function. Then check, for example using the property from Exercise 11.20, that the function pairs form Galois connections.
Show that each constituent of the analysis constraints is a sound abstraction of the corresponding constituent of the semantic constraints, for all programs.

The requirements that the analysis constraint functions are monotone, the semantic constraint functions are continuous, and the abstraction and concretization functions form Galois connections are rarely restrictive in practice but can be considered as “sanity checks” that the design is meaningful.

As we have seen in Exercises 11.21 and 11.23, it is possible to design sound analyses that do not have all these nice properties, but the price is usually less precision or more complicated soundness proofs. Another restriction of the soundness theorem above is that it requires L_2 to have finite height, however, the theorem and proof can easily be extended to analyses with infinite-height lattices and widenings.

soundness testing
optimal abstraction

As usual in logics, the dual of soundness is completeness.
In Section 11.3 we deﬁned soundness of an analysis for a program P as the property

\alpha(\{\[P\]\})\sqsubseteq \[ \[P\] \]

Consequently, it is natural to deﬁne that an analysis is complete for P if

\[ \[ P\] \] \sqsubseteq \alpha(\{\[P\]\})

An analysis is complete if it complete for all programs.

If an analysis is both sound and complete for P we have

\alpha(\{\[P\]\}) = \[ \[P\] \]

complete abstraction

For abstractions that are sound, completeness implies optimality (but not vice versa).

soundness and completeness theorem

Abstractions that are incomplete may be complete in some situations; for example, abstract addition in sign analysis is not complete in general, but it is complete in situations where, for example, both arguments are positive values. For this reason, even though few analyses are sound and complete for all programs, many analyses are sound and complete for some programs or program fragments.

reachable states collecting semantics
trace semantics

Reference

Static Program Analysis