Pointer Analysis Architecture
#
OverviewThe Pointer Analysis framework (under src/PointerAnalysis
) in OpenRace
computes a set of object (points-to set) that can be pointed by a pointer
in the program during the runtime.
It implements a context-sensitive and field-sensitive inclusion-based (Andersen)
Pointer Analysis algorithm.
#
High-Level ArchitectureThere are three major components in the pointer analysis framework and they interact with each other to compute the final result.
These components communicate with each other through a set of predefined APIs in
MemModelTrait
as well as in LangModelTrait
.
At high level,
Language Model abstracts the target program into the constraint graph,
the pointer analysis framework offers the flexibility for user-defined language model to
intercept any function call in the program and handle it differently;
Memory Model handles the creation of the static objects, different memory models handles
object creation differently. E.g., Field-Insensitive Memory Model (FIMemModel.h
) simply
creates one object node for the object, while Field-Sensitive Memory Model (FSMemModel.h
)
creates one object node for each field of the object.
Finally, Solver takes the constraint graph as the input and computes the final result.
#
Detailed Implementation- Language Model: Language Model transfers the program into the constraint graph/pointer assignment graph.
It creates pointer nodes (corresponding pointers in the program) and links nodes with different kinds of edges (corresponding to different operations performed in the program). There are five different kinds of edges in the constraint graph.
- load edge:
if ptr1 = *ptr2;, then ptr2 --load--> ptr1
(ptr2 is loaded into ptr1). - store edge:
if *ptr1 = ptr2;, then ptr2 --store--> ptr1
(ptr2 is stored into ptr1). - copy edge:
if ptr1 = ptr2;, then ptr2 --copy--> ptr1
(ptr2 is copied to ptr1). - addr_of edge:
if ptr1 = &ptr2; , then ptr2--addr_of--> ptr1
(ptr1 takes the address of ptr2). - offset edge:
if ptr1 = ptr2->f;, then ptr2--offset-->ptr1
(ptr1 add a offset to ptr2).
- Memory Model: Memory Model models the objects allocated in the program.
For example, it computes the size of each object, the memory layout of each object, and it creates object nodes in the Constraint Graph.
Currently, we have two different memory models:
- Field-insensitive memory model: Field-insensitive memory model is a simple implementation of the memory model. It is relatively fast but also inaccurate. The field-insensitive memory model does not distinguish different fields of the object. For the following code snippet:
- Field-sensitive memory model: Field-sensitive memory model (still in process) overcomes the limitations of the field-insensitive memory model. It distinguishes different fields of the same object so that the ptr1 and ptr2 in the above example are not considered as aliases. It computes the memory layout (the offset of each field) and provides APIs to index into the complicated object to access the right fields.
- Andersen Solver: After the language model builds the constraint graph and the memory model models the object, the pointer analysis uses the solver to compute the points-to set of the pointer. The detailed description of the algorithm can be found via http://compilers.cs.ucla.edu/fernando/publications/papers/CGO09.pdf.
#
Field-SensitivitySince the implementation of Field Sensitivity is complex, we provide some additional information in the documentation.
Field-sensitive pointer analysis is achieved by handling the offset constraints in the above table, at a high level;
the constraint v <- &s.field
is satisfied by first index the object s.field
and then add the address of s.field
to the pts(v)
.
SolverBase.h: processOffset
implements the functionality.
For efficiency, the processOffset function maintains a cache (handledGEPMap) for GEP instructions to only handled diffed object indexing that haven’t been handled before.
The diffed object indexing is implementation as below:
After the diffed points-to set is computed, it iterates over the set using the following loop.
The fieldObject is computed by statement LMT::indexObject(this->getLangModel(), objNode, gep)
which index the object.
The indexObject method is implemented differently when using different memory model, in FSMemModel, the code is available in FSMemModel::indexObject()
We implemented a different scheme than the traditional field-sensitive pointer analysis for C/C++ which uses logic field index to index the object.
Our implementation instead uses physical layout offset to index objects.
At a high level, FSMemModel computes a MemLayout for different types.
For example, suppose sizeof(double *) == 4
, the memory layout for the following structure is a bitvector and is computed as below:
FSMemModel::indexObject()
method indices the memory layout to test whether the bit is set in the memory layout and return the corresponding object at the offset as following
It first use accumulate the offset of the gep by gep->accumulateConstantOffset(DL, offset)
and then it index the object
using auto result = obj->memBlock->getObjectAt(obj->pOffset + offset.getSExtValue())
.
If the supplied physical offset is invalid (can not be indexed), it returns nullptr
.
#
HeuristicsWill update whenever heuristics are discovered.
We explain our heuristic code in our pointer analysis in this section. Some heuristic decisions are removed, some are left to users to specify the desired values, and others are reserved in our code base.
#
User-Specified HeuristicsINDIRECT_OPTION
: Whether and how to add the resolved indirect function calls to the callgraphThere are three options:
For the usage of
WITH_LIMIT
, the limit is defined by variableMax_Indirect_Target
.Default value:
INDIRECT_OPTION = IndirectResolveOption::WITH_LIMIT
andMax_Indirect_Target = 999
Specified by user since commit 5c766da289c3b8cc1ba2df56ab8b0207a4bfc74e
How to specify: for example, if choosing
SKIP
:if choosing 'WITH_LIMIT':
ANON_REC_LIMIT
: Set the upperbound of the total size of recursively-created anonymous objects in a program- Default value: 999
- Specified by user since commit cf77af58709374a7ff5d8237bc0f9361d4b270e0
- How to specify: for example,
ANON_REC_DEPTH_LIMIT
: Set the upperbound of the depth of types considered for a recursively-created anonymous object in a program- Default value: 10
- Specified by user since commit cf77af58709374a7ff5d8237bc0f9361d4b270e0
- How to specify: for example,
#
Reserved HeuristicsHASH_EDGE_LIMIT
: Defines the size ofrequiredEdges
(of typellvm::BitVector
) which indexes the newly added copy edges by using the hash values of the two nodes involved in a copy edge. The maximum value of the hash value is 4,294,967,295 (as indicated bysize_t
, a.k.a.,unsigned long int
). As shown here, in order to save memory and improve performance, and also the size of newly added copy edges is limited for every iteration, we useHASH_EDGE_LIMIT
as the size forrequiredEdges
instead of the max value.- Default value: 1000032953
- The array heap allocation types might be inferred using heuristics
The assumption here is:
if the last element of the allocated structure is a zero-sized array, it is a structure type with flexible array elements.
Mainly used in functions
isStructWithFlexibleArray()
andallocStructArrayObjImpl()
#
Removed HeuristicsPTS_SIZE_LIMIT
: Set the upper bound of the size of a points-to set- Default value: 999
- Removed since commit 0113a0adda3bf00f50c72470f95ee6c4a8feb2cb