Design Overview
The description of this program is split into two parts.
The Flow of Execution describes how the input is processed step by step to produce an output, following the logical flow of the program.
The Project Layout describes the project by its physical layout, through short summaries for each subdirectory in the src folder.
Flow of Execution#
The logical entry point for the race detection functionality is in src/RaceDetect/RaceDetect.h. This function takes an LLVM IR module as input, and produces a race report as output.
The intermediate steps to produce the race report are:
- Construct a static program trace
- Preprocessing
- Pointer Analysis
- Race IR
- Run analyses
- Combine analyses to do race detection
- Generate report
Program Trace#
First, a ProgramTrace is constructed from the input IR. This trace represents a simulated execution of the program.
During the construction of the program trace, a number of things happen.
- PreProcessing transformations are run to make analyses easier
- a whole program PointerAnalysis is run on the llvm module
- the LLVM IR is converted to a higher level IR for easier analysis
See the Trace section for more detail on how the ProgramTrace is built.
Analysis#
Next, a number of analyses are run. Each takes the program trace as input and can then be queried for some property in the program trace.
The core analyses are:
- SharedMemory analysis finds events that could potentially access the same location in memory.
- HappensBefore analysis determines if two events could potentially happen in parallel.
- LockSet tracks what locks are held by an event during execution, and can be used to find out if two events share a common lock.
See the Analysis section for more details.
Race Detection#
After constructing the program trace and analyses, the detectRaces function combines everything to detect any races.
A simple race detection algorithm could combine the core analyses like so:
See the RaceDetect section.
Generate Report#
Finally, a report needs to be generated so that any detected races can be displayed to the user or reported to some tool.
See the Reporter section.
Project Layout#
This section describes code by each subdirectory in src. To get an idea of the flow of execution through the program, see the Logical Overview above.
The most important directories are:
IRfor converting the input into a format we can analyzePointerAnalysisfor our custom pointer analysis implementationLanguageModelfor how language features are recognizedTracefor generating a program TraceAnalysisfor our different analysesThe top level
srcdirectory also contains the main race detection logic inRaceDetect.h/cpp, andmain.cppwhich is the entry point to theracedetectexecutable.
Analysis#
This directory contains all of our different analyses. Each of them takes a program trace as input and then allows for some specific property of the program trace to be queried.
Each of the analyses is constructed using the ProgramTrace to be analyzed, and then queried for specific data.
Demangler#
This directory contains some common code for demangling C++ function names.
IR#
This folder contains interfaces describing logical operations (IR.h), implementations of those interfaces (IRImpls.h), and a function that takes an LLVM Function and constructs a list of logical operations (Builder.h).
In the context of race detection, most of the LLVM IR instructions are not needed.
Given an LLVM IR function that looks like this
Race detection only really needs to know the following
So the code in the IR directory focuses on extracting only the LLVM instructions needed for race detection, and wrapping them in a common set of interfaces (read, write, fork, join) that can be consumed more easily by later analyses.
The logical operation interfaces are wrappers around LLVM Instructions, and return values in terms of LLVM IR. Logical operations will later be converted to program events, which answer questions in terms of static program execution.
As an example, in the context of race detection, the only information we really need to know about a read operation, is what value is being read.
So, the interface for a logical read operation looks something like the following.
Many instructions can act as logical reads. A vector access, some external functions call, or just the LLVM IR load instruction.
Now to treat different types of llvm instructions as reads, we just need to define a derived class implementing the read interface.
Lastly, the Builder file contains functions to recognize and construct these concrete implementations from LLVM IR.
There is a loop that looks something like
Each branch in the loop call a "recognizer" function defined in the LanguageModel directory.
When a match is found, the corresponding concrete impl is constructed and added to the list of logical operations.
LanguageModel#
The LanguageModel directory contains code related to recognizing certain language or framework features in LLVM IR.
Most of the files in this directory recognize specific API calls. For example, pthread.h contains a list of "recognizers".
These recognizers are used to determine if the function being called is a certain API call (e..g. pthread create or join). The majority of the code in LanguageModel are recognizers like this for various libraries or frameworks.
The only exceptions are RaceModel.* files.
RaceModel#
RaceModel tells PointerAnalysis how to interpret certain function calls.
Given this example
Pointer analysis needs to know that the call to pthread_create will cause the entry function to be executed, and that the arg passed to the entry function points to the memory location of val.
The RaceModel class is used to extend PointerAnalysis with the information necessary to build a correct call graph and link function arguments.
The two functions that control this are:
interceptFunctionwhich decides how each function is expanded in the call graphinterceptCallSitewhich links the pointer from the call site to the called function
As any example, to tell Pointer Analysis that a call to pthread_create(t, NULL, entry, &val) should result in the entry function being visited next in the call graph, interceptFunction can be modified to include the following code.
To also link the pointer passed to the entry function (e.g. link&val to void *arg), the interceptCallSite can be updated accordingly.
Logging#
This directory contains the code used throughout the project for logging.
PointerAnalysis#
This documentation is still in progress.
PreProcessing#
This directory contains a number of passes and transformations that make the LLVM IR easier for our tool to analyze.
Reporter#
This directory contains the code responsible for collecting detected races and formatting them into some output.
Eventually this directory should contain logic for reporting races in a JSON format, but the reporter is still under development.
Trace#
Trace contains the logic for simulating execution and creating a static program trace.
The important code in this directory are Event, ThreadTrace, and ProgramTrace.
The Event contains a list of interfaces, similar to the IR.h file (see the IR section). Each Event interface wraps an IR type. Where IR interfaces return values in the context of LLVM IR, events return values in the context of the simulated execution.
As an example, look at the following interface for a read at IR and Event levels.
The ReadIR interface returns the llvm::Value being accessed, but the Event interface returns a list of abstract locations in memory that could potentially be accessed.
This is the key difference between IR and Events. IR interfaces are just wrappers around LLVM IR instructions, but Events represent actual program events and their simulated execution.
Lastly, to construct a list of simulated events (e.g., a trace), a function in ThreadTrace takes an entry function as input, traverses the Race IR, and builds the simulated trace for a given thread.
The ProgramTrace class is just a collection of thread traces representing all threads that ran during simulated program execution.