Content area
Data races represent a class of concurrency errors when two threads access a shared memory location without proper synchronization. Data races are hard to reveal and debug. This paper presents RaceHunter, a dynamic data race detection technique, which monitors executions of shared-memory concurrent programs, discovers pairs of conflicting memory accesses, and systematically checks them for data races. RaceHunter does not report false data races when the target software exploits nonstandard synchronization primitives or unknown synchronization protocols, and it can also detect data races missed by other approaches. Dynamic data race detectors can monitor continuous, e.g., real-life, program executions or they can verify relatively short program executions, e.g., organized by system tests. The latter is the primary use case scenario for RaceHunter.
INTRODUCTION
This paper is devoted to verification of multithreaded programs, i.e., concurrent programs with shared memory. In addition to errors inherent in sequential programs, multithreaded programs suffer from errors of new types. A data race is one of them.
Definition 1 (Data race). A data race is a situation in a multithreaded program when two conflicting memory accesses are performed simultaneously: i.e. the memory accesses are not properly synchronized and therefore they are unordered in a program execution.
Definition 2 (Conflicting memory accesses). Two memory accesses conflict if:
(1) they are performed by different threads;
(2) the accessed memory locations overlap;
(3) one of the accesses is writing;
(4) one of the accesses is nonatomic.
The data race can lead to incorrect computations and exceptions, which makes data race detection an important problem.
Currently, there are several approaches to solving this problem. They can be divided into two types: static analysis [1–3] and dynamic analysis [4–7]. Static approaches inspect program code without its execution. Dynamic approaches detect data races at runtime.
Static and dynamic approaches have their pros and cons. The key advantage of static analysis is a more thorough search through potential executions of a concurrent program. However, static approaches may generate false positives. With a large number of false positives, the detection of true data races becomes significantly more difficult.
The data race detection technique proposed in this paper belongs to the dynamic class. Among dynamic approaches, three directions can be conventionally distinguished.
1. Happens-before [8]: runtime monitoring of a concurrent program with partial order checks on a set of instruction executions. This approach is implemented, e.g., in TSan [4].
2. Lock-set: monitoring of a concurrent program with the check for intersection of the sets of held locks. This approach is implemented in Eraser [5].
3. Breakpoint-watchpoint: direct detection of data races by using breakpoints and watchpoints. This approach is implemented in DataCollider [6] and KCSAN [7].
All these techniques have their advantages and disadvantages. The method described in this paper expands the breakpoint-watchpoint approach, the key advantage of which is that it does not track synchronization events to detect data races. Hence, it can be applied to system software, e.g., operating systems and virtual machines, which often use synchronizations that do not satisfy lock-unlock semantics and/or for which it is difficult to automatically construct a complete happens-before relationship. For instance, these are memory barriers, disabled thread switching, and disabled interrupts.
Motivation
The breakpoint-watchpoint approach was originally implemented in DataCollider. This tool occasionally sets breakpoints on a certain number of memory access instructions. Each instruction is selected randomly from the set of memory access instructions in the program 1. The breakpoint handler:
(1) reads the data accessed by the instruction;
(2) sets a watchpoint on the memory location accessed by the instruction;
(3) waits for a short period of time;
(4) removes the watchpoint;
(5) reads the data accessed by the instruction a second time;
(6) reports a data race if the watchpoint is triggered or the data are modified;
(7) randomly selects a memory access instruction and sets a breakpoint on it.
It is assumed that the watchpoint is triggered when a memory location that overlaps with the monitored one is accessed.
It can be seen that, in the DataCollider approach, the memory accesses on which breakpoints are set are chosen randomly. As a result, this tool may miss memory access instructions involved in a data race. Therefore, DataCollider may misdetect data races.
This approach was further developed in KCSAN. The tool implements breakpoints and watchpoints in software, based on static instrumentation. Before every read or write instruction, a call of a function from the KCSAN runtime library is inserted. Each Nth sequential call of this function activates the breakpoint handler. Depending on the configuration, the breakpoint handler can change N. A new value of N is chosen randomly from a fixed interval.
In the KCSAN approach, memory access instructions to be checked for data races are also randomly selected. Therefore, this modification of the breakpoint-watchpoint approach may also misdetect data races.
Hypothesis 1. When memory accesses to be checked for data races are not randomly selected and only conflicting memory accesses (see Definition 2) are systematically checked, it is possible to improve the efficiency of data race detection.
Structure of the Paper
The goal of this study is formulated in Section 2. The main stages of the RaceHunter approach are sequentially described in Section 3. The worst-case scenario of memory usage for RaceHunter is discussed in Section 4. The practical application of RaceHunter is considered in Section 5. Section 6 compares RaceHunter with existing data race detection techniques. The conclusions based on the results of this study are drawn in Section 7. Section 8 outlines some directions for future research. Appendix provides auxiliary functions for algorithms and formulas from the main sections.
PROBLEM STATEMENT
It is required to develop a dynamic approach to data race detection in multithreaded programs that:
• systematically checks conflicting memory accesses for data races;
• does not generate false positives when the target software exploits nonstandard synchronization primitives or unknown synchronization protocols.
RACEHUNTER APPROACH
Figure 1 shows the general scheme of the RaceHunter approach to data race detection. The main stages of the approach are as follows.
Fig. 1. [Images not available. See PDF.]
General scheme of the RaceHunter approach.
1.Instrumentation. The compiler automatically inserts function calls from the RaceHunter runtime library into certain locations of program code. This makes it possible to monitor and handle events at runtime.
2. Monitoring. At runtime, events such as memory access, function call, function termination, thread spawning, etc. are monitored and recorded for each thread. The monitoring result is an event trace.
3. Trace analysis. The result of this stage is a set of targets for data race checks, where the target is a pair of descriptors of conflicting memory accesses.
4. Data race provocation. For each target, the program is reexecuted while setting breakpoints on two memory accesses that correspond to the descriptors from the target. If one of them occurs, then its breakpoint handler starts waiting for the other one. If the second access occurs, then a data race is reported.
To provoke a data race, the program must be re-executed. It is assumed that repeated executions are not random events: they are attempts to reproduce the execution recorded at the stage of monitoring. For this purpose, it is necessary (but not sufficient):
(1) to execute the program in the same initial state (to use RaceHunter, we need to transfer the program to its initial state or rerun it);
(2) to use the same input data for the program.
Due to the variable rate of thread execution by CPU cores, the order in which threads execute the instructions, as well as the points of thread switching by the scheduler in the re-executions of the program, may differ. Therefore, even if two necessary conditions considered above are satisfied, the reexecution of a multithreaded program may differ from its initial execution. In what follows, we explain how RaceHunter takes into account this nondeterminism inherent in multithreaded programs.
Additional factors of nondeterminism, e.g., the use of random numbers and the nondeterminism behavior of external systems with which the program interacts, should, if possible, be eliminated for the time of verification (should be determinated) to improve efficiency of data race detection.
Instrumentation
RaceHunter monitors events in the program and updates its state as they occur. For this purpose, the RaceHunter code is inserted into the tested program.
1. At compile time, static instrumentation of memory access instructions, as well as entry and exit points of functions, is automatically carried out, i.e., calls of the corresponding functions from the RaceHunter runtime library are inserted immediately before these instructions. The code of this library must be linked with the code of the verified program.
The instrumentation is implemented at the level of LLVM IR [9] of the Clang compiler [10]. In general, the instrumentation is similar to that implemented in TSan [11] and other tools [12]. The main advantage of the current version of instrumentation in RaceHunter is that the instructions of the program are not modified, only new instructions—calls of functions from the RaceHunter runtime library—are added.
2. As an optimization, memory functions, e.g., memset, memcpy, memove [13], etc., are automatically replaced with functions from the RaceHunter runtime library. This allows multiple memory accesses from these functions to be simulated with a single access. This optimization can be disabled if the instructions of the program must not be affected.
3. Code of some system functions is instrumented by hand. In particular, we insert a call to RaceHunter OnThreadCreateEvent function into the code of a system function spawning a thread.
Monitoring
At this stage, RaceHunter monitors a number of events in the program and records them in an event trace.
Designation 1 (set of nonnegative integers). = .
Definition 3 (memory access). A memory access is a tuple <I, D, L, R, W, A>, where
• is the address of a memory access instruction;
• is the address of the beginning of an accessed data segment;
• is the length (in bytes) of an accessed data segment;
• is reading (1) or not (0);
• is writing (1) or not (0);
• is an atomic access (1) or not (0).
Designation 2 (Set of all possible memory accesses). The set of memory accesses that conform with Definition 3 is denoted by Λ.
Definition 4 (Function call). A function call is a tuple <I, F>, where
• is the address of a function call instruction;
• is the address of a callee function.
Designation 3 (Set of all possible function calls). Γ = {<I, F>: I, F ∈ }.
Definition 5 (Event). An event is a tuple , where K ∈ 1..4 is the event type. The semantics of values of K is as follows: 1 is memory access, 2 is function call, 3 is function termination, and 4 is thread spawning.
Designation 4 (Set of all possible events). The set of events that conform with Definition 5 is denoted by .
Definition 6 (Thread trace). A thread trace is a tuple , where
• U is a thread identifier (ID) by Definition 8;
• is a finite sequence of events in a thread;
• is the number of spawned threads.
Definition 7 (Program trace). A program trace (Trace) is a finite set of thread traces by Definition 6.
Definitions 5 and 6 use the thread ID. A correct thread ID must satisfy the following requirements:
• there must not be two threads with the same ID in any execution of the program;
• in the re-executions of the program at the stage of data race provocation, the thread must have the same ID.
The RaceHunter runtime library provides an interface that can be used to instruct the tool on how to calculate the ID of the current thread. For instance, we can use the address of the thread start function as its ID if the program cannot have two threads with the same start function.
By default, RaceHunter uses its internal method for thread identification. This method assumes that the program is initially executed with one thread and any thread can spawn new threads.
Definition 8 (Default thread ID). A thread ID is a partial function . The function simulates a finite sequence of natural numbers.
Definition 9 (Initial thread ID)..
Algorithm 1 handles the event of spawning one thread by another thread immediately before the thread is actually spawned in the program. Handling of other events generally comes down to adding information about them into the thread trace in which these events occur.
Algorithm 1. Handling the thread spawning event at the monitoring stage. | |
Input: the ID of the current thread; | |
a program trace by Definition 7. | |
Output: an updated program trace. The program trace is accessed by all threads in parallel. The other variables of the algorithm are local. | |
1: | {Trace of current thread} |
2: | {ID generation for spawned thread} |
3: | {Spawned thread does not yet have events and has not yet spawned other threads} |
4: | {New trace of current thread} |
5: | {Program trace update} |
Statement 1. Algorithm 1 generates a program trace in which all thread traces have different IDs.
Proof. The thread ID calculated by Algorithm 1 corresponds to a unique path in the genealogical tree from the initial thread to the spawned thread. A node in this tree is the ordinal number of a spawn of a thread by its parent.
Remark 1. In this study, we select and enumerate steps in concurrent algorithms. Suppose that an algorithm step is an atomic action that can change the state of the algorithm. For instance, Algorithm 1 has exactly five steps. Suppose also that the concurrent execution of algorithm steps by threads is sequentially consistent [14]. The semantics of concurrent algorithm execution that we use corresponds to that in the PlusCal language [15].
Trace Analysis
The goal of trace analysis is to detect conflicting pairs of memory accesses and make them targets of a data race check. To find conflicting memory accesses, RaceHunter uses Definition 2. Let us formalize it.
Definition 10 (Conflicting memory accesses). Memory access a1 from thread trace t1 and memory access a2 from thread trace t2 are in conflict if the conjunction of the following conditions holds:
(1) ;
(2) ;
;
(3) ;
(4) .
However, the information on the memory access (by Definition 3) is not sufficient to identify the target memory access during program re-execution at the stage of data race provocation. For instance, the target memory access instruction (, ) can be executed several times and by several threads. Moreover, the equality of data addresses (, ) in different executions is generally not guaranteed.
The detection of target memory accesses is additionally complicated by the fact that the re-execution of a multithreaded program may not be identical to what was observed during the monitoring stage. Hence, the descriptor of the target memory access must allow for variations in program execution.
Definition 11 (Memory access descriptor). A memory access descriptor is a tuple <U, B, I, S, N>, where
• U is a thread ID by Definition 8;
• is 0 (false) if the target thread can have any thread ID or 1 (true) if the ID should be equal to U;
• is the address of a memory access instruction;
• is the top of a function call stack during memory access (the function called last is the first element of the stack); Dom(S) = ∅ is any function call stack;
• is the ordinal number of a memory access in a thread trace.
Definition 11 implies that N enumerates the memory accesses that satisfy the remaining conditions, i.e., the memory accesses from a thread with ID U that are performed by an instruction with address I when the thread’s stack is S.
Designation 5. The set of all possible memory access descriptors by Definition 11 is denoted by Ψ.
Definition 12 (Target of data race check). The target (Target) is a tuple <D1 ∈ Ψ, D2 ∈ Ψ>, which includes a pair of descriptors (by Definition 11) for memory accesses (by Definition 3) from a program trace (by Definition 7) that are in conflict (by Definition 10).
Algorithm 2. Construction of a memory access descriptor. |
Input: the ID u (by Definition 8) of a thread that performs memory access; |
the ordinal number i of a memory access in the trace of a thread with ID u; |
a program trace by Definition 7. |
Output: a memory access descriptor d by Definition 11. |
{Accesses performed by d.I} |
{Function call stacks for accesses from p} |
{Stack top that distinguishes i from others} |
{Memory accesses performed by instruction d.I before access i} |
Algorithm 2 constructs a memory access descriptor by Definition 11. The main goal of Algorithm 2 is to check the necessity of adding the next portion of data to the descriptor to identify the target memory access. The only information required is the address of the memory access instruction.
Remark 2. Algorithm 2 decomposes the problem of target memory access identification into two subproblems: identifying the thread that performed the target memory access and identifying the memory access within that thread. For certain applications, it is difficult to select IDs for threads that persist across re-executions. In that case, both Algorithm 2 itself and the memory access descriptor (Definition 11) can be modified.
The algorithm for constructing the resulting set of targets (the algorithm for finding all pairs by Definition 12) is not presented in this paper. Conceptually, this algorithm receives a program trace (by Definition 7) as an input, searches for pairs of memory accesses (K = 1 by Definition 5) that are in conflict (by Definition 10) in this trace, and constructs a descriptor (by Definition 11) for each access from the pair.
Data Race Provocation
For each target from the set of targets constructed based on the results of the trace analysis, the program is re-executed. The data race provocation consists in organizing simultaneous memory accesses specified in the target.
Definition 13 (Thread state). The state of a thread is a tuple , where
• U is a thread ID by Definition 8;
• is the call stack of a thread;
• is the number of detected memory accesses that correspond to the descriptor;
• is the number of spawned threads.
Definition 14 (Program state). The state of a program is a tuple , where:
• T is a finite set of thread states by Definition 13;
• is a function that returns a memory access by its descriptor, i.e. a memory access detected during the data race provocation phase matching the descriptor;
• is the memory access corresponding to the descriptor that was performed ( is true) or not ( is false);
• is a data race decision, where is no data race detected, is a data race detected, and is the decision has not yet been made;
• is a timeout.
The stage of data race provocation is implemented by a set of event handlers. The memory access event handler is represented by Algorithm 4, while the handlers of other events are included in Algorithm 3. It is assumed that the handled events occur in concurrent threads. The semantics of the concurrent execution of the event handlers is formulated in Remark 1.
Algorithm 3. Event handling at the stage of data race provocation. | |
Input: the ID u of the current thread by Definition 8; | |
an event e by Definition 5; | |
a target by Definition 12; | |
a program state ps by Definition 14. | |
Output: a new program state ps. The program state is updated in parallel by all threads. The other variables of the algorithm are local. | |
1: | |
2: | if {Function call} then |
3: | {Adding call to call stack} |
4: | |
5: | else if {Function termination} then |
6: | {Removing last call from stack} |
7: | |
8: | else if {Thread spawning} then |
9: | {ID generation for spawned thread} |
10: | |
11: | {New state of current thread} |
12: | {Initial state of spawned thread} |
13: | |
14: | end if |
Algorithms 3 and 4 start in the initial state of the program :
• , (Target.D2, 0)}, 0 > is the initial state of the initial program thread;
• , (Target.D2, <0, 0, 0, 0, 0, 0>)}
• ;
• is a positive integer.
In Algorithm 4, the timeout is simulated by a loop that increments the counter until it reaches the value of ps.X (lines 13–15). In implementation, this could be something else, e.g., a time wait or an event wait.
On the one hand, ps.X must be sufficiently large to wait for the second memory access. On the other hand, an excessive wait unnecessarily increases verification time. Suppose that we have a function that returns the current system time. Then, we can calculate the duration of execution of any section of the code, including the monitoring stage and the loop in lines 15–17. Assume that m is the duration of the monitoring stage. Measure the duration of execution of a loop i = 0; while (i < c) {i++;} where с is a big constant, e.g. 107, where k is a coefficient showing how much faster saving events during the monitoring stage is than checking for a target during the provocation stage on average, e.g. 1.5 times faster.
It should be noted that the correct calculation of implies taking into account the configuration of a computing system, e.g., disabled CPU caches during the measurement.
Algorithm 4. Memory access event handling at the stage of data race provocation. | |
Input: the ID u of the current thread by Definition 8; | |
an event e by Definition 5; | |
a target by Definition 12; | |
a program state ps by Definition 14. | |
Output: a new program state ps. The program state is updated in parallel by all threads. The other variables of the algorithm are local. | |
1: | |
2: | if {Memory access} then |
3: | for all {Target tuple is function } do |
4: | |
5: | if |
{Access corresponds to descriptor d1} then | |
6: | {Counter increment} |
7: | {New state of current thread} |
8: | {Program trace update} |
9: | if {Ordinal number of access matches } then |
10: | {First time target access detected} |
{Target access saved} | |
{Save that access is detected} | |
{Program trace update} | |
else | |
go to 3 {Check access for correspondence to or terminate} | |
end if | |
11: | {Data race} |
{Save in trace that data race is detected} | |
go to 20 {Termination: data race detected} | |
12: | {Counter of access wait loop for d2} |
13: | while |
) do | |
14: | |
15: | end while |
16: | {There is no data race} |
17: | end if |
18: | end if |
19: | end for |
20: | end if |
ESTIMATION OF THE PROGRAM TRACE SIZE IN THE WORST CASE
Suppose that b bits are sufficient to store an integer. We calculate only the size of the useful data without taking into account the memory required to organize the data structure.
By Definition 7, a program trace is a finite set of thread traces. A thread trace includes three elements: a thread ID by Definition 8, an integer counter of spawned threads, and a sequence of thread events.
Suppose that n threads are executed. The maximum amount of memory for storing their thread IDs is reached when each thread spawns exactly one thread: bit. Obviously, bits are sufficient to store counters of spawned threads.
By Definition 5, there are only four types of events. Only three of them need to be stored in the program trace: memory access (1), function call (2), and function termination (3). Thus, to store an event type with three possible values, two bits are sufficient.
Let us consider a memory access event by Definition 5. There is no need to store the type of an operation (read or write) and atomicity in the program trace for each memory access. This information can be retrieved at the instrumentation stage from LLVM IR, being associated with an instruction address in machine code and then with a memory access in the program trace.
The length of the accessed data segment can be calculated at runtime (the case where multiple memory accesses from memset, memcpy, memove, and other functions are simulated by one memory access). As this is an optimization, let us discard it here. The operand size for most regular load and store instructions is known at the instrumentation stage. Therefore we should not save the length of the accessed data segment for a memory access event in the program trace.
Thus, to store a memory access event by Definition 5, it is sufficient to have 2b + 2. To store a function call event, it is sufficient to have 2b + 2 bits per function call address, callee function address, and event type.
Assuming that a function termination event in a program trace is associated with the previous function call event, we can conclude that two bits (event type) are sufficient to store a function termination event.
Thus, in the worst case, + k(2b + 4) bits are sufficient to execute a program with n threads, m memory accesses, and k function calls (and terminations).
EXPERIMENTS WITH RACEHUNTER
A large-scale experimental study to confirm the effectiveness of RaceHunter, as well as the industrial application of this tool, is ahead. The goal of the RaceHunter experiments carried out in this work is to practically verify RaceHunter’s ability to detect data races.
As the target software for the experiments, the real-time operating system [16] was chosen. This operating system implements the ARINC 653 standard, which specifies the requirements for the programming interface provided by the operating system to applications. In particular, this programming interface includes a set of functions for inter-thread communication. In the experiments, an error was introduced into the implementation of these functions to provoke a data race, and it was checked whether RaceHunter can detect this data race.
For each experiment, a test was developed. In the test, two threads with the same priority were concurrently executed by different CPU cores. One thread sent a message, while another thread waited indefinitely for the message to arrive. The tests were designed in such a way that two memory access instructions involved in a data race were actually executed.
Experiment 1. A global nonatomic integer variable is declared. In the implementation of the message sending function, a value is written to a variable. In the implementation of the message receiving function, the value of the variable is read. The variable is accessed without mutual exclusion. Result: Data race detected.
Experiment 2. In the implementation of the message receiving function, the operation of holding a synchronization primitive is relocated in such a way that one of the shared variables is read without mutual exclusion. Result: Data race detected.
Experiment 3. In the implementation of the message sending function, the operation of holding a synchronization primitive is relocated in such a way that one of the shared variables is read without mutual exclusion. Result: Data race detected.
Table 1 shows the main verification statistics for RaceHunter, which is obtained in the experiments. The operating system was run in the qemu emulator [17]. A system-on-chip with four 100 MHz cores and 128 MB RAM was emulated. The average number of memory access and function call events during a test is small because the test executions were really short and we did not instrument OS kernel in our experiments targeting RaceHunter for data races in the user space code.
Table 1. . Some statistical indicators of verification (average values)
Parameter | Value |
|---|---|
Number of threads in a test | 4 |
Number of memory accesses | 696 |
Number of function calls | 129 |
Number of pairs of conflicting memory accesses | 50 |
Memory used by RaceHunter | ∼14 KB |
Test execution time without RaceHunter | ∼2 s |
Verification time with RaceHunter | ∼93 s |
It should be noted that the current version of RaceHunter was developed to test the new approach to data race detection. Thus, it can be considered a working prototype, which differs from an industrial version in its suboptimal memory usage and algorithms for which the most important criterion was ease of implementation.
COMPARISON WITH EXISTING APPROACHES
RaceHunter is a dynamic analysis tool developed to detect data races missed by existing breakpoint-watchpoint approaches through systematic checking every conflicting pair of memory access events. Moreover RaceHunter is able to detect data races missed by lock-set and happens-before approaches.
Statement 2. There are data races that the lock-set approach may miss, but RaceHunter detects.
Proof. To prove the statement, it is sufficient to present a program with the corresponding data race. The program in Listing 1 has a data race at the address of variable a.
Listing1. Pseudocode of a program with a data race. | |
int a = 0, b = 0; | |
mutex ma, mb; | |
void t1 (void) { | void t2 (void) { |
lock(&ma); | lock(&mb); |
a = 1; | if ( b == 1) { |
unlock(&ma); | unlock (&mb); |
lock(&mb); | lock(&ma); |
b = 1; | a = 2; |
unlock(&mb); | unlock (&ma); |
} | } |
else { | |
unlock(&mb); | |
a = 3; | |
} | |
} | |
The lock-set approach misdetects this data race in the sequential execution of threads t1, t21 because, in this case, executes the branch where a is accessed with lock holding.
RaceHunter detects this race in the sequential execution of t1, t2 because it detects the access of a from two threads at the monitoring stage and waits before at the stage of data race provocation, which leads to the fact that the branch, where a is accessed without lock holding, is executed in .
Statement 3. There are data races that the happens-before approach may miss, but RaceHunter detects.
Proof. To prove the statement, it is sufficient to present a program with the corresponding data race. The program in Listing 2 has a data race at the address of variable .
Listing 2. Pseudocode of a program with a data race. | |
int a = 0; | |
atomic int b = 0; | |
atomic int c = 2; | |
void t1 (void) { | void t2 (void) { |
a = 1; | if (b != c) |
b = 1; | a = 2; |
} | } |
The happens-before approach misdetects this data race in the sequential execution of threads . In this execution, there is a pair of statements (t1 : b = 1, ∈ happens–before. In addition, (t1 : a = 1, ∈ happens–before ∧ ∈ happens–before according to the program order. Hence, in the sequential execution of threads t1, t2 with happens-before, ∈ happens–before.
RaceHunter detects this race in the sequential execution of t1, t2 because it successfully waits for = 2 before .
RaceHunter’s improved ability to detect data races is due to the increase in verification time (mainly due to re-executions) and unlimited memory usage at the monitoring stage. Table 2 compares the key features of the dynamic approaches to data race detection, where RH is RaceHunter, HB is happens-before, LS is lock-set, and BW is breakpoint-watchpoint.
Table 2. . RaceHunter versus other dynamic data race detection techniques
Feature | RH | HB | LS | BW |
|---|---|---|---|---|
Does not generate false positives due to unobserved synchronization | + | – | – | + |
Does not miss data races due to random memory access checks | + | + | + | – |
Uses a fixed amount of memory | – | + | + | + |
Does not waste time on program re-execution | – | + | + | + |
Detects data races missed by other approaches | + | + | + | + |
There is an analogy between the RaceHunter approach and fuzzing [18–20]:
• the fuzzer is initialized with some set of input data; in turn, RaceHunter is applied to a certain set of executions of a multithreaded program;
• the fuzzer mutates the input data; in turn, RaceHunter provokes new executions of a multithreaded program.
However, unlike RaceHunter, the fuzzer adds mutated inputs to the initial set of input data to produce new inputs based on them.
The idea of the two-stage dynamic data race detection technique, whereby possible data races are detected in the first execution and are confirmed by replay in subsequent executions, was previously formulated and implemented in [21]. The second-stage algorithm in the method [21]:
•organizes a certain sequential order of instruction execution in the program by randomly selecting a thread to execute the next instruction in the controllable thread execution scheduler.
• identifies memory accesses based only on the program statement that performs them. That is why it suspends each thread before executing either of two target program statements (on which a data race was detected at the first stage) until the data race is confirmed (the conflict with one of the previously suspended memory accesses is revealed) or until all threads terminate.
Random selection of a thread to execute the next instruction and the suspension of the thread before each execution of two target statements may interfere with the program execution carried out based on the results of the first stage and, therefore, may prevent the reproduction of the data race. On the other hand, checking all memory accesses performed by the target statements can be more efficient than the RaceHunter approach in the case where the memory access descriptor by Definition 11 (or some other definition) does not identify the target memory access during the replay because of the nondeterministic execution of the program.
In addition, the control over the thread scheduler at the second stage of the method [21] significantly impairs its scaling for long program executions or a large number of executions.
The method [21] was originally developed for shared memory concurrent Java programs. Later, in [22], the authors implemented a similar method for distributed shared memory concurrent programs.
In the method [22], memory accesses are also identified only by the program statement that executes them. The second-stage algorithm differs from the method [21]. Tasks on computational nodes are executed freely, except for the processing of two target memory access statements. The handler sequentially
1. performs a data race check with one of previously suspended tasks;
2. if the check does not confirm the data race, then the handler suspends the current task for a short period of time (more precisely, the task is suspended with a certain probability). By suspending a task, the handler reduces the probability of suspending subsequent tasks.
The second-stage algorithm in the method [22] may also not confirm data races detected at the first stage, because target memory accesses are not uniquely identified, delays are introduced that are not aimed at replaying the execution obtained at the first stage, and tasks are suspended on target statements with a certain probability (rather than always).
As the basic method [21], the method [22] can be more efficient than RaceHunter in the case where the memory access descriptor by Definition 11 does not identify the target memory access at the stage of data race provocation due to the nondeterministic execution of the program. Meanwhile, reducing the probability of memory accesses checks at the second stage of the method [22] is aimed at improving its scalability.
CONCLUSIONS
In this paper, we have proposed RaceHunter, a new technique for dynamic detection of data races. RaceHunter extends the approach to direct data race detection using breakpoints and watchpoints, which was previously implemented in DataCollider [6] and KCSAN [7], with the idea of systematic identification and check of conflicting memory accesses.
The stage of checking pairs of conflicting memory accesses for data races (the stage of data race provocation) differs from the methods [21] and [22] in its unique way of identifying the target pair of memory accesses during replays, while taking into account the possible nondeterministic execution of a multithreaded program by using the memory access descriptor.
RaceHunter does not generate false positives when the target software exploits nonstandard synchronization primitives or unknown synchronization protocols, which is why it can be applied to software with any thread synchronization methods. This is especially important for system software, e.g., operating systems and virtual machines. In particular, RaceHunter was applied to the real-time operating system [16].
RaceHunter’s improved ability to detect data races is due to, among other things, program re-executions, which however increase verification time.
In addition, RaceHunter does not support fixed memory usage, which is why it is not suitable for very long (in the extreme case, infinite) monitoring of program execution. RaceHunter is more suitable for use in continuous integration and development systems in conjunction with tests that involve concurrent thread execution.
DIRECTIONS FOR FURTHER RESEARCH
It seems important to experimentally investigate the effectiveness of RaceHunter in comparison with existing dynamic approaches to data race detection, especially with [7] and [22].
In the current version of RaceHunter, pairs of conflicting memory accesses are detected by Definition 10. The use of a thread analysis similar to [23], a lock analysis similar to [5], etc. can potentially reduce the number of targets for the data race provocation stage and, therefore, speed up verification. For these types of analysis, it is reasonable to estimate potential verification speedup and investigate their impact on the efficiency of data race detection.
RaceHunter needs to re-execute the program during the data race provocation stage. It seems reasonable to investigate the compatibility of existing multithreaded execution replay methods with RaceHunter [24–27].
FUNDING
This work was supported by ongoing institutional funding. No additional grants to carry out or direct thisparticular research were obtained.
CONFLICT OF INTEREST
The author of this work declares that he has no conflicts of interest.
It is assumed that the threads are executed freely, in parallel, which does not exclude that, in one of the executions, thread ends before starts.
Translated by Yu. Kornienko
Publisher’s Note.
Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
AI tools may have been used in the translation or editing of this article.
REFERENCES
1 Naik, M., Aiken, A., and Whaley, J., Effective static race detection for Java, Proc. 27th ACM SIGPLAN Conf. Programming Language Design and Implementation, 2006, pp. 308–319.
2 Andrianov, P. and Mutilin, V., Scalable thread-modular approach for data race detection, Proc. Int. Workshop Frontiers in Software Engineering Education, 2019, pp. 371–385.
3 Pratikakis, P.; Foster, J.S.; Hicks, M. Locksmith: Practical static race detection for C. ACM Trans. Program. Lang. Syst.; 2011; 33, pp. 1-55. [DOI: https://dx.doi.org/10.1145/1889997.1890000]
4 Serebryany, K. and Iskhodzhanov, T., Threadsanitizer: Data race detection in practice, Proc. Workshop Binary Instrumentation and Applications, 2009, pp. 62–71. https://doi.org/10.1145/1791194.1791203
5 Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., and Anderson, T., Eraser: A dynamic data race detector for multithreaded programs, ACM Trans. Comput. Syst., 1997, pp. 391–411. https://doi.org/10.1145/265924.265927
6 Erickson, J., Musuvathi, M., Burckhardt, S., and Olynyk, K., Effective data-race detection for the kernel, Proc. 9th USENIX Symp. Operating Systems Design and Implementation (OSDI), 2010.
7 Kernel Concurrency Sanitizer. https://docs.kernel.org/dev-tools/kcsan.html. Accessed October 26, 2023.
8 Lamport, L. Time, clocks, and the ordering of events in a distributed system; 2019; [DOI: https://dx.doi.org/10.1145/3335772.3335934]
9 LLVM language reference manual. https://llvm.org/docs/LangRef.html. Accessed November 20, 2023.
10 Clang: A C language family frontend for LLVM. https://clang.llvm.org. Accessed November 20, 2023.
11 Serebryany, K., Potapenko, A., Iskhodzhanov, T., and Vyukov, D., Dynamic race detection with LLVM compiler: Compile-time instrumentation for ThreadSanitizer, Proc. Int. Conf. Runtime Verification, pp. 110–114.
12 Marino, D., Musuvathi, M., and Narayanasamy, S., LiteRace: Effective sampling for lightweight data-race detection, Proc. 30th ACM SIGPLAN Conf. Programming Language Design and Implementation, 2009, pp. 134–143.
13 Portable Operating System Interface. (POSIX®) base specifications; 2009;
14 Lamport, L. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput.; 1979; 100, pp. 690-691. [DOI: https://dx.doi.org/10.1109/TC.1979.1675439]
15 Lamport, L. The PlusCal algorithm language; 2009; [DOI: https://dx.doi.org/10.1007/978-3-642-03466-4_2]
16 Cheptsov, V. and Khoroshilov, A., Robust resource partitioning approach for ARINC 653 RTOS.
17 Bellard, F., Qemu, a fast and portable dynamic translator, Proc. USENIX Annu. Technical Conf., California, USA, 2005, vol. 41, p. 46.
18 Holler, C., Herzig, K., and Zeller, A., Fuzzing with code fragments, Proc. 21st USENIX Security Symp., 2012, pp. 445–458.
19 libFuzzer – A library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html. Accessed October 26, 2023.
20 Sargsyan, S., Hakobyan, J., Mehrabyan, M., Mishechkin, M., Akozin, V., and Kurmangaleev, S., ISP-Fuzzer: Extendable fuzzing framework, Proc. Ivannikov Memorial Workshop (IVMEM), 2019, pp. 68–71. https://doi.org/10.1109/IVMEM.2019.00017
21 Sen, K., Race directed random testing of concurrent programs, Proc. 29th ACM SIGPLAN Conf. Programming Language Design and Implementation, 2008, pp. 11–21.
22 Park, C.-S., Sen, K., Hargrove, P., and Iancu, C., Efficient data race detection for distributed memory parallel programs, Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, 2011, pp. 1–12.
23 Mellor-Crummey, J., On-the-fly detection of data races for programs with nested fork-join parallelism, Proc. ACM/IEEE Conf. Supercomputing, 1991, pp. 24–33.
24 Leblanc, T.J. Debugging parallel programs with instant replay. IEEE Trans. Comput.; 1987; 100, pp. 471-482. [DOI: https://dx.doi.org/10.1109/TC.1987.1676929]
25 Dovgalyuk, P. Deterministic replay of system’s execution with multi-target QEMU simulator for dynamic analysis and reverse debugging; 2012; [DOI: https://dx.doi.org/10.1109/CSMR.2012.74]
26 Netzer, R.H., Optimal tracing and replay for debugging shared-memory parallel programs, Proc. ACM/ONR Workshop Parallel and Distributed Debugging, 1993, pp. 1–11.
27 Bacon, D.F.; Goldstein, S.C. Hardware-assisted replay of multiprocessor programs. ACM SIGPLAN Not.; 1991; 26, pp. 194-206. [DOI: https://dx.doi.org/10.1145/127695.122777]
© Pleiades Publishing, Ltd. 2024. ISSN 0361-7688, Programming and Computer Software, 2024, Vol. 50, No. 6, pp. 467–481. © Pleiades Publishing, Ltd., 2024. Russian Text © The Author(s), 2024, published in Trudy ISP RAN/Proc. ISP RAS, 2023, Vol. 35, No. 6.