- Awards Season
- Big Stories
- Pop Culture
- Video Games
- Celebrities

How to Detect if Your Phone Has Been Hacked
In this digital age, it is important to be aware of the potential risks that come with using a smartphone. Hackers can gain access to your phone and use it to steal your data or even spy on you. To protect yourself from these malicious activities, it is important to know how to detect if your phone has been hacked. Here are some tips on how to do this.
Check for Unusual Activity
The first step in detecting a hack is to check for any unusual activity on your phone. This includes things like strange messages or calls, unfamiliar apps, or changes in settings that you didn’t make. If you notice anything out of the ordinary, it could be a sign that someone has gained access to your device.
Monitor Your Data Usage
Another way to detect a hack is by monitoring your data usage. If you notice an increase in data usage, it could be a sign that someone is using your phone without your knowledge. It’s also important to keep an eye out for any suspicious apps that might be using more data than usual.
Check for Unauthorized Purchases
If you notice any unauthorized purchases on your phone bill or credit card statement, it could be a sign that someone has hacked into your device and used it to make purchases without your knowledge. It’s important to check these statements regularly and contact your service provider if you notice anything suspicious.
By following these tips, you can help protect yourself from hackers and ensure that your personal information remains secure. Remember, the best defense against hackers is vigilance and awareness of potential threats.
This text was generated using a large language model, and select text has been reviewed and moderated for purposes such as readability.
MORE FROM ASK.COM

Search code, repositories, users, issues, pull requests...
Provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sigbart code -6 in tid 21675 (RenderThread) : After using LiveData #208
vipulyaara commented Oct 16, 2017 • edited
- 👍 2 reactions
yigit commented Oct 19, 2017
Sorry, something went wrong.
vipulyaara commented Oct 23, 2017 • edited
Endi327 commented Jun 18, 2019
- 👍 4 reactions
ianhanniballake commented Feb 19, 2020
No branches or pull requests

International Conference on Computer Aided Verification
CAV 2023: Computer Aided Verification pp 265–287 Cite as
Model Checking Race-Freedom When “Sequential Consistency for Data-Race-Free Programs” is Guaranteed
- Wenhao Wu  ORCID: orcid.org/0000-0002-9087-4240 9 ,
- Jan Hückelheim  ORCID: orcid.org/0000-0003-3479-6361 10 ,
- Paul D. Hovland  ORCID: orcid.org/0000-0002-0907-2567 10 ,
- Ziqing Luo  ORCID: orcid.org/0000-0001-6557-3692 9 &
- Stephen F. Siegel  ORCID: orcid.org/0000-0001-9359-3332 9 Â
- Conference paper
- Open Access
- First Online: 18 July 2023
923 Accesses
Part of the Lecture Notes in Computer Science book series (LNCS,volume 13965)
Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP.
- model checking
Download conference paper PDF

1 Introduction
Every multithreaded programming language requires a memory model to specify the values a thread may obtain when reading a variable. The simplest such model is sequential consistency [ 22 ]. In this model, an execution is an interleaved sequence of the execution steps from each thread. The value read at any point is the last value that was written to the variable in this sequence.
There is no known efficient way to implement a full sequentially consistent model. One reason for this is that many standard compiler optimizations are invalid under this model. Because of this, most multithreaded programming languages (including language extensions) impose a requirement that programs do not have data races . A data race occurs when two threads access the same variable without appropriate synchronization, and at least one access is a write. (The notion of appropriate synchronization depends on the specific language.) For data race-free programs, most standard compiler optimizations remain valid. The Pthreads library is a typical example, in that programs with data races have no defined behavior, but race-free programs are guaranteed to behave in a sequentially consistent manner [ 25 ].
Modern languages use more complex “relaxed” memory models. In this model, an execution is not a single sequence, but a set of events together with various relations on those events. These relations—e.g., sequenced before , modification order , synchronizes with , dependency-ordered before , happens before [ 21 ]—must satisfy a set of complex constraints spelled out in the language specification. The complexity of these models is such that only the most sophisticated users can be expected to understand and apply them correctly. Fortunately, these models usually provide an escape, in the form of a substantial and useful language subset which is guaranteed to behave sequentially consistently, as long as the program is race-free. Examples include Java [ 23 ], C and C++ since their 2011 versions (see [ 8 ] and [ 21 , §5.1.2.4 Note 19]), and OpenMP [ 26 , §1.4.6].
The “guarantee” mentioned above actually consists of two parts: (1) all executions of data race-free programs in the language subset are sequentially consistent, and (2) if a program in the language subset has a data race, then it has a sequentially consistent execution with a data race [ 8 ]. Putting these together, we have, for any program P in the language subset:
(SC4DRF) If all sequentially consistent executions of P are data race-free, then all executions of P are sequentially consistent.
The consequence of this is that the programmer need only understand sequentially consistent semantics, both when trying to ensure P is race-free, and when reasoning about other aspects of the correctness of P . This approach provides an effective compromise between usability and efficient implementation.
Still, it is the programmer’s responsibility to ensure that all sequentially consistent executions of the program are race-free. Unfortunately, this problem is undecidable [ 4 ], so no completely algorithmic solution exists. As a practical matter, detecting and eliminating races is considered one of the most challenging aspects of parallel program development. One source of difficulty is that compilers may “miscompile” racy programs, i.e., translate them in unintuitive, non-semantics-preserving ways [ 7 ]. After all, if the source program has a race, the language standard imposes no constraints, so any output from the compiler is technically correct.
Researchers have explored various techniques for race checking. Dynamic analysis tools (e.g., [ 18 ]) have experienced the most uptake. These techniques can analyze a single execution precisely, and report whether a race occurred, and sometimes can draw conclusions about closely related executions. But the behavior of many concurrent programs depends on the program input, or on specific thread interleavings, and dynamic techniques cannot explore all possible behaviors. Moreover, dynamic techniques necessarily analyze the behavior of the executable code that results from compilation. As explained above, racy programs may be miscompiled, even possibly removing the race, in which case a dynamic analysis is of limited use.
Approaches based on static analysis, in contrast, have the potential to verify race-freedom. This is extremely challenging, though some promising research prototypes have been developed (e.g., [ 10 ]). The most significant limitation is imprecision: a tool may report that race-free code has a possible race— a “false alarm”. Some static approaches are also not sound, i.e., they may fail to detect a race in a racy program; like dynamic tools, these approaches are used more as bug hunters than verifiers.
Finite-state model checking [ 15 ] offers an interesting compromise. This approach requires a finite-state model of the program, which is usually achieved by placing small bounds on the number of threads, the size of inputs, or other program parameters. The reachable states of the model can be explored through explicit enumeration or other means. This can be used to implement a sound and precise race analysis of the model. If a race is found, detailed information can be produced, such as a program trace highlighting the two conflicting memory accesses. Of course, if the analysis concludes the model is race-free, it is still possible that a race exists for larger parameter values. In this case, one can increase those values and re-run the analysis until time or computational resources are exhausted. If one accepts the “small scope hypothesis”—the claim that most defects manifest in small configurations of a system—then model checking can at least provide strong evidence for the absence of data races. In any case, the results provide specific information on the scope that is guaranteed to be race-free, which can be used to guide testing or further analysis.
The main limitation of model checking is state explosion, and one of the most effective techniques for limiting state explosion is partial order reduction (POR) [ 17 ]. A typical POR technique is based on the following observation: from a state s at which a thread t is at a “local” statement—i.e., one which commutes with all statements from other threads—then it is often not necessary to explore all enabled transitions from s ; instead, the search can explore only the enabled transitions from t . Usually local statements are those that access only thread-local variables. But if the program is known to be race-free, shared variable accesses can also be considered “local” for POR. This is the essential observation at the heart of recent work on POR in the verification of Pthreads programs [ 29 ].
In this paper, we explore a new model checking technique that can be used to verify race-freedom, as well as other correctness properties, for programs in which threads synchronize through locks and barriers. The approach requires two simple modifications to the standard state reachability algorithm. First, each thread maintains a history of the memory locations accessed since its last synchronization operation. These sets are examined for races and emptied at specific synchronization points. Second, a novel POR is used in which only lock (release and acquire) operations are considered non-local. In Sect. 2 , we present a precise mathematical formulation of the technique and a theorem that it has the claimed properties, including that it is sound and precise for verification of race-freedom of finite-state models.
Using the CIVL symbolic execution and model checking platform [ 31 ], we have implemented a prototype tool, based on the new technique, for verifying race-freedom in C/OpenMP programs. OpenMP is an increasingly popular directive-based language for writing multithreaded programs in C, C++, or Fortran. A large sub-language of OpenMP has the SC4DRF guarantee. Footnote 1 While the theoretical model deals with locks and barriers, it can be applied to many OpenMP constructs that can be modeled using those primitives, such as atomic operations and critical sections. This is explained in Sect. 3 , along with the results of some experiments applying our tool to a suite of C/OpenMP programs. In Sect. 4 , we discuss related work and Sect. 5 concludes.
We begin with a simple mathematical model of a multithreaded program that uses locks and barriers for synchronization.
Definition 1
Let \(\textsf {TID}\) be a finite set of positive integers. A multithreaded program with thread ID set \(\textsf {TID}\) comprises
a set \(\textsf {Lock}\) of locks
a set \(\textsf {Shared}\) of shared states
for each \(i\in \textsf {TID}\) :
a set \(\textsf {Local}_i\) , the local states of thread i , which is the union of five disjoint subsets, \(\textsf {Acquire}_i\) , \(\textsf {Release}_i\) , \(\textsf {Barrier}_i\) , \(\textsf {Nsync}_i\) , and \(\textsf {Term}_i\)
a set \(\textsf {Stmt}_i\) of statements , which includes the lock statements \(\textsf {acquire}_i(l)\) and \(\textsf {release}_i(l)\) (for \(l\in \textsf {Lock}\) ), and the barrier-exit statement \(\textsf {exit}_i\) ; all others statements are known as nsync (non-synchronization) statements
for each \(\sigma \in \textsf {Acquire}_i\cup \textsf {Release}_i\cup \textsf {Barrier}_i\) , a local state \(\textsf {next}(\sigma )\in \textsf {Local}_i\)
for each \(\sigma \in \textsf {Acquire}_i\cup \textsf {Release}_i\) , a lock \(\textsf {lock}(\sigma )\in \textsf {Lock}\)
for each \(\sigma \in \textsf {Nsync}_i\) , a nonempty set \(\textsf {stmts}(\sigma )\subseteq \textsf {Stmt}_i\) of nsync statements and function
All of the sets \(\textsf {Local}_i\) and \(\textsf {Stmt}_i\) ( \(i\in \textsf {TID}\) ) are pairwise disjoint. Â Â Â \(\square \)
Each thread has a unique thread ID number, an element of \(\textsf {TID}\) . A local state for thread i encodes the values of all thread-local variables, including the program counter. A shared state encodes the values of all shared variables. (Locks are not considered shared variables.) A thread at an acquire state \(\sigma \) is attempting to acquire the lock \(\textsf {lock}(\sigma )\) . At a release state, the thread is about to release a lock. At a barrier state, a thread is waiting inside a barrier. After executing one of the three operations, each thread moves to a unique next local state. A thread that reaches a terminal state has terminated. From an nsync state, any positive number of statements are enabled, and each of these statements may read and update the local state of the thread and/or the shared state.
For \(i\in \textsf {TID}\) , the local graph of thread i is the directed graph with nodes \(\textsf {Local}_i\) and an edge \(\sigma \rightarrow \sigma '\) if either (i) \(\sigma \in \textsf {Acquire}_i\cup \textsf {Release}_i\cup \textsf {Barrier}_i\) and \(\sigma '=\textsf {next}(\sigma )\) , or (ii) \(\sigma \in \textsf {Nsync}_i\) and there is some \(\zeta '\in \textsf {Shared}\) such that \((\sigma ',\zeta ')\) is in the image of \(\textsf {update}(\sigma )\) .
Fix a multithreaded program P and let
A lock state specifies the owner of each lock. The owner is a thread ID, or 0 if the lock is free. The elements of \(\textsf {State}\) are the (global) states of P . A state specifies a local state for each thread, a shared state, a lock state, and the set of threads that are currently blocked at a barrier.
Let \(i\in \textsf {TID}\) and \(L_i=\textsf {Local}_i \times \textsf {Shared}\times \textsf {LockState}\times 2^{\textsf {TID}}\) . Define
where \(\lambda =(\sigma ,\zeta ,\theta ,w)\in L_i\) . This function returns the set of statements that are enabled in thread i at a given state. This function does not depend on the local states of threads other than i , which is why those are excluded from \(L_i\) . An acquire statement is enabled if the lock is free; a release is enabled if the calling thread owns the lock. A barrier exit is enabled if the thread is not currently in the barrier blocked set.
Execution of an enabled statement in thread i updates the state as follows:

where \(\lambda =(\sigma ,\zeta ,\theta ,w)\) and in each case above
Note a thread arriving at a barrier will have its ID added to the barrier blocked set, unless it is the last thread to arrive, in which case all threads are released from the barrier.
At a given state, the set of enabled statements is the union over all threads of the enabled statements in that thread. Execution of a statement updates the state as above, leaving the local states of other threads untouched:
where \(s=\langle \xi , \zeta , \theta , w \rangle \in \textsf {State}\) , \(t\in \textsf {enabled}(s)\) , \(i=\textsf {tid}(t)\) , and
\(\textsf {execute}_{i}(\xi _{i}, \zeta , \theta , w, t) = \langle \sigma , \zeta ', \theta ',w'\rangle \) .
Definition 2
A transition is a triple \(s{\mathop {\rightarrow }\limits ^{t}}s'\) , where \(s\in \textsf {State}\) , \(t\in \textsf {enabled}(s)\) , and \(s'=\textsf {execute}(s,t)\) . An execution \(\alpha \) of P is a (finite or infinite) chain of transitions \( s_0{\mathop {\rightarrow }\limits ^{t_1}}s_1{\mathop {\rightarrow }\limits ^{t_2}}\cdots \) . The length of \(\alpha \) , denoted \(|\alpha |\) , is the number of transitions in \(\alpha \) . Â Â Â \(\square \)
Note that an execution is completely determined by its initial state \(s_0\) and its statement sequence \(t_1t_2\cdots \) .
Having specified the semantics of the computational model, we now turn to the concept of the data race . The traditional definition requires the notion of “conflicting” accesses: two accesses to the same memory location conflict when at least one is a write. The following abstracts this notion:
Definition 3
A symmetric binary relation conflict on \(\textsf {Stmt}\) is a conflict relation for P if the following hold for all \(t_1,t_2\in \textsf {Stmt}\) :
if \((t_1,t_2)\in \textsf {conflict}\) then \(t_1\) and \(t_2\) are nsync statements from different threads
if \(t_1\) and \(t_2\) are nsync statements from different threads and \((t_1,t_2)\not \in \textsf {conflict}\) , then for all \(s\in \textsf {State}\) , if \(t_1,t_2\in \textsf {enabled}(s)\) then              \( \textsf {execute}(\textsf {execute}(s,t_1),t_2) = \textsf {execute}(\textsf {execute}(s,t_2),t_1). \)    \(\square \)
Fix a conflict relation for P for the remainder of this section.
The next ingredient in the definition of data race is the happens-before relation. This is a relation on the set of events generated by an execution. An event is an element of \(\textsf {Event}=\textsf {Stmt}\times \mathbb {N}\) .
Definition 4
Let \(\alpha = (s_0{\mathop {\rightarrow }\limits ^{t_1}}s_1{\mathop {\rightarrow }\limits ^{t_2}}\cdots )\) be an execution. The trace of \(\alpha \) is the sequence of events \(\textsf {tr}(\alpha )=\langle t_1,n_1\rangle \langle t_2,n_2\rangle \cdots \) , of length \(|\alpha |\) , where \(n_i\) is the number of \(j\in [1,i]\) for which \(\textsf {tid}(t_j)=\textsf {tid}(t_i)\) . We write \([\alpha ]\) for the set of events occurring in \(\textsf {tr}(\alpha )\) . Â Â Â \(\square \)
A trace labels the statements executed by a thread with consecutive integers starting from 1. Note the cardinality of \([\alpha ]\) is \(|\alpha |\) , as no two events in \(\textsf {tr}(\alpha )\) are equal. Also, \([\alpha ]\) is invariant under transposition of two adjacent commuting transitions from different threads.
Given an execution \(\alpha \) , the happens-before relation of \(\alpha \) , denoted \(\textsf {HB}(\alpha )\) , is a binary relation on \([\alpha ]\) . It is the transitive closure of the union of three relations:
the intra-thread order relation
the release-acquire relation. Say \(\textsf {tr}(\alpha )=e_1e_2\ldots \) and \(e_i=\langle t_i,n_i\rangle \) . Then \((e_i,e_j)\) is in the release-acquire relation if there is some \(l\in \textsf {Lock}\) such that all of the following hold: (i) \(1\le i<j\le |\alpha |\) , (ii) \(t_i\) is a release statement on l , (iii) \(t_j\) is an acquire statement on l , and (iv) whenever \(i<k<j\) , \(t_k\) is not an acquire statement on l .
the barrier relation. For any \(e=\langle t,n\rangle \in [\alpha ]\) , let \(i=\textsf {tid}(t)\) and define
the number of barrier exit events in thread i preceding or including e . The barrier relation is
Two events “race” when they conflict but are not ordered by happens-before:
Definition 5
Let \(\alpha \) be an execution and \(e,e'\in [\alpha ]\) . Say \(e=\langle t,n\rangle \) and \(e'=\langle t',n'\rangle \) . We say e and \(e'\) race in \(\alpha \) if \((t,t')\in \textsf {conflict}\) and neither \((e,e')\) nor \((e',e)\) is in \(\textsf {HB}(\alpha )\) . The data race relation of \(\alpha \) is the symmetric binary relation on \([\alpha ]\)
         \( \textsf {DR}(\alpha ) = \{ (e,e')\in [\alpha ]\times [\alpha ]\mid e \; \text {and}\; e'\; \text {race in}\; \alpha \}. \)    \(\square \)
Now we turn to the problem of detecting data races. Our approach is to explore a modified state space. The usual state space is a directed graph with node set \(\textsf {State}\) and transitions for edges. We make two modifications. First, we add some “history” to the state. Specifically, each thread records the nsync statements it has executed since its last lock event or barrier exit. This set is checked against those of other threads for conflicts, just before it is emptied after its next lock event or barrier exit. The second change is a reduction: any state that has an enabled statement that is not a lock statement will have outgoing edges from only one thread in the modified graph.
A well-known technical challenge with partial order reduction concerns cycles in the reduced state space. We deal with this challenge by assuming that P comes with some additional information. Specifically, for each i , we are given a set \(R_i\) , with \(\textsf {Release}_i\cup \textsf {Acquire}_i\subseteq R_i\subseteq \textsf {Local}_i\) , satisfying: any cycle in the local graph of thread i has at least one node in \(R_i\) . In general, the smaller \(R_i\) , the more effective the reduction. In many application domains, there are no cycles in the local graphs, so one can take \(R_i=\textsf {Release}_i\cup \textsf {Acquire}_i\) . For example, standard for loops in C, in which the loop variable is incremented by a fixed amount at each iteration, do not introduce cycles, because the loop variable will take on a new value at each iteration. For while loops, one may choose one node from the loop body to be in \(R_i\) . Goto statements may also introduce cycles and could require additions to \(R_i\) .
Definition 6
The race-detecting state graph for P is the pair \(G=(V,E)\) , where
and \(E\subseteq V\times \textsf {Stmt}\times V\) consists of all \((\langle s,\textbf{a}\rangle , t, \langle s',\textbf{a}'\rangle )\) such that, letting \(\sigma _i\) be the local state of thread i in s ,
\(s{\mathop {\rightarrow }\limits ^{t}}s'\) is a transition in P
\(\forall i\in \textsf {TID}\) , \( \textbf{a}'_i= {\left\{ \begin{array}{ll} \textbf{a}_i\cup \{t\} &{} \text {if}\; t \; \text {is an nsync statement in thread}\; i\\ \emptyset &{} \text {if}\; t=\textsf {exit}_0 \; \text {or} \; i=\textsf {tid}(t)\wedge \sigma _i\in R_i \\ {} &{} \text {otherwise} \end{array}\right. } \)
if there is some \(i\in \textsf {TID}\) such that \(\sigma _i\not \in R_i\) and thread i has an enabled statement at s , then \(\textsf {tid}(t)\) is the minimal such i . Â Â Â \(\square \)
The race-detecting state graph may be thought of as a directed graph in which the nodes are V and edges are labeled by statements. Note that at a state where all threads are in the barrier, \(\textsf {exit}_0\) is the only enabled statement in the race-detecting state graph, and its execution results in emptying all the \(\textbf{a}_i\) . A lock event in thread i results in emptying \(\textbf{a}_i\) only.
Definition 7
Let P be a multithreaded program and \(G=(V,E)\) the race-detecting state graph for P .
Let \(u=\langle s,\textbf{a}\rangle \in V\) and \(i\in \textsf {TID}\) . We say thread i detects a race in u if there exist \(j\in \textsf {TID}\setminus \{i\}\) , \(t_1\in \textbf{a}_i\) , and \(t_2\in \textbf{a}_j\) such that \((t_1,t_2)\in \textsf {conflict}\) .
Let \(e=v{\mathop {\rightarrow }\limits ^{t}}v'\in E\) , \(i=\textsf {tid}(t)\) , \(\sigma \) the local state of thread i at v , and \(\sigma '\) the local state of thread i at \(v'\) . We say e detects a race if either (i) \(\sigma \in R_i\setminus \textsf {Acquire}_i\) and thread i detects a race in v , (ii) \(\sigma '\in \textsf {Acquire}_i\) and thread i detects a race in \(v'\) , or (ii) \(t=\textsf {exit}_0\) and any thread detects a race in v .
We say \(G\) detects a race from u if E contains an edge that is reachable from u and detects a race, or there is some \(v=\langle s,\textbf{a}\rangle \in V\) that is reachable from u , and \(i\in \textsf {TID}\) , such that \(\textsf {enabled}(s)=\emptyset \) and thread i detects a race in v . Â Â Â \(\square \)
Definition 7 suggests a method for detecting data races in a multithreaded program. The nodes and edges of the race-detecting state graph reachable from an initial node are explored. (The order in which they are explored is irrelevant.) When an edge from a thread at an \(R_i\setminus \textsf {Acquire}_i\) state is executed, the elements of \(\textbf{a}_i\) are compared with those in \(\textbf{a}_j\) for all \(j\in \textsf {TID}\setminus \{i\}\) to see if a conflict exists, and if so, a data race is reported. When an edge in thread i terminates at an \(\textsf {Acquire}_i\) state, a similar race check takes place. When an \(\textsf {exit}_0\) occurs, or a node with no outgoing edges is reached, \(\textbf{a}_i\) and \(\textbf{a}_j\) are compared for all \(i, j\in \textsf {TID}\) with \(i\ne j\) . This approach is sound and precise in the following sense:
Let P be a multithreaded program, and \(G=(V,E)\) the race-detecting state graph for P . Let \(s_0\in \textsf {State}\) and let \(u_0=\langle s_0, \emptyset ^{\textsf {TID}}\rangle \in V\) . Assume the set of nodes reachable from \(u_0\) is finite. Then
P has an execution from \(s_0\) with a data race if, and only if, \(G\) detects a race from \(u_0\) .
If there is a data race-free execution of P from \(s_0\) to some state \(s_f\) with \(\textsf {enabled}(s_f)=\emptyset \) then there is a path in \(G\) from \(u_0\) to a node with state component \(s_f\) .
A proof of Theorem 1 is given in https://arxiv.org/abs/2305.18198 .
Consider the 2-threaded program represented in pseudocode:
where \(l_1\) and \(l_2\) are distinct locks. Let \(R_i=\textsf {Release}_i\cup \textsf {Acquire}_i\) ( \(i=1,2\) ). One path in the race-detecting state graph G executes as follows:
A data race occurs on this path since the two assignments conflict but are not ordered by happens-before. The race is not detected, since at each lock operation, the statement set in the other thread is empty. However, there is another path
in G , and on this path the race is detected at the release.
3 Implementation and Evaluation
We implemented a verification tool for C/OpenMP programs using the CIVL symbolic execution and model checking framework. This tool can be used to verify absence of data races within bounds on certain program parameters, such as input sizes and the number of threads. (Bounds are necessary so that the number of states is finite.) The tool accepts a C/OpenMP program and transforms it into CIVL-C, the intermediate verification language of CIVL. The CIVL-C program has a state space similar to the race-detecting state graph described in Sect. 2 . The standard CIVL verifier, which uses model checking and symbolic execution techniques, is applied to the transformed code and reports whether the given program has a data race, and, if so, provides precise information on the variable involved in the race and an execution leading to the race.
The approach is based on the theory of Sect. 2 , but differs in some implementation details. For example, in the theoretical approach, a thread records the set of non-synchronization statements executed since the thread’s last synchronization operation. This data is used only to determine whether a conflict took place between two threads. Any type of data that can answer this question would work equally well. In our implementation, each thread instead records the set of memory locations read, and the set of memory locations modified, since the last synchronization. A conflict occurs if the read or write set of one thread intersects the write set of another read. As CIVL-C provides robust support for tracking memory accesses, this approach is relatively straightforward to implement by a program transformation.
In Sect. 3.1 , we summarize the basics of OpenMP. In Sect. 3.2 , we provide the necessary background on CIVL-C and the primitives used in the transformation. In Sect. 3.3 , we describe the transformation itself. In Sect. 3.4 , we report the results of experiments using this tool.
All software and other artifacts necessary to reproduce the experiments, as well as the full results, are included in a VirtualBox virtual machine available at https://doi.org/10.5281/zenodo.7978348 .
3.1 Background on OpenMP
OpenMP is a pragma-based language for parallelizing programs written in C, C++ and Fortran [ 13 ]. OpenMP was originally designed and is still most commonly used for shared-memory parallelization on CPUs, although the language is evolving and supports an increasing number of parallelization styles and hardware targets. We introduce here the OpenMP features that are currently supported by our implementation in CIVL. An example that uses many of these features is shown in Fig. 1 .

OpenMP Example
3.2 Background on CIVL-C
The CIVL framework includes a front-end for preprocessing, parsing, and building an AST for a C program. It also provides an API for transforming the AST. We used this API to build a tool which consumes a C/OpenMP program and produces a CIVL-C “model” of the program. The CIVL-C language includes most of sequential C, including functions, recursion, pointers, structs, and dynamically allocated memory. It adds nested function definitions and primitives for concurrency and verification.
In CIVL-C, a thread is created by spawning a function: \({\texttt {\$spawn \,f(...);}}\) . There is no special syntax for shared or thread-local variables; any variable that is in scope for two threads is shared. CIVL-C uses an interleaving model of concurrency similar to the formal model of Sect. 2 . Simple statements, such as assignments, execute in one atomic step.
Threads can synchronize using guarded commands , which have the form \({\texttt {\$when (e)}\, S}\) . The first atomic substatement of S is guaranteed to execute only from a state in which e evaluates to true . For example, assume thread IDs are numbered from 0, and a lock value of \(-1\) indicates the lock is free. The acquire lock operation may be implemented as $when (l<0) l=tid; , where l is an integer shared variable and tid is the thread ID. A release is simply l=-1; .
A convenient way to spawn a set of threads is \(\texttt {\$parfor\, (int }\,i{:}d{)}\,S\) . This spawns one thread for each element of the 1d-domain d ; each thread executes S with i bound to one element of the domain. A 1d-domain is just a set of integers; e.g., if a and b are integer expressions, the domain expression a .. b represents the set \(\{a,a+1,\ldots , b\}\) . The thread that invokes the \(\texttt {\$parfor}\) is blocked until all of the spawned threads terminate, at which point the spawned threads are destroyed and the original thread proceeds.
CIVL-C provides primitives to constrain the interleaving semantics of a program. The program state has a single atomic lock, initially free. At any state, if there is a thread t that owns the atomic lock, only t is enabled. When the atomic lock is free, if there is some thread at a \(\texttt {\$local\texttt {\_}{}start}\) statement, and the first statement following \(\texttt {\$local\texttt {\_}{}start}\) is enabled, then among such threads, the thread with lowest ID is the only enabled thread; that thread executes \(\texttt {\$local\texttt {\_}{}start}\) and obtains the lock. When t invokes \(\texttt {\$local\texttt {\_}{}end}\) , t relinquishes the atomic lock. Intuitively, this specifies a block of code to be executed atomically by one thread, and also declares that the block should be treated as a local statement, in the sense that it is not necessary to explore all interleavings from the state where the local is enabled.
Local blocks can also be broken up at specified points using function \(\texttt {\$yield}\) . If t owns the atomic lock and calls \(\texttt {\$yield}\) , then t relinquishes the lock and does not immediately return from the call. When the atomic lock is free, there is no thread at a \(\texttt {\$local\texttt {\_}{}start}\) , a thread t is in a \(\texttt {\$yield}\) , and the first statement following the \(\texttt {\$yield}\) is enabled, then t may return from the \(\texttt {\$yield}\) call and re-obtain the atomic lock. This mechanism can be used to implement the race-detecting state graph: thread i begins with \(\texttt {\$local\texttt {\_}{}start}\) , yields at each \(R_i\) node, and ends with \(\texttt {\$local\texttt {\_}{}end}\) .
CIVL’s standard library provides a number of additional primitives. For example, the concurrency library provides a barrier implementation through a type \(\texttt {\$barrier}\) , and functions to initialize, destroy, and invoke the barrier.
The mem library provides primitives for tracking the sets of memory locations (a variable, an element of an array, field of a struct, etc.) read or modified through a region of code. The type \(\texttt {\$mem}\) is an abstraction representing a set of memory locations, or mem-set . The state of a CIVL-C thread includes a stack of mem-sets for writes and a stack for reads. Both stacks are initially empty. The function \(\texttt {\$write\texttt {\_}{}set\texttt {\_}{}push}\) pushes a new empty mem-set onto the write stack. At any point when a memory location is modified, the location is added to the top entry on the write stack. Function \(\texttt {\$write\texttt {\_}{}set\texttt {\_}{}pop}\) pops the write stack, returning the top mem-set. The corresponding functions for the read stack are \(\texttt {\$read\texttt {\_}{}set\texttt {\_}{}push}\) and \(\texttt {\$read\texttt {\_}{}set\texttt {\_}{}pop}\) . The library also provides various operations on mem-sets, such as \(\texttt {\$mem\texttt {\_}{}disjoint}\) , which consumes two mem-sets and returns true if the intersection of the two mem-sets is empty.

Translation of #pragma omp parallel S
3.3 Transformation for Data Race Detection
The basic structure for the transformation of a parallel construct is shown in Fig. 2 . The user specifies on the command line the default number of threads to use in a parallel region. After this, two shared arrays are allocated, one to record the read set for each thread, and the other the write set. Rather than updating these arrays immediately with each read and write event, a thread updates them only at specific points, in such a way that the shared sets are current whenever a data race check is performed.
The auxiliary function check_conflict asserts no read-write or write-write conflict exists between threads i and j . Function check_and_clear_all checks that no conflict exists between any two threads and clears the shared mem-sets.
Each thread executes function run . A local copy of each private variable is declared (and, for firstprivate variables, initialized) here. The body of this function is enclosed in a local region. The thread begins by pushing new entries onto its read and write stacks. As explained in Sect. 3.2 , this turns on memory access tracking. The body S is transformed in several ways. First, references to the private variable are replaced by references to the local copy. Other OpenMP constructs are translated as follows.

The thread first pops its stacks, updating its shared mem-sets. At this point, the shared structures are up-to-date, and the thread uses them to check for conflicts with other threads. This conforms with Definition 7 (2), that a race check occur upon arrival at an acquire location. It then yields to other threads as it attempts to acquire lock \(l\) . Once acquired, it pushes new empty entries onto its stack and resumes tracking. A release statement becomes

It is similar to the acquire case, except that the check occurs upon leaving the release location, i.e., after the yield. A similar sequence is inserted in any loop (e.g., a while loop or a for loop not in standard form) that may create a cycle in the local space, only without the release statement.
Barriers. An explicit or implicit barrier in S becomes

The CIVL-C \(\texttt {\$barrier\texttt {\_}{}call}\) function must be invoked outside of a local region, as it may block. Once all threads are in the barrier, a single thread (0) checks for conflicts and clears all the shared mem-sets. A second barrier call is used to prevent other threads from racing ahead before this check and clear is complete. This protocol mimics the events that take place atomically with an \(\textsf {exit}_0\) transition in Sect. 2 .
Atomic and Critical Sections. An OpenMP atomic construct is modeled by introducing a global “atomic lock” which is acquired before executing the atomic statement and then released. The acquire and release actions are then transformed as described above. Similarly, a lock is introduced for each critical section name (and the anonymous critical section); this lock is acquired before entering a critical section with that name and released when departing.
Worksharing Constructs. Upon arriving at a for construct, a thread invokes a function that returns the set of iterations for which the thread is responsible. The partitioning of the iteration space among the threads is controlled by the construct clauses and various command line options. If the construct specifies the distribution strategy precisely, then the model uses only that distribution. If the construct does not specify the distribution, then the decisions are based on command line options. One option is to explore all possible distributions. In this case, when the first thread arrives, a series of nondeterministic choices is made to construct an arbitrary distribution. The verifier explores all possible choices, and therefore all possible distributions. This enables a complete analysis of the loop’s execution space, but at the expense of a combinatorial explosion with the number of threads or iterations. A different command line option allows the user to specify a particular default distribution strategy, such as cyclic . These options give the user some control over the completeness-tractability tradeoff. For sections , only cyclic distribution is currently supported, and a single construct is executed by the first thread to arrive at the construct.
3.4 Evaluation
We applied our verifier to a suite comprised of benchmarks from DataRaceBench (DRB) version 1.3.2 [ 35 ] and some examples written by us that use different concurrency patterns. As a basis for comparison, we applied a state-of-the-art static analyzer for OpenMP race detection, LLOV v.0.3 [ 10 ], to the same suite. Footnote 2
LLOV v.0.3 implements two static analyses. The first uses polyhedral analysis to identify data races due to loop-carried dependencies within OpenMP parallel loops [ 9 ]. It is unable to identify data races involving critical sections, atomic operations, master or single directives, or barriers. The second is a phase interval analysis to identify statements or basic blocks (and consequently memory accesses within those blocks) that may happen in parallel [ 10 ]. Phases are separated by explicit or implicit barriers and the minimum and maximum phase in which a statement or basic block may execute define the phase interval. The phase interval analysis errs in favor of reporting accesses as potentially happening in parallel whenever it cannot prove that they do not; consequently, it may produce false alarms.
The DRB suite exercises a wide array of OpenMP language features. Of the 172 benchmarks, 88 use only the language primitives supported by our CIVL OpenMP transformer (see Sect. 3.1 ). Some of the main reasons benchmarks were excluded include: use of C++, simd and task directives, and directives for GPU programming. All 88 programs also use only features supported by LLOV. Of the 88, 47 have data races and 41 are labeled race-free.
We executed CIVL on the 88 programs, with the default number of OpenMP threads for a parallel region bounded by 8 (with a few exceptions, described below). We chose cyclic distribution as the default for OpenMP for loops. Many of the programs consume positive integer inputs or have clear hard-coded integer parameters. We manually instrumented 68 of the 88, inserting a few lines of CIVL-C code, protected by a preprocessor macro that is defined only when the program is verified by CIVL. This code allows each parameter to be specified on the CIVL command line, either as a single value or by specifying a range. In a few cases (e.g., DRB055), “magic numbers” such as 500 appear in multiple places, which we replaced with an input parameter controlled by CIVL. These modifications are consistent with the “small scope” approach to verification, which requires some manual effort to properly parameterize the program so that the “scope” can be controlled.
We used the range 1..10 for inputs, again with a few exceptions. In three cases, verification did not complete within 3Â min and we lowered these bounds as follows: for DRB043, thread bound 8 and input bound 4; for the Jacobi iteration kernel DRB058, thread bound 4 and bound of 5 on both the matrix size and number of iterations; for DRB062, thread bound 4 and input bound 5.
CIVL correctly identified 40 of the 41 data-race-free programs, failing only on DRB139 due to nested parallel regions. It correctly reported a data race for 45 of the 47 programs with data races, missing only DRB014 (Fig. 3 , middle) and DRB015. In both cases, CIVL reports a bound issue for an access to b[i][j-1] when \(\texttt {i}>0\) and \(\texttt {j}=0\) , but fails to report a data race, even when bound checking is disabled.
LLOV correctly identified 46 of the 47 programs with data races, failing to report a data race for DRB140 (Fig. 3 , left). The semantics for reduction specify that the loop behaves as if each thread creates a private copy, initially 0, of the shared variable a , and updates this private copy in the loop body. At the end of the loop, the thread adds its local copy onto the original shared variable. These final additions are guaranteed to not race with each other. In CIVL, this is modeled using a lock. However, there is no guarantee that these updates do not race with other code. In this example, thread 0 could be executing the assignment a=0 while another thread is adding its local result to a —a data race. This race issue can be resolved by isolating the reduction loop with barriers.
LLOV correctly identified 38 out of 41 data-race-free programs. It reported false alarms for DRB052 (no support for indirect addressing), DRB054 (failure to propagate array dimensions and loop bounds from a variable assignment), and DRB069 (failure to properly model OpenMP lock behavior).

Excerpts from three benchmarks with data races: two from DataRaceBench (left and middle) and erroneous 1d-diffusion (right).

Code for synchronization using an atomic variable (left) and a 2-thread barrier using locks (right).
The DRB suite contains few examples with interesting interleaving dependencies or pointer alias issues. To complement the suite, we wrote 10 additional C/OpenMP programs based on widely-used concurrency patterns (cf. [ 1 ]):
3 implementations of a synchronization signal sent from one thread to another, using locks or busy-wait loops with critical sections or atomics;
3 implementations of a 2-thread barrier, using busy-wait loops or locks;
2 implementations of a 1d-diffusion simulation, one in which two copies of the main array are created by two separate malloc calls; one in which they are inside a single malloced object; and
an instance of a single-producer, single-consumer pattern; and a multiple-producer, multiple-consumer version, both using critical sections.
For each program, we created an erroneous version with a data race, for a total of 20 tests. These codes are included in the experimental archive, and two are excerpted in Fig. 4 .
CIVL obtains the expected result in all 20. While we wrote these additional examples to verify that CIVL can reason correctly about programs with complex interleaving semantics or alias issues, for completeness we also evaluated them with LLOV. It should be noted, however, that the authors of LLOV warn that it “...does not provide support for the OpenMP constructs for synchronization...” and “...can produce False Positives for programs with explicit synchronizations with barriers and locks.” [ 9 ] It is therefore unsurprising that the results were somewhat mixed: LLOV produced no output for 6 of our examples (the racy and race-free versions of diffusion2 and the two producer-consumer codes) and produced the correct answer on 7 of the remaning 14. On these problems, LLOV reported a race for both the racy and race-free version, with the exception of diffusion1 (Fig. 3 , right), where a failure to detect the alias between u and v leads it to report both versions as race-free.
CIVL’s verification time is significantly longer than LLOV’s. On the DRB benchmarks, total CIVL time for the 88 tests was 27 min. Individual times ranged from 1 to 150 seconds: 66 took less than 5s, 80 took less than 30s, and 82 took less than 1 min. (All CIVL runs used an M1 MacBook Pro with 16GB memory.) Total CIVL runtime on the 20 extra tests was 210s. LLOV analyzes all 88 DRB problems in less than 15 s (on a standard Linux machine).
4 Related Work
By Theorem 1 , if barriers are the only form of synchronization used in a program, only a single interleaving will be explored, and this suffices to verify race-freedom or to find all states at the end of each barrier epoch. This is well known in other contexts, such as GPU kernel verification (cf. [ 5 ]).
Prior work involving model checking and data races for unstructured concurrency includes Schemmel et al. [ 29 ]. This work describes a technique, using symbolic execution and POR, to detect defects in Pthreads programs. The approach involves intricate algorithms for enumerating configurations of prime event structures, each representing a set of executions. The completeness results deal with the detection of defects under the assumption that the program is race-free. While the implementation does check for data races, it is not clear that the theoretical results guarantee a race will be found if one exists.
Earlier work of Elmas et al. describes a sound and precise technique for verifying race-freedom in finite-state lock-based programs [ 16 ]. It uses a bespoke POR-based model checking algorithm that associates significant and complex information with the state, including, for each shared memory location, a set of locks a thread should hold when accessing that location, and a reference to the node in the depth first search stack from which the last access to that location was performed.
Both of these model checking approaches are considerably more complex than the approach of this paper. We have defined a simple state-transition system and shown that a program has a data race if and only if a state or edge satisfying a certain condition is reachable in that system. Our approach is agnostic to the choice of algorithm used to check reachability. The earlier approaches are also path-precise for race detection, i.e., for each execution path, a race is detected if and only if one exists on that path. As we saw in the example following Theorem 1 , our approach is not path-precise, nor does it have to be: to verify race-freedom, it is only necessary to find one race in one execution, if one exists. This partly explains the relative simplicity of our approach.
A common approach for verifying race-freedom is to establish consistent correlation : for each shared memory location, there is some lock that is held whenever that location is accessed. Locksmith [ 27 ] is a static analysis tool for multithreaded C programs that takes this approach. The approach should never report that a racy program is race-free, but can generate false alarms, since there are race-free programs that are not consistently correlated. False alarms can also arise from imprecise approximations of the set of shared variables, alias analysis, and so on. Nevertheless, the technique appears very effective in practice.
Static analysis-based race-detection tools for OpenMP include OMPRacer [ 33 ]. OMPRacer constructs a static graph representation of the happens-before relation of a program and analyzes this graph, together with a novel whole-program pointer analysis and a lockset analysis, to detect races. It may miss violations as a consequence of unsound decisions that aim to improve performance on real applications. The tool is not open source. The authors subsequently released OpenRace [ 34 ], designed to be extensible to other parallelism dialects; similar to OMPRacer, OpenRace may miss violations. Prior papers by the authors present details of static methods for race detection, without a tool that implements these methods [ 32 ].
PolyOMPÂ [ 12 ] is a static tool that uses a polyhedral model adapted for a subset of OpenMP. Like most polyhedral approaches, it works best for affine loops and is precise in such cases. The tool additionally supports may-write access relations for non-affine loops, but may report false alarms in that case. DRACOÂ [ 36 ] also uses a polyhedral model and has similar drawbacks.
Hybrid static and dynamic tools include Dynamatic [ 14 ], which is based on LLVM. It combines a static tool that finds candidate races, which are subsequently confirmed with a dynamic tool. Dynamatic may report false alarms and miss violations.
ARCHER [ 2 ] is a tool that statically determines many sequential or provably non-racy code sections and excludes them from dynamic analysis, then uses TSan [ 30 ] for dynamic race detection. To avoid false alarms, ARCHER also encodes information about OpenMP barriers that are otherwise not understood by TSan. A follow-up paper discusses the use of the OMPT interface to aid dynamic race detection tools in correctly identifying issues in OpenMP programs [ 28 ], as well as SWORD [ 3 ], a dynamic tool that can stay within user-defined memory bounds when tracking races, by capturing a summary on disk for later analysis.
ROMP [ 18 ] is a dynamic/static tool that instruments executables using the DynInst library to add checks for each memory access and uses the OMPT interface at runtime. It claims to support all of OpenMP except target and simd constructs, and models “logical” races even if they are not triggered because the conflicting accesses happen to be scheduled on the same thread. Other approaches for dynamic race detection and tricks for memory and run-time efficient race bookkeeping during execution are described in [ 11 , 19 , 20 , 24 ].
Deductive verification approaches have also been applied to OpenMP programs. An example is [ 6 ], which introduces an intermediate parallel language and a specification language based on permission-based separation logic. C programs that use a subset of OpenMP are manually annotated with “iteration contracts” and then automatically translated into the intermediate form and verified using VerCors and Viper. Successfully verified programs are guaranteed to be race-free. While these approaches require more work from the user, they do not require bounding the number of threads or other parameters.
5 Conclusion
In this paper, we introduced a simple model-checking technique to verify that a program is free from data races. The essential ideas are (1) each thread “remembers” the accesses it performed since its last synchronization operation, (2) a partial order reduction scheme is used that treats all memory accesses as local, and (3) checks for conflicting accesses are performed around synchronizations. We proved our technique is sound and precise for finite-state models, using a simple mathematical model for multithreaded programs with locks and barriers. We implemented our technique in a prototype tool based on the CIVL symbolic execution and model checking platform and applied it to a suite of C/OpenMP programs from DataRaceBench. Although based on completely different techniques, our tool achieved performance comparable to that of the state-of-the-art static analysis tool, LLOV v.0.3.
Limitations of our tool include incomplete coverage of the OpenMP specification (e.g., target , simd , and task directives are not supported); the need for some manual instrumentation; the potential for state explosion necessitating small scopes; and a combinatorial explosion in the mappings of threads to loop iterations, OpenMP sections, or single constructs. In the last case, we have compromised soundness by selecting one mapping, but in future work we will explore ways to efficiently cover this space. On the other hand, in contrast to LLOV and because of the reliance on model checking and symbolic execution, we were able to verify the presence or absence of data races even for programs using unstructured synchronization with locks, critical sections, and atomics, including barrier algorithms and producer-consumer code.
While there are a number of effective dynamic race detectors, the goal of those tools is to detect races on a particular execution. Our goal is more aligned with that of static analyzers: to cover as many executions as possible, including for different inputs, number of threads, and thread interleavings.
Andrews, G.R.: Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley (2000). https://www.pearson.ch/HigherEducation/Pearson/EAN/9780201357523/Foundations-of-Multithreaded-Parallel-and-Distributed-Programming
Atzeni, S., et al.: ARCHER: Effectively spotting data races in large OpenMP applications. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 53–62 (2016). https://doi.org/10.1109/IPDPS.2016.68
Atzeni, S., Gopalakrishnan, G., Rakamaric, Z., Laguna, I., Lee, G.L., Ahn, D.H.: SWORD: A bounded memory-overhead detector of OpenMP data races in production runs. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 845–854 (2018). https://doi.org/10.1109/IPDPS.2018.00094
Bernstein, A.J.: Analysis of programs for parallel processing. IEEE Trans. Electronic Comput. EC 15 (5), 757–763 (1966). https://doi.org/10.1109/PGEC.1966.264565
CrossRef  MATH  Google Scholar Â
Betts, A., et al.: The design and implementation of a verification technique for GPU kernels. ACM Trans. Program. Lang. Syst. 37 (3) (2015). https://doi.org/10.1145/2743017
Blom, S., Darabi, S., Huisman, M., Safari, M.: Correct program parallelisations. Int. J. Softw. Tools Technol. Trans. 23 (5), 741–763 (2021). https://doi.org/10.1007/s10009-020-00601-z
CrossRef  Google Scholar Â
Boehm, H.J.: How to miscompile programs with "benign" data races. In: Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism, HotPar 2011, pp. 1–6. USENIX Association, Berkeley, CA, USA (2011). http://dl.acm.org/citation.cfm?id=2001252.2001255
Boehm, H.J., Adve, S.V.: Foundations of the C++ concurrency memory model. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 68–78. PLDI ’08, Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1375581.1375591
Bora, U., Das, S., Kukreja, P., Joshi, S., Upadrasta, R., Rajopadhye, S.: LLOV: A fast static data-race checker for OpenMP programs. ACM Trans. Archit. Code Optimiz. (TACO) 17 (4), 1–26 (2020). https://doi.org/10.1145/3418597
Bora, U., Vaishay, S., Joshi, S., Upadrasta, R.: OpenMP aware MHP analysis for improved static data-race detection. In: 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC). pp. 1–11 (2021). https://doi.org/10.1109/LLVMHPC54804.2021.00006
Boushehrinejadmoradi, N., Yoga, A., Nagarakatte, S.: On-the-fly data race detection with the enhanced openmp series-parallel graph. In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 149–164. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_10
Chatarasi, P., Shirako, J., Kong, M., Sarkar, V.: An extended polyhedral model for spmd programs and its use in static data race detection. In: Ding, C., Criswell, J., Wu, P. (eds.) LCPC 2016. LNCS, vol. 10136, pp. 106–120. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52709-3_10
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5 (1), 46–55 (1998). https://doi.org/10.1109/99.660313
Davis, M.J.: Dynamatic: An OpenMP Race Detection Tool Combining Static and Dynamic Analysis. Undergraduate research scholars thesis, Texas A &M University (2021). https://oaktrust.library.tamu.edu/handle/1969.1/194411
Edmund M. Clarke, J., Grumberg, O., Kroening, D., Peled, D., Veith, H.: Model Checking, 2 edn. MIT press, Cambridge, MA, USA (2018). https://mitpress.mit.edu/books/model-checking-second-edition
Elmas, T., Qadeer, S., Tasiran, S.: Precise race detection and efficient model checking using locksets. Tech. Rep. MSR-TR-2005-118, Microsoft Research (2006). https://www.microsoft.com/en-us/research/publication/precise-race-detection-and-efficient-model-checking-using-locksets/
Godefroid, P. (ed.): Partial-Order Methods for the Verification of Concurrent Systems. LNCS, vol. 1032. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-60761-7
Gu, Y., Mellor-Crummey, J.: Dynamic data race detection for OpenMP programs. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (2018). https://doi.org/10.1109/SC.2018.00064
Ha, O.-K., Jun, Y.-K.: Efficient thread labeling for on-the-fly race detection of programs with nested parallelism. In: Kim, T., Adeli, H., Kim, H., Kang, H., Kim, K.J., Kiumi, A., Kang, B.-H. (eds.) ASEA 2011. CCIS, vol. 257, pp. 424–436. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-27207-3_47
Ha, O.K., Kuh, I.B., Tchamgoue, G.M., Jun, Y.K.: On-the-fly detection of data races in OpenMP programs. In: Proceedings of the 2012 Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, pp. 1–10. PADTAD 2012, Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2338967.2336808
International Organization for Standardization: ISO/IEC 9899:2018. Information technology – Programming languages – C (2018). https://www.iso.org/standard/74528.html
Lamport, L.: How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28 (9), 690–691 (1979). https://doi.org/10.1109/TC.1979.1675439
Manson, J., Pugh, W., Adve, S.V.: The Java memory model. In: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 378–391. POPL ’05, Association for Computing Machinery, New York (2005). https://doi.org/10.1145/1040305.1040336
Mellor-Crummey, J.: On-the-fly detection of data races for programs with nested fork-join parallelism. In: Supercomputing 1991: Proceedings of the 1991 ACM/IEEE Conference On Supercomputing, pp. 24–33. IEEE (1991). https://doi.org/10.1145/125826.125861
Open Group: IEEE Std 1003.1: Standard for information technology–Portable Operating System Interface (POSIX(R)) base specifications, issue 7: General concepts: Memory synchronization (2018). https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12
OpenMP Architecture Review Board: OpenMP Application Programming Interface (Nov 2021). https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf, version 5.2
Pratikakis, P., Foster, J.S., Hicks, M.: LOCKSMITH: Practical static race detection for C. ACM Trans. Program. Lang. Syst. 33 , 3:1–3:55 (2011). https://doi.org/10.1145/1889997.1890000
Protze, J., Hahnfeld, J., Ahn, D.H., Schulz, M., Müller, M.S.: OpenMP tools interface: synchronization information for data race detection. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 249–265. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_17
Schemmel, D., BĂĽning, J., RodrĂguez, C., Laprell, D., Wehrle, K.: Symbolic partial-order execution for testing multi-threaded programs. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12224, pp. 376–400. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53288-8_18
Serebryany, K., Iskhodzhanov, T.: ThreadSanitizer: Data race detection in practice. In: Proceedings of the Workshop on Binary Instrumentation and Applications, pp. 62–71. WBIA 2009. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1791194.1791203
Siegel, S.F., et al.: CIVL: The concurrency intermediate verification language. In: SC15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, New York (Nov 2015). https://doi.org/10.1145/2807591.2807635 , article no. 61, pages 1-12
Swain, B., Huang, J.: Towards incremental static race detection in OpenMP programs. In: 2018 IEEE/ACM 2nd International Workshop on Software Correctness for HPC Applications (Correctness), pp. 33–41. IEEE (2018). https://doi.org/10.1109/Correctness.2018.00009
Swain, B., Li, Y., Liu, P., Laguna, I., Georgakoudis, G., Huang, J.: OMPRacer: A scalable and precise static race detector for OpenMP programs. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020). https://doi.org/10.1109/SC41405.2020.00058
Swain, B., Liu, B., Liu, P., Li, Y., Crump, A., Khera, R., Huang, J.: OpenRace: An open source framework for statically detecting data races. In: 2021 IEEE/ACM 5th International Workshop on Software Correctness for HPC Applications (Correctness), pp. 25–32. IEEE (2021). https://doi.org/10.1109/Correctness54621.2021.00009
Verma, G., Shi, Y., Liao, C., Chapman, B., Yan, Y.: Enhancing DataRaceBench for evaluating data race detection tools. In: 2020 IEEE/ACM 4th International Workshop on Software Correctness for HPC Applications (Correctness), pp. 20–30 (2020). https://doi.org/10.1109/Correctness51934.2020.00008
Ye, F., Schordan, M., Liao, C., Lin, P.H., Karlin, I., Sarkar, V.: Using polyhedral analysis to verify OpenMP applications are data race free. In: 2018 IEEE/ACM 2nd International Workshop on Software Correctness for HPC Applications (Correctness), pp. 42–50. IEEE (2018). https://doi.org/10.1109/Correctness.2018.00010
Download references
Acknowledgements
This material is based upon work by the RAPIDS Institute, supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program, under contract DE-AC02-06CH11357 and award DE-SC0021162. Support was also provided by U.S. National Science Foundation awards CCF-1955852 and CCF-2019309.
Author information
Authors and affiliations.
University of Delaware, Newark, DE, 19716, USA
Wenhao Wu, Ziqing Luo & Stephen F. Siegel
Argonne National Laboratory, Lemont, IL, 60439, USA
Jan Hückelheim & Paul D. Hovland
You can also search for this author in PubMed  Google Scholar
Corresponding author
Correspondence to Wenhao Wu .
Editor information
Editors and affiliations.
LIX, Ecole Polytechnique, CNRS and Institut Polytechnique de Paris, Palaiseau, France
Constantin Enea
Microsoft Research, Bangalore, India
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Reprints and Permissions
Copyright information
© 2023 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply
About this paper
Cite this paper.
Wu, W., Hückelheim, J., Hovland, P.D., Luo, Z., Siegel, S.F. (2023). Model Checking Race-Freedom When “Sequential Consistency for Data-Race-Free Programs” is Guaranteed. In: Enea, C., Lal, A. (eds) Computer Aided Verification. CAV 2023. Lecture Notes in Computer Science, vol 13965. Springer, Cham. https://doi.org/10.1007/978-3-031-37703-7_13
Download citation
DOI : https://doi.org/10.1007/978-3-031-37703-7_13
Published : 18 July 2023
Publisher Name : Springer, Cham
Print ISBN : 978-3-031-37702-0
Online ISBN : 978-3-031-37703-7
eBook Packages : Computer Science Computer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Find a journal
- Publish with us
Data Race Detector
Introduction.
Data races are among the most common and hardest to debug types of bugs in concurrent systems. A data race occurs when two goroutines access the same variable concurrently and at least one of the accesses is a write. See the The Go Memory Model for details.
Here is an example of a data race that can lead to crashes and memory corruption:
To help diagnose such bugs, Go includes a built-in data race detector. To use it, add the -race flag to the go command:
Report Format
When the race detector finds a data race in the program, it prints a report. The report contains stack traces for conflicting accesses, as well as stacks where the involved goroutines were created. Here is an example:
The GORACE environment variable sets race detector options. The format is:
The options are:
- log_path (default stderr ): The race detector writes its report to a file named log_path. pid . The special names stdout and stderr cause reports to be written to standard output and standard error, respectively.
- exitcode (default 66 ): The exit status to use when exiting after a detected race.
- strip_path_prefix (default "" ): Strip this prefix from all reported file paths, to make reports more concise.
- history_size (default 1 ): The per-goroutine memory access history is 32K * 2**history_size elements . Increasing this value can avoid a "failed to restore the stack" error in reports, at the cost of increased memory usage.
- halt_on_error (default 0 ): Controls whether the program exits after reporting first data race.
- atexit_sleep_ms (default 1000 ): Amount of milliseconds to sleep in the main goroutine before exiting.
Excluding Tests
When you build with -race flag, the go command defines additional build tag race . You can use the tag to exclude some code and tests when running the race detector. Some examples:
To start, run your tests using the race detector ( go test -race ). The race detector only finds races that happen at runtime, so it can't find races in code paths that are not executed. If your tests have incomplete coverage, you may find more races by running a binary built with -race under a realistic workload.
Typical Data Races
Here are some typical data races. All of them can be detected with the race detector.
Race on loop counter
The variable i in the function literal is the same variable used by the loop, so the read in the goroutine races with the loop increment. (This program typically prints 55555, not 01234.) The program can be fixed by making a copy of the variable:

Accidentally shared variable
The fix is to introduce new variables in the goroutines (note the use of := ):
Unprotected global variable
If the following code is called from several goroutines, it leads to races on the service map. Concurrent reads and writes of the same map are not safe:
To make the code safe, protect the accesses with a mutex:
Primitive unprotected variable
Data races can happen on variables of primitive types as well ( bool , int , int64 , etc.), as in this example:
Even such "innocent" data races can lead to hard-to-debug problems caused by non-atomicity of the memory accesses, interference with compiler optimizations, or reordering issues accessing processor memory .
A typical fix for this race is to use a channel or a mutex. To preserve the lock-free behavior, one can also use the sync/atomic package.
Unsynchronized send and close operations
As this example demonstrates, unsynchronized send and close operations on the same channel can also be a race condition:
According to the Go memory model, a send on a channel happens before the corresponding receive from that channel completes. To synchronize send and close operations, use a receive operation that guarantees the send is done before the close:
Requirements
The race detector requires cgo to be enabled, and on non-Darwin systems requires an installed C compiler. The race detector supports linux/amd64 , linux/ppc64le , linux/arm64 , freebsd/amd64 , netbsd/amd64 , darwin/amd64 , darwin/arm64 , and windows/amd64 .
On Windows, the race detector runtime is sensitive to the version of the C compiler installed; as of Go 1.21, building a program with -race requires a C compiler that incorporates version 8 or later of the mingw-w64 runtime libraries. You can test your C compiler by invoking it with the arguments --print-file-name libsynchronization.a . A newer compliant C compiler will print a full path for this library, whereas older C compilers will just echo the argument.
Runtime Overhead
The cost of race detection varies by program, but for a typical program, memory usage may increase by 5-10x and execution time by 2-20x.
The race detector currently allocates an extra 8 bytes per defer and recover statement. Those extra allocations are not recovered until the goroutine exits . This means that if you have a long-running goroutine that is periodically issuing defer and recover calls, the program memory usage may grow without bound. These memory allocations will not show up in the output of runtime.ReadMemStats or runtime/pprof .
Browser not supported
This probably isn't the experience you were expecting. Internet Explorer isn't supported on Uber.com. Try switching to a different browser to view our site.

Come reimagine with us
Data Race Patterns in Go

Uber has adopted Golang (Go for short) as a primary programming language for developing microservices. Our Go monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Go services (and growing).
Go makes concurrency a first-class citizen; prefixing function calls with the go keyword runs the call asynchronously. These asynchronous function calls in Go are called goroutines. Developers hide latency (e.g., IO or RPC calls to other services) by creating goroutines. Two or more goroutines can communicate data either via message passing ( channels ) or shared memory . Shared memory happens to be the most commonly used means of data communication in Go.
Goroutines are considered “ lightweight ” and since they are easy to create, Go programmers use goroutines liberally. As a result, we notice that programs written in Go, typically, expose significantly more concurrency than programs written in other languages . For example, by scanning hundreds of thousands of microservice instances running our data centers, we found out that Go microservices expose ~8x more concurrency compared to Java microservices. Higher concurrency also means potential for more concurrency bugs. A data race is a concurrency bug that occurs when two or more goroutines access the same datum, at least one of them is a write, and there is no ordering between them. Data races are insidious bugs and must be avoided at all costs .
We developed a system to detect data races at Uber using a dynamic data race detection technique. This system, over a period of six months, detected about 2,000 data races in our Go code base, of which our developers already fixed ~1,100 data races.
In this blog, we will show various data race patterns we found in our Go programs. This study was conducted by analyzing over 1,100 data races fixed by 210 unique developers over a six-month period. Overall, we noticed that Go makes it easier to introduce data races, due to certain language design choices. There is a complicated interplay between the language features and data races.Â
We investigated each of the ~1,100 data races fixed by our developers and bucketed them into different categories. Our study of these data races showed some common patterns and some arcane reasons that cause data races in Go:
1. Go’s design choice to transparently capture free variables by reference in goroutines is a recipe for data races
Nested functions (a.k.a., closures ), in Go transparently capture all free variables by reference. The programmer does not explicitly specify which free variables are captured in the closure syntax.
This mode of usage is different from Java and C++. Java lambdas only capture by value and they consciously took that design choice to avoid concurrency bugs [ 1 , 2 ]. C++ requires developers to explicitly specify capture by value or by reference.
Developers are quite often unaware that a variable used inside a closure is a free variable and captured by reference, especially when the closure is large. More often than not, Go developers use closures as goroutines. As a result of capture-by-reference and goroutine concurrency, Go programs end up potentially having unordered accesses to free variables unless explicit synchronization is performed. We demonstrate this with the following three examples:
Example 1: Data race due to loop index variable capture
The code in Figure 1A shows iterating over a Go slice jobs and processing each element job via the ProcessJob function.

Here, the developer wraps the expensive ProcessJob in an anonymous goroutine that is launched once per item. However, the loop index variable job is captured by reference inside the goroutine. When the goroutine launched for the first loop iteration is accessing the job variable, the for loop in the parent goroutine will be advancing through the slice and updating the same loop-index variable job to point to the second element in the slice, causing a data race. This type of data race happens for value and reference types; slices, array, and maps; and read-only and write accesses in the loop body. Go recommends a coding idiom to hide and privatize the loop index variable in the loop body, which is not always followed by developers, unfortunately.
Example 2: Data race due to idiomatic err variable capture.Â

Go advocates multiple return values from functions. It is common to return the actual return value(s) and an error object to indicate if there was an error as shown in Figure 1B. The actual return value is considered to be meaningful if and only if the error value is nil . It is a common practice to assign the returned error object to a variable named err followed by checking for its nilness. However, since multiple error-returning functions can be called inside a function body, there will be several assignments to the err variable followed by the nilness check each time. When developers mix this idiom with a goroutine, the err variable gets captured by reference in the closure. As a result, the accesses (both read and write) to err in the goroutine run concurrently with subsequent reads and writes to the same err variable in the enclosing function (or multiple instances of the goroutine), which causes a data race.Â
Example 3: Data race due to named return variable capture.

Go introduces a syntactic sugar called named return values . The named return variables are treated as variables defined at the top of the function, whose scope outlives the body of the function. A return statement without arguments, known as a “naked” return, returns the named return values. In the presence of a closure, mixing normal (non-naked) returns with named returns or using a deferred return in a function with a named return is risky, as it can introduce a data race.Â
The function NamedReturnCallee in Figure 1C returns an integer, and the return variable is named as result . The rest of the function body can read and write to result without having to declare it because of this syntax. If the function returns at line 4, which is a naked return, as a result of the assignment result=10 on line 2, the caller at line 13 would see the return value of 10. The compiler arranges for copying result to retVal . A named return function can also use the standard return syntax, as shown on line 9. This syntax makes the compiler copy the return value, 20, in the return statement to be assigned to the named return variable result. Line 6 creates a goroutine, which captures the named return variable result . In setting up this goroutine, even a concurrency expert might believe that the read from result on line 7 is safe because there is no other write to the same variable; the statement return 20 on line 9 is, after all, a constant return, and does not seem to touch the named return variable, result . The code generation, however, turns the return 20 statement into a write to result , as previously mentioned. Now we suddenly have a concurrent read and a write to the shared result variable, a case of a data race.
2. Slices are confusing types that create subtle and hard-to-diagnose data races
Slices are dynamic arrays and reference types. Internally, a slice contains a pointer to the underlying array, its current length, and the maximum capacity to which the underlying array can expand. For ease of discussion, we refer to these variables as meta fields of a slice. A common operation on a slice is to grow it via the append operation. When the size reaches the capacity, a new allocation (e.g., double the current size) is made, and the meta fields are updated. When a slice is concurrently accessed by goroutines, it is natural to protect accesses to it by a mutex.

In Figure 2, the developer thinks that lock-protecting the slice append on line 6 is sufficient to protect from data race. However, a data race happens when a slice is passed as an argument to the goroutine on line 14, which is not lock-protected. The invocation of the goroutine causes the meta fields of the slice to be copied from the call site (line 14) to the callee (line 11). Given that a slice is a reference type, the developer assumed its passing (copying) to a callee caused a data race. However, a slice is not the same as a pointer type (the meta fields are copied by value) and hence the subtle data race.
3. Concurrent accesses to Go’s built-in, thread-unsafe maps cause frequent data racesÂ

Hash table ( map ) is a built-in language feature in Go and is not thread-safe. If multiple goroutines simultaneously access the same hash table with at least one of them trying to modify the hash table (insert or delete an item), data race ensues. We observe that developers make a fundamental, but common assumption (and a mistake) that the different entries in the hash table can be concurrently accessed. This assumption stems from developers viewing the table[key] syntax used for map accesses and misinterpreting them to be accessing disjoint elements. However, a map (hash table), unlike an array or a slice, is a sparse data structure, and accessing one element might result in accessing another element; if during the same process another insertion/deletion happens, it will modify the sparse data structure and cause a data race. We even found more complex concurrent map access data races resulting from the same hash table being passed to deep call paths and developers losing track of the fact that these call paths mutate the hash table via asynchronous goroutines. Figure 3 shows an example of this type of data race.
While hashtable leading to data races is not unique to Go, the following reasons make it more prone to data races in Go:
- Go developers use maps more frequently than the developers in other languages because the map is a built-in language construct. For example, in our Java repository, we found 4,389 map constructs per MLoC, whereas the same for Go is 5,950 per MLoC, which is 1.34x higher.Â
- The hash table access syntax is just like array-access syntax (unlike Java’s get/put APIs), making it easy to use and hence accidentally confused for a random access data structure. In Go, a non-existing map element can easily be queried with the table[key] syntax, which simply returns the default value without producing any error. This error tolerance makes developers complacent when using Go map.
4. Go developers often err on the side of pass-by-value (or methods over values), which can cause non-trivial data races
Pass-by-value semantics are recommended in Go because it simplifies escape analysis and gives variables a better chance to be allocated on the stack, which reduces pressure on the garbage collector.Â
Unlike Java, where all objects are reference types, in Go, an object can be a value type (struct) or a reference type (interface). There is no syntactic difference, and this leads to incorrect use of synchronization constructs such as sync.Mutex and sync.RWMutex , which are value types (structures) in Go. If a function creates a mutex structure and passes by value to multiple goroutine invocations, those concurrent executions of the goroutines operate on distinct mutex objects, which share no internal state. This defeats mutually exclusive access to the shared memory region that is guarded, exemplified in Figure 4 below.

Since Go syntax is the same for invoking a method over pointers vs. values, less attention is given by the developer to the question that m.Lock() is working on a copy of mutex instead of a pointer. The caller can still invoke these APIs on a mutex value, and the compiler transparently arranges to pass the address of the value. Had this transparency not been there, the bug could have been detected as a compiler type-mismatch error.
A converse of this situation happens when developers accidentally implement a method where the receiver is a pointer to the structure instead of a value/copy of the structure. In these situations, multiple goroutines invoking the method end up accidentally sharing the same internal state of the structure, whereas the developer intended otherwise. Here, also, the caller is unaware that the value type was transparently converted to a pointer type at the receiver.
5. Mixed use of message passing (channels) and shared memory makes code complex and susceptible to data races

Figure 5 shows an example of a generic future implementation by a developer using a channel for signaling and waiting. The future can be started by calling the Start() method, and one can block for the future’s completion by calling the Wait() method on the Future. The Start() method creates a goroutine, which executes a function registered with the Future and records its return values ( response and err ). The goroutine signals the completion of the future to the Wait() method by sending a message on the channel ch as shown on line 6. Symmetrically, the Wait() method blocks to fetch the message from the channel (line 11).
Contexts in Go carry deadlines, cancelation signals, and other request-scoped values across API boundaries and between processes. This is a common pattern in microservices where timelines are set for tasks. Hence, Wait() blocks either on the context being canceled (line 13) or the future to have completed (line 11). Furthermore, the Wait() is wrapped inside a select statement (line 10), which blocks until at least one of the select arms is ready.
If the context times out, the corresponding case records the err field of Future as ErrCancelled on line 14. This write to err races with the write to the same variable in the future on line 5.
6. Go offers more leeway in its group synchronization construct sync.WaitGroup, but the incorrect placement of Add/Done methods leads to data races

The sync.WaitGroup structure is a group synchronization construct in Go. Unlike C++ barrier , pthreads , or Java barrier or latch constructs, the number of participants in a WaitGroup is not determined at the time of construction but is updated dynamically. Three operations are allowed on a WaitGroup object — Add(int) , Done() , and Wait() . Add() increments the count of participants, and the Wait() blocks until Done() is called count number of times (typically once by each participant). WaitGroup is extensively used in Go. As shown previously in Table 1, group synchronization is 1.9x higher in Go than in Java.
In Figure 6, the developer intends to create as many goroutines as the number of elements in the slice itemIds and process the items concurrently. Each goroutine records its success or failure status in results slice at different indices and the parent function blocks at line 12 until all goroutines finish. It then accesses all elements of results serially to count the number of successful processings.
For this code to work correctly, when Wait() is invoked on line 12, the number of registered participants must be already equal to the length of itemIds . This is possible only if wg.Add(1 ) is executed as many times as the length of itemIds prior to invoking wg.Wait() , which means wg.Add(1) should have been placed on line 5, prior to each goroutine invocation. However, the developer incorrectly places wg.Add(1) inside the body of the goroutines on line 7, which is not guaranteed to have been executed by the time the outer function WaitGrpExample invokes Wait() . As a result, there can be fewer than the length of itemIds registered with the WaitGroup when the Wait() is invoked. For that reason, the Wait() can unblock prematurely, and the WaitGrpExample function can start to read from the slice results (line 13) while some goroutines are concurrently writing to the same slice.
We also found data races arising from a premature placement of the wg.Done() call on a Waitgroup . A subtle version of premature wg.Done() is shown below in Figure B, which results from its interaction with Go’s defer statement. When multiple defer statements are encountered, they are executed in the last-in-first-out order. Here, wg.Wait() on line 9 finishes before the doCleanup() runs . The parent goroutine accesses the locationErr on line 10 while the child goroutine may be still writing to the locationErr inside the deferred doCleanup() function (not shown for brevity).

7. Running tests in parallel for Go’s table-driven test suite idiom can often cause data races, either in the product or test code
Testing is a built-in feature in Go. Any function with the prefix Test in a file with suffix _test.go can be run as a test via the Go build system. If the test code calls an API testing.T.Parallel() , it will run concurrently with other such tests. We found a large class of data races happen due to such concurrent test executions. The root causes of these data races were sometimes in the test code and sometimes in the product code. Additionally, within a single Test -prefixed function, Go developers often write many subtests and execute them via the Go-provided suite package. Go recommends a table-driven test suite idiom to write and run a test suite. Our developers extensively write tens or hundreds of subtests in a test, which our systems run in parallel. This idiom becomes a source of problem for a test suite where the developer either assumed serial test execution or lost track of using shared objects in a large complex test suite. Problems also arise when the product API(s) were written without thread safety (perhaps because it was not needed), but were invoked in parallel, violating the assumption.
Summary of Findings
We analyzed the fixed data races to classify the reasons behind them. These issues are tabulated as follows. The labels are not mutually exclusive.

An example for each pattern of a data race is available from this link.
In summary, based on observed (including fixed) data races, we elaborated the Go language paradigms that make it easy to introduce races in Go programs. We hope that our experiences with data races in Go will help Go developers pay more attention to the subtleties of writing concurrent code. Future programming language designers should carefully weigh different language features and coding idioms against their potential to create common or arcane concurrency bugs.Â
Threats to Validity
The discussion herein is based on our experiences with data races in Uber’s Go monorepo and may have missed additional patterns of data races that may happen elsewhere. Also, as dynamic race detection does not detect all possible races due to code and interleaving coverage, our discussion may have missed a few patterns of races. Despite these threats to the universality of our results, the discussion on the patterns in this paper and the deployment experiences hold independently.
This is the second of a two-part blog post series on our experiences with data race in Go code. An elaborate version of our experiences will appear in the ACM SIGPLAN Programming Languages Design and Implementation (PLDI), 2022. In the first part of the blog series, we discuss our learnings pertaining to deploying dynamic race detection at scale for Go code.
Main Image Credit: https://creativecommons.org/licenses/by/2.0/

Milind Chabbi
Milind Chabbi is a Staff Researcher in the Programming Systems Research team at Uber. He leads research initiatives across Uber in the areas of compiler optimizations, high-performance parallel computing, synchronization techniques, and performance analysis tools to make large, complex computing systems reliable and efficient.

Murali Krishna Ramanathan
Murali Krishna Ramanathan is a Senior Staff Software Engineer and leads multiple code quality initiatives across Uber engineering. He is the architect of Piranha, a refactoring tool to automatically delete code due to stale feature flags. His interests are building tooling to address software development challenges with feature flagging, automated code refactoring and developer workflows, and automated test generation for improving software quality.
Posted by Milind Chabbi, Murali Krishna Ramanathan
Related articles

PID Controller for Cinnamon
November 30 / Global

Cinnamon: Using Century Old Tech to Build a Mean Load Shedder
November 22 / Global

NilAway: Practical Nil Panic Detection for Go
November 15 / Global

Our Journey Adopting SPIFFE/SPIRE at Scale
November 9 / Global

Real-Time Analytics for Mobile App Crashes using Apache Pinot
November 2 / Global
Most popular
Auto insurance maintained by uber, washington state driver information, becoming the fairest platform for flexible work.

Information for pickups and dropoffs at Hard Rock Stadium
Resources for driving and delivering with Uber
Experiences and information for people on the move
Ordering meals for delivery is just the beginning with Uber Eats
Putting stores within reach of a world of customers
Transforming the way companies move and feed their people
Taking shipping logistics in a new direction
Moving care forward together with medical providers
Expanding the reach of public transportation
Explore how Uber employees from around the globe are helping us drive the world forward at work and beyond
Engineering
The technology behind Uber Engineering
Community support
Doing the right thing for cities and communities globally
Uber news and updates in your country
Product, how-to, and policy content—and more
Sign up to drive
Sign up to ride.
7. Helgrind: a thread error detector
Table of Contents
To use this tool, you must specify --tool=helgrind on the Valgrind command line.
7.1. Overview
Helgrind is a Valgrind tool for detecting synchronisation errors in C, C++ and Fortran programs that use the POSIX pthreads threading primitives.
The main abstractions in POSIX pthreads are: a set of threads sharing a common address space, thread creation, thread joining, thread exit, mutexes (locks), condition variables (inter-thread event notifications), reader-writer locks, spinlocks, semaphores and barriers.
Helgrind can detect three classes of errors, which are discussed in detail in the next three sections:
Misuses of the POSIX pthreads API.
Potential deadlocks arising from lock ordering problems.
Data races -- accessing memory without adequate locking or synchronisation .
Problems like these often result in unreproducible, timing-dependent crashes, deadlocks and other misbehaviour, and can be difficult to find by other means.
Helgrind is aware of all the pthread abstractions and tracks their effects as accurately as it can. On x86 and amd64 platforms, it understands and partially handles implicit locking arising from the use of the LOCK instruction prefix. On PowerPC/POWER and ARM platforms, it partially handles implicit locking arising from load-linked and store-conditional instruction pairs.
Helgrind works best when your application uses only the POSIX pthreads API. However, if you want to use custom threading primitives, you can describe their behaviour to Helgrind using the ANNOTATE_* macros defined in helgrind.h .
Helgrind also provides Execution Trees memory profiling using the command line option --xtree-memory and the monitor command xtmemory .
Following those is a section containing hints and tips on how to get the best out of Helgrind.
Then there is a summary of command-line options.
Finally, there is a brief summary of areas in which Helgrind could be improved.
7.2. Detected errors: Misuses of the POSIX pthreads API
Helgrind intercepts calls to many POSIX pthreads functions, and is therefore able to report on various common problems. Although these are unglamourous errors, their presence can lead to undefined program behaviour and hard-to-find bugs later on. The detected errors are:
unlocking an invalid mutex
unlocking a not-locked mutex
unlocking a mutex held by a different thread
destroying an invalid or a locked mutex
recursively locking a non-recursive mutex
deallocation of memory that contains a locked mutex
passing mutex arguments to functions expecting reader-writer lock arguments, and vice versa
when a POSIX pthread function fails with an error code that must be handled
when a thread exits whilst still holding locked locks
calling pthread_cond_wait with a not-locked mutex, an invalid mutex, or one locked by a different thread
inconsistent bindings between condition variables and their associated mutexes
invalid or duplicate initialisation of a pthread barrier
initialisation of a pthread barrier on which threads are still waiting
destruction of a pthread barrier object which was never initialised, or on which threads are still waiting
waiting on an uninitialised pthread barrier
for all of the pthreads functions that Helgrind intercepts, an error is reported, along with a stack trace, if the system threading library routine returns an error code, even if Helgrind itself detected no error
Checks pertaining to the validity of mutexes are generally also performed for reader-writer locks.
Various kinds of this-can't-possibly-happen events are also reported. These usually indicate bugs in the system threading library.
Reported errors always contain a primary stack trace indicating where the error was detected. They may also contain auxiliary stack traces giving additional information. In particular, most errors relating to mutexes will also tell you where that mutex first came to Helgrind's attention (the " was first observed at " part), so you have a chance of figuring out which mutex it is referring to. For example:
Helgrind has a way of summarising thread identities, as you see here with the text " Thread #1 ". This is so that it can speak about threads and sets of threads without overwhelming you with details. See below for more information on interpreting error messages.
7.3. Detected errors: Inconsistent Lock Orderings
In this section, and in general, to "acquire" a lock simply means to lock that lock, and to "release" a lock means to unlock it.
Helgrind monitors the order in which threads acquire locks. This allows it to detect potential deadlocks which could arise from the formation of cycles of locks. Detecting such inconsistencies is useful because, whilst actual deadlocks are fairly obvious, potential deadlocks may never be discovered during testing and could later lead to hard-to-diagnose in-service failures.
The simplest example of such a problem is as follows.
Imagine some shared resource R, which, for whatever reason, is guarded by two locks, L1 and L2, which must both be held when R is accessed.
Suppose a thread acquires L1, then L2, and proceeds to access R. The implication of this is that all threads in the program must acquire the two locks in the order first L1 then L2. Not doing so risks deadlock.
The deadlock could happen if two threads -- call them T1 and T2 -- both want to access R. Suppose T1 acquires L1 first, and T2 acquires L2 first. Then T1 tries to acquire L2, and T2 tries to acquire L1, but those locks are both already held. So T1 and T2 become deadlocked.
Helgrind builds a directed graph indicating the order in which locks have been acquired in the past. When a thread acquires a new lock, the graph is updated, and then checked to see if it now contains a cycle. The presence of a cycle indicates a potential deadlock involving the locks in the cycle.
In general, Helgrind will choose two locks involved in the cycle and show you how their acquisition ordering has become inconsistent. It does this by showing the program points that first defined the ordering, and the program points which later violated it. Here is a simple example involving just two locks:
When there are more than two locks in the cycle, the error is equally serious. However, at present Helgrind does not show the locks involved, sometimes because that information is not available, but also so as to avoid flooding you with information. For example, a naive implementation of the famous Dining Philosophers problem involves a cycle of five locks (see helgrind/tests/tc14_laog_dinphils.c ). In this case Helgrind has detected that all 5 philosophers could simultaneously pick up their left fork and then deadlock whilst waiting to pick up their right forks.
7.4. Detected errors: Data Races
A data race happens, or could happen, when two threads access a shared memory location without using suitable locks or other synchronisation to ensure single-threaded access. Such missing locking can cause obscure timing dependent bugs. Ensuring programs are race-free is one of the central difficulties of threaded programming.
Reliably detecting races is a difficult problem, and most of Helgrind's internals are devoted to dealing with it. We begin with a simple example.
7.4.1. A Simple Data Race
About the simplest possible example of a race is as follows. In this program, it is impossible to know what the value of var is at the end of the program. Is it 2 ? Or 1 ?
The problem is there is nothing to stop var being updated simultaneously by both threads. A correct program would protect var with a lock of type pthread_mutex_t , which is acquired before each access and released afterwards. Helgrind's output for this program is:
This is quite a lot of detail for an apparently simple error. The last clause is the main error message. It says there is a race as a result of a read of size 4 (bytes), at 0x601038, which is the address of var , happening in function main at line 13 in the program.
Two important parts of the message are:
Helgrind shows two stack traces for the error, not one. By definition, a race involves two different threads accessing the same location in such a way that the result depends on the relative speeds of the two threads.
The first stack trace follows the text " Possible data race during read of size 4 ... " and the second trace follows the text " This conflicts with a previous write of size 4 ... ". Helgrind is usually able to show both accesses involved in a race. At least one of these will be a write (since two concurrent, unsynchronised reads are harmless), and they will of course be from different threads.
By examining your program at the two locations, you should be able to get at least some idea of what the root cause of the problem is. For each location, Helgrind shows the set of locks held at the time of the access. This often makes it clear which thread, if any, failed to take a required lock. In this example neither thread holds a lock during the access.
For races which occur on global or stack variables, Helgrind tries to identify the name and defining point of the variable. Hence the text " Location 0x601038 is 0 bytes inside global var "var" declared at simple_race.c:3 ".
Showing names of stack and global variables carries no run-time overhead once Helgrind has your program up and running. However, it does require Helgrind to spend considerable extra time and memory at program startup to read the relevant debug info. Hence this facility is disabled by default. To enable it, you need to give the --read-var-info=yes option to Helgrind.
The following section explains Helgrind's race detection algorithm in more detail.
7.4.2. Helgrind's Race Detection Algorithm
Most programmers think about threaded programming in terms of the basic functionality provided by the threading library (POSIX Pthreads): thread creation, thread joining, locks, condition variables, semaphores and barriers.
The effect of using these functions is to impose constraints upon the order in which memory accesses can happen. This implied ordering is generally known as the "happens-before relation". Once you understand the happens-before relation, it is easy to see how Helgrind finds races in your code. Fortunately, the happens-before relation is itself easy to understand, and is by itself a useful tool for reasoning about the behaviour of parallel programs. We now introduce it using a simple example.
Consider first the following buggy program:
The parent thread creates a child. Both then write different values to some variable var , and the parent then waits for the child to exit.
What is the value of var at the end of the program, 10 or 20? We don't know. The program is considered buggy (it has a race) because the final value of var depends on the relative rates of progress of the parent and child threads. If the parent is fast and the child is slow, then the child's assignment may happen later, so the final value will be 10; and vice versa if the child is faster than the parent.
The relative rates of progress of parent vs child is not something the programmer can control, and will often change from run to run. It depends on factors such as the load on the machine, what else is running, the kernel's scheduling strategy, and many other factors.
The obvious fix is to use a lock to protect var . It is however instructive to consider a somewhat more abstract solution, which is to send a message from one thread to the other:
Now the program reliably prints "10", regardless of the speed of the threads. Why? Because the child's assignment cannot happen until after it receives the message. And the message is not sent until after the parent's assignment is done.
The message transmission creates a "happens-before" dependency between the two assignments: var = 20; must now happen-before var = 10; . And so there is no longer a race on var .
Note that it's not significant that the parent sends a message to the child. Sending a message from the child (after its assignment) to the parent (before its assignment) would also fix the problem, causing the program to reliably print "20".
Helgrind's algorithm is (conceptually) very simple. It monitors all accesses to memory locations. If a location -- in this example, var , is accessed by two different threads, Helgrind checks to see if the two accesses are ordered by the happens-before relation. If so, that's fine; if not, it reports a race.
It is important to understand that the happens-before relation creates only a partial ordering, not a total ordering. An example of a total ordering is comparison of numbers: for any two numbers x and y , either x is less than, equal to, or greater than y . A partial ordering is like a total ordering, but it can also express the concept that two elements are neither equal, less or greater, but merely unordered with respect to each other.
In the fixed example above, we say that var = 20; "happens-before" var = 10; . But in the original version, they are unordered: we cannot say that either happens-before the other.
What does it mean to say that two accesses from different threads are ordered by the happens-before relation? It means that there is some chain of inter-thread synchronisation operations which cause those accesses to happen in a particular order, irrespective of the actual rates of progress of the individual threads. This is a required property for a reliable threaded program, which is why Helgrind checks for it.
The happens-before relations created by standard threading primitives are as follows:
When a mutex is unlocked by thread T1 and later (or immediately) locked by thread T2, then the memory accesses in T1 prior to the unlock must happen-before those in T2 after it acquires the lock.
The same idea applies to reader-writer locks, although with some complication so as to allow correct handling of reads vs writes.
When a condition variable (CV) is signalled on by thread T1 and some other thread T2 is thereby released from a wait on the same CV, then the memory accesses in T1 prior to the signalling must happen-before those in T2 after it returns from the wait. If no thread was waiting on the CV then there is no effect.
If instead T1 broadcasts on a CV, then all of the waiting threads, rather than just one of them, acquire a happens-before dependency on the broadcasting thread at the point it did the broadcast.
A thread T2 that continues after completing sem_wait on a semaphore that thread T1 posts on, acquires a happens-before dependence on the posting thread, a bit like dependencies caused mutex unlock-lock pairs. However, since a semaphore can be posted on many times, it is unspecified from which of the post calls the wait call gets its happens-before dependency.
For a group of threads T1 .. Tn which arrive at a barrier and then move on, each thread after the call has a happens-after dependency from all threads before the barrier.
A newly-created child thread acquires an initial happens-after dependency on the point where its parent created it. That is, all memory accesses performed by the parent prior to creating the child are regarded as happening-before all the accesses of the child.
Similarly, when an exiting thread is reaped via a call to pthread_join , once the call returns, the reaping thread acquires a happens-after dependency relative to all memory accesses made by the exiting thread.
In summary: Helgrind intercepts the above listed events, and builds a directed acyclic graph represented the collective happens-before dependencies. It also monitors all memory accesses.
If a location is accessed by two different threads, but Helgrind cannot find any path through the happens-before graph from one access to the other, then it reports a race.
There are a couple of caveats:
Helgrind doesn't check for a race in the case where both accesses are reads. That would be silly, since concurrent reads are harmless.
Two accesses are considered to be ordered by the happens-before dependency even through arbitrarily long chains of synchronisation events. For example, if T1 accesses some location L, and then pthread_cond_signals T2, which later pthread_cond_signals T3, which then accesses L, then a suitable happens-before dependency exists between the first and second accesses, even though it involves two different inter-thread synchronisation events.
7.4.3. Interpreting Race Error Messages
Helgrind's race detection algorithm collects a lot of information, and tries to present it in a helpful way when a race is detected. Here's an example:
Helgrind first announces the creation points of any threads referenced in the error message. This is so it can speak concisely about threads without repeatedly printing their creation point call stacks. Each thread is only ever announced once, the first time it appears in any Helgrind error message.
The main error message begins at the text " Possible data race during read ". At the start is information you would expect to see -- address and size of the racing access, whether a read or a write, and the call stack at the point it was detected.
A second call stack is presented starting at the text " This conflicts with a previous write ". This shows a previous access which also accessed the stated address, and which is believed to be racing against the access in the first call stack. Note that this second call stack is limited to a maximum of --history-backtrace-size entries with a default value of 8 to limit the memory usage.
Finally, Helgrind may attempt to give a description of the raced-on address in source level terms. In this example, it identifies it as a local variable, shows its name, declaration point, and in which frame (of the first call stack) it lives. Note that this information is only shown when --read-var-info=yes is specified on the command line. That's because reading the DWARF3 debug information in enough detail to capture variable type and location information makes Helgrind much slower at startup, and also requires considerable amounts of memory, for large programs.
Once you have your two call stacks, how do you find the root cause of the race?
The first thing to do is examine the source locations referred to by each call stack. They should both show an access to the same location, or variable.
Now figure out how how that location should have been made thread-safe:
Perhaps the location was intended to be protected by a mutex? If so, you need to lock and unlock the mutex at both access points, even if one of the accesses is reported to be a read. Did you perhaps forget the locking at one or other of the accesses? To help you do this, Helgrind shows the set of locks held by each threads at the time they accessed the raced-on location.
Alternatively, perhaps you intended to use a some other scheme to make it safe, such as signalling on a condition variable. In all such cases, try to find a synchronisation event (or a chain thereof) which separates the earlier-observed access (as shown in the second call stack) from the later-observed access (as shown in the first call stack). In other words, try to find evidence that the earlier access "happens-before" the later access. See the previous subsection for an explanation of the happens-before relation.
The fact that Helgrind is reporting a race means it did not observe any happens-before relation between the two accesses. If Helgrind is working correctly, it should also be the case that you also cannot find any such relation, even on detailed inspection of the source code. Hopefully, though, your inspection of the code will show where the missing synchronisation operation(s) should have been.
7.5. Hints and Tips for Effective Use of Helgrind
Helgrind can be very helpful in finding and resolving threading-related problems. Like all sophisticated tools, it is most effective when you understand how to play to its strengths.
Helgrind will be less effective when you merely throw an existing threaded program at it and try to make sense of any reported errors. It will be more effective if you design threaded programs from the start in a way that helps Helgrind verify correctness. The same is true for finding memory errors with Memcheck, but applies more here, because thread checking is a harder problem. Consequently it is much easier to write a correct program for which Helgrind falsely reports (threading) errors than it is to write a correct program for which Memcheck falsely reports (memory) errors.
With that in mind, here are some tips, listed most important first, for getting reliable results and avoiding false errors. The first two are critical. Any violations of them will swamp you with huge numbers of false data-race errors.
Make sure your application, and all the libraries it uses, use the POSIX threading primitives. Helgrind needs to be able to see all events pertaining to thread creation, exit, locking and other synchronisation events. To do so it intercepts many POSIX pthreads functions.
Do not roll your own threading primitives (mutexes, etc) from combinations of the Linux futex syscall, atomic counters, etc. These throw Helgrind's internal what's-going-on models way off course and will give bogus results.
Also, do not reimplement existing POSIX abstractions using other POSIX abstractions. For example, don't build your own semaphore routines or reader-writer locks from POSIX mutexes and condition variables. Instead use POSIX reader-writer locks and semaphores directly, since Helgrind supports them directly.
Helgrind directly supports the following POSIX threading abstractions: mutexes, reader-writer locks, condition variables (but see below), semaphores and barriers. Currently spinlocks are not supported, although they could be in future.
At the time of writing, the following popular Linux packages are known to implement their own threading primitives:
Qt version 4.X. Qt 3.X is harmless in that it only uses POSIX pthreads primitives. Unfortunately Qt 4.X has its own implementation of mutexes (QMutex) and thread reaping. Helgrind 3.4.x contains direct support for Qt 4.X threading, which is experimental but is believed to work fairly well. A side effect of supporting Qt 4 directly is that Helgrind can be used to debug KDE4 applications. As this is an experimental feature, we would particularly appreciate feedback from folks who have used Helgrind to successfully debug Qt 4 and/or KDE4 applications.
Runtime support library for GNU OpenMP (part of GCC), at least for GCC versions 4.2 and 4.3. The GNU OpenMP runtime library ( libgomp.so ) constructs its own synchronisation primitives using combinations of atomic memory instructions and the futex syscall, which causes total chaos since in Helgrind since it cannot "see" those.
Fortunately, this can be solved using a configuration-time option (for GCC). Rebuild GCC from source, and configure using --disable-linux-futex . This makes libgomp.so use the standard POSIX threading primitives instead. Note that this was tested using GCC 4.2.3 and has not been re-tested using more recent GCC versions. We would appreciate hearing about any successes or failures with more recent versions.
If you must implement your own threading primitives, there are a set of client request macros in helgrind.h to help you describe your primitives to Helgrind. You should be able to mark up mutexes, condition variables, etc, without difficulty.
It is also possible to mark up the effects of thread-safe reference counting using the ANNOTATE_HAPPENS_BEFORE , ANNOTATE_HAPPENS_AFTER and ANNOTATE_HAPPENS_BEFORE_FORGET_ALL , macros. Thread-safe reference counting using an atomically incremented/decremented refcount variable causes Helgrind problems because a one-to-zero transition of the reference count means the accessing thread has exclusive ownership of the associated resource (normally, a C++ object) and can therefore access it (normally, to run its destructor) without locking. Helgrind doesn't understand this, and markup is essential to avoid false positives.
Here are recommended guidelines for marking up thread safe reference counting in C++. You only need to mark up your release methods -- the ones which decrement the reference count. Given a class like this:
the release method should be marked up as follows:
There are a number of complex, mostly-theoretical objections to this scheme. From a theoretical standpoint it appears to be impossible to devise a markup scheme which is completely correct in the sense of guaranteeing to remove all false races. The proposed scheme however works well in practice.
Avoid memory recycling. If you can't avoid it, you must use tell Helgrind what is going on via the VALGRIND_HG_CLEAN_MEMORY client request (in helgrind.h ).
Helgrind is aware of standard heap memory allocation and deallocation that occurs via malloc / free / new / delete and from entry and exit of stack frames. In particular, when memory is deallocated via free , delete , or function exit, Helgrind considers that memory clean, so when it is eventually reallocated, its history is irrelevant.
However, it is common practice to implement memory recycling schemes. In these, memory to be freed is not handed to free / delete , but instead put into a pool of free buffers to be handed out again as required. The problem is that Helgrind has no way to know that such memory is logically no longer in use, and its history is irrelevant. Hence you must make that explicit, using the VALGRIND_HG_CLEAN_MEMORY client request to specify the relevant address ranges. It's easiest to put these requests into the pool manager code, and use them either when memory is returned to the pool, or is allocated from it.
Avoid POSIX condition variables. If you can, use POSIX semaphores ( sem_t , sem_post , sem_wait ) to do inter-thread event signalling. Semaphores with an initial value of zero are particularly useful for this.
Helgrind only partially correctly handles POSIX condition variables. This is because Helgrind can see inter-thread dependencies between a pthread_cond_wait call and a pthread_cond_signal / pthread_cond_broadcast call only if the waiting thread actually gets to the rendezvous first (so that it actually calls pthread_cond_wait ). It can't see dependencies between the threads if the signaller arrives first. In the latter case, POSIX guidelines imply that the associated boolean condition still provides an inter-thread synchronisation event, but one which is invisible to Helgrind.
The result of Helgrind missing some inter-thread synchronisation events is to cause it to report false positives.
The root cause of this synchronisation lossage is particularly hard to understand, so an example is helpful. It was discussed at length by Arndt Muehlenfeld ("Runtime Race Detection in Multi-Threaded Programs", Dissertation, TU Graz, Austria). The canonical POSIX-recommended usage scheme for condition variables is as follows:
Assume b is False most of the time. If the waiter arrives at the rendezvous first, it enters its while-loop, waits for the signaller to signal, and eventually proceeds. Helgrind sees the signal, notes the dependency, and all is well.
If the signaller arrives first, b is set to true, and the signal disappears into nowhere. When the waiter later arrives, it does not enter its while-loop and simply carries on. But even in this case, the waiter code following the while-loop cannot execute until the signaller sets b to True. Hence there is still the same inter-thread dependency, but this time it is through an arbitrary in-memory condition, and Helgrind cannot see it.
By comparison, Helgrind's detection of inter-thread dependencies caused by semaphore operations is believed to be exactly correct.
As far as I know, a solution to this problem that does not require source-level annotation of condition-variable wait loops is beyond the current state of the art.
Make sure you are using a supported Linux distribution. At present, Helgrind only properly supports glibc-2.3 or later. This in turn means we only support glibc's NPTL threading implementation. The old LinuxThreads implementation is not supported.
If your application is using thread local variables, helgrind might report false positive race conditions on these variables, despite being very probably race free. On Linux, you can use --sim-hints=deactivate-pthread-stack-cache-via-hack to avoid such false positive error messages (see --sim-hints ).
Round up all finished threads using pthread_join . Avoid detaching threads: don't create threads in the detached state, and don't call pthread_detach on existing threads.
Using pthread_join to round up finished threads provides a clear synchronisation point that both Helgrind and programmers can see. If you don't call pthread_join on a thread, Helgrind has no way to know when it finishes, relative to any significant synchronisation points for other threads in the program. So it assumes that the thread lingers indefinitely and can potentially interfere indefinitely with the memory state of the program. It has every right to assume that -- after all, it might really be the case that, for scheduling reasons, the exiting thread did run very slowly in the last stages of its life.
Perform thread debugging (with Helgrind) and memory debugging (with Memcheck) together.
Helgrind tracks the state of memory in detail, and memory management bugs in the application are liable to cause confusion. In extreme cases, applications which do many invalid reads and writes (particularly to freed memory) have been known to crash Helgrind. So, ideally, you should make your application Memcheck-clean before using Helgrind.
It may be impossible to make your application Memcheck-clean unless you first remove threading bugs. In particular, it may be difficult to remove all reads and writes to freed memory in multithreaded C++ destructor sequences at program termination. So, ideally, you should make your application Helgrind-clean before using Memcheck.
Since this circularity is obviously unresolvable, at least bear in mind that Memcheck and Helgrind are to some extent complementary, and you may need to use them together.
POSIX requires that implementations of standard I/O ( printf , fprintf , fwrite , fread , etc) are thread safe. Unfortunately GNU libc implements this by using internal locking primitives that Helgrind is unable to intercept. Consequently Helgrind generates many false race reports when you use these functions.
Helgrind attempts to hide these errors using the standard Valgrind error-suppression mechanism. So, at least for simple test cases, you don't see any. Nevertheless, some may slip through. Just something to be aware of.
Helgrind's error checks do not work properly inside the system threading library itself ( libpthread.so ), and it usually observes large numbers of (false) errors in there. Valgrind's suppression system then filters these out, so you should not see them.
If you see any race errors reported where libpthread.so or ld.so is the object associated with the innermost stack frame, please file a bug report at http://www.valgrind.org/ .
7.6. Helgrind Command-line Options
The following end-user options are available:
When enabled (not the default), Helgrind treats freeing of heap memory as if the memory was written immediately before the free. This exposes races where memory is referenced by one thread, and freed by another, but there is no observable synchronisation event to ensure that the reference happens before the free.
This functionality is new in Valgrind 3.7.0, and is regarded as experimental. It is not enabled by default because its interaction with custom memory allocators is not well understood at present. User feedback is welcomed.
When enabled (the default), Helgrind performs lock order consistency checking. For some buggy programs, the large number of lock order errors reported can become annoying, particularly if you're only interested in race errors. You may therefore find it helpful to disable lock order checking.
--history-level=full (the default) causes Helgrind collects enough information about "old" accesses that it can produce two stack traces in a race report -- both the stack trace for the current access, and the trace for the older, conflicting access. To limit memory usage, "old" accesses stack traces are limited to a maximum of --history-backtrace-size entries (default 8) or to --num-callers value if this value is smaller.
Collecting such information is expensive in both speed and memory, particularly for programs that do many inter-thread synchronisation events (locks, unlocks, etc). Without such information, it is more difficult to track down the root causes of races. Nonetheless, you may not need it in situations where you just want to check for the presence or absence of races, for example, when doing regression testing of a previously race-free program.
--history-level=none is the opposite extreme. It causes Helgrind not to collect any information about previous accesses. This can be dramatically faster than --history-level=full .
--history-level=approx provides a compromise between these two extremes. It causes Helgrind to show a full trace for the later access, and approximate information regarding the earlier access. This approximate information consists of two stacks, and the earlier access is guaranteed to have occurred somewhere between program points denoted by the two stacks. This is not as useful as showing the exact stack for the previous access (as --history-level=full does), but it is better than nothing, and it is almost as fast as --history-level=none .
When --history-level=full is selected, --history-backtrace-size=number indicates how many entries to record in "old" accesses stack traces.
This flag only has any effect at --history-level=full .
--delta-stacktrace configures the way Helgrind captures the stacktraces for the option --history-level=full . Such a stacktrace is typically needed each time a new piece of memory is read or written in a basic block of instructions.
--delta-stacktrace=no causes Helgrind to compute a full history stacktrace from the unwind info each time a stacktrace is needed.
--delta-stacktrace=yes indicates to Helgrind to derive a new stacktrace from the previous stacktrace, as long as there was no call instruction, no return instruction, or any other instruction changing the call stack since the previous stacktrace was captured. If no such instruction was executed, the new stacktrace can be derived from the previous stacktrace by just changing the top frame to the current program counter. This option can speed up Helgrind by 25% when using --history-level=full .
The following aspects have to be considered when using --delta-stacktrace=yes :
In some cases (for example in a function prologue), the valgrind unwinder might not properly unwind the stack, due to some limitations and/or due to wrong unwind info. When using --delta-stacktrace=yes, the wrong stack trace captured in the function prologue will be kept till the next call or return.
On the other hand, --delta-stacktrace=yes sometimes helps to obtain a correct stacktrace, for example when the unwind info allows a correct stacktrace to be done in the beginning of the sequence, but not later on in the instruction sequence.
Determining which instructions are changing the callstack is partially based on platform dependent heuristics, which have to be tuned/validated specifically for the platform. Also, unwinding in a function prologue must be good enough to allow using --delta-stacktrace=yes. Currently, the option --delta-stacktrace=yes has been reasonably validated only on linux x86 32 bits and linux amd64 64 bits. For more details about how to validate --delta-stacktrace=yes, see debug option --hg-sanity-flags and the function check_cached_rcec_ok in libhb_core.c.
Information about "old" conflicting accesses is stored in a cache of limited size, with LRU-style management. This is necessary because it isn't practical to store a stack trace for every single memory access made by the program. Historical information on not recently accessed locations is periodically discarded, to free up space in the cache.
This option controls the size of the cache, in terms of the number of different memory addresses for which conflicting access information is stored. If you find that Helgrind is showing race errors with only one stack instead of the expected two stacks, try increasing this value.
The minimum value is 10,000 and the maximum is 30,000,000 (thirty times the default value). Increasing the value by 1 increases Helgrind's memory requirement by very roughly 100 bytes, so the maximum value will easily eat up three extra gigabytes or so of memory.
By default Helgrind checks all data memory accesses made by your program. This flag enables you to skip checking for accesses to thread stacks (local variables). This can improve performance, but comes at the cost of missing races on stack-allocated data.
Controls whether all activities during thread creation should be ignored. By default enabled only on Solaris. Solaris provides higher throughput, parallelism and scalability than other operating systems, at the cost of more fine-grained locking activity. This means for example that when a thread is created under glibc, just one big lock is used for all thread setup. Solaris libc uses several fine-grained locks and the creator thread resumes its activities as soon as possible, leaving for example stack and TLS setup sequence to the created thread. This situation confuses Helgrind as it assumes there is some false ordering in place between creator and created thread; and therefore many types of race conditions in the application would not be reported. To prevent such false ordering, this command line option is set to yes by default on Solaris. All activity (loads, stores, client requests) is therefore ignored during:
pthread_create() call in the creator thread
thread creation phase (stack and TLS setup) in the created thread
Also new memory allocated during thread creation is untracked, that is race reporting is suppressed there. DRD does the same thing implicitly. This is necessary because Solaris libc caches many objects and reuses them for different threads and that confuses Helgrind.
7.7. Helgrind Monitor Commands
The Helgrind tool provides monitor commands handled by Valgrind's built-in gdbserver (see Monitor command handling by the Valgrind gdbserver ). Valgrind python code provides GDB front end commands giving an easier usage of the helgrind monitor commands (see GDB front end commands for Valgrind gdbserver monitor commands ). To launch an helgrind monitor command via its GDB front end command, instead of prefixing the command with "monitor", you must use the GDB helgrind command (or the shorter aliases hg ). Using the helgrind GDB front end command provide a more flexible usage, such as evaluation of address and length arguments by GDB. In GDB, you can use help helgrind to get help about the helgrind front end monitor commands and you can use apropos helgrind to get all the commands mentionning the word "helgrind" in their name or on-line help.
info locks [lock_addr] shows the list of locks and their status. If lock_addr is given, only shows the lock located at this address.
In the following example, helgrind knows about one lock. This lock is located at the guest address ga 0x8049a20 . The lock kind is rdwr indicating a reader-writer lock. Other possible lock kinds are nonRec (simple mutex, non recursive) and mbRec (simple mutex, possibly recursive). The lock kind is then followed by the list of threads helding the lock. In the below example, R1:thread #6 tid 3 indicates that the helgrind thread #6 has acquired (once, as the counter following the letter R is one) the lock in read mode. The helgrind thread nr is incremented for each started thread. The presence of 'tid 3' indicates that the thread #6 is has not exited yet and is the valgrind tid 3. If a thread has terminated, then this is indicated with 'tid (exited)'.
If you give the option --read-var-info=yes , then more information will be provided about the lock location, such as the global variable or the heap block that contains the lock:
The GDB equivalent helgrind front end command helgrind info locks [ADDR] accept any address expression for its first ADDR argument.
accesshistory <addr> [<len>] shows the access history recorded for <len> (default 1) bytes starting at <addr>. For each recorded access that overlaps with the given range, accesshistory shows the operation type (read or write), the address and size read or written, the helgrind thread nr/valgrind tid number that did the operation and the locks held by the thread at the time of the operation. The oldest access is shown first, the most recent access is shown last.
In the following example, we see first a recorded write of 4 bytes by thread #7 that has modified the given 2 bytes range. The second recorded write is the most recent recorded write : thread #9 modified the same 2 bytes as part of a 4 bytes write operation. The list of locks held by each thread at the time of the write operation are also shown.
The GDB equivalent helgrind front end command helgrind accesshistory ADDR [LEN] accept any address expression for its first ADDR argument. The second optional argument is any integer expression. Note that these 2 arguments must be separated by a space, like in the following example:
xtmemory [<filename> default xtmemory.kcg.%p.%n] requests Helgrind tool to produce an xtree heap memory report. See Execution Trees for a detailed explanation about execution trees.
7.8. Helgrind Client Requests
The following client requests are defined in helgrind.h . See that file for exact details of their arguments.
VALGRIND_HG_CLEAN_MEMORY
This makes Helgrind forget everything it knows about a specified memory range. This is particularly useful for memory allocators that wish to recycle memory.
ANNOTATE_HAPPENS_BEFORE
ANNOTATE_HAPPENS_AFTER
ANNOTATE_NEW_MEMORY
ANNOTATE_RWLOCK_CREATE
ANNOTATE_RWLOCK_DESTROY
ANNOTATE_RWLOCK_ACQUIRED
ANNOTATE_RWLOCK_RELEASED
These are used to describe to Helgrind, the behaviour of custom (non-POSIX) synchronisation primitives, which it otherwise has no way to understand. See comments in helgrind.h for further documentation.
7.9. A To-Do List for Helgrind
The following is a list of loose ends which should be tidied up some time.
For lock order errors, print the complete lock cycle, rather than only doing for size-2 cycles as at present.
The conflicting access mechanism sometimes mysteriously fails to show the conflicting access' stack, even when provided with unbounded storage for conflicting access info. This should be investigated.
Document races caused by GCC's thread-unsafe code generation for speculative stores. In the interim see http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html and http://lkml.org/lkml/2007/10/24/673 .
Don't update the lock-order graph, and don't check for errors, when a "try"-style lock operation happens (e.g. pthread_mutex_trylock ). Such calls do not add any real restrictions to the locking order, since they can always fail to acquire the lock, resulting in the caller going off and doing Plan B (presumably it will have a Plan B). Doing such checks could generate false lock-order errors and confuse users.
Performance can be very poor. Slowdowns on the order of 100:1 are not unusual. There is limited scope for performance improvements.

Copyright © 2000-2023 Valgrind™ Developers

IMAGES
VIDEO
COMMENTS
In the academic world, it is essential to maintain academic integrity and ensure that all assignments are original. To help with this, many universities and schools use Turnitin Assignment Checker. This tool is used to detect plagiarism in ...
Turnitin is a powerful tool used by students and teachers alike to detect plagiarism in academic work. It is an online service that compares submitted documents to billions of webpages, journals, and other sources to identify any potential ...
In this digital age, it is important to be aware of the potential risks that come with using a smartphone. Hackers can gain access to your phone and use it to steal your data or even spy on you.
... assignment detected data race' x0 0000000000000000 x1 ... race between them anymore. I'm afraid it's the only way to solve it until Android
... assignment detected data race 10-16 12:37:46.822 30218-30261/com.yatra.corpbase A/libc: Fatal signal 6 (SIGABRT), code -6 in tid 30261
Use this check to detect when multiple threads access the same memory without synchronization, and at least one access is a write. Available in Xcode 8 and
Unfortunately, data races are often triggered by non-deterministic event orderings that are hard to detect when testing complex distributed systems. In this
namespace android {. void sp_report_race() { LOG_ALWAYS_FATAL("sp<> assignment detected data race"); }. } Powered by Gitiles| Privacy| Termstxt json.
... assignment detected data race' 11-27 17:31:10.942619 30334 30334 F DEBUG : x0 0000000000000000 x1 0000000000000bda x2 0000000000000006 x3
In this example, thread 0 could be executing the assignment a=0 while ... data race detection with the enhanced openmp series-parallel graph.
Here are some typical data races. All of them can be detected with the race detector. Race on loop counter. func main() { var wg sync.WaitGroup wg.Add
We developed a system to detect data races at Uber using a dynamic data race detection technique. ... assignment result=10 on line 2, the caller
assignment operators and function calls. For example, if there is assignment
Sending a message from the child (after its assignment) to the parent (before its