Content area
Issue Title: VMCAI 2009
This paper concerns automatically verifying safety properties of concurrent programs. In our work, the safety property of interest is to check for multi-location data races in concurrent Java programs, where a multi-location data race arises when a program is supposed to maintain an invariant over multiple data locations, but accesses/updates are not protected correctly by locks. The main technical challenge that we address is how to generate a program model that retains (at least some of) the synchronization operations of the concrete program, when the concrete program uses dynamic memory allocation. Static analysis of programs typically begins with an abstraction step that generates an abstract program that operates on a finite set of abstract objects. In the presence of dynamic memory allocation, the finite number of abstract objects of the abstract program must represent the unbounded number of concrete objects that the concrete program may allocate, and thus by the pigeon-hole principle some of the abstract objects must be summary objects--they represent more than one concrete object. Because abstract summary objects represent multiple concrete objects, the program analyzer typically must perform weak updates on the abstract state of a summary object, where a weak update accumulates information. Because weak updates accumulate rather than overwrite, the analyzer is only able to determine weak judgements on the abstract state, i.e., that some property possibly holds, and not that it definitely holds. The problem with weak judgements is that determining whether an interleaved execution respects program synchronization requires the ability to determine strong judgements, i.e., that some lock is definitely held, and thus the analyzer needs to be able to perform strong updates--an overwrite of the abstract state--to enable strong judgements. We present the random-isolation abstraction as a new principle for enabling strong updates of special abstract objects. The idea is to associate with a program allocation site two abstract objects, r ♯ and o ♯ , where r ♯ is a non-summary object and o ♯ is a summary object. Abstract object r ♯ models a distinguished concrete object that is chosen at random in each program execution. Because r ♯ is a non-summary object--i.e, it models only one concrete object--strong updates are able to be performed on its abstract state. Because which concrete object r ♯ models is chosen randomly, a proof that a safety property holds for r ♯ generalizes to all objects modeled by o ♯ . We implemented the random isolation abstraction in a tool called Empire, which verifies atomic-set serializability of concurrent Java programs (atomic-set serializability is one notion of multi-location data-race freedom). Random isolation allows Empire to track lock states in ways that would not otherwise have been possible with conventional approaches.[PUBLICATION ABSTRACT]
Int J Softw Tools Technol Transfer (2011) 13:495518 DOI 10.1007/s10009-011-0197-7
VMCAI 2009
Finding concurrency-related bugs using random isolation
Nicholas Kidd Thomas Reps Julian Dolby
Mandana Vaziri
Published online: 23 April 2011 Springer-Verlag 2011
Abstract This paper concerns automatically verifying safety properties of concurrent programs. In our work, the safety property of interest is to check for multi-location data races in concurrent Java programs, where a multi-location data race arises when a program is supposed to maintain an invariant over multiple data locations, but accesses/updates are not protected correctly by locks. The main technical challenge that we address is how to generate a program model that retains (at least some of) the synchronization operations of the concrete program, when the concrete program uses dynamic memory allocation. Static analysis of programs typically begins with an abstraction step that generates an abstract program that operates on a nite set of abstract objects. In the presence of dynamic memory allocation, the nite number of abstract objects of the abstract
Supported by NSF under grants CCF-0540955, CCF-0810053, and CCF-0904371, by ONR under grants N00014-09-1-0510 and N00014-09-1-0776, by ARL under grant W911NF-09-1-0413, and by AFRL under grants FA8750-06-C-0249 and FA9550-09-1-0279.
N. Kidd (B) T. Reps
University of Wisconsin, Madison, WI, USA e-mail: [email protected]; [email protected]
T. RepsGrammaTech, Inc., Ithaca, NY, USA e-mail: [email protected]
J. Dolby M. Vaziri
IBM T.J. Watson Research Center, Hawthorne, NY, USA e-mail: [email protected]
M. Vazirie-mail: [email protected]
Present Address:N. KiddGoogle Inc., Mountain View, USA
program must represent the unbounded number of concrete objects that the concrete program may allocate, and thus by the pigeon-hole principle some of the abstract objects must be summary objectsthey represent more than one concrete object. Because abstract summary objects represent multiple concrete objects, the program analyzer typically must perform weak updates on the abstract state of a summary object, where a weak update accumulates information. Because weak updates accumulate rather than overwrite, the analyzer is only able to determine weak judgements on the abstract state, i.e., that some property possibly holds, and not that it definitely holds. The problem with weak judgements is that determining whether an interleaved execution respects program synchronization requires the ability to determine strong judgements, i.e., that some lock is definitely held, and thus the analyzer needs to be able to perform strong updatesan overwrite of the abstract stateto enable strong judgements. We present the random-isolation abstraction as a new principle for enabling strong updates of special abstract objects. The idea is to associate with a program allocation site two abstract objects, r and o , where r is a non-summary object and o is a summary object. Abstract objectr models a distinguished concrete object that is chosen at random in each program execution. Because r is a non-summary objecti.e, it models only one concrete objectstrong updates are able to be performed on its abstract state. Because which concrete object r models is chosen randomly, a proof that a safety property holds forr generalizes to all objects modeled by o . We implemented the random isolation abstraction in a tool called Empire, which veries atomic-set serializability of concurrent Java programs (atomic-set serializability is one notion of multi-location data-race freedom). Random isolation allows Empire to track lock states in ways that would not otherwise have been possible with conventional approaches.
123
496 N. Kidd et al.
Keywords Random-isolation abstraction Atomic sets
Model checking Verication Pushdown systems
Atomicity
1 Introduction
This paper concerns automatically verifying safety properties of concurrent programs. Although the problem is undecidable in general, in some cases undecidability can be side-stepped using the following scheme for obtaining approximate answers:
Step 1: Construct an abstraction of the program of interest using a modeling language for which safety checking is decidable.
Step 2: Invoke the decision procedure to determine whether safety holds.
Step 3: If the answer obtained in Step 2 is yes (safety holds), report safe; otherwise, report unknown.
A safety-checking problem may be theoretically intractable from the standpoint of worst-case running time; however, in many cases answers can be obtained in a reasonable amount of time by incorporating fast algorithms and data structures for key steps of the decision procedure, or for key heuristics that it uses. Other work that adopts this approach includes symbolic model checking [8], SLAM [6], Moped [53], Lammichs work on the analysis of multi-threaded programs [35], and the recent work on analyzing linked lists by [34], among others.
In our work, the safety property of interest is to check for multi-location data races in concurrent Java programs, where a multi-location data race arises when a program is supposed to maintain an invariant over multiple data locations, but accesses/updates are not protected correctly by locks. (The precise definition of multi-location data races used by our checking tool Empire is presented in Sect. 2.) Multi-location data races generalize the traditional notion of data races (i.e., a single-location data race results from inconsistent coordination of accesses on a single memory location).The importance of checking for multi-location data races is underscored by a study conducted by [37], which showed that roughly 30% of the concurrency bugs found in a set of concurrent open-source applications were due to multiple-memory-location problems. (Our approach can nd classical single-location data races as well as multiple-location races.)
An example of a multi-location data race is found in one of the Vector constructors in Suns JDK 1.4.2, which is illustrated by the code fragment show in Listing 1.1 The class contains two elds elementCount and
1 The snippet is a simplied version of [54, Fig. 2].
Listing 1 java.util.Vector snippet.
public class Vector {Object[] elementData;int elementCount;
Vector(Collection c) {elementCount = c.size();elementData = new
Object[elementCount * 110 / 100];c.toArray(elementData);}}
elementData, where elementData is an array of objects and elementCount is an integer eld whose value species the number of valid components of a Vector object (i.e., the logical contents of a Vectorobject are elementData[0]elementData[element
Count 1]).
When a shared Vector object is accessed by concurrently executing threads T1 and T2, it is clearly desirable for T1 to not be able to observe intermittent writes of T2 to the two elds. Without proper synchronization, the invariant that elementCount species the logical boundaries of elementData could be violated. With an unfortunate scheduling of thread executions, such a violation could occur in the Vector constructor shown in Listing 1. In particular, the following interleaved execution would result in the invariant being violated:
1. T1 begins executing the Vector constructor shown in Listing 1;
2. T1 initializes elementCount to be c.size();3. T2 removes all elements from the shared Vector referenced by the parameter Collection c;
4. T1 initializes elementData with Collection c as modied by T2.
At this point, if c was initially a non-empty collection, elementCount will now be non-zero, but the array elementData will have at each index position from 0to elementCount 1 the null reference.2
Our goal of checking for multi-location data races (such as the one just described) directly affects the choice of a modeling language. Specifically, the modeling language should support multiple threads of execution that communicate via shared memory, and that protect shared-memory accesses with locks. However, if we are not careful, the combination of the rst two features alone could result in a modeling language for which safety checking is undecidable [47].
In our work, decidability is maintained by carefully choosing which aspects of a concurrent Java program will be
2 The bug was rst reported by [57].
123
Finding concurrency-related bugs 497
modeled precisely and which are abstracted away. The problems that were necessary for us to address can be summarized as follows: (1) How are data values modeled? (2) How are program locks modeled? (3) How are Javas reentrant locks modeled? (4) What is the algorithm for analyzing the resulting program model? Below we discuss each issue in turn.
Abstracting values The rst insight is that multi-location data races are concerned with reads from and writes to elds of shared objects, and not with the values of those elds.Thus, a safe approximation abstracts away the values of all variables. Once the actual data values have been abstracted away, the reads and writes of an individual thread can no longer be affected by the reads and writes of other threads.Thus, the tight coupling between threads that typically results in analysis of concurrent programs (or program models) being undecidable has been removed. As in many program analyses, the price one pays is that the set of behaviors of the program model is necessarily an over-approximation of the set of behaviors of the (concrete) program, and hence one-sided answers are obtained: safe/possibly unsafe rather than safe/definitely unsafe.
Abstracting locks Once data values have been abstracted away, the resulting program contains no synchronization, which is obviously not adequate for checking properties that deal with program synchronization. In general, one cannot simply decide to model the lock state of every Java object oi.e., whether the lock associated with o is in the locked or unlocked statebecause Java programs contain dynamic memory allocation, which results in an a priori unbounded number of locks. To maintain decidability, we require a mechanism that is able to model the unbounded number of concrete locksi.e., the set of all locks that are allocated by all dynamic concrete runs of the programwith a nite number of abstract locks. Unfortunately, the typical approach of using summary objects does not provide the precision necessary to reason about lock-based synchronization.
For summary-based analyses, an unbounded set of concrete objects are abstracted into a nite set of abstract objects.One common approach is to use the allocation-site abstraction of [27], where an abstract object o is dened for each static allocation site of the program (i.e., a new statement in Java), and the abstract state of o is a summary of all possible concrete objects that could be allocated from that site. It follows from the pigeon-hole principle that there must be at least one abstract object that represents more than one concrete object.
The difculty is that when abstraction techniques summarize multiple objects into a single summary object, the analyzer is unable to perform strong updates (overwrites) when interpreting a lock or unlock operation. A strong update of an abstract object corresponds to a group kill: it represents a definite change in value to all concrete objects that
the abstract object represents. Strong updates cannot generally be performed on summary objects because a (concrete) update only affects one of the summarized concrete objects. To be sound, an analyzer has to use weak updates (i.e., it has to accumulate lock states). In particular, it has to use an abstract lock state that represents {locked, unlocked} and such
an approach generally leads to significant loss of precision: an analysis will typically end up showing that a given summary lock is in the state {locked, unlocked}, which provides
no information whatsoever.
Our second insight, which is the main technical contribution of the paper, is a novel abstraction technique called random isolation. The idea that we introduce is to analyze a transformed version of the program in which a set of user-specied allocation sites are transformed so that, during each concrete run of the program, at most one object r will be randomly tagged as being distinguished for that allocation site. By this means, an analyzer can keep the abstract representative for r (say r ) separate from a summary object that represents the rest of the objects allocated at the spec-ied allocation site.3 The advantage of random isolation is that because r is selected randomly, properties proved about its abstract representative r can be generalized to all objects that are potentially allocated at the same allocation site. Moreover, because each concrete run can have only one instance of r, r is not a summary object, which means that strong updates can be performed on the abstract state of r
and hence its lock state can be tracked precisely. In practice, the analyzer will typically require an additional mechanism to ensure that a reference-type variable (e.g., this in Java) refers solely to r , which can be difcult due to aliasing. We address this problem via a source-to-source transformation that injects expressions that perform a case analysis on what the reference variable points to (Sect. 6.1).
Reentrant locks Empire checks concurrent Java programs that use reentrant locks (obtained via synchronizedmethods and synchronized blocks). Our third insight is that when modeling the locking structure of a program, one can (in some sense) ignore nested manipulations of the same lock. That is, only outermost lock/unlock pairs are truly important.
The straightforward way to encode a reentrant lock is to model it with a pushdown automaton (context-free language) that tracks the number of successive lock acquisitions on the stack. Instead, we apply a transformation that replaces the context-free language that describes a reentrant lock by a regular language that describes a non-reentrant lock. We call this transformation language-strength reduction, and it
3 Our convention is to use r for a randomly tagged concrete object, o for an arbitrary concrete object, and r resp. o for their abstract counterparts. When an allocation-site Site is specied, we use the subscripted variants rSite, oSite, r Site, and o Site.
123
498 N. Kidd et al.
is performed in a completely accurate way (i.e., the models behavior is preserved). Languagestrength reduction is further discussed in Sect.8.3.
Analyzing abstract programs After program transformation and abstraction, the resultant program model contains a nite number of threads, shared memory locations, and shared locks. However, one still is left with the task of analyzing such a model. The model-checking engine on which Empire was originally based employed a semi-decision procedure [32].The reason was that with (i) reentrant locks, (ii) an unbounded number of context switches, and (iii) an unbounded number of lock acquisitions, the problem appeared to be undecidable. The actions of each process are modeled by a context-free language; any model checker for multiple processes is dangerously close to the undecidable problem of determining whether the intersection of two context-free languages is empty. Surprisingly, after the problem has been transformed by the random-isolation and language-strength-reduction transformations, a decision procedure exists for the resulting simplied problem.
In essence, the couplings between different processes turned out to be sufciently weak that the undecidability of checking emptiness of the intersection of two context-free languages did not come into play; however, that fact was obfuscated in a non-trivial way in the conditions of the original problem. We return to this discussion in Sect. 8 where an overview of the decision procedure is presented. (A full description of the decision procedure is available elsewhere [31].)
Summary The above sequence of ideas allowed us to create a tool that (i) can model concurrent Java programs quite faithfully, and (ii) check them for important kinds of concurrency problems: both multi-location and single-location data races. When we compared the performance of the decision procedure against the earlier semi-decision procedure (which represented the current state-of-the-art), we found that the decision procedure was 34 times faster overall (i.e., comparing the sum of the running times on all queries, with a 300-second timeout). Moreover, with the 300-second timeout threshold, for each query where the semi-decision procedure timed out (roughly 49% of the queries), the decision procedure succeeded within the allotted time.
Outline The remainder of this paper is organized as follows: Sect.2 provides background material on atomic sets and AS-serializability, the notion of multi-location data race used in the paper. Section3 presents an overview of Empire, our tool for checking for multi-location and single-location data races. Section4 presents EML, the Empire modeling language. Section5 motivates the need for a new abstraction by discussing how the commonly used allocation-site abstraction [27] is insufcient when checking for multi-
Listing 2 Concurrent Stack program.
1
class Stack {
public static final int MAX=10;
3
2
@atomic(S) Object[] data = new Object[MAX];
4
@atomic(S) int count = -1;
5
6
@atomic sync Object pop(){
7
Object res = data[count];
8
data[count--] = null;
return res;
9
}
10
@atomic sync void push(Object o) {
12
11
data[++count] = o;
}
13
@atomic sync int size() {
15
14
return count+1;
}
16
@atomic sync replaceTop(Object o) {
18
17
pop();
push(o);
19
}
20
static Stack makeStack() {
22
21
return newSite Stack();
}
23
}
24
class SafeWrap {
25
@atomic sync Object popwrap(@atomic Stack s)
27
26
{ return (s.size() > 0) ? s.pop() : null;
28
}
static SafeWrap makeSafeWrap() {
30
29
return new SafeWrap();
}
31
static void main(String[] args){
34
32 33
Stack stack = Stack.makeStack();
35
stack.push(new Integer(1));
36
new Thread("1") {
makeSafeWrap().popwrap(stack);
38
37
}
new Thread("2") {
39
makeSafeWrap().popwrap(stack);
41
40
}
}
42
}
location data races. Section6 presents the random-isolation abstraction and describes an implementation of it. Section7 presents a sound translation from a concurrent Java program abstracted using random isolation to an EML program. Section8 describes the model-checking technique used to verify AS-serializability for an EML program. Section9 presents the experimental evaluation. Section10 discusses related work.
2 Atomic-set serializability
This section presents atomic setsthe mechanism to specify the elds over which an invariant holdsand atomic-set serializabilitya data-centric correctness criterion that is
43
123
Finding concurrency-related bugs 499
used to dene both single-location and multi-location data races. We present both atomic sets and atomic-set serializability (AS-serializability) via the example program shown in Listing 2. (In Listing 2, assume that all methods are public. The Java keyword public has been omitted to save space. Moreover, the Java keyword synchronized has been abbreviated as sync.)
An atomic set is dened with respect to a Java class C and species that an invariant holds over a subset of the elds dened by C. C may dene as many atomic sets as elds, however, each eld is allowed to be a member of at most one atomic set (i.e., the atomic sets are disjoint).
Example (Atomic sets) Listing 2 presents a simple Java program that denes a class Stack that implements a stack data structure using an array Stack.data for storage and an integer Stack.count whose current value is the top-of-stack index into the array Stack.data. (For now, ignore the class SafeWrap.) The annotations @atomic(S) on elds Stack.data and Stack.count indicate that those elds are members of the atomic set S. The (unspecied) invariant over data and count is that the current value of countis the index of the position in data where the top-of-stack item is stored. The existence of this relationship is reected by both elds being members of the atomic set S.
Associated with atomic sets are units of work; a unit-of-work annotation is a programmer assertion that a method guarantees to maintain the invariant of an atomic set when executed serially (i.e., the unit-of-work is executed to completion and all other threads wait for it to complete). We call a unit of work that writes to all members of an atomic set a write-complete unit of work.
Example (Units of work) In Listing 2, the units of work are methods that have the @atomic annotation (e.g., the method Stack.pop). The write-complete units of work are the methods Stack.pop, Stack.push, and Stack.replaceTop.
The correctness criterion associated with atomic sets is atomic-set serializability (AS-serializability). AS-serializability is a generalization of serializability. In general, a code region that should appear to execute atomically forms a transaction (i.e., represents a logically atomic operation).An interleaved program execution is then serializable if it is equivalent to an execution in which the transactions are executed in some serial order. For AS-serializability, the atomic-code regions (i.e., transactions) are the unit-of-work methods.An execution is said to be AS-serializable if its projection on each atomic set is serializable, where the projection of an execution with respect to an atomic set S removes all reads and writes to memory locations that are not members of S.AS-serializability is a generalization of serializability beca-
use serializability can be obtained by dening all of memory to be in one atomic set.4
Example (AS-serializable) Consider the method Stack.replaceTop, whose body merely invokes the methods Stack.pop and Stack.push. Each of the three methods are synchronized methods, and the atomic set with respect to which they must execute atomically is S. Because the methods are synchronized, no other thread can access the elds that are members of S until Stack.replaceTop completes execution. Thus, all executions of Stack.replaceTop must be AS-serializable.
Atomic sets can be dynamically extended to include the atomic sets of method parameters, which are referred to as unit for parameters. This is specied by an @atomic annotation on a method parameter.5 Dynamic extension of an atomic set captures the fact that if a unit of work is dependent on a method parameter, execution of that unit of work must (appear to) be atomic for the atomic sets of the parameter as well.
Example (Unit for parameters) Class SafeWrap does not dene an atomic set; however, method popwrap is dened as a unit of work and parameter Stack s as a unit for parameter. Thus, popwrap should execute atomically with respect to the units of work of class Stack, namely the atomic set S.
2.1 AS-serializability violations
An execution that is not AS-serializable is said to have an AS-serializability violation. A result from [54] is that if all units of work are write-complete, then AS-serializability violations can be completely characterized by a set of fourteen problematic access patterns. Table1 presents the 14 problematic access patterns.6 Each pattern consists of a sequence of reads (R) and writes (W) to an atomic-set member l, or to two atomic-set members l1 and l2. Problematic access patterns 15 capture single-location data-consistency errors, while problematic access patterns 614 capture multi-location data-consistency errors. Each memory access occurs during a unit of work u; the subscript on the memory access denotes which unit of work it belongs to. For example, problematic access pattern 1 is dened as Ru(l) Wu (l) Wu(l), which describes an AS-serializability violation that involves an atomic-set member l. The rst read and last write belong to unit of work u. The intervening write belongs to a unit of work u that is executed by another thread.
4 Dening serializability in terms of AS-serializability requires the ability for atomic sets to span class definitions.
5 Vaziri et al. [54] use the annotation unitfor to mark method parameters that should be dynamically absorbed into an atomic set.
6 The patterns in Table1 appear in [22] and [54].
123
500 N. Kidd et al.
Table 1 14 problematic access patterns
Id Problematic access pattern Description
1 Ru(l) Wu (l) Wu(l) Value read is stale by the time an update is made in u
2 Ru(l) Wu (l) Ru(l) Two reads of the same location can yield different values in u
3 Wu(l) Ru (l) Wu(l) An intermediate state is observed by u
4 Wu(l) Wu (l) Ru(l) Value read may not be the same as the one written last in u
5 Wu(l) Wu (l) Wu(l) Value written by u can be lost
6 Wu(l1) Wu (l1) Wu (l2) Wu(l2) Memory can be left in an inconsistent state
7 Wu(l1) Wu (l2) Wu (l1) Wu(l2) Same as above
8 Wu(l1) Wu (l2) Wu(l2) Wu (l1) Same as above
9 Wu(l1) Ru (l1) Ru (l2) Wu(l2) State observed can be inconsistent10 Wu(l1) Ru (l2) Ru (l1) Wu(l2) Same as above
11 Ru(l1) Wu (l1) Wu (l2) Ru(l2) Same as above
12 Ru(l1) Wu (l2) Wu (l1) Ru(l2) Same as above
13 Ru(l1) Wu (l2) Ru(l2) Wu (l1) Same as above
14 Wu(l1) Ru (l2) Wu(l2) Ru (l1) Same as above
Patterns 15 involve a single memory location; patterns 614 involve a pair of memory locations
To give a concrete illustration of an AS-serializability violation, let us return to the program in Listing 2. The method Stack.pop performs no safety checking. That is, Stack.pop does not verify that the stack is nonempty before attempting to remove the top-of-stack item.If Stack.pop is invoked on an empty stack, then the eld Stack.count will be 1. The array access to
the eld Stack.data will result in an ArrayIndexOutOfBoundsException being raised because Stack.count has the value 1.
The class SafeWrap dened below Stack in Listing 2 addresses this oversight by dening the method SafeWrap.popwrap, which rst checks that the parameter Stack s is not empty before invoking the method Stack.pop. For the class SafeWrap, the method SafeWrap.popwrap is a unit of work, indicated by the @atomic annotation on the method, and the parameter Stack s is a unitfor parameter, indicated by the @atomic annotation on parameter s. (Note that the @atomic annotation provides a specication of the desired behavior, and not an implementation of it.) Recall that the atomic sets of a unitfor parameter are dynamically incorporated into the atomic sets of the invoking object, i.e., the thisparameter, for the duration of a unit of work. Because class SafeWrap does not dene any atomic sets, the atomic set with respect to which SafeWrap.popwrap must execute atomically is the atomic set S dened by class Stack.
Unfortunately, the attempt to harden the code by adding error checking has introduced an AS-serializability violation.The problem is that the program synchronizes on the wrong object. In this case, the method SafeWrap.popwrap has the synchronized annotation, whereas the method should
have been written to synchronize on the parameter Stack s. Thus, the interleaved execution shown in Fig. 1 is valid.
The interleaved execution shown in Fig. 1 contains problematic access pattern 12. The data accesses that are involved in the pattern are underlined in Fig. 1. Because of the AS-serializability violation, the method SafeWrap.popwrap can raise an ArrayIndexOutOfBoundsException even when it is invoked on a non-empty stack. The interleaved execution shown in Fig. 1 illustrates how that can happen. Initially, the stack contains one item. Thread T1 begins execution and checks that the stack is non-empty by invoking Stack.size. The check succeeds, and so T1s next action is to invoke Stack.pop. Before doing so, thread T2 successfully executes SafeWrap.popwrap, which removes the item from the stack, leaving it empty. When T1 resumes execution, it invokes Stack.pop on an empty stack, which raises an exception. The point at which the exception is raised is denoted by the symbol [maltesecross] at the end of thread T1s execution sequence.
3 EMPIRE
Empire is a tool to verify that errors such as the one described at the end of Sect.2 do not occur in a concurrent Java program Prog. Empire returns answers of the form the problematic access pattern p is definitely not present or the problematic access pattern p may be present, where p is an integer in the range 114 that species the particular problematic access pattern of interest (cf. Table1). Empire uses abstraction to generate an abstract program Prog such that the set of behaviors of Prog is a sound over-approximation of the set
123
Finding concurrency-related bugs 501
Fig. 1 An interleaved execution of thread T1 and T2 that contains an AS-serializability violation. R and W denote a read and write access, respectively. c and d denote elds count and data, respectively. [ and ] denote the beginning and end, respectively, of a unit of
work. The subscripts 1 and 2 are thread ids. (s and )s denote the acquire and release operations, respectively, of the lock of Stack sthat is the input parameter to SafeWrap.popwrap
of behaviors of Prog. The challenge is to dene a nite-data abstraction such that Prog is able to disallow certain thread interleavings by modeling the synchronization of Progs processes, and yet analysis of Prog remains decidable. These two properties are closely intertwined: if the program model becomes more precise it likely implies that a more powerful decision procedure will be required.
The Empire toolchain is shown in Fig. 2. The input to Empire is a concurrent Java program Prog along with a set of user-specied allocation sites. The allocation sites direct Empire to (attempt to) verify that all executions are AS-serializable with respect to the objects that can be allocated from those sites. We will generally refer to a specied allocation site by Site.
Example (Specied allocation site) In Listing 2, the allocation site of interest is signied by the subscripted newSitestatement on line 22.
Empires abstraction process proceeds in two stages. First, a source-to-source program transformation (abbreviated as Src-to-Src in Fig. 2) bootstraps the implementation of the random-isolation abstraction. Second, the transformed program is abstracted via random-isolation abstraction (RIA in Fig. 2) to generate an EML program. From one EML program, multiple queries are generated for the Ipamc model checker that implements our decision procedure. The input to Ipamc is a multi-pushdown system and an indexed phase automaton that species the safety property. We defer the definitions of these last two entities until Sect.8.
Before getting to random isolation, we begin by dening the Empire Modeling Language (EML). Dening EML rst serves to motivate the need for a new abstraction technique.
4 EML
An EML program EProg consists of (i) a nite set of shared-memory locations SMem; (ii) a nite set of reentrant locks
SLocks; and (iii) a nite number of concurrently executing processes SProcs.
Memory An EML shared-memory location m is an abstract memory location: abstract reads and writes can be made on m; however, EML does not have a notion of a value held by m. The lack of values stored at shared-memory locations is an artifact of Empires goal of verifying AS-serializability, which is a property of the order of interleaved reads and writes of an application, and not of the values read and written. Moreover, abstracting away values is required to ensure that the analysis of an EML program remains decidable.
Locks An EML lock is reentrant, meaning that the lock can be reacquired by the EML process that currently owns the lock, and also that the lock must be released the same number of times as it was acquired to become free. EML restricts the acquisition and release of an EML lock to occur within the body of a function, i.e., an EML lock cannot be acquired in a function f and released in another function f . In addition, the acquisition of multiple EML locks by an EML process must be properly nested: an EML process must release a set of held locks in the order opposite to their acquisition order. The two restrictions are naturally fullled when EML is used to model synchronized methods and blocks of a Java thread.7
An EML lock l can begin in an unallocated state and will remain unallocated until an EML process makes a transition on an edge alloc l (see Table2). Lock allocation is used to signify whether or not the randomly isolated object that is associated with l has been allocated. In Sect.6.1, which discusses the implementation of random-isolation, alloc lSiteRi is used to implement (in EML) the setting of the global ag FSiteRi.
EML does not have objects and object allocation does not occurthere is a nite set of shared memory locations SMem.
7 Java programs that use the synchronization primitives provided by the java.util.concurrent library need not satisfy the restriction that lock usage is properly nested. To analyze such programs, one would need to bypass EML and generate an Ipamc query (Sect.8) directly the caveat being that any results obtained from analyzing an Ipamc query that does not use properly nested locks is meaningless because the input query lies outside the domain of allowable queries.
123
502 N. Kidd et al.
...
Prog'
Prog EML
Multi-PDS
Multi-PDS
Multi-PDS
IPA
IPA
IPA
IPAMC
Src-to-Src RIA IPAMC
eml2pds
... ...
SAFE/ UNSAFE?
...
IPAMC
Fig. 2 The Empire toolchain. EML stands for Empire Modeling Language (Sect.4); RIA stands for random-isolation abstraction (Sect.6); Multi-PDS stands for multi-pushdown
system (Sect.8); IPA stands for indexed-phase automaton (Sect.8); and IPAMC is our model checker for multi-PDSs [31]
Table 2 The edge labels Labels of an EML ow graph that represents an EML function and their corresponding semantics
Labels Semantics
call f Invoke function f
read m Read from memory location m SMem write m Write to memory location m SMem alloc l Allocate the EML lock l SLocks
lock l Acquire the EML lock l SLocks
unlock l Release the EML lock l SLocks
unit begin Begin a unit of work
unit end End a unit of work
start Proc Start EML process Proc
skip A statement whose semantic action is not modeled
The choice of the word alloc reects the fact that, in the concrete Java program, a randomly isolated object needs to be allocated before its lock can be acquired. Moreover, it can be allocated at most one time. To summarize, alloc l is a semantic action in the EML program; it denotes that the EML execution models a path in the concrete Java program that allocates a randomly isolated (concrete) object.
Processes An EML process Proc is dened by a set of (possibly) recursive functions, one of which is designated as the main function of the process. An EML function f is dened by a labeled-ow graph G f . A labeled-ow graph is a tuple
G f = (Nodes f , Labels, Edges f , nentry, nexit), where:
Nodes f is a set of nodes; Labels is dened with respect to the set of memory locations SMem and set of locks SLocks as shown in Table2;
Edges f Nodes f Labels Nodes f is a set of
edges; nentry Nodes f is the distinct entry node; and
nexit Nodes f is the distinct exit node.
An edge e = (n, a, n ) Edges f models the ow of control
from node n to node n . Non-determinism is introduced by
having multiple outgoing edges from the same node. (EML programs have only non-deterministic branches.) The label a Labels represents a semantic action that the EML pro
cess performs when transferring control from node n to node n . EML supports the labels listed in Table2.
An edge labeled start Proc starts the EML process named Proc. This is used to model the fact that when a Java program begins, only one thread is executing its mainmethod, and all other threads cannot begin execution until they have been started by an already executing thread.
The design of EML is strictly motivated by the desire to detect AS-serializability violations, and thus we do not give a formal specication of its semantics. Instead, the meaning of an EML program is dened by its translation into a set of Ipamc queries. In some sense, Ipamc is a virtual machine for running an EML program and monitoring its behavior with respect to indexed phase automata (IPAs, which will be dened in Sect.8.1). The translation from EML to a multi-PDS (Sect.8) is viewed as compilation, and IPAs for the set of fourteen problematic access patterns dene a language of disallowed program behaviors. An EML program is compiled into a multi-PDS, and, in effect, the decision procedure implemented by Ipamc exhaustively runs the EML program on all possible inputs and checks whether any of the runs is in the language of problematic access paths. We defer discussion of the details of this process until Sect.8, and explain in Sects.57 how an EML program is generated. Ipamc itself is structured as a tool for checking an EML program with respect to an arbitrary IPA, and hence could be applied to problems other than detecting AS-serializability violations.
5 The allocation-site abstraction
The definition of EML in Sect.4 makes explicit the constraints that the result of program abstraction should satisfy: there must be a nite number of shared memory locations, a nite number of threads, and a nite number of reentrant locks that are acquired and released in a properly nested
123
Finding concurrency-related bugs 503
fashion. A concurrent Java program, however, will typically only satisfy the requirement that locks are acquired and released in a properly nested fashion (synchronized methods naturally satisfy this requirement).8
To address the incompatibilities between a concrete Java program and EML, we use program abstraction, and, in particular, the random-isolation abstraction. Before presenting the random-isolation abstraction, we motivate the need for a new abstraction by rst discussing why the allocation-site abstraction [27]a commonly used technique for modeling the unbounded set of dynamically allocated concrete objects with a nite set of abstract objectsis unsatisfactory. Specifically, we highlight the allocation-site abstractions limitations when analyzing a concurrent program that uses an unbounded number of dynamically allocated locks. Intuitively, the reason why locks pose such a problem is that modeling synchronization in the abstract program requires definite (must) informationi.e., information that an abstract lock, and hence all concrete locks represented by the abstract lock, is definitely in the locked state. When using the allocation-site abstraction, an analysis typically only has indefinite (may) information available. We make these concepts more concrete in the following discussion.
Given an allocation site Site for class T , let Conc(Site) denote the set of all concrete objects of class T that can be allocated at Site. The allocation-site abstraction uses a single abstract object o Site to summarize all of the concrete
objects in Conc(Site). When the size of Conc(Site), denoted by |Conc(Site)|, is greater than 1, the abstract object o Site
is referred to as a summary object. Thus, for each eld f dened by T , eld o Site. f is a summary eld for the set of
elds {o. f | o Conc(Site)}. Because the program has
a nite number of program points, and each class denes a nite number of elds, this results in a nite-data abstraction.
Example (Allocation-site abstraction) There are ve allocation sites in Listing 2: line 22 allocates a Stack object; line 30 allocates a SafeWrap object; line 35 allocates an Integer object; and lines 37 and 40 allocate Threadobjects T1 and T2, respectively. All allocation sites except the one on line 30 are executed exactly one time for all executions of the program shown in Listing 2. The allocation site on line 30 allocates two concrete objects because the method SafeWrap.makeSafeWrap is invoked once by thread T1 and once by thread T2.
The allocation-site abstraction would dene the ve abstract objects o 22, o 30, o 35, o 37, and o 40, where abstract object o i represents all concrete objects that could be allocated at the allocation site on line i. For the program in
8 We assume that an analyzed Java program is restricted to use synchronized methods and blocks and not other synchronization mechanisms. In particular, the non-syntactically restricted locks provided by the java.util.concurrent package are not used.
Listing 2, the abstract objects o 22, o 30, o 35, and o 37 each represent a singleton set because the program only executes the associated allocation statement once (as discussed above). However, the abstract object o 30 represents two concrete objects because the method SafeWrap.makeSafeWrapis invoked twice, once by threads T1 and T2, respectively.
For the allocation-site abstraction to be sound, an analysis generally has to perform weak updates on each summary object. That is, information for the summary object must be accumulated rather than overwritten. In contrast, a strong updatethe alternative to a weak updateoverwrites an abstract objects abstract state. Such an overwrite corresponds to a group kill when the abstract object is a summary object. Thus, a strong update of the abstract state generally can only be performed when the analysis can prove that there is at most one object allocated at Site, i.e., |Conc(Site)| = 1. For the program
in Listing 2, strong updates could be performed on all abstract objects except o 30 because abstract object o 30 represents two concrete SafeWrap objects. Note that an interprocedural analysis would be required to determine that the abstract object o 22the abstract object representing objects allocated by the Stack.makeStack methodrepresents a singleton set. The problem is that o 22 is used to allocate all Stack objects, and thus an interprocedural analysis would be required to prove that the method Stack.makeStack is invoked exactly one time for all executions of the program. Because of this difculty, analyses that make use of the allocation-site abstraction typically resort to the assumption that every abstract object is a summary object.
An AS-serializability violation is dened in terms of reads and writes to the elds of the T object allocated at a specied allocation site Site. Thus, to detect an AS-serializability violation, Empire must track reads and writes to these elds. In Listing 2, the allocation site of interest is on line 22. This is denoted by the subscripted newSitestatement. The allocation-site abstraction is a sound over-approximation for modeling reads and writes because a read from (write to) the abstract eld o Site. f corresponds to a
possible read from (write to) o. f , for one concrete object o in Conc(Site). For the progam shown in Listing 2, the reads and writes of interest are to the elds Stack.dataand Stack.count. The @atomic(S) annotation on the two elds species that each eld is a member of the atomic set S.
Checking for AS-serializability violations also requires Empire to model program synchronization. If Empire did not model synchronization operations, it would likely be useless because the number of abstract executions would grossly over-approximate the number of valid concrete executions, which would typically result in a large number of false
123
504 N. Kidd et al.
positives. Modeling Progs synchronization is accomplished by dening EML locks. There are two possibilities for dening the semantics of an EML lock:
1. The rst possibility is to interpret a lock acquire as a strong update, i.e., the program has definitely acquired a particular lock. This would correspond to acquiring the locks of all possible instances in Conc(Site), which in most circumstancesincluding the one herewould be unsound. In the example of Listing 2, this interpretation of locking combined with the allocation-site abstraction would preclude the interleaved program execution shown in Fig. 1 that contains the bug, because the two SafeWrap objects would effectively get the same lock, and the two SafeWrap.popwrap methods would execute without interleaving.
2. The second possibility for dening the semantics of EML locks is to interpret a lock acquire as a weak update, i.e., the program may have acquired a particular lock. A weak update would leave the lock in a possibly held state.When an EML process Proc attempts to acquire a lock that is in the possibly held state, two cases must be considered.
(a) The lock is actually held by another EML process Proc and thus Proc must block until Proc releases the lock.
(b) The lock is not held by another EML process Proc and thus Proc may acquire the lock.
This semantics is sound because all possible cases are considered. However, the overall effect of this semantics is, in essence, equivalent to an EML program without locks because the program can never reason denitively whether or not a lock is actually held. That is, in the abstract program a thread is always able to acquire a lock, which in essence means that the abstract program operates as if there are no synchronization constraints. In general, this possibility would greatly increase the number of false positives. For instance, in the example of Listing 2, if we were to x the code by adding an additional synchronization block on the parameter Stack sinside the body of SafeWrap.popwrap, the analysis would still report a bug because locking behavior was modeled imprecisely.
In summary, neither of the two possible semantics for locks with the allocation-site abstraction is satisfactory: the rst leads to an unsound analysis and the second leads to an analysis that would be useless in practice. Thus, a new abstraction mechanism is required, one that is sound, but allows strong updates to be performed on at least some of the abstract locks (so that the analysis can (partially) reason about program synchronization).
6 Random-isolation abstraction
To address the deciencies of using the allocation-site abstraction, we developed a new abstraction technique, random-isolation, which is a novel extension of allocation-site abstraction. The random-isolation abstraction is motivated by the following observation:
Observation 1 The concrete objects that can be allocated at a given allocation site Site, Conc(Site), cannot be distinguished by the allocation-site abstraction.
Observation 1 states that if one chooses to isolate a random concrete object rSite from Conc(Site), the allocation-site abstraction would not be able to distinguish the randomly chosen concrete object from any of the other concrete objects that are represented by o Site.
The random-isolation abstraction leverages Observation 1 by randomly isolating one of the concrete objects allocated at allocation site Site and tracking it specially in the abstraction. Whereas allocation-site abstraction would use one summary object o Site to represent all concrete
objects Conc(Site) that can be allocated at Site, random isolation uses two objects: one summary, o Site, and one non-summary, r Site. The object r Site is a non-sum
mary object because it alone represents the randomly isolated concrete object rSite. Because r Site is a non-sum
mary object, it is safe to perform strong updates to its (abstract) state, which gives us Random-isolation principle 1.
Random-isolation principle 1 (Updates) Let rSite Conc
(Site) be a randomly isolated concrete object. Because rSite is modeled by a special abstract object r Site, the random-
isolation abstraction enables an analysis to perform strong updates on the abstract state of r Site.
Random isolation also provides a powerful methodology for proving properties of a program, which is captured by following Proofs Principle. In plain English, to establish that a universally quantied safety property holds for all objects oSite Conc(Site), it sufces to estab
lish the property for only the randomly isolated object r
Conc(Site).
Random-isolation principle 2 (Proofs) Given allocation site Site and safety property over concrete objects, to establish that o : Conc(Site).(o) holds, it is sufcient
to establish that the augmented formula o : Conc(Site).
is_ri(o) (o) holds, where the predicate is_ri is true for
the randomly isolated object (allocated at site Site).
Proof The proof is by contradiction. Let Prog be a (concrete) concurrent program, Prog be an (abstract) concurrent program obtained via the random-isolation abstraction, where
123
Finding concurrency-related bugs 505
the set of behaviors of Prog over-approximates the set of behaviors of Prog, i.e., Prog is a sound abstraction of Prog.
Consider the case where the abstract program analyzeri.e., the software model checkerhas veried that holds for the randomly isolated object in Prog , and also that there exists a concrete trace where for some object oSite
Conc(Site), (oSite) does not hold. Because of random isolation, the randomly isolated object rSite is just as likely to be oSite as it is to be any other concrete object. Thus, the model checker must consider the case that rSite is oSite. Because the property holds for r Site, and because r Site represents oSite
in the trace under consideration, then the property must also hold for oSite, which is a contradiction under the above listed assumptions.
Random-isolation principle 2 is crucial for applying software model checking to the analysis of concurrent programs.In particular, it allows the model checker to consider only nite-data abstract programs yet still be able to prove safety properties of innite-data concrete programs.9 (Sect.10 discusses related techiques for proving safety properties of innite-data programs.)
Random isolation in Empire Before describing the technical details of how random isolation is implemented, we highlight the benets of random isolation as used in Empire.
Empire attempts to establish that all executions of a concurrent Java program Prog are AS-serializable with respect to the objects allocated at a user-specied allocation site Site.Formally, let (oSite) be a formula stating that an execution is AS-serializable with respect to the concrete object oSite.
Empires goal is then to establish that for all executions,
1 = x Conc(Site) . (x).
Applying random-isolation abstraction allows us, via Random Isolation Principle 2, to change the goal to that of establishing the property:
2 = x Conc(Site) . is_ri(x) (x).
Eliminating the implication, we rewrite 2 as follows:
3 = x : Conc(Site) . is_ri(x) (x).
Finally, to establish 3 via a software model checker, we perform the following steps:
1. An abstract program Prog is generated via the random-isolation abstraction.
9 We say nite-data and not nite-state because each thread has an unbounded-sized stack, giving rise to an innite-state space.
2. Formula 3 is reinterpreted in the abstraction:
3 = x {r Site, o Site} . is_ri(x ) (x ).
The universal quantication is now over the two abstract objects dened by the abstract program.10
3. Model checker Ipamc is used to check if there exists an execution of the abstract program Prog that satises the negation of the formula. If Ipamc reports that no such execution exists, then the original formula 3 holds. The negation is dened as:
3 = x {r Site, o Site} . is_ri(x ) (x ).
To use Ipamc to determine if an execution satises 3,
Empire must actually make fourteen separate queries. The reason being that the negated formula, (x ),
is precisely dened by the fourteen problematic access patterns listed in Table1.
The motivation for random isolation is to allow strong updates on the abstract lock of an abstract object, where a strong update corresponds to a definite lock acquire or definite lock release. Strong updates are enabled in EML by dening an EML lock for each non-summary object. Because of random isolation, the state of the Java lock that is associated with the randomly isolated instance rSite can be modeled precisely by the state of the special abstract object r Site.
That is, the acquiring and releasing of the lock for rSite by a thread of execution can be modeled by a strong update on the state of r Site, thus allowing the analyzer to disallow certain
thread interleavings when performing state-space exploration on the generated EML program, and thereby improving the precision of the analysis.
In contrast, because sound tracking of the lock state for a summary object generally would result in the possibly held state, EML programs have no locks for summary objects: their modeled behaviors are not restricted by synchronization primitives. This approach provides a sound, nite model of the locking behavior of Prog . (It is an over-approximation because the absence of locks on summary objects can cause the abstract program to have thread interleavings not found in the concrete Java program.)
6.1 Implementing random isolation
Random isolation can be implemented via a source-to-source transformation of the concrete Java program. For
10 The concrete and abstract formulas, 3 resp. 3, are related via the Embedding Theorem [50]. The reader is referred to [23, Lemma 1 and
Theorem 1] for correctness proofs in the context of three-valued logic and universal quantication in abstractions of logical structures.
123
506 N. Kidd et al.
expository purposes, we break down the discussion of the source-to-source transformation into three separate rewriting steps and a step in which the abstract program is generated:
1. Random tagging The rst transformation produces a residual program that for any execution of the original program Prog, at most one concrete object is randomly chosen to be tagged as special, i.e., at most one concrete object r is randomly isolated.
2. Synchronized-method elimination The second transformation makes explicit the synchronization operation for a Java synchronized method. Recall that a synchronized method implicitly acquires the lock associated with the implicit this parameter. This transformation makes the synchronization explicit by erasing synchronized from method declarations and introducing their semantically equivalent synchronizedblocks.
3. Synchronized-block case analysis Finally, synchronized blocks are the only source of lock and unlock operations.When analyzing the program model, the lock-state of a lock can generally only be determined precisely for randomly isolated objects, i.e., those introduced by the rst transformation above. However, due to aliasing it might not be possible to determine statically whether a reference uniquely refers to a randomly isolated object or not. The nal transformation introduces case-analysis expressions to determine whether an object reference ref in a Java expression synchronized(ref){...}
solely refers to a randomly isolated object or not.4. Dening the abstract program After the source-to-source transformation steps, an abstract program Prog is generated from the transformed concrete Java program Prog.
We now discuss each transformation step in turn. To assist the reader, the transformation steps are illustrated for the example program shown in Listing 2.
Random tagging As explained above, the purpose of the rst source-to-source transformation is to modify the program so that at most one concrete object is randomly tagged on any given run of a program. Because we do not wish to modify class definitions (e.g., by instrumenting each class to have an additional tag eld), our approach to random tagging is to (i) generate a fresh allocation site; and (ii) ensure that at most one object can be allocated from the fresh allocation site for any given run of the program. Thus, random tagging corresponds to being allocated from the new allocation site. An additional benet of this transformation is that it plays nicely with the allocation-site abstraction, the basis for the random-isolation abstraction. Specifically, by definition of the allocation-site abstraction, the introduction
of a new allocation site in the programs source code causes the allocation-site abstraction to have an additional abstract object. In essence, our implementation of the random-isolation abstraction extends the allocation-site abstraction with special semantics to abstract objects that correspond to the newly introduced allocation sitesnamely, the allocation-site abstraction is extended with the knowledge that the corresponding abstract objects are non-summary objects.
Concretely, let us consider the random-tagging transformation in the context of the example program in Listing 2, where the allocation site of interest is on line 22 and is repeated below for convenience.
return newSite Stack(); (1)
The random-tagging transformation involves transforming the newSite statement into
(rand() && FSiteRi.compareAndSet(false, true))
? newSiteRi Stack() : newSite Stack();(2)
Site Site from code fragment (1) is transformed into a conditional-allocation site, where the conditional performs a compareAndSet of a newly introduced global ag FSiteRi.11
The global ag FSiteRi can be set to true at most once, thus
ensuring that at most one concrete object rSite can ever be allocated at the generated site SiteRi. That is, FSiteRi guar
antees that |Conc(SiteRi)| 1 for all possible executions
of the program; consequently, the size of the set that is the concretization of r Site must also be less than or equal to one.
The observant reader might be concerned that an AS-serializability violation could occur on an object allocated from Site that is not tagged. For example, an execution that contains an AS-serializability violation but never allocates an object from the fresh allocation site SiteRi would satisfy this criteria. Fortunately, this is not a problem because our analysis considers all executions of the abstract program. The following corollary makes formal why such executionsthose where the fresh allocation site is never executedneed not be considered:
Corollary 1 For performing AS-serializability violation detection, we only have to be concerned with executions in which rSite is eventually allocated.
Proof The claimed property is a consequence of the use of randomness in code fragment 2. If there were a trace in which
11 b.compareAndSet(expect,update) is supported by class java.util.concurrent.atomic.AtomicBoolean. It atomically sets Boolean value b to update if the current value of b is the expected value expect. It is important to use an atomic operation in the test in (2); if FSiteRi were not set atomically, the source-to-source trans
formation would introduce a race condition that would allow multiple objects to be allocated at the newly introduced allocation site SiteRi.
123
Finding concurrency-related bugs 507
rSite was not allocated but the trace did in fact contain an AS-serializability violation, then because of the use of randomness, there must exist a similar trace in which the rSite object was allocated and the violation occurred on that object.
That is, the need to only consider traces in which rSite is (eventually) allocated is a corollary of Random-isolation principle 2. All such traces have the property that the global ag FSiteRi must eventually be set to true (see 3 in item
3 on page 505).
Synchronized-method elimination We now discuss the second transformation, which makes explicit all lock acquisitions and releases in the Java program Prog. It is desirable to have explicit locking operations because the analysis requires the ability to distinguish between locking operations performed on randomly tagged objectsi.e., concrete objects whose abstract counterpart is guaranteed to be a non-summary objectversus all other objects. Making the locking operations explicit allows uniform treatment of the case analysis, which is the subject of the third transformation. Because this transformation is trivial, we describe it by example.
Consider the method Stack.size() in Listing 2, repeated below for convenience:
@atomic sync int size() { return count + 1; }
The lock associated with the this object is implicitly acquired when size() is invoked, and released upon return.To make the acquisition and release explicit, the method is transformed as follows:
@atomic int size() { sync(this) { return count + 1; }}
Observe that the synchronized keyword has been removed from the method declaration, and the entire body of the method has been enclosed in a synchronized block that acquires the lock associated with the (implicit) this parameter. (For static synchronized methods, i.e., for synchronized class methods, the this parameter to the synchronized block is replaced with the name of the Java class in which the method declaration occurs.)
Synchronized-block case analysis The last transformation performs a case analysis on the points-to information associated with the reference parameter to a synchronized block.Consider the body of the transformed size method just described:
sync(this) { return count + 1; }in the context of the program with a transformed allocation site as described in the Random-Tagging transformation.
(rand() && testAndSet(FSiteRi))? newSiteRi Stack() : newSite Stack();
The size method could be invoked on either the randomly tagged object rSiteRi or an object allocated from the original allocation site oSite. When analyzing the program under random-isolation semantics, the analysis can generally only prove that the abstract object r SiteRi corresponding to the ran
domly tagged object rSiteRi is a non-summary object. Thus, locking operations, which constitute strong updates to the abstract state of a lock object, can generally only be performed on the object referred to by a Java reference if the analysis can prove that the reference only refers to a randomly tagged object. In points-to analysis terms, where the set of abstract objects that a reference (e.g, this) may point to is denoted by Pts(this), acquiring and releasing a lock is generally only possible when the analysis can prove that Pts(this) = {r SiteRi}.
Our nal transformation embeds a case analysis that enables such a distinction to be made. The key observation is that the desired casethat Pts(this) = {r SiteRi}can be
precisely checked for dynamically. Specifically, the dynamic check tests whether or not the reference refers to the randomly tagged (concrete) object rSiteRi. We call the checking method is_ri because it is a test whether a reference is a reference to a randomly isolated object. The concrete semantics of is_ri is as follows:
is_ri(t) =def FSiteRi t = rSiteRi
The nal transformation for a synchronized block on reference t is as follows:
//is_ri(t) = true implies Pts(t) = {rSiteRi} if (is_ri(t)) { sync(t) { body } }
//is_ri(t) = f alse implies Pts(t) rSiteRi
else { sync(t) { body } }
Observe that the true and false branches of the is_ri test perform the same concrete actions. This is crucial because the transformation must not alter the concrete semantics of the program. That is, one should view is_ri as performing a metadata test, where the metadata of interest is whether or not the referred to object has been randomly tagged. During EML generation (discussed in Sect.7), the metadata test determines whether or not the generated EML program will contain an action that changes the state of an abstract lock.
During points-to analysis, the interpretation of the call on is_ri performs a case analysis on Pts(t). Specifically, the abstract interpretation of is_ri performs the abstract test FSiteRi t = r SiteRi, which is just the correspond
ing abstract semantics of the concrete semantics of is_rigiven above. We extend the points-to analysis to interpret the is_ri method call as follows: when following the true branch of the condition, the points-to analysis performs an assume Pts(t) = {r SiteRi}; and when following the false
123
508 N. Kidd et al.
branch of the condition, the points-to analysis performs an assume Pts(t) = Pts(t) \ {r SiteRi}.
Remark 2 The nal transformation can be avoided, i.e., the transformation that implements a case analysis via is_ri, when the analysis can already prove that Pts(t) = {r SiteRi}.
Such a property can often be established by means of an object-sensitive interprocedural control-ow graph (ICFG).For readers familiar with object-sensitive analysis in the style of [40], and its corresponding object-sensitive call graph, the technique we use is similar to [40]. Briey, an object-sensitive call graph CG models the interprocedural control ow of a program, qualied by the identities of the receiver objects of methods: there is a node in CG for each method of the program for each contexti.e., each abstract object that can act as the receiver of the invocationin which it can be invoked [40]. An object-sensitive points-to analysis associates points-to facts with the nodes of CG, thus computing different points-to facts for different object contexts of the same method. Because these two analysis artifacts are object-sensitive, their respective dataow facts make a distinction between objects allocated at SiteRi and objects allocated at Site. For example, referring back to the program in Listing 2, there would be two copies of the control-ow graph (CFG) for the method Stack.size, one for object context o Site and one for object context r SiteRi. Thus, inside of the CFG for Stack.size with object context r SiteRi, an analysis is
able to take advantage of the fact that the special Java thisvariable refers solely to the non-summary object r SiteRi.
Dening the abstract program After the source-to-source transformations have been performed, the resulting program is ready for abstraction; the corresponding abstract program Prog will operate over a set of abstract objects consisting of o Site for each allocation site Site in the program (including the special abstract object r SiteRi for the generated allocation
site SiteRi). Because threads in Java are objects themselves, Prog now has a nite number of abstract threads, where each thread is associated with an allocation site. (We can, of course, generate multiple copies of each thread as needed.That is, the number of threads is actually a parameter of Prog .)
6.2 Summary
We conclude with three points about random isolation and its implementation via source-to-source transformations:
1. Observe that if the randomly isolated abstract object r SiteRi is folded into the summary object o Site, then the
abstraction reduces to the allocation-site abstraction.2. From the above observation, we can conclude that random isolation can be used to improve the precision
of any analysis that must reason about an unbounded set of indistinguishable objects, especially analyses that already use the allocation-site abstraction. Moreover, random isolation is particularly effective when an analysis needs the ability to perform strong updates on the abstract state of an object.3. Even though the transformations have been presented in the context of AS-serializability violation detection, random isolation is a generic approach that can be applied wherever an analysis needs to distinguish between r SiteRi
and o Site to perform a strong update. In particular, the
splitting of allocation sites to distinguish a random individual and the injection of case-analysis expressions to isolate the actions performed on that individual has applications outside of AS-serializability violation detection.
7 EML generation
Once the abstract program Prog has been generated from a concurrent Java program Prog, the next step in Empire is to generate an EML program EProg. We now discuss the EML-generation process.
To model the randomly isolated abstract object r Site, EProg denes a shared memory location m f for each eld f of the class T , and also an EML lock lSiteRi to model the lock associated with r Site. The status of the global ag
FSiteRi is modeled by the EML lock lSiteRi being allocated or not. As shown in Sect.6.1 (see Corollary 1), we are only concerned with executions in which lSiteRi is eventually allocated. Hence we need only be concerned with traces of the
EML program in which alloc lSiteRi appears somewhere.
Let Threads be the set of all subclasses of java.lang. Thread. For each Threads, and for each allocation site
Site that allocates an instance of , EProg denes an EML process ProcSite that models the behavior of one instance of that is allocated at Site. Also, EProg denes an EML process Procmain that models the Java thread that begins execution of the main method.
The functions of an EML process Proc correspond one-toone with the methods of the Java program, i.e., for each Java method m there is an EML function fm. The labeled-ow graph G fm is dened from the control-ow graph CFGm that
is associated with m.12 The control-ow graph CFGm consists of a set of Java statements Stmts, a successor relation Succ StmtsStmts, and has distinct entry and exit state
ments. The translation from CFGm to G fm is straightforward.
There is a node nstmt Nodes fm for each stmt Stmts.
There is a labeled edge (nstmt, astmt, nstmt ) for each pair of
12 When using an object-sensitive analysis as discussed in Remark 2, there will be multiple labeled-ow graphs for a Java method m, one for each object context in which it could be invoked.
123
Finding concurrency-related bugs 509
Table 3 Java statement types for CFGm, their corresponding
EML labels, and the condition necessary to generate the EML label
The nal row is a catch all for the Java statements that are either not modeled in EML or that do not satisfy the listed condition
stmt astmt Condition
o.m() call m
entry unitbegin this= r Site, CFGm is a unit of work exit unitend this= r Site, CFGm is a unit of work x = o.f, x = o.f[i] read m f r Site Pts(o)
o.f = x, o.f[i] = x write m f r Site Pts(o) newSiteRi T alloc lr
Site
monitorenter o lock lr
Site Pts(o) = {r Site} monitorexit o unlock lr
Site Pts(o) = {r Site} o.start() start ProcSite o Site Pts(o)
* skip
statements (stmt, stmt ) Succ. The label astmt models the
execution of the Java statement stmt. Table3 shows the label astmt that is generated for a Java statement stmt. Note that(i) synchronized blocks have been compiled down to the lower-level Java bytecode statements monitorenter and monitorexit; (ii) label generation must know if a method is a unit of work; and (iii) a read or write access to the contents of array eld f of T is treated as a read or write access to the eld f .13
8 Verifying AS-serializability of EML programs
An EML program has a set of shared-memory locations, SMem, a set of EML locks, SLocks, and a set of EML processes, SProcs. To verify AS-serializability of an EML program, Empire generates a number of queries that are then answered by Ipamc [31,33]a model checker for multi-PDSs with reentrant locks, whose decision procedure can answer the types of queries required by Empire.
Remark 3 Even though many analyses for multi-PDSs are undecidable [28] showed that property checking for certain fragments of temporal logic is decidable. For a detailed account of the decision procedure of [31] and the characteristics of the problem that render it decidable, the reader is referred to [29].
A query is generated for each pair (m, m ) SMemSMem
for the fourteen interleaving scenarios. Pairs are used because the interleaving scenarios are dened in terms of at most two locations from an atomic set (cf. Table1). Moreover, AS-serializability violation detection is asymmetric in that
13 Our decision to treat an array and its contents as one eld is not strictly necessary. Using a more-rened analysis that can reason about arrays [13,21,50], a more precise model could be generated.
Table 4 The encoding of a call graphs and CFGs edges as PDS rules
Rule Control ow modeled
p, n1
s1
p, n2 Intraprocedural edge n1 n2
p, nc
sc
p, e f rc Call to f , with entry e f ,
from nc that returns to rc
p, x f
s f
p, Return from f at exit x f
The action a denotes the abstract behavior of executing that edge
it is performed with respect to an individual EML process Proc. In total, Empire generates O(|SProcs|14(|SMem|2))
queries for an EML program.
A generated query consists of a (i) pushdown system (PDS) for each EML process (Sect.8.2), and (ii) an indexed phase automaton (IPA)a restricted kind of nondeterministic nite automaton (NFA)that recognizes EML execution traces containing an AS-serializability violation involving one or two memory locations (Sect.8.4). The remainder of this section proceeds as follows: Sect.8.1 formally denes PDSs, multi-PDSs, and IPAs; Sect.8.2 denes the translation from an EML process to a PDS; Sect.8.3 describes how Ipamc handles reentrant locks; and Sect.8.4 denes the IPA for an atomic-set/problematic-access-pattern pair.
8.1 Definitions
A pushdown system (PDS) naturally models the interprocedural control ow of an EML process (see Table 4). In essence, a PDS is an NFA equipped with a stack. The control states of the NFA portion of a PDS models the global state of an EML process, while the stack is used to ensure that only interprocedurally valid paths are modeled.
123
510 N. Kidd et al.
Denition 1 A pushdown system (PDS) is a tuple P =
(P, , Lab, , c0), where P is a nite set of control locations; is a nite set of stack symbols; Lab is a nite set of labels (or actions); (P ) Lab (P ) is a
nite set of labeled-transition rules, where a rule is denoted by r = p,
a
p , u ; and c0 = p0, 0 is the
initial conguration of P. A conguration c of P is a pair p P, u . For a rule r = p,
a
c j,M[i] = (, 0),
and g = (c1, . . . , c j, . . . , cn,M[i ( j, 1)]).
3. cj
p , u , we
use lab(r) to denote rs label a. Without loss of generality, we restrict there to be at most two right-hand-side stack symbols of a rule. A rule with zero, one, and two right-hand-side stack symbols is called a pop, step, and push rule, respectively.
A concurrent program that synchronizes via locks, i.e., an EML program, is modeled by a multi-PDS. Informally, a multi-PDS consists of a nite set of PDSs and a nite set of locks. The intention is that each PDS models a thread, and that the PDSs acquire and release locks to perform global synchronization. We assume that locks are acquired and released in a well-nested fashionlocks are released in the opposite order in which they are acquiredand in synchrony with a PDSs push and pop rules. In fact, the latter assumption is all that is necessary as it implies the former.
Denition 2 A multi-PDS consists of a nite number of PDSs P1, . . . , Pn, where each PDS Pj = (Pj, j , Labj ,
j , c j0), that synchronize via a nite set of locks SLocks = {l1, . . . , l|SLocks|}. The actions Lab of each PDS consist of
lock-acquires ((i) and releases ()i) for 1 i |SLocks|,
plus symbols from , a nite alphabet of non-parenthesis symbols. A global conguration (c1, . . . , cn,M) is a tuple
consisting of:
a local conguration cj for each PDS Pi, 1 j n;
and an ownership arrayM of length |SLocks|, in which each
entry indicates the owner of a given lock: for each 1
i |SLocks|,M[i] ({, 1, . . . , n} N ) is a pair where
the rst component indicates the identity j of the PDS
Pj that holds lock li ( signies that li is currently not
held by any PDS), and the second component is a nonnegative number that indicates the number of times that a PDS has (re)acquired a lock.
The initial global conguration g0 = (c10, . . . , cn0,M0), where
ci0 is the initial conguration of PDS Pi, 1 i n, andM0
is the initial ownership array that maps each entryM[i], 1
i |SLocks|, to the ownership pair (, 0). For an ownership
arrayM, an update at position i for lock li to a new ownership
pair p is denoted byM[i p].
Denition 3 For multi-PDS = (P1, . . . , Pn, SLocks, ),
the Reentrant Semantics allows for a lock l SLocks to be
reacquired by the PDS that owns the lock. In particular, two global congurations g and g are in the relation [squiggleright], denoted by g [squiggleright] g , iff g = (c1, . . . , cj , . . . , cn,M) and one of the
following holds:
1. cj a
c j, a , and g = (c1 . . . , c j, . . . , cn,M).2. cj
(i
c j,M[i] = ( j, z),
and g = (c1, . . . , c j, . . . , cn,M[i ( j, z + 1)]).
4. cj
(i
c j,M[i] = ( j, z), z > 1,
and g = (c1, . . . , c j, . . . , cn,M[i ( j, z 1)]).
5. cj
)i
c j,M[i] = ( j, 1),
and g = (c1, . . . , c j, . . . , cn,M[i (, 0)]).
The reexive transitive closure of [squiggleright] is denoted by g [squiggleright] g . A multi-PDS execution is a nite sequence steps g0 [squiggleright] g1 [squiggleright] [squiggleright] gk. The word w assocated with is
the sequence of actions from locking operations are not exposedthat correspond to each transition step by item 1 above.
Given a multi-PDS, Ipamc is a model checker that determines whether there exists an interleaved execution of a multi-PDS such that the word w associated with is recognized by an indexed phase automaton (IPA).
Denition 4 An indexed phase automaton (IPA) is a tuple (Q, Id, , ), where Q is a nite, totally ordered set of states
{q1, . . . , q|Q|}, Id is a nite set of thread identiers, is a
nite alphabet, and Q Id Q is a transition rela
tion. The transition relation is restricted to respect the order on states: for each transition (qx, i, a, qy) , either y = x
or y = x +1. We call a transition of the form (qx, i, a, qx+1)
a phase transition. The initial state is q1, and the nal state is q|Q|.
The restriction on in Definition4 ensures that the only loops in an IPA are self-loops on states, which is crucial for keeping the checking problem decidable [29]. We assume that for every x, 1 x < |Q|, there is only one phase transi
tion of the form (qx, i, a, qx+1) . (An IPA that has multiple
such transitions can be factored into a set of IPAs, each of which satisfy this property.) Finally, we only consider IPAs that recognize a non-empty language, which means that an IPA must have exactly (|Q| 1) phase transitions, i.e., any
accepting run of an IPA will contain only a bounded number of phase transitions.
)i
123
Finding concurrency-related bugs 511
8.2 Modeling an EML process
We now discuss the generation of a PDS for an EML process Proc. For expository purposes, we break the translation into two steps.
Step 1: a PDS P1 is generated that models the interpro
cedural control ow of Proc. In this stage, all locking operationsi.e., inter-process communicationare not modeled by P1.
Step 2: from PDS P1 a PDS P2 is generated that accounts
for locking operations and thus implements the necessary interprocess communication that restricts the set of allowable schedules.
Step 1: The rst step is straightforward: a single-state PDS P1 = (P1 = {p}, Lab1, 1, 1, p, emain ) is gen
erated using the rule templates depicted in Table 4, with Lab1 being the set of all distinct EML statements used by
Proc prexed with the EML processs name. For example, if the EML statement is read m and the EML processs name is Proc, then Lab1 will contain Proc.read m. Including the EML processs name in the PDS action enables the violation monitor and locks to know which EML process performs an action (cf. the PDS rules for an EML lock above).
P1 captures the interprocedural control ow of Proc. There is one exception, the EML statement start ProcSite does not include Proc as a prex because a thread can be started by any other thread (but only started one time).
Step 2: PDS P2 is PDS P1 augmented to account for
lock allocation. (Recall that a lock must be allocated before it can be used.) The set of control locations P1 of P1 is expanded
to include a boolean ag for each lock l. If the ag is true then l has been allocated, otherwise an EML process has yet to allocate l.
From Procs point of view, there are two ways that a lock can be allocated: either Proc allocates a lock or another EML process Proc allocates the lock. If Proc allocates the lock l,
then there will be a PDS rule of the form p,
Formally, PDS P2 = (P2, Lab2, 2, 2, , emain ),
where
P2 = 2S
Locks : a control location is a set of ags s denoting which locks have been allocated. The set of control locations P1 is not used because it is the singleton set {p}.
Lab2 = Lab1 {P .alloc l | P (SProcs \ {Proc}), l
SLocks}: Lab2 is Lab1 augmented to include actions that
allow for P2 to guess when another EML process Proc
allocates a lock. 2 = 1 {guess}: 2 includes the stack symbol guess
that implements the guessing procedure. 2 is dened from 1 as shown in Table5. Row 2 ensures that no lock is allocated more than once; row 3 ensures that a lock is not used before being allocated; and rows 4 and 5 ensure that the shared-memory locations are not accessed before r Site has been allocated. Row 6 denes
rules that invoke the guessing procedure for each conguration of P2. Guessing is necessary because an EML
process cannot know when another EML process allocates a lock. Row 7 denes rules that implement the guessing procedure: from control location s, s SLocks, guess
that EML process P (SProcs \ {P}) allocates a lock
l (SLocks \s), and return back to the caller in the control
location s {l}. The guessing rule is then labeled with
action P .alloc l.
The multi-PDS that models an EML program then has, for each EML process, a PDS P dened as just described,
and the set of locks is merely the set of locks SLocks of the EML program.
8.3 Modeling an EML Lock
The translation from an EML lock to a multi-PDS lock that is used by Ipamc is straightforward: there is a one-to-one correspondence between EML locks and the set of multi-PDS locks. The use of the notation SLocks to represent both the set of EML locks and multi-PDS locks emphasizes this correspondence.
We next briey describe how Ipamc replaces reentrant locks with non-reentrant locks via a transformation called language-strength reduction [30]. The transformation relies on the fact that lock acquisitions and releases are synchronized with a PDSs stack. Briey, the technique involves transforming a PDS so that it pushes a special marker onto the stack the rst time a lock is acquired, and also records the fact that the PDS acquired the lock in the PDSs state space. All subsequent lock acquisitions and and their matching releases do not change the state of the PDS. When the special marker is seen again on the stack, the PDS is about to execute the matching release for the initial lock acquisition, and thus actually release the lock. In essence, for PDS
Proc.alloc l
p , u . The corresponding rule in PDS P2s rule set 2
must ensure that the control location on the left-hand-side has the ag for l set to false, and the control location on the right-hand-side has the ag for l set to true. Otherwise Proc allocates l. In this case, Proc has no way of knowing when Proc allocates l, and therefore must guess when the allocation occurred. Guessing is modeled by non-deterministically invoking the guess method, which simply guesses that another EML process allocates l. Because of non-determinism, the guess method causes Proc to consider all possibilities of lock allocation (i.e., the cross product of SProcs
and SLocks).
123
512 N. Kidd et al.
Table 5 Each row denes a set of PDS rules in 2 from a rule in 1
The control location p from a rule in 1 is not repeated because all rules in 1 are single-control-location rules.The condition for generating a rule reects that certain actions can only occur when a lock has been allocated, e.g., acquiring a lock l can only occur after l has been allocated (see Sect.8.2)
Action a Rule p,
a
p, w 1
, start P { s,
a
s, w | s 2S
Locks
}
alloc l { s,
P.a
s {l}, w | s 2S
Locks
l /
s }
lock/unlock l { s,
P.a
s, w | s 2S
Locks
l s }
read/write m { s,
P.a
s, w | s 2S
Locks
lSiteRi s }
ubegin/uend { s,
P.a
s, w | s 2S
Locks
lSiteRi s }
{ s,
s, guess | s 2S
Locks
}
{ s, guess
P .alloc l
s {l}, | s 2S
Locks
l /
s P (SProcs \ {P}) }
P = (P, Lab, , , c0), the transformation denes a new
PDS P = (P , Lab, , , c 0), where
P = P 2S
c j,M[i] = ,
and g = (c1, . . . , c j, . . . , cn,M[i j]).
3. cj
(i
Locks , where a member (p, s) of P includes the original control state p of P and a set s that records the set of locks that P currently holds.
= ( SLocks) consists of the original stack
alphabet , and, in addition, a new set of symbols that enables P to record on the stack the rst time that a lock
l SLocks has been acquired.
contains, for each rule r , a set of rules that main
tain and update the set of held locks s for a control state (p, s), and record on the stack via a symbol (, l) that a lock has been acquired for the rst time. The exact definition of is beyond the scope of this paper, and we refer the reader to [30] for further details.
c 0 = (p0, ), 0 is the initial conguration paired with
to indicate that P does not hold any locks.
After transforming P to be P , all nested lock acquisi
tions and releases have been removed. Hence, we can rene the Reentrant Semantics to disallow congurations where an ownership arrayM contains at position i for lock li an
ownership pair ( j, z) such that z > 1, which gives the Non-Reentrant Semantics. To distinguish between the Non-Reentrant and Reentrant Semantics, we will use j and to
denote the ownership pair ( j, 1) and (, 0), respectively. The
shorthand is sound because there can be no ambiguity, i.e., ( j, z) where z > 1 is not allowed. Moreover, to distinguish the Non-Reentrant Semantics, we will use instead of [squiggleright] to
denote the transition relation between global congurations.
Denition 5 The Non-Reentrant Semantics of a multi-PDS C P is dened by a transition relation on global congura
tions. Two global congurations g and g are in , denoted
by g g , iff g = (c1, . . . , cj , . . . , cn,M) and one of the
following holds:
1. cj a
c j, a /
{ (i, )i },
and g = (c1 . . . , c j, . . . , cn,M).
2. cj
c j,M[i] = j,
and g = (c1, . . . , c j, . . . , cn,M[i ]).
Note that items 1,2, and 3 correspond to items 1,2, and 5 of the Reentrant Semantics. The word w that for an execution is dened the same as in the Reentrant Semantics.
8.4 Violation monitor
The IPA Amon acts as a violation monitor that detects when
one of the problematic access patterns occurs during a unit of work for a specic EML process Proc. To do so, it must track (i) the reads and writes to the shared-memory locations SMem by each EML process, and (ii) whether or not the target
EML process Proc is executing a unit of work.
Tracking the reads and writes of EML processes requires only a nite amount of state, i.e., state to track which reads and writes of interest have been seen. Recall that units of work are reentrant because unit-of-work methods may be recursive or invoke other unit-of-work methods. Thus, tracking the unit-of-work status of Proc requires an innite amount of state. Fortunately, the stack of the PDS for Proc is innite-state, and the language-strength-reduction transformation [30] can be used to fuse the unit-of-work depth count with the PDS stack. The transformation is analogous to the one discussed in Sect.8.3. After applying the transformation, tracking the unit-of-work status for Proc requires only a nite amount of state, namely a Boolean ag, and thus tracking both the reads and writes of interest and the unit-of-work status of Proc can be performed by an IPA Amon.
The IPA Amon accepts interleaved executions (traces) that
contain the memory accesses specied by the problematic access pattern of interest. To make the discussion more
)i
123
Finding concurrency-related bugs 513
Fig. 3 The NFA A12 that recognizes traces of interleaved read and
write memory accesses containing problematic access pattern 12 for the program shown in Listing 2. The edge labeled alloc denotes allocating the randomly isolated object. An edge labeled R1(c) (W2(c))
denotes a read from (write to) the eld Stack.count by thread T1 (T2). Similarly, edges labeled R1(d) and W2(d) denote accesses to the
eld Stack.data. The symbols [ and ] denote Proc beginning and
ending a unit of work, respectively. denotes the input alphabet of
A12, and is dened as \ {]}. Once A12 guesses that a violation
will occur by making a transition to state q3, it must observe a violation before the unit-of-work end symbol ] appears in a trace. Otherwise,
it will become stuck in a state q36
concrete, we focus on the IPA A12 shown in Fig. 3, which
tracks the problematic access pattern R1(c); W2(d); W2(c);
R1(d) for the AS-serializability violation from the program shown in Listing 2. For a generic problematic access pattern p, the definition of the NFA Amon that recognizes traces
containing p follows naturally.Figure3 gives a graphical depiction of A12. The initial
state is q1 and the nal state is q7. For a trace to be accepted by A12, it must make a transition through each state q17. That
is, the states q17 track the memory accesses that make up problematic access pattern 12. The transition (q1, alloc, q2) models the allocation of the randomly isolated object. Once the randomly isolated object has been allocated, A12 ignores
reads and writes by following the self-loop on state q2 until the target EML process Proc begins a unit of workmodeled by the transition (q2, [, q3).
A12 makes use of non-determinism. For each state qi, 3 i 6, there is a self-loop labeled with [Ri Wi. The sym
bol [ models reentrant calls to units of work. The state does
not change because Proc must be executing a unit of work for A12 to be in state qi. The symbols Ri and Wi denote read
and write accesses to any memory location by either thread. The use of non-determinism enables A12 to guess which
memory accesses are actually involved in problematic access pattern 12. For example, if A12 is in state q3 and observes the
action R1(c), it can make a transition to state q4-the action is part of problematic access pattern 12or it can follow the self-loop and remain in state q3the action is not part of problematic access pattern 12. Non-determinism is required because a thread may perform multiple memory accesses during a unit of work.
8.5 Queries
Once a multi-PDS and IPA Amon have been generated
for an EML program EProg, a query is given to the model checker Ipamc to determine if an interleaved execution of is recognized by Amon. If Ipamc returns true, meaning that
there is an interleaved execution of recognized by Amon,
then the EML program has an AS-serializability violation.
Otherwise, Ipamc returns false, and the EML program does not have an AS-serializability violation for the given scenario dened over a single or double-memory location for scenarios 15 and 614, respectively, and the target EML process Proc.
9 Experiments
Empire is implemented using the WALA [25] program-analysis framework. Random isolation is implemented using WALAs facilities for rewriting the abstract-syntax tree (AST) of a Java program. The default object-sensitive call graph construction and points-to analyses are modied to implement the semantic reinterpretation of is_ri, as described in Sect.6.1. After rewriting the ASTs, Empire emits an EML program from the input Java program. The EML program is then translated into multiple multi-PDS/IPA pairs, for which reachability queries are answered using Ipamc [31,33]. All experiments were run on a dual-core 3GHz Pentium Xeon processor with 4GB of memory.
The goal of the experiments was to determine whether the techniques developed and implemented in Empire could detect both single- and multi-location AS-serializability violations. We evaluated Empire on eight programs from the ConTest suite [15], which is a set of small benchmarks with known concurrency bugs. Empire requires that the allocation site of interest be annotated in the source program. We annotated eleven of the twenty-seven programs that Con-Test documentation identies as having non-atomic bugs. Our front-end currently handles eight of the eleven (the AST rewriting currently does not support certain Java constructs). When analyzing a program with the user-specied allocation site Site that allocates an object of type T , we used the default assumptions that (i) all elds declared by T are in one atomic set, and (ii) each public method dened by T is a unit of work.
To reduce the size of the generated models, we made minor modications to the benchmark programs. For the programs analyzed, le I/O is used to output debugging and scheduling
123
514 N. Kidd et al.
information, and to receive input that species the number of threads. We removed these operations, and manually unrolled loops that allocate Thread objects 2 times. When a benchmark used a shared object of type java.lang.Objectas a lock, the type was changed to java.lang.Integerbecause our implementation uses selective object-sensitivity, for which the use of java.lang.Object as a shared lock removes all selectivity and severely degrades performance.14
The programs SoftwareVerificationHW and shopdene each threads run() method to consist of a loop that repeatedly executes one unit of work. For these programs, the code in the body of the loop was extracted out into its own method so that the default unit-of-work assumptions would be correct. Each modication had no impact on the AS-serializability violations that could occur in a benchmark.The difculty in automating these transformations is determining programmer intent. For example, removing debugging output is easy for an analyst who can use comments, variable names, and generated output to determine that certain output statements are not relevant to program correctness. The use of Java annotations to decorate such statements would greatly simplify that task. That is, given annotations such as @debug or @lock on program statements and elds, respectively, manual transformations would no longer be necessary.
In total, Empire generated 4583 Ipamc queries. Each query was analyzed using a 300-second timeout, and completed successfully within that threshold. Table6 presents a summary of the analysis results. The dividing line that separates the rst four benchmarks from the latter four benchmarks pertains to lock usage. Each generated EML program consists of a single lock, namely, the lock for the randomly isolated object.15 However, for the rst four benchmarks, the code does not contain synchronized methods, and hence does not acquire and release the lock associated with the randomly isolated object. For the latter four benchmarks, the code contains synchronized methods, and the generated EML programs for these four contain lock and unlock operations. In total, Empire:
1. found 212 (possible) AS-serializability violations (col. # Violations Found); and
2. determined that 4371 queries did not contain an AS-serializability violation (col. # Queries Veried).
14 By selective object-sensitivity, we mean that a full object-sensitive call graph is not constructed because doing so exhausts memory resources. Instead, the call graph is only object sensitive with respect to a few key object types: the type T of the specied allocation site, the types of all of T s elds, and the type of all subclasses of threads.
15 Empire can generate multi-lock EML programs when the Java program makes use of global locks that are guaranteed to be unique, such as the lock that is associated with a Class object.
Table7 presents a breakdown of the problematic-access-pattern numbers of the AS-serializability violations that were found for each benchmark program. For 7 of the 8 benchmarks listed in Table7, Empire found multiple actual AS-serializability violations (indicated by ), where by
actual AS-serializability violation we mean a true AS-serializability violation in the concrete concurrent Java program listed. The false positives (denoted by ) reported for
PingPong are due to an over-approximation of a threads control owexceptional-control-ow paths are allowed in the model that cannot occur during a real execution of the program. Table7 entries marked by ? indicate problematic access patterns for which there is an AS-serializability violation in the EML program, but has not been veried in the original Java program.16
Overall, the experiments show that Empire is able to detect both single-and multi-location AS-serializability violations. It is interesting to note that for benchmarks where a multi-location AS-serializability violation was found, a single-location AS-serializability violation also occurred. This can be explained by the fact that Java methods typically read and write to a eld multiple times, which allows for both single- and multi-location AS-serializability violations to occur.
10 Related work
Strong updates on an isolated non-summary object The idea of identifying non-summary objects so that strong updates can be performed on them has a long history in shape-analysis algorithms. Originally, some non-summary objects could be identied as a fortuitous byproduct of the abstraction in use. For instance, with approaches based on k-limiting [24,26] non-summary nodes are maintained along selector-edge paths of length k from program variables. The shape
abstraction of Chase et al. [11] tracked which node merges in an abstract memory conguration preserved the property that the resulting abstract node represents a single concrete node in each store that the abstract memory conguration represents.
Plevyak et al. [44] introduced the idea of deliberately isolating a non-summary node that represents only the memory location that will be updated during a transition. They called this operation deconstruction. Deconstruction has turned out to be essential, both for developing analyses that are tolerant
16 Due to a limitation of our implementation in not producing witness traces for failed queries (sometimes called counterexamples), we are not able to check whether the AS-serializability violations found using Ipamc but not by our previous approach [32] using the model checker Cpdsmc [10] are actual bugs or false positives. The lack of witness traces is not a fundamental limitation of the approach used in Ipamc, just of our current implementation; in principle, it is possible to extend Ipamc to produce witness traces.
123
Finding concurrency-related bugs 515
Table 6 For each benchmark, column # Queries gives the number of queries that were generated (i.e., the number of IPAs generated for a benchmarks model )
Benchmark Number of queries Number of violations found Number of queries veried
Account 642 38 604
AirlineTickets 900 18 882
PingPong 384 35 349
ProducerConsumer 512 72 440
SoftwareVerication HW 15 6 9
BugTester 615 0 615
BuggyProgram 615 16 599
Shop 900 27 873
Totals 4,583 212 4,371
Columns # Violations Found and # Queries Veried give the breakdown of query satisfaction (i.e., an execution of is and is not, respectively, recognized by a generated IPA). The horizontal line after row 4 separates the benchmarks that did not contain any synchronization operations after abstraction from those that still did contain synchronization operations
Table 7 Marked entries denote violations reported by Empire, with being a true positive;
a false positive; and ? being a reported violation that we have not determined whether or not it is a true positive
Program 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Account ? ? ? ?
AirlineTickets ? ? ?
PingPong
ProducerConsumer ? ? ? ? ? ?
SoftwareVerication HW ? ? ? ?
BugTester
BuggyProgram
Shop ? ? ? ? ? ? ? ? ?
of temporary violations of data-structure invariants, as well as for developing analyses that are capable of synthesizing data-structure invariants while analyzing loops that traverse linked data structures. Ordinarily, deconstruction is used to isolate the current node m being operated on, which is often the only node that does not satisfy some node-centric property (n) that holds for all other nodes ni of the data structure (often a local connectivity property, but sometimes a non-local reachability property). Once surgery has been completed on node m to establish (m), the loop moves on to another node, and m is often folded into an existing summary nodeand thus () continues to hold for all concrete
nodes represented by the summary node. By this means, the xed-point-nding loop of an abstract-interpretation solver can perform what amounts to an inductive argument that the loop (re-)establishes () throughout the data structure.
The work of Plevyak et al. [44], and related work by Sagiv et al. [49], exploited deconstruction in analyses tailored for specic data structuresdoubly linked lists in the case of Plevyak et al., and singly linked lists in the case of Sagiv et al. Later work by Sagiv et al. [50] gave an account of
deconstruction in terms of logic, and showed that the essence of deconstruction involves a step, called focus, which forces the values of certain formulas (e.g., one that species which memory location will be updated during a transition) from an indefinite value (unknown) to a definite value (true or false). Focus is a semantic reduction [12]: it has the effect of converting an abstract memory conguration into one or more abstract memory congurations that are more precise than the original one. Moreover, the approach taken by Sagiv et al. [50] provided the ability to apply deconstruction in every abstraction speciable using their parametric framework for shape analysis (which is implemented in the TVLA system [36]).
Gopan et al. [19,20] also used the idea of working with a non-summary element so that strong updates could be performed, and showed how it could be applied to the analysis of programs that manipulate arrays.
Work in the type-theory community that uses similar ideas includes that of Walker and Morrisett [55], Foster et al. [18], Fahndrich and DeLine [16], Aiken et al. [2], Ahmed et al. [1], and Rondon et al. [48].
Scenarios 616 involve two memory locations
123
516 N. Kidd et al.
When the analysis methods discussed above also employ the allocation-site abstraction [27], each abstract memory conguration will have some bounded number of abstract nodes per allocation site.
Like random-isolation abstraction, recency abstraction [5] uses no more than two abstract blocks per allocation site Site: a non-summary block MRAB[Site], which
represents the most-recently allocated block allocated at Site, and a summary block NMRAB[Site], which repre
sents the non-most-recently allocated blocks allocated at Site. As the names indicate, recency abstraction is based on tracking a temporal property of a block b: the is-the-most-recent-block-from-Site(b) property, which serves to isolate MRAB[b] from the blocks represented by NMRAB[b].
With counter abstraction [39,45,58], numeric information is attached to summary objects to characterize the number of concrete objects represented. The information on summary object u of abstract conguration S describes the number of concrete objects that are mapped to u in any concrete conguration that S represents. Counter abstraction has been used in the analysis of innite-state systems [39,45], as well as in shape analysis [58].
In contrast to all of the aforementioned work, random-isolation abstraction is based on isolating a random individual (so that strong updates can be performed on its abstract counterpart), tracking its properties, and generalizing from the properties of the randomly chosen individual to all of the individuals allocated at the same allocation site, according to Random-isolation principle 2. The isolation can be performed either via (i) a source-to-source transformation of the original program (as explained in Sect.6.1); (ii) in the concrete operational semantics; or (iii) in the abstract semantics. Method (i) facilitates the use of random isolation in conjunction with existing analyses as a way to strengthen them. In addition to the work described in the present paper, random isolation has been used to model check information-ow properties of decentralized information ow control (DIFC) systems, which manipulate potentially unbounded sets of processes, principals, and communication channels [23].
The work most closely related to random isolation is the technique of temporal case splitting [14], which was developed independently and contemporaneously with random isolation [32]. Temporal case splitting also provides a way to reduce analysis problems that involve an unbounded number of entities and resources to a nite abstraction. In temporal case splitting, manually introduced Skolem variables serve to name a single, arbitrary individual among a given class of individuals. Emmi et al. [14] used temporal case splitting as one of several techniques in a tool to verify programs that use reference counting to track resources and schedule their deallocation.
Detection of concurrency-related bugs. Traditional work on error detection for concurrent programs has focused on classical data races. Static approaches for detecting data races include type systems, where the programmer indicates proper synchronization via type annotations (e.g., [7]), model checking (e.g., [46]), and static analysis (e.g., [42]). Dynamic analyses for detecting data races include those based on the lockset algorithm [52], on the happens-before relation [41], or on a combination of the two [43]. A data race is a heuristic indication that a concurrency bug may exist, and does not directly correspond to a notion of program correctness. In our approach, we treat atomic-set, unit-of-work, and unit-for annotations as a specication language that allow a programmer to state his intensions more precisely. Moreover, AS-serializability accounts for multi-location consistency errors that the aforementioned techniques would not be able to nd.
High-level data races may take the form of view inconsistency [4], where memory is read inconsistently, as well as stale-value errors [3,9], where a value read from a shared variable is used beyond the synchronization scope in which it was acquired. Our problematic interleaving scenarios capture these forms of high-level data races, as well as several others, in one framework.
Several notions of serializability and associated detection tools have been presented, including [17,38,51,56]. These correctness criteria ignore relationships that may exist between shared memory locations, and treat all locations as forming one atomic set. Therefore, they may not accurately reect the intentions of the programmer for correct behavior. Atomic-set serializability takes such relationships into account and provides a ner-grained correctness criterion for concurrent systems. For a detailed discussion and comparison of different notions of serializability, see [22].
AS-serializability was proposed by [54]. That work focused on inference of locks. A dynamic violation-detection tool was proposed in [22] to nd errors in legacy code. Hammer et al. also analyzed programs from the Con-Testbenchmark suite, and their dynamic tool found AS-serializability violations in the programs Account, AirlineTcks, BugTester, PingPong, and SoftwareVerificationHW. Empire is a static counterpart of their tool, with the benet that Empire considers multiple executions of a program (symbolically), instead of just one execution like the dynamic tool of Hammer et al. Even though Empire did not nd the AS-serializability violation in BugTester because Empire exhausted its available resources, it did nd AS-serializability violations in shop, which the tool of Hammer et al. did not nd. The remaining two benchmarks were not analyzed by their tool.
Acknowledgments The authors would like to thank the anonymous reviewers for their valuable feedback.
123
Finding concurrency-related bugs 517
References
1. Ahmed, A., Fluet, M., Morrisett, G.: L3: a linear language with locations. Fundam. Inf. 77(4), 397449 (2007)
2. Aiken, A., Foster, J., Kodumal, J., Terauchi, T.: Checking and inferring local non-aliasing. In: PLDI. pp. 129140 (2003)
3. Artho, C., Biere, A., Havelund, K.: Using block-local atomicity to detect stale-value concurrency errors. In: Proceedings of Second International Symposium on Automated Technology for Verication and Analysis (ATVA 2004), Taipei, Taiwan. LNCS, vol. 3299, pp. 150164. Springer, Berlin (2004)
4. Artho, C., Havelund, K., Biere, A.: High-level data races. J. Softw.
Test. Verif. Reliab. (STVR) 13(4), 207227 (2003)
5. Balakrishnan, G., Reps, T.: Recency-abstraction for heap-allocated storage. In: SAS, pp. 221239 (2006)
6. Ball, T., Rajamani, S.: Automatically validating temporal safety properties of interfaces. In: SPIN, pp. 103122 (2001)
7. Boyapati, C., Lee, R., Rinard, M.: Ownership types for safe programming: preventing data races and deadlocks. In OOPSLA, pp. 211230 (2002)
8. Burch, J.R., Clarke, E.M., McMillan, K.L., Dill, D.L., Hwang, L.J.: Symbolic model checking: 1020 states and beyond. In LICS, pages 428239 (1990)
9. Burrows, M., Leino, K.R.M.: Finding stale-value errors in concurrent programs. Conc. Comp. Prac. Exp. 16(12), 11611172 (2004)
10. Chaki, S., Clarke, E., Kidd, N., Reps, T., Touili, T.: Verifying concurrent message-passing C programs with recursive calls. In: TACAS, pp. 334349 (2006)
11. Chase, D., Wegman, M., Zadeck, F.: Analysis of pointers and structures. In: PLDI, pp. 296310 (1990)
12. Cousot, P., Cousot, R.: Systematic design of program analysis frameworks. In: POPL, pp. 269282 (1979)
13. Dillig, I., Dillig, T., Aiken, A.: Fluid updates: beyond strong versus weak updates. In: ESOP, pp. 246266 (2010)
14. Emmi, M., Jhala, R., Kohler, E., Majumdar, R.: Verifying reference counting implementations. In: TACAS, pp. 352367 (2009)
15. Eytani, Y., Havelund, K., Stoller, S.D., Ur, S.: Towards a framework and a benchmark for testing tools for multi-threaded programs. Conc. Comp. Prac. Exp. 19(3), 267279 (2007)
16. Fahndrich, M., DeLine, R.: Adoption and focus: practical linear types for imperative programming. In: PLDI, pp. 1324 (2002)
17. Flanagan, C., Freund, S.N.: Atomizer: a dynamic atomicity checker for multithreaded programs. In: POPL, pp. 256267 (2004)
18. Foster, J., Terauchi, T., Aiken, A.: Flow-sensitive type qualiers.
In: PLDI, pp. 112 (2002)
19. Gopan, D., DiMaio, F., Dor, N., Reps, T.W., Sagiv, S.: Numeric domains with summarized dimensions. In: TACAS, pp. 512529 (2004)
20. Gopan, D., Reps, T.W., Sagiv, S.: A framework for numeric analysis of array operations. In: POPL, pp. 338350 (2005)
21. Gulwani, S., McCloskey, B., Tiwari, A.: Lifting abstract interpreters to quantied logical domains. In: POPL, pp. 235246 (2008)
22. Hammer, C., Dolby, J., Vaziri, M., Tip, F.: Dynamic detection of atomic-set serializability violations. In: ICSE, pp. 231240 (2008)
23. Harris, W., Kidd, N., Chaki, S., Jha, S., Reps, T.: Verifying information ow control over unbounded processes. In: FM, pp. 773789 (2009)
24. Horwitz, S., Pfeiffer, P., Reps, T.: Dependence analysis for pointer variables. In: PLDI, pp. 2840 (1989)
25. IBM: Watson Libraries for Analysis (WALA). http://wala.sourceforge.net/wiki/index.php
Web End =http://wala. http://wala.sourceforge.net/wiki/index.php
Web End =sourceforge.net/wiki/index.php (2009)
26. Jones, N., Muchnick, S.: Flow analysis and optimization of Lisp-like structures. In: Program Flow Analysis: Theory and Applications, pp. 6674. Prentice-Hall, New Jersey (1981)
27. Jones, N., Muchnick, S.: A exible approach to interprocedural data ow analysis and programs with recursive data structures. In: POPL (1982)
28. Kahlon, V., Gupta A.: On the analysis of interacting pushdown systems. In: POPL, pp. 303314 (2007)
29. Kidd, N.: Static Verication of Data-Consistency Properties. PhD thesis, University of Wisconsin-Madison (2009)
30. Kidd, N., Lal, A., Reps, T.: Language strength reduction. In:SAS, pp. 283298 (2008)
31. Kidd, N., Lammich, P., Touili, T., Reps, T.: A decision procedure for detecting atomicity violations for communicating processes with locks. Int. J. Softw. Tools Tech. Transfer 13(1), 3760 (2011)
32. Kidd, N., Reps, T., Dolby, J., Vaziri, M.: Finding concurrency-related bugs using random isolation. In: VMCAI, pp. 198213 (2009a)
33. Kidd, N. A., Lammich, P., Touili, T., Reps, T.: A decision procedure for detecting atomicity violations for communicating processes with locks. In: SPIN, pp. 125142 (2009b)
34. Lahiri, S.K., Qadeer, S.: Verifying properties of well-founded linked lists, pp. 115126 (2006)
35. Lammich, P., Mller-Olm, M.: Conict analysis of programs with procedures, dynamic thread creation, and monitors. In: SAS, pp. 205220 (2008)
36. Lev-Ami, T., Sagiv, M.: TVLA: A system for implementing static analyses. In: SAS, pp. 280301 (2000)
37. Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakesa comprehensive study on real world concurrency bug characteristics. In: ASPLOS, pp. 329339 (2008)
38. Lu, S., Tucek, J., Qin, F., Zhou, Y.: AVIO: Detecting atomicity violations via access-interleaving invariants. In: ASPLOS, pp. 3748 (2006)
39. McMillan, K.: Verication of innite state systems by compositional model checking. In: CHARME, pp. 219234 (1999)
40. Milanova, A., Rountev, A., Ryder, B.G.: Parameterized object sensitivity for points-to analysis for Java. TOSEM 14(1), 141 (2005)
41. Min, S.L., Choi, J.-D.: An efcient cache-based access anomaly detection scheme. In: ASPLOS, pp. 235244 (1991)
42. Naik, M., Aiken, A.: Conditional must not aliasing for static race detection. In: POPL, pp. 327338 (2007)
43. OCallahan, R., Choi, J.-D.: Hybrid dynamic data race detection. In: PPoPP, pp. 167178 (2003)
44. Plevyak, J., Karamcheti, V., Chien, A.: Analysis of dynamic structures for efcient parallel execution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) In: Workshop on Languages and Compilers for Parallel Computing, Portland, OR. Lecture Notes in Computer Science, vol. 768, pp. 3757. Springer, Berlin (1993)
45. Pnueli, A., Xu, J., Zuck, L.: Liveness with (0, 1, )-counter
abstraction. In: CAV, pp. 107122 (2002)46. Qadeer, S., Wu, D.: KISS: keep it simple and sequential. In: PLDI, pp. 1424 (2004)
47. Ramalingam, G.: Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst. 22, 416430 (2000)
48. Rondon, P., Kawaguchi, M., Jhala, R.: Low-level liquid types. In: POPL, pp. 131144 (2010)
49. Sagiv, M., Reps, T., Wilhelm, R.: Solving shape-analysis problems in languages with destructive updating. ACM Trans. Program. Lang. Syst. 20(1), 150 (1998)
50. Sagiv, M., Reps, T., Wilhelm, R.: Parametric shape analysis via 3-valued logic. ACM Trans. Program. Lang. Syst. 24(3), 217298 (2002)
51. Sasturkar, A., Agarwal, R., Wang, L., Stoller, S.D.: Automated type-based analysis of data races and atomicity. In: PPoPP, pp. 8394 (2005)
123
518 N. Kidd et al.
52. Savage, S., Burrows, M., Nelson, G., Sobalvarro, P., Anderson, T.E.: Eraser: A dynamic data race detector for multithreaded programs. Theor. Comput. Sci. 15(4), 391411 (1997)
53. Schwoon, S.: Model-Checking Pushdown Systems. PhD thesis, TUM (2002)
54. Vaziri, M., Tip, F., Dolby, J.: Associating synchronization constraints with data in an object-oriented language. In: POPL, pp. 334345 (2006)
55. Walker, D., Morrisett, J.: Alias types for recursive data structures.
In: Types in Compilation, pp. 177206 (2000)
56. Wang, L., Stoller, S.D.: Accurate and efcient runtime detection of atomicity errors in concurrent programs. In PPoPP, pp. 137146 (2006a)
57. Wang, L., Stoller, S.D.: Runtime analysis for atomicity of multi-threaded programs. IEEE Trans. Softw. Eng. 32, 93110 (2006)
58. Yavuz-Kahveci T., Bultan T.: Automated verication of concurrent linked lists with counters. In: SAS, pp. 6984 (2002)
123
Springer-Verlag 2011