US8832659B2 - Systems and methods for finding concurrency errors - Google Patents

Systems and methods for finding concurrency errors Download PDF

Info

Publication number
US8832659B2
US8832659B2 US13/312,844 US201113312844A US8832659B2 US 8832659 B2 US8832659 B2 US 8832659B2 US 201113312844 A US201113312844 A US 201113312844A US 8832659 B2 US8832659 B2 US 8832659B2
Authority
US
United States
Prior art keywords
communication
context
failed
computing device
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US13/312,844
Other versions
US20120144372A1 (en
Inventor
Luis Ceze
Brandon Lucia
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Washington Center for Commercialization
Original Assignee
University of Washington Center for Commercialization
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Washington Center for Commercialization filed Critical University of Washington Center for Commercialization
Priority to US13/312,844 priority Critical patent/US8832659B2/en
Assigned to UNIVERSITY OF WASHINGTON THROUGH ITS CENTER FOR COMMERCIALIZATION reassignment UNIVERSITY OF WASHINGTON THROUGH ITS CENTER FOR COMMERCIALIZATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LUCIA, BRANDON, CEZE, LUIS
Publication of US20120144372A1 publication Critical patent/US20120144372A1/en
Assigned to NATIONAL SCIENCE FOUNDATION reassignment NATIONAL SCIENCE FOUNDATION CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: University of Washington Center for Commercialization
Priority to US14/464,673 priority patent/US9146737B2/en
Application granted granted Critical
Publication of US8832659B2 publication Critical patent/US8832659B2/en
Assigned to NATIONAL SCIENCE FOUNDATION reassignment NATIONAL SCIENCE FOUNDATION CONFIRMATORY LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: UNIVERSITY OF WASHINGTON
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3688Test management for test execution, e.g. scheduling of test suites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/314Parallel programming languages

Definitions

  • Concurrency errors are difficult problems for developers writing multi-threaded applications to solve. Even expert programmers have difficulty predicting complicated behaviors resulting from the unexpected interaction of operations in different threads.
  • Three exemplary types of concurrency errors are data races, atomicity violations, and ordering violations.
  • Data races occur when two or more memory operations in different threads, at least one of which is a write, access the same memory location and are not properly synchronized.
  • Atomicity violations happen when memory operations assumed to be executed atomically are not enclosed inside a single critical section.
  • Ordering violations happen when memory accesses in different threads happen in an unexpected order.
  • Some particularly difficult concurrency errors to resolve involve multiple variables. Though some efforts have been made to individually detect data races, locking discipline violations, and atomicity violations, what is needed are automated systems and methods for finding general concurrency errors, including multivariable errors and ordering violations.
  • a nontransitory computer-readable medium having computer-executable instructions stored thereon If executed by one or more processors of a computing device, the instructions cause the computing device to perform actions to analyze a set of context-aware communication graphs for debugging.
  • the actions comprise creating a set of aggregate reconstructions based on edges of the set of communication graphs, ranking the aggregate reconstructions in order of likelihood of being associated with a failed execution, and presenting one or more highly ranked aggregate reconstructions.
  • a computer-implemented method of building a context-aware communication graph comprises detecting an access of a memory location by a first instruction of a first thread; updating a context associated with the first thread; and, in response to determining that a second instruction of a second thread different from the first thread was a last thread to write to the memory location, adding an edge to the context-aware communication graph, the edge including the context associated with the first thread, a sink identifying the first instruction, a source identifying the second instruction, and a context associated with the second thread.
  • a computing device for detecting concurrency bugs comprises at least two processing cores, at least two cache memories, a coherence interconnect, and a communication graph data store.
  • Each cache memory is associated with at least one processing core, and is associated with coherence logic.
  • the coherence interconnect is communicatively coupled to each of the cache memories.
  • the coherence logic is configured to add edges to a communication graph stored in the communication graph data store based on coherence messages transmitted on the coherence interconnect.
  • FIG. 1A illustrates an exemplary pseudocode listing that may exhibit concurrency errors if executed by multiple concurrent threads
  • FIG. 1B illustrates memory accesses and other operations during an exemplary multiple-threaded execution of the pseudocode listing of FIG. 1A ;
  • FIG. 2 illustrates an exemplary context-aware communication graph according to various aspects of the present disclosure
  • FIG. 3 is a block diagram that illustrates an exemplary computing device suitable for being updated to collect and analyze communication graphs according to various aspects of the present disclosure
  • FIG. 4 is a block diagram that illustrates one embodiment of a software-instrumented computing device according to various aspects of the present disclosure
  • FIG. 5 is a table that illustrates abbreviations introduced for the ease of discussion
  • FIGS. 6A-6C are tables that illustrate an exemplary embodiment of data stored within a memory location metadata data store according to various aspects of the present disclosure
  • FIG. 7 is a block diagram that illustrates one embodiment of a hardware-instrumented computing device according to various aspects of the present disclosure
  • FIG. 8 is a state diagram that illustrates state changes in an MESI coherence protocol suitable for use with embodiments of the present disclosure
  • FIGS. 9A-9D illustrate various embodiments of metadata stored in association with cache lines according to various aspects of the present disclosure
  • FIG. 10 is a table that illustrates an exemplary communication graph suitable for storage in a communication graph data store according to various aspects of the present disclosure
  • FIG. 11 illustrates an exemplary reconstruction constructed from the communication graph of FIG. 10 ;
  • FIG. 12 illustrates the creation of an aggregate reconstruction according to various aspects of the present disclosure
  • FIG. 13 illustrates one embodiment of a method of finding possible causes of concurrency errors using context-aware communication graphs according to various aspects of the present disclosure
  • FIG. 14 illustrates one embodiment of a procedure for collecting context-aware communication graphs for a set of failed executions and a set of correct executions according to various aspects of the present disclosure
  • FIG. 15 illustrates one embodiment of a procedure for selecting a set of edges correlated with failed behavior according to various aspects of the present disclosure
  • FIG. 16 illustrates one embodiment of a procedure for determining an aggregate reconstruction for each selected edge according to various aspects of the present disclosure
  • FIG. 17 illustrates one embodiment of a procedure for determining a context variation ratio for each aggregate reconstruction according to various aspects of the present disclosure.
  • FIG. 18 illustrates one embodiment of a method of detecting possible causes of concurrency errors using unlabeled executions according to various aspects of the present disclosure.
  • FIG. 1A illustrates an exemplary pseudocode listing that may exhibit concurrency errors if executed by multiple concurrent threads.
  • the figure includes a set of instruction numbers 102 and a C++-like pseudocode listing 104 .
  • the instruction numbers 102 have been provided as letters for sake of discussion in order to disambiguate from numbers used later to indicate timestamps. Certain details have been elided from the pseudocode, such as the details of Instruction A and Instruction H, and the details of the Add( ) function. It may be assumed that the details of Instruction A and Instruction H do not have any effect on inter-thread communication, and that the Add( ) function includes a single memory write operation to the memory location referred to by the “items” variable. Further, it may be assumed for sake of discussion that each line of pseudocode involves at most a single instruction that affects a memory location, though in some embodiments of actual programming languages, many instructions that affect many memory locations may reside in a single line of code.
  • the Spider class includes a concurrency error. Specifically, there is an implicit assumption that Instruction K and Instruction M are included in a single atomic operation. Since there is no protection mechanism in place, multiple threads concurrently executing this code may sometimes experience an attempt to access a null pointer in Instruction N.
  • FIG. 1B illustrates memory accesses and other operations during an exemplary multiple-threaded execution of the pseudocode listing of FIG. 1A .
  • the parenthesized letters correspond to the set of instruction numbers 102 in FIG. 1A , and the sequence of execution proceeds from the top of the illustration to the bottom of the illustration.
  • the list of numbers 115 illustrates an exemplary timestamp for the execution of each instruction. Low integers are used for timestamps for ease of discussion only, and in other embodiments, other types of data may be used for timestamp values.
  • a return value of an RDTSC x86 instruction, a system time value, and/or the like may be used as the timestamp.
  • a return value of an RDTSC x86 instruction, a system time value, and/or the like may be used as the timestamp.
  • Thread one 110 begins by executing Instruction A and Instruction B to initialize the “items” variable and to set the “qsize” variable to “0.” Next, thread one 110 executes Instruction C to add the value “i” to the “items” variable, and executes Instruction D to increment the value of the “qsize” variable from “0” to “1.” Thread two 112 enters the “while” loop at Instruction J, and executes the check at Instruction K to determine whether the size of the Queue object is “0.” At Instruction I, thread two 112 accesses the “qsize” variable, which was last incremented to “1” by thread one 110 . Thread two 112 will then proceed to Instruction M, because the value retrieved from the “qsize” variable was not “0.”
  • thread three 114 proceeds to begin to dequeue the single item from the Queue object.
  • Thread three 114 reads the “qsize” variable, and determines that it may proceed to dequeue an object. Assuming the execution of thread three 114 next proceeds to Instruction G, thread three 114 writes to the “qsize” variable, decrementing it to “0.”
  • execution returns to thread two 112 .
  • thread two 112 calls the Dequeue( ) function, which proceeds to Instruction E.
  • Thread two 112 accesses the “qsize” variable, and determines that it is now “0” (as updated by thread three 114 ).
  • the Dequeue( ) function returns “null” in response to the value of the “qsize” variable, and so the value of “item” in Instruction M is set to “null.”
  • thread two 112 attempts to call the function GetD( ) on a pointer set to “null,” which causes an exception, a system crash, or some other undefined failure depending on the operating environment.
  • a communication graph may be used to represent communication between threads in a multi-threaded environment.
  • a communication graph includes one or more edges that represent communication events. Each edge includes a source node and a sink (or destination) node.
  • the source node of an edge represents a write instruction.
  • the sink node of an edge represents a read instruction or a write instruction that accessed the memory location written by the write instruction of the source node.
  • the communication graph may also include a source node for uninitialized states, thus allowing edges to be created when a memory location first accesses otherwise uninitialized memory locations.
  • Communication graphs may be context-oblivious or context-aware.
  • concurrency errors may lead to edges that are only present in graphs of buggy executions, and so may be useful for detecting some concurrency errors.
  • a context-oblivious communication graph may not include enough information to detect the error.
  • each edge may include information representing a relative order of communication events.
  • FIG. 2 One example of a context-aware communication graph is illustrated in FIG. 2 .
  • the communication graph 200 illustrates communication events that occur during the pseudocode execution illustrated in FIG. 1B , using instruction numbers and code fragments from the code listing in FIG. 1A .
  • the communication graph 200 includes a set of nodes and a set of edges. Each node includes an associated instruction address (illustrated in the top half of each node) and a context (illustrated in the bottom half of each node). Each node is unique, in that no two nodes will represent the same instruction address and context.
  • Each edge is labeled in the figure by an edge number for ease of discussion only, and extends from a source node to a sink node.
  • Each node in the communication graph 200 may be a sink node or a source node for any number of edges. In some embodiments, some nodes stored in the communication graph 200 may not be associated with any edges, such as, for example, when multiple consecutive memory accesses occur within a single thread.
  • the context stored in each node represents a relative order of communication events, and may be any suitable type of information for storing such information.
  • context information may include information uniquely identifying every dynamic memory operation. However, since the size of such a graph would continue to grow over time, it may be desirable to store a smaller set of context information that nonetheless represents sufficient detail to allow for the detection of concurrency bugs.
  • the context information may include a sequence of communication events observed by a thread immediately prior to the execution of a memory instruction regardless of the memory location involved.
  • the communication events may be stored in a FIFO queue of a predetermined length, such that once the queue is full, an oldest entry is discarded before adding a new entry.
  • the predetermined length of the FIFO queue may be any length, such as five elements, more than five elements, or less than five elements. In the embodiment illustrated in FIG. 2 , the predetermined length of the context FIFO queue is five elements.
  • a local read (“LocRd”) is a read of a memory location last written by a remote thread.
  • a local write (“LocWr”) is a write to a memory location last written by a remote thread.
  • a remote read (“RemRd”) is a read of a memory location by a remote thread that was last written by the local thread.
  • a remote write (“RemWr”) is a write to a memory location by a remote thread that was last written by the local thread.
  • the type of event is what is stored in the context FIFO, without the memory location associated with the event.
  • FIG. 2 nine nodes corresponding to the execution trace of FIG. 1B are illustrated.
  • An uninitialized state node 202 is included in the graph 200 before execution begins to serve as a source node for edges that represent initial accesses to memory locations.
  • a first node 204 refers to the first memory access in the execution trace, where thread one 110 executes Instruction A to initialize the “items” memory location.
  • the first node 204 stores the instruction location (Instruction A) and a context, which is currently empty because there were no previous memory accesses.
  • An edge (“Edge 1 ”) is created between the uninitialized state node 202 and the first node 204 .
  • a second node 206 refers to the second memory access in the execution trace, where thread one 110 executes Instruction B to initialize the “qsize” memory location.
  • the second node 204 stores the instruction location (Instruction B) and a context, which currently contains a single element, “LocWr,” representing the local write to the “items” memory location at Instruction A.
  • An edge (“Edge 2 ”) is created between the uninitialized state node 202 and the second node 204 .
  • a third node 208 and a fourth node 210 are added when thread one 110 executes Instruction C and Instruction D to update the “items” memory location and the “qsize” memory location, respectively.
  • the context for the third node 208 is “LocWr, LocWr,” as the memory writes in Instruction A and Instruction B caused two LocWr states to be pushed onto the context FIFO queue for thread one 110
  • the context for the fourth node 210 is “LocWr, LocWr, LocWr,” as the memory write in Instruction C caused another LocWr state to be pushed onto the context FIFO queue for thread one 110 .
  • No edges are created with the third node 208 or the fourth node 210 as a sink, because the last thread to write to the memory location in each case was the local thread, so there was no thread-to-thread communication.
  • a fifth node 212 is created when thread two 112 reads the “qsize” memory location at Instruction I.
  • the context for thread two 112 contains “RemWr, RemWr, RemWr, RemWr,” representing the four remote write operations performed by thread one 110 .
  • An edge (“Edge 3 ”) is created having the fourth node 210 as the source node and the fifth node 212 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112 , the thread currently accessing the “qsize” memory location.
  • a sixth node 214 is created when thread three 114 reads the “qsize” memory location at Instruction I.
  • a remote read event was pushed onto the context FIFO for thread three 114 when thread two 112 read the “qsize” memory location, and so the context stored for the sixth node 214 is “RemRd, RemWr, RemWr, RemWr, RemWr.”
  • An edge (“Edge 4 ”) is created having the fourth node 210 as the source node and the sixth node 214 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114 , the thread currently accessing the “qsize” memory location.
  • a seventh node 216 is created when thread three 114 writes to the “qsize” memory location at Instruction G.
  • a local read event was pushed onto the context FIFO for thread three 114 when it read the “qsize” memory location.
  • the context stored for the seventh node 216 is “LocRd, RemRd, RemWr, RemWr, RemWr.”
  • An edge (“Edge 5 ”) is created having the fourth node 210 as the source node and the seventh node 216 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114 , the thread currently accessing the “qsize” memory location.
  • An eighth node 218 is created when thread two 112 reads from the “qsize” memory location at Instruction E.
  • a remote read event was pushed onto the context FIFO for thread two 112 when thread three 114 read the “qsize” memory location, and a remote write event was pushed onto the context FIFO for thread two when thread three 114 wrote to the “qsize” memory location.
  • Edge 6 is created having the seventh node 216 as the source node and the eighth node 218 as the sink node, because the seventh node 216 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112 , the thread currently accessing the “qsize” memory location.
  • Edge 6 is illustrated as a dashed line, because it is this inter-thread communication that occurs in failed executions.
  • FIG. 3 is a block diagram that illustrates an exemplary computing device suitable for being updated to collect and analyze communication graphs according to various aspects of the present disclosure.
  • the computing device 300 illustrated in FIG. 3 is not configured to collect or analyze communication graphs, but is instead included herein for the sake of further discussion below concerning how to configure a computing device 300 for collecting and/or analyzing communication graphs.
  • the computing device 300 includes main memory 302 , a coherence interconnect 304 , a set of cache memories 312 , 316 , 320 , and a set of processor cores 306 , 308 , 310 .
  • Each processor core 306 , 308 , 310 is associated with one of the cache memories 312 , 316 , 320 .
  • a processor core checks if a valid copy of the data from the memory location is present in its associated cache. If so, the processor core uses the cached copy of the data. If not, the coherence interconnect 304 obtains data from the memory location either from another cache which has a valid copy of the data or from main memory 302 .
  • the coherence interconnect 304 may be a coherence bus, a scalable coherence interface, or any other suitable coherence interconnect technology.
  • the main memory 302 may be any suitable computer-readable medium, such as SRAM, DRAM, flash memory, a magnetic storage medium, and/or the like.
  • each of the cache memories 312 , 316 , 320 includes coherence logic 314 , 318 , 322 that interacts with the coherence interconnect 304 to synchronize the contents of the cache memories.
  • each processor core 306 , 308 , 310 may be located in a separate physical processor, or may be separate processing cores in a single physical processor. Further, one of ordinary skill in the art will also recognize that three processor cores and three cache memories have been illustrated herein for ease of discussion, and that in some embodiments, more or fewer processor cores, and/or more or fewer cache memories, may be used. In addition, in some embodiments, additional levels of cache memory between the illustrated cache and the main memory, or between the illustrated cache and the associated processor core, may be used, multiple processor cores may be associated with a single cache memory, and/or multiple cache memories may be associated with a single processor core. In some embodiments, the computing device 300 may be a desktop computer, a laptop computer, a tablet computing device, a mobile computing device, a server computer, and/or any other suitable computing device having at least one processor that executes more than one thread.
  • FIG. 4 is a block diagram that illustrates one embodiment of a software-instrumented computing device 400 according to various aspects of the present disclosure.
  • the software-instrumented computing device 400 is similar to the computing device 300 illustrated in FIG. 3 , and includes three processor cores 406 , 408 , 410 , three caches 412 , 416 , 420 that each include coherence logic 414 , 418 , 422 , a coherence interconnect 404 , and a main memory 402 .
  • the software-instrumented computing device 400 has been configured with one or more components 454 for collecting context-aware communication graphs.
  • the components 454 include a graph analysis engine 456 , a memory location metadata data store 458 , a thread context data store 460 , and a communication graph data store 462 .
  • the thread context data store 460 is configured to store a context FIFO queue for each thread executed by the computing device 400 .
  • the memory location metadata data store 458 is configured to store metadata for each memory location identifying at least an instruction and thread that last wrote to the memory location.
  • the communication graph data store 462 is configured to store one or more communication graphs built using the information stored in the thread context data store 460 and the memory location metadata data store 458 .
  • the communication graph data store 462 may also store an indication of whether each communication graph is associated with correct behavior or failed behavior.
  • the graph analysis engine 456 is configured to analyze a stored communication graph to find edges to be inspected for errors, as discussed further below.
  • the executable program is instrumented to monitor memory accesses.
  • a binary may be instrumented using the Pin dynamic instrumentation tool by Intel Corporation.
  • Java code may be instrumented using the RoadRunner dynamic analysis framework developed by Cormac Flanagan and Stephen N. Freund.
  • the instrumentation tracks thread contexts, and memory location metadata while the program is executing, and builds the communication graph for storage in the communication graph data store 462 . After collection, the graph analysis engine 456 may be used to analyze the communication graphs.
  • a “data store” may include any suitable device configured to store data for access by a computing device.
  • Each data store may include a relational database, a structured flat file, and/or any other suitable data storage format.
  • the memory location metadata data store 458 may include a fixed-size hash table. To find metadata associated with a particular memory location, the memory location address modulo the hash table size may be used as an index into the hash table. In such an embodiment, a lossy collision resolution policy in which an access may read or overwrite a colliding location's metadata may be tolerated without unduly sacrificing performance if the fixed size of the hash table is large enough, such as having at least 32 million entries.
  • the memory location metadata data store 458 may use a shadow memory feature of an instrumentation utility such as RoadRunner and/or the like to implement a distributed metadata table. Unique identifiers of memory access instructions in the bytecode may be used instead of instruction addresses. Contexts may be stored as integers using bit fields.
  • a communication graph data store 462 may include a chaining hash table. To access the chaining hash table, a hash function may separately sum the entries in the source node context and the sink node context. Each node's sum may then be XORed with the instruction address of the node. The hash key may then be generated by XORing the result of the computation for the source node with the result of the computation for the sink node.
  • a communication graph data store 462 may include an adjacency list and may use hash sets. In such an embodiment, nodes may be indexed by instruction address/context pairs. In some embodiments, other methods or data structures may be used within the communication graph data store 462 , the memory location metadata data store 458 , or any other data store described herein.
  • Each data store may include one or more non-volatile computer-readable storage media, such as a magnetic drive, optical drive, flash drive, and/or the like, and/or may include one or more volatile computer-readable storage media, such as DRAM, SRAM, and/or the like.
  • Each data store may be accessible locally by the computing device, or may be accessible over some type of network.
  • One of ordinary skill in the art will recognize that separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure.
  • partial communication graphs may be stored in separate communication graph data stores 462 that are local to each thread. In such an embodiment, performance may be improved by making addition of edges to the graph a thread-local operation. When such a thread ends, the partial communication graph may be merged into a global communication graph stored in a master communication graph data store 462 .
  • engine refers to logic embodied in hardware or software instructions, which may be written in a programming language, such as C, C++, COBOL, JAVATM, PHP, Perl, C#, and/or the like. An engine may be compiled into executable programs or written in interpreted programming languages. Software engines may be callable from other engines, or from themselves. Generally, the engines described herein refer to logical modules that may be merged with other engines or applications, or may be divided into sub-engines. The engines may be stored on any type of computer-readable medium or computer storage device and be stored on and executed by one or more general purpose computing devices, thus creating a special purpose computing device configured to provide the engine.
  • FIG. 5 is a table that illustrates abbreviations introduced for the ease of discussion.
  • Four context values tracked by some embodiments of the present disclosure are Local Read (“LocRd”), Local Write (“LocWr”), Remote Read (“RemRd”), and Remote Write (“RemWr”).
  • these values may be represented by the integers 0, 1, 2, and 3, respectively, as indicated in the table in FIG. 5 .
  • a context having a fixed length of five elements may be represented by an array of anywhere from zero to five integers. This notation is used below interchangeably with the abbreviated notation for brevity and clarity.
  • the integer values map to the context values listed in FIG. 5 .
  • any other suitable representation may be used for individual context values and/or the elements of a context FIFO queue.
  • a single integer may be used to represent all possible combinations of elements in a context FIFO queue.
  • the integers between 0 and 1023, inclusive may be used to represent every possible context FIFO queue.
  • FIGS. 6A-6C are tables that illustrate an exemplary embodiment of data stored within a memory location metadata data store 458 according to various aspects of the present disclosure.
  • the information stored within a memory location metadata data store 458 represents a previous instruction that wrote to each memory location.
  • the tables store values for a memory location, a last writer thread ID, a last writer instruction, a context, and a timestamp.
  • more or less information than that shown in the figures may be stored in the memory location metadata data store 458 .
  • the timestamp value may not be collected, or more or less context information may be collected.
  • FIGS. 6A-6C contain exemplary information that may be collected during the execution trace illustrated in FIG. 1B .
  • FIG. 6A illustrates a table 602 in the memory location metadata data store 458 after thread one 110 has executed Instruction B at time 2 .
  • the table 602 includes two entries for the two memory locations that were accessed: the “items” location and the “qsize” location. Both locations were last written by thread one 110 .
  • the “items” location was last written by Instruction A at timestamp 1
  • the “qsize” location was last written by Instruction B at timestamp 2 .
  • the context for the write to the “items” memory location was empty, and the context for the write to the “qsize” memory location was “1” (“LocWr,” using the shorthand illustrated in FIG. 5 ).
  • FIG. 6B illustrates the table 602 after thread one 110 has executed Instruction D at time 4 .
  • the entry for the “items” location has been updated to show that the last writer instruction was Instruction C, and that the write occurred at timestamp 3 with a context of “1, 1” (corresponding to “LocWr, LocWr”).
  • the entry for the “qsize” location has been updated to show that the last writer instruction was Instruction D, and that the write occurred at timestamp 4 with a context of “1, 1, 1” (corresponding to “LocWr, LocWr, LocWr”). Since both writes occurred in thread one 110 , the last writer thread ID values for both entries remained the same.
  • FIG. 6C illustrates the table 602 after thread three 114 has executed Instruction G at time 7 .
  • the entry for the “qsize” location has been updated to show that the last writer thread was thread three 114 instead of thread one 110 , that the last writer instruction was Instruction G, and that the write occurred at timestamp 7 with a context of “0, 2, 3, 3, 3” (corresponding to “LocRd, RemRd, RemWr, RemWr, RemWr”).
  • the information in the memory location metadata data store 458 may be consulted to determine whether an edge should be added to a communication graph, and then may be updated if the memory access is a write. For example, upon detecting the read of the “qsize” location by Instruction I at time 5 in thread two 112 , the entry for the “qsize” location is checked, and it is determined that the last writer thread was not thread two 112 (see FIG. 6B ).
  • an edge (Edge 3 ) is added to the communication graph having a source node indicating at least Instruction D and context “1, 1, 1,” and a sink node indicating at least Instruction I and context “3, 3, 3, 3.”
  • the timestamp information for both the source node and the sink node may also be included in the edge.
  • the rest of the communication graph may be similarly constructed during the execution of the code listing.
  • FIG. 7 illustrates one embodiment of a hardware-instrumented computing device 700 according to various aspects of the present disclosure.
  • the hardware-instrumented computing device 700 includes a main memory 702 , a coherence interconnect 704 , three processor cores 706 , 708 , 710 , and three cache memories 712 , 716 , 720 .
  • Each of these components includes similar structure and function to the like components discussed above, except as outlined below.
  • the hardware-instrumented computing device 700 also includes components 754 for storing and analyzing context-aware communication graphs.
  • the components 754 include a graph analysis engine 756 and a communication graph data store 762 that may be similar to the graph analysis engine 456 and communication graph data store 462 discussed above.
  • One difference between the communication graph data store 462 and the communication graph data store 762 is that the communication graph data store 762 may be populated by the additional hardware components discussed below instead of by instrumented software code.
  • Each processor core 706 , 708 , 710 is augmented with a context register 707 , 709 , 711 .
  • the context register 707 , 709 , 711 is configured to store a context FIFO queue, as described above, for a thread currently being executed by the associated processor core 706 , 708 , 710 .
  • each cache line in each cache memory 712 , 716 , 720 is augmented with metadata 713 , 717 , 721 that describes the last instruction to write to the cache line. Details of the cache lines, including the metadata 713 , 717 , 721 , are discussed further below with respect to FIGS. 9A-9D .
  • the cache memories 712 , 716 , 720 in the hardware-instrumented computing device 700 include modified coherence logic 715 , 719 , 723 .
  • the modified coherence logic 715 , 719 , 723 monitors coherence messages sent via the coherence interconnect 704 , and updates the metadata 713 , 717 , 721 , and the context registers 707 , 709 , 711 accordingly.
  • only thread-to-thread communication that results in cache-to-cache transfers or memory-to-cache transfers may be considered for addition to communication graphs.
  • the modified coherence logic 715 , 719 , 723 also adds edges to communication graphs stored in a communication graph data store 762 based on at least the context registers 707 , 709 , 711 and the metadata 713 , 717 , 721 .
  • the modified coherence logic 715 , 719 , 723 is based on a modified MESI coherence protocol.
  • Standard MESI coherence protocols are generally known in the art, and so are not discussed herein at length.
  • FIG. 8 is a state diagram that illustrates state changes in an MESI coherence protocol suitable for use with embodiments of the present disclosure.
  • the states of the MESI coherence protocol describe the status of a cache line, and may be Modified, Exclusive, Shared, or Invalid.
  • the numbers on each edge of the state diagram are associated with the legend at the bottom, which describes the type of operation that causes the change from one state to another.
  • Solid lines represent edges that result from an action taken by a local processor associated with the cache, and dashed lines represent edges that result from a message received via the coherence interconnect 704 indicating an action taken by a remote processor.
  • the modified coherence logic 715 , 719 , 723 may adhere to a normal MESI coherence protocol, but may augment some coherence messages to share information about the instructions involved with the communication. For example, when a read reply is transmitted, the modified coherence logic 715 , 719 , 723 may include the metadata 713 , 717 , 721 of the corresponding cache line to provide information for read-after-write (RAW) communication. As another example, when an invalidate reply or acknowledgement is transmitted, the modified coherence logic 715 , 719 , 723 may include the metadata 713 , 717 , 721 of the cache line that was invalidated to provide information for write-after-write (WAW) communication.
  • WAW write-after-write
  • the modified coherence logic 715 , 719 , 723 monitors traffic on the coherence interconnect 704 , and pushes context events into the context register 707 , 709 , 711 of the associated processor core 706 , 708 , 710 when appropriate.
  • the modified coherence logic 715 , 719 , 723 may push a local read event into the context register 707 , 709 , 711 upon detecting a local read miss, a local write event upon detecting a local write miss or upgrade miss, a remote write event upon detecting an incoming invalidate request, and a remote read event upon detecting an incoming read request.
  • the modified coherence logic 715 , 719 , 723 also updates the communication graph.
  • the modified coherence logic 715 , 719 , 723 may add an edge to the communication graph upon detecting a read reply, an invalidate reply, or a read miss serviced from memory 702 .
  • an edge is added having a source node including information from the metadata included in the read reply, and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened.
  • an edge Upon detecting an invalidate reply, an edge is added having a source node including information from the metadata for the cache line that was invalidated, and a sink node including information relating to the local instruction that caused the invalidate request and the context in which the request originated.
  • an edge Upon detecting a read miss serviced from memory 702 , an edge is added with a source node set to a null value and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened, to indicate that an otherwise uninitialized memory location was accessed.
  • FIGS. 9A-9D illustrate various embodiments of metadata 713 , 717 , 721 stored in association with cache lines according to various aspects of the present disclosure.
  • FIG. 9A illustrates a standard cache line that does not have any metadata 713 , 717 , 721 added. Fields are included for a tag indicating a state in the MESI diagram and for the data itself.
  • FIG. 9B illustrates a modified cache line, in which a metadata field has been added to associate a last writer instruction address with the cache line.
  • FIG. 9C adds a writer context field to the modified cache line of FIG. 9B
  • FIG. 9D adds a timestamp field to the modified cache line of FIG. 9C .
  • the writer context field and/or the timestamp may be optional, though the available graph analysis functionality may change.
  • the information stored in the metadata 713 , 717 , 721 in aggregate, may be similar to the information stored in the memory location metadata data store 458 in the software-instrumented computing device 400 described above, and may be used for similar purposes.
  • the metadata 713 , 717 , 721 or another portion of the associated cache line may include additional information not illustrated here, such as a writer thread ID or any other information, without departing from the scope of the disclosed subject matter.
  • Context-aware communication graphs may be analyzed to determine instructions that are likely associated with failed program behavior. However, since concurrency bugs are difficult to diagnose, it would be helpful if a representation of the behavior of all threads around the instruction could be presented for debugging, and not just the single instruction or the single thread that failed. By adding timestamp data to the nodes of a context-aware communication graph, behavior likely to occur before, during, and after an instruction may be presented for debugging purposes.
  • a reconstruction presents communication nodes that occur before, during, and after an identified edge from a communication graph.
  • FIG. 10 is a table that illustrates an exemplary communication graph suitable for storage in the communication graph data store 762 .
  • Each edge in the communication graph includes a source instruction, a source context, a source timestamp; and a sink instruction, sink context, and sink timestamp.
  • the communication graph illustrated in tabular format in FIG. 10 is similar to the communication graph illustrated and described in FIG. 2 , but has added timestamp information for the source node and the sink node for each edge, when available.
  • FIG. 11 illustrates an exemplary reconstruction 1100 constructed from the communication graph of FIG. 10 .
  • the reconstruction 1100 is based on the edge from source node 1102 to sink node 1104 .
  • a prefix section 1106 , a body 1108 , and a suffix section 1110 are provided to present communication nodes that occurred before, during, and after the communication represented by the edge.
  • the timestamps of nodes in the communication graph are inspected to determine nodes that are appropriate for the prefix section 1106 , body 1108 , and suffix section 1110 of the reconstruction 1100 .
  • the prefix section 1106 and suffix section 1110 may include any number of nodes.
  • the prefix section 1106 and/or suffix section 1110 include at most a predetermined number of nodes. In some embodiments, the predetermined number of nodes may be less than or equal to a maximum length of the context FIFO queues used in the communication graph.
  • FIG. 12 illustrates the creation of an aggregate reconstruction 1210 , which combines each node that appears in the prefix, body, or suffix of more than one execution into a single structure.
  • FIG. 12 is based on a hypothetical different code listing and communication graph than those discussed in the rest of the present disclosure, having nodes labeled from the letter S to the letter Z. The code listing is not illustrated, but the details of the code listing are not necessary to understand the formation of an aggregate reconstruction.
  • the code was executed a plurality of times, and communication graphs were created for each execution. In those executions, four executions were identified that had a particular edge having node Y as the source node and node Z as the sink node. For each execution, a reconstruction 1202 , 1204 , 1206 , 1208 was calculated based on the timestamps of the nodes in the communication graph around node Y and node Z. The reconstructions 1202 , 1204 , 1206 , 1208 are slightly different in each case, reflecting the indeterminate nature of the execution.
  • each reconstruction 1202 , 1204 , 1206 , 1208 are unioned together to form an aggregate prefix, an aggregate body, and an aggregate suffix.
  • Nodes may appear in more than one portion of the aggregate reconstruction, because in some executions, a given node may occur before the sink node or source node, and in other executions, the given node may occur after the sink node or source node.
  • Each node in the aggregate reconstruction 1210 is then assigned a confidence value, which indicates a proportion of executions for which the given node appeared in the given portion of the reconstruction.
  • node U in the body of the aggregate reconstruction 1210 is assigned a confidence value 1212 of 100%, because node U was present in the body of every reconstruction.
  • node S is assigned a confidence value 1214 of 50% in the prefix, and a confidence value 1216 of 50% in the body, because node S appeared in each portion of the reconstructions twice for the four executions.
  • the other confidence values were similarly derived.
  • the nodes in the aggregate reconstruction 1210 are not ordered other than being segregated into prefix, body, and suffix portions, as the timestamps may not be comparable from one execution to another. The use of aggregate reconstructions and confidence values to find likely reconstructions that show failures will be discussed further below.
  • FIG. 13 illustrates one embodiment of a method 1300 of finding possible causes of concurrency errors using context-aware communication graphs.
  • the illustrated method 1300 includes several procedures that are illustrated and described in further detail below.
  • a computing device is configured to collect context-aware communication graph information.
  • the computing device may be a software-instrumented computing device 300 , a hardware-instrumented computing device 700 , or any other suitable computing device configured for collecting context-aware communication graph information, and may be configured as described above.
  • a procedure is performed wherein the computing device collects context-aware communication graphs for a set of failed executions and a set of correct executions.
  • FIG. 14 illustrates one embodiment of a procedure 1400 executed at block 1304 of FIG. 13 for collecting context-aware communication graphs for a set of failed executions and a set of correct executions according to various aspects of the present disclosure.
  • the procedure 1400 proceeds to block 1402 , where a test case experiencing intermittent failures is identified.
  • a software developer may receive reports from users or other testers indicating a particular crash, exception, or other error occurs intermittently during a particular usage scenario. The software developer may then determine one or more reproduction steps to create a test case usable to attempt to recreate the reported error.
  • the software developer may execute generic functionality test cases, such as unit tests, load tests, or performance tests, in an attempt to reproduce the error.
  • the test case or generic functionality test cases may be executed by an automated testing framework, or may be executed by a test user performing a set of reproduction steps.
  • the procedure 1400 then proceeds to a for loop between a start block 1404 and an end block 1410 , wherein the test case is executed and a test case result is determined.
  • the for loop between blocks 1404 and 1410 is executed a predetermined number of times. In other embodiments, the for loop between blocks 1404 and 1410 may be executed until a predetermined number of failed test case results are collected, and/or any other suitable number of times.
  • the procedure 1400 proceeds to block 1406 , where the computing device collects and stores a communication graph during execution of the test case.
  • the computing device may collect and store the communication graph via a suitable technique as described above.
  • the computing device associates the communication graph with a test case result.
  • an automated testing framework may store a failed test case result with the communication graph upon detecting that an error occurred or an expected result was not obtained, and may store a correct test case result with the communication graph upon detecting that an expected result was obtained without any errors.
  • a test user may analyze the results of the test case, and may indicate whether a correct test case result or a failed test case result should be stored with the communication graph.
  • the procedure 1400 proceeds to the for loop end block 1410 and determines whether the for loop should be executed again. If so, the procedure 1400 returns to the for loop start block 1404 . If not, the procedure 1400 proceeds to block 1412 , where the computing device creates a set of failed communication graphs based on the communication graphs having failed test case results. At block 1414 , the computing device creates a set of correct communication graphs based on the communication graphs having correct test case results. In some embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in the communication graph data store 762 or 462 , while in other embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in a separate data store for future processing. The procedure 1400 then proceeds to an end block and terminates.
  • the method 1300 proceeds from block 1304 to block 1306 , where a procedure is performed wherein a graph analysis engine, such as graph analysis engine 456 or 756 , selects a set of edges correlated with failed behavior based on a failed frequency ratio calculated for each edge.
  • a graph analysis engine such as graph analysis engine 456 or 756
  • FIG. 15 illustrates one embodiment of a procedure 1500 executed at block 1306 of FIG. 13 for selecting a set of edges correlated with failed behavior according to various aspects of the present disclosure.
  • the procedure 1500 selects edges from the communication graphs that appear more often in failed communication graphs than in correct communication graphs. From a start block, the procedure 1500 proceeds to block 1502 , where the graph analysis engine determines a failed execution fraction for the edges of the communication graphs based on a number of occurrences of the edges in the set of failed communication graphs and a total number of failed executions.
  • the failed execution fraction for a given edge may be expressed by the following equation, wherein Frac f is the failed execution fraction for the edge, EdgeFreq f is the number of failed communication graphs in which the edge appears, and #Runs f is the total number of failed communication graphs.
  • Frac f EdgeFreq f # ⁇ ⁇ Runs f ( 1 )
  • the graph analysis engine determines a correct execution fraction for the edges of the communication graphs based on a number of occurrences of the edges in the set of correct communication graphs and a total number of correct communication graphs.
  • the correct execution fraction for a given edge may be expressed by the following equation, wherein Frac c is the correct execution fraction for the edge, EdgeFreq c is the number of correct communication graphs in which the edge appears, and #Runs c is the total number correct communication graphs.
  • Frac c EdgeFreq c # ⁇ ⁇ Runs c ( 2 )
  • the graph analysis engine determines a failed frequency ratio for the edges of the communication graphs based on the failed execution fraction and the correct execution fraction.
  • the failed frequency ratio for a given edge may be expressed by the following equation, wherein F is the failed frequency ratio:
  • edges having a Frac c of zero may be particularly likely to be associated with failures, but would cause Function 3 above to be undefined.
  • the Frac c value may be replaced by a value that yields a large value for F.
  • a Frac c of zero may be replaced by the following value:
  • the procedure 1500 then proceeds to block 1508 , where the graph analysis engine selects a set of edges for further analysis based on the failed frequency ratios. In some embodiments, the graph analysis engine may select a predetermined number of edges having the highest failed frequency ratios. In some embodiments, the graph analysis engine may select edges having a failed frequency ratio greater than a threshold value. The procedure 1500 then proceeds to an end block and terminates.
  • FIG. 16 illustrates one embodiment of a procedure 1600 executed at block 1308 of FIG. 13 for determining an aggregate reconstruction for each selected edge according to various aspects of the present disclosure. From a start block, the procedure 1600 proceeds to a for loop between a for loop start block 1602 and a for loop end block 1620 , wherein the for loop executes once for each selected edge to create an aggregate reconstruction for each selected edge.
  • the procedure 1600 proceeds to another for loop between a for loop start block 1604 and a for loop end block 1608 , wherein the for loop executes once for each failed communication graph containing the selected edge to create reconstructions for the selected edge for each failed communication graph.
  • the procedure 1600 proceeds to block 1606 , where the graph analysis engine creates a failed reconstruction based on timestamps of the source node and the sink node of the selected edge in the failed communication graph, as well as timestamps of neighboring nodes in the failed communication graph. As discussed above with respect to FIG.
  • the failed reconstruction may be built by selecting nodes having timestamps between the timestamp of the source node and sink node of the edge, a predetermined number of nodes having timestamps before the timestamp of the source node, and a predetermined number of nodes having timestamps after the timestamp of the sink node.
  • the procedure 1600 then proceeds to the for loop end block 1608 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1604 and calculates a failed reconstruction for another failed communication graph. If not, the procedure 1600 proceeds to block 1610 , where the graph analysis engine creates an aggregate failed reconstruction for the selected edge based on frequencies of nodes in the prefix, body, and suffix of the created failed reconstructions. In some embodiments, the aggregate failed reconstruction for the selected edge may be built using a method similar to the construction of the aggregate reconstruction illustrated and described in FIG. 12 .
  • the procedure 1600 then proceeds to the for loop end block 1620 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1602 and calculates an aggregate reconstruction for the next selected edge. If not, the procedure 1600 proceeds to an end block and terminates.
  • a reconstruction consistency represents a combined confidence value over all nodes in an aggregate reconstruction.
  • nodes having high confidence values occur consistently in the same region of the reconstructions, and are therefore likely to be related to the failed behavior.
  • reconstructions containing many high confidence nodes may reflect a correlation between the co-occurrence of the instructions contained in the nodes in the order shown by the reconstruction and the occurrence of failures.
  • a reconstruction consistency may be determined by combining total average confidence values for the nodes in each reconstruction region. For example, a reconstruction consistency R for a reconstruction having a prefix region P, a body B, and a suffix S, may be represented by the following equation, wherein V(n,r) is the confidence value of node n in region r.
  • a procedure is performed wherein the graph analysis engine determines a difference in interleaving around the edge in failed communication graphs versus correct communication graphs.
  • the difference in interleaving may be represented by a context variation ratio, which is based on a comparison of a number of contexts in which either the source instruction or the sink instruction communicate in failed communication graphs versus correct communication graphs. Large differences between the number of contexts in correct communication graphs compared to failed communication graphs may be correlated with failures.
  • FIG. 17 illustrates one embodiment of a procedure 1700 executed at block 1312 of FIG. 13 for determining a context variation ratio for each aggregate reconstruction.
  • the procedure 1700 proceeds to block 1702 , where the graph analysis engine determines a source instruction and a sink instruction associated with the edge used to create the aggregate reconstruction.
  • the graph analysis engine determines a number of failed source contexts based on a number of nodes in the failed communication graphs that include the source instruction.
  • the failed source contexts may include contexts from any node wherein the source instruction appears, whether the node is a source node or a sink node.
  • the procedure 1700 proceeds to block 1706 , where the graph analysis engine determines a number of failed sink contexts based on a number of nodes in the failed communication graphs that include the sink instruction.
  • the failed sink contexts may include contexts from any node wherein the sink instruction appears.
  • the graph analysis engine adds the number of failed source contexts and the number of failed sink contexts to obtain a number of failed contexts.
  • the number of failed contexts represents a count of the contexts in which either the source instruction or the sink instruction communicates as represented by the failed communication graphs.
  • the procedure 1700 proceeds to block 1710 , where the graph analysis engine determines a number of correct source contexts based on a number of nodes in the correct communication graphs that include the source instruction.
  • the graph analysis engine determines a number of correct sink contexts based on a number of nodes in the correct communication graphs that include the sink instruction.
  • the source contexts and sink contexts include nodes wherein the source instruction or sink instruction, respectively, are present in either a source node or sink node.
  • the procedure 1700 proceeds to block 1714 , where the graph analysis engine adds the number of correct source contexts and the number of correct sink contexts to obtain a number of correct contexts.
  • the graph analysis engine determines a context variation ratio based on the number of failed contexts and the number of correct contexts.
  • the procedure 1700 then proceeds to an end block and terminates.
  • the context variation ratio C may be represented by the following equation, wherein #Ctx f is the number of failed contexts and #Ctx c is the number of correct contexts.
  • the method 1300 proceeds to block 1314 , where the graph analysis engine ranks each aggregate reconstruction based on one or more of the reconstruction consistency, the context variation ratio, and the failed frequency ratio.
  • the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be useful separately or individually to rank aggregate reconstructions for finding aggregate reconstructions that accurately represent failed executions.
  • two or more of the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be combined to rank each aggregate reconstruction to allow the strengths of each score to complement each other.
  • the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be multiplied together to produce a score for ranking each aggregate reconstruction.
  • the graph analysis engine presents one or more highly ranked aggregate reconstructions for debugging.
  • the top ranked aggregate reconstructions are likely to accurately represent failed executions, and so the error should be easily diagnosed by the developer once presented with the top ranked aggregate reconstructions.
  • the method 1300 then proceeds to an end block and terminates.
  • FIG. 18 illustrates one embodiment of a method 1800 of detecting possible causes of concurrency errors using such unlabeled executions. From a start block, the method 1800 proceeds to block 1802 , where a computing device for collecting context-aware communication graph information is configured. As discussed above, the computing device may be a hardware-instrumented computing device 700 , a software-instrumented computing device 400 , or any other suitably configured computing device.
  • the computing device collects context-aware communication graphs for a set of executions. Unlike the method 1300 discussed above, the executions or communication graphs are not labeled as correct or failed.
  • a graph analysis engine calculates an instruction rank that reflects the rarity of contexts in which the instruction executed.
  • the instruction rank for each instruction may be represented by the following equation, wherein X i is the set of contexts in which the instruction executed, F i,x is a number of runs in which the instruction i executed in context x, and F i, * is a total number of times the instruction i executed regardless of context across all runs.
  • rank i ⁇ x ⁇ X i ⁇ F i , x F i , * ( 7 )
  • the equation functions to rank instructions that were executed in rare contexts higher to reflect their increased likelihood of being associated with failed behavior.
  • the graph analysis engine ranks the instructions based on the associated instruction ranks to identify one or more instructions for presenting for debugging.
  • reconstructions and/or aggregate reconstructions may be built as described above based on the highly ranked instruction and/or one or more edges associated with the highly ranked instruction to make debugging easier.
  • the method 1800 then proceeds to an end block and terminates.

Abstract

Systems and methods for detecting concurrency bugs are provided. In some embodiments, context-aware communication graphs that represent inter-thread communication are collected during test runs, and may be labeled according to whether the test run was correct or failed. Graph edges that are likely to be associated with failed behavior are determined, and probable reconstructions of failed behavior are constructed to assist in debugging. In some embodiments, software instrumentation is used to collect the communication graphs. In some embodiments, hardware configured to collect the communication graphs is provided.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This application claims the benefit of U.S. Provisional Application No. 61/420,185, filed Dec. 6, 2010, which is incorporated herein by reference in its entirety for all purposes.
STATEMENT OF GOVERNMENT LICENSE RIGHTS
This invention was made with government support under CNS-0720593 and CCF-0930512, awarded by the National Science Foundation. The government has certain rights in the invention.
BACKGROUND
Concurrency errors are difficult problems for developers writing multi-threaded applications to solve. Even expert programmers have difficulty predicting complicated behaviors resulting from the unexpected interaction of operations in different threads. Three exemplary types of concurrency errors are data races, atomicity violations, and ordering violations. Data races occur when two or more memory operations in different threads, at least one of which is a write, access the same memory location and are not properly synchronized. Atomicity violations happen when memory operations assumed to be executed atomically are not enclosed inside a single critical section. Ordering violations happen when memory accesses in different threads happen in an unexpected order. Some particularly difficult concurrency errors to resolve involve multiple variables. Though some efforts have been made to individually detect data races, locking discipline violations, and atomicity violations, what is needed are automated systems and methods for finding general concurrency errors, including multivariable errors and ordering violations.
SUMMARY
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In some embodiments, a nontransitory computer-readable medium having computer-executable instructions stored thereon is provided. If executed by one or more processors of a computing device, the instructions cause the computing device to perform actions to analyze a set of context-aware communication graphs for debugging. The actions comprise creating a set of aggregate reconstructions based on edges of the set of communication graphs, ranking the aggregate reconstructions in order of likelihood of being associated with a failed execution, and presenting one or more highly ranked aggregate reconstructions.
In some embodiments, a computer-implemented method of building a context-aware communication graph is provided. The method comprises detecting an access of a memory location by a first instruction of a first thread; updating a context associated with the first thread; and, in response to determining that a second instruction of a second thread different from the first thread was a last thread to write to the memory location, adding an edge to the context-aware communication graph, the edge including the context associated with the first thread, a sink identifying the first instruction, a source identifying the second instruction, and a context associated with the second thread.
In some embodiments, a computing device for detecting concurrency bugs is provided. The device comprises at least two processing cores, at least two cache memories, a coherence interconnect, and a communication graph data store. Each cache memory is associated with at least one processing core, and is associated with coherence logic. The coherence interconnect is communicatively coupled to each of the cache memories. The coherence logic is configured to add edges to a communication graph stored in the communication graph data store based on coherence messages transmitted on the coherence interconnect.
DESCRIPTION OF THE DRAWINGS
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
FIG. 1A illustrates an exemplary pseudocode listing that may exhibit concurrency errors if executed by multiple concurrent threads;
FIG. 1B illustrates memory accesses and other operations during an exemplary multiple-threaded execution of the pseudocode listing of FIG. 1A;
FIG. 2 illustrates an exemplary context-aware communication graph according to various aspects of the present disclosure;
FIG. 3 is a block diagram that illustrates an exemplary computing device suitable for being updated to collect and analyze communication graphs according to various aspects of the present disclosure;
FIG. 4 is a block diagram that illustrates one embodiment of a software-instrumented computing device according to various aspects of the present disclosure;
FIG. 5 is a table that illustrates abbreviations introduced for the ease of discussion;
FIGS. 6A-6C are tables that illustrate an exemplary embodiment of data stored within a memory location metadata data store according to various aspects of the present disclosure;
FIG. 7 is a block diagram that illustrates one embodiment of a hardware-instrumented computing device according to various aspects of the present disclosure;
FIG. 8 is a state diagram that illustrates state changes in an MESI coherence protocol suitable for use with embodiments of the present disclosure;
FIGS. 9A-9D illustrate various embodiments of metadata stored in association with cache lines according to various aspects of the present disclosure;
FIG. 10 is a table that illustrates an exemplary communication graph suitable for storage in a communication graph data store according to various aspects of the present disclosure;
FIG. 11 illustrates an exemplary reconstruction constructed from the communication graph of FIG. 10;
FIG. 12 illustrates the creation of an aggregate reconstruction according to various aspects of the present disclosure;
FIG. 13 illustrates one embodiment of a method of finding possible causes of concurrency errors using context-aware communication graphs according to various aspects of the present disclosure;
FIG. 14 illustrates one embodiment of a procedure for collecting context-aware communication graphs for a set of failed executions and a set of correct executions according to various aspects of the present disclosure;
FIG. 15 illustrates one embodiment of a procedure for selecting a set of edges correlated with failed behavior according to various aspects of the present disclosure;
FIG. 16 illustrates one embodiment of a procedure for determining an aggregate reconstruction for each selected edge according to various aspects of the present disclosure;
FIG. 17 illustrates one embodiment of a procedure for determining a context variation ratio for each aggregate reconstruction according to various aspects of the present disclosure; and
FIG. 18 illustrates one embodiment of a method of detecting possible causes of concurrency errors using unlabeled executions according to various aspects of the present disclosure.
DETAILED DESCRIPTION
FIG. 1A illustrates an exemplary pseudocode listing that may exhibit concurrency errors if executed by multiple concurrent threads. The figure includes a set of instruction numbers 102 and a C++-like pseudocode listing 104. The instruction numbers 102 have been provided as letters for sake of discussion in order to disambiguate from numbers used later to indicate timestamps. Certain details have been elided from the pseudocode, such as the details of Instruction A and Instruction H, and the details of the Add( ) function. It may be assumed that the details of Instruction A and Instruction H do not have any effect on inter-thread communication, and that the Add( ) function includes a single memory write operation to the memory location referred to by the “items” variable. Further, it may be assumed for sake of discussion that each line of pseudocode involves at most a single instruction that affects a memory location, though in some embodiments of actual programming languages, many instructions that affect many memory locations may reside in a single line of code.
Though it may be difficult to find through a mere inspection of the code listing 104, the Spider class includes a concurrency error. Specifically, there is an implicit assumption that Instruction K and Instruction M are included in a single atomic operation. Since there is no protection mechanism in place, multiple threads concurrently executing this code may sometimes experience an attempt to access a null pointer in Instruction N.
FIG. 1B illustrates memory accesses and other operations during an exemplary multiple-threaded execution of the pseudocode listing of FIG. 1A. Three threads—thread one 110, thread two 112, and thread three 114—are used to execute the pseudocode listing 104. The parenthesized letters correspond to the set of instruction numbers 102 in FIG. 1A, and the sequence of execution proceeds from the top of the illustration to the bottom of the illustration. The list of numbers 115 illustrates an exemplary timestamp for the execution of each instruction. Low integers are used for timestamps for ease of discussion only, and in other embodiments, other types of data may be used for timestamp values. For example, in some embodiments, a return value of an RDTSC x86 instruction, a system time value, and/or the like may be used as the timestamp. As with FIG. 1A, certain details that do not have an effect on the concurrency error or the memory accesses have been elided from the figure for ease of discussion.
Thread one 110 begins by executing Instruction A and Instruction B to initialize the “items” variable and to set the “qsize” variable to “0.” Next, thread one 110 executes Instruction C to add the value “i” to the “items” variable, and executes Instruction D to increment the value of the “qsize” variable from “0” to “1.” Thread two 112 enters the “while” loop at Instruction J, and executes the check at Instruction K to determine whether the size of the Queue object is “0.” At Instruction I, thread two 112 accesses the “qsize” variable, which was last incremented to “1” by thread one 110. Thread two 112 will then proceed to Instruction M, because the value retrieved from the “qsize” variable was not “0.”
Next, thread three 114 proceeds to begin to dequeue the single item from the Queue object. At Instruction I, thread three 114 reads the “qsize” variable, and determines that it may proceed to dequeue an object. Assuming the execution of thread three 114 next proceeds to Instruction G, thread three 114 writes to the “qsize” variable, decrementing it to “0.”
Next, execution returns to thread two 112. At Instruction M, thread two 112 calls the Dequeue( ) function, which proceeds to Instruction E. At Instruction E, thread two 112 accesses the “qsize” variable, and determines that it is now “0” (as updated by thread three 114). At Instruction F, the Dequeue( ) function returns “null” in response to the value of the “qsize” variable, and so the value of “item” in Instruction M is set to “null.” At Instruction N, thread two 112 attempts to call the function GetD( ) on a pointer set to “null,” which causes an exception, a system crash, or some other undefined failure depending on the operating environment.
Communication Graphs
A communication graph may be used to represent communication between threads in a multi-threaded environment. In some embodiments, a communication graph includes one or more edges that represent communication events. Each edge includes a source node and a sink (or destination) node. The source node of an edge represents a write instruction. The sink node of an edge represents a read instruction or a write instruction that accessed the memory location written by the write instruction of the source node. In some embodiments, the communication graph may also include a source node for uninitialized states, thus allowing edges to be created when a memory location first accesses otherwise uninitialized memory locations.
Communication graphs may be context-oblivious or context-aware. In a context-oblivious communication graph, concurrency errors may lead to edges that are only present in graphs of buggy executions, and so may be useful for detecting some concurrency errors. However, if a given edge may be present in both failed executions and correct executions, such as in an interleaving error affecting multiple variables, a context-oblivious communication graph may not include enough information to detect the error.
In a context-aware communication graph, each edge may include information representing a relative order of communication events. One example of a context-aware communication graph is illustrated in FIG. 2. The communication graph 200 illustrates communication events that occur during the pseudocode execution illustrated in FIG. 1B, using instruction numbers and code fragments from the code listing in FIG. 1A. The communication graph 200 includes a set of nodes and a set of edges. Each node includes an associated instruction address (illustrated in the top half of each node) and a context (illustrated in the bottom half of each node). Each node is unique, in that no two nodes will represent the same instruction address and context. Each edge is labeled in the figure by an edge number for ease of discussion only, and extends from a source node to a sink node. Each node in the communication graph 200 may be a sink node or a source node for any number of edges. In some embodiments, some nodes stored in the communication graph 200 may not be associated with any edges, such as, for example, when multiple consecutive memory accesses occur within a single thread.
For ease of discussion, the description herein only analyzes the memory locations denoted by the variables “qsize” and “items,” so that each line of pseudocode may be considered to include a single instruction that affects a single memory location. Also, the description treats the variable “items” and the Add( ) function that affects it as affecting a single memory location. One of ordinary skill in the art will understand that, in some embodiments, context-aware communication graphs may describe every memory access separately, including multiple memory accesses for a single line of code.
The context stored in each node represents a relative order of communication events, and may be any suitable type of information for storing such information. In some embodiments, context information may include information uniquely identifying every dynamic memory operation. However, since the size of such a graph would continue to grow over time, it may be desirable to store a smaller set of context information that nonetheless represents sufficient detail to allow for the detection of concurrency bugs.
In some embodiments, the context information may include a sequence of communication events observed by a thread immediately prior to the execution of a memory instruction regardless of the memory location involved. The communication events may be stored in a FIFO queue of a predetermined length, such that once the queue is full, an oldest entry is discarded before adding a new entry. In some embodiments, the predetermined length of the FIFO queue may be any length, such as five elements, more than five elements, or less than five elements. In the embodiment illustrated in FIG. 2, the predetermined length of the context FIFO queue is five elements.
In some embodiments, four types of communication events may be observed by a local thread. A local read (“LocRd”) is a read of a memory location last written by a remote thread. A local write (“LocWr”) is a write to a memory location last written by a remote thread. A remote read (“RemRd”) is a read of a memory location by a remote thread that was last written by the local thread. A remote write (“RemWr”) is a write to a memory location by a remote thread that was last written by the local thread. The type of event is what is stored in the context FIFO, without the memory location associated with the event.
In FIG. 2, nine nodes corresponding to the execution trace of FIG. 1B are illustrated. An uninitialized state node 202 is included in the graph 200 before execution begins to serve as a source node for edges that represent initial accesses to memory locations. A first node 204 refers to the first memory access in the execution trace, where thread one 110 executes Instruction A to initialize the “items” memory location. The first node 204 stores the instruction location (Instruction A) and a context, which is currently empty because there were no previous memory accesses. An edge (“Edge 1”) is created between the uninitialized state node 202 and the first node 204.
A second node 206 refers to the second memory access in the execution trace, where thread one 110 executes Instruction B to initialize the “qsize” memory location. The second node 204 stores the instruction location (Instruction B) and a context, which currently contains a single element, “LocWr,” representing the local write to the “items” memory location at Instruction A. An edge (“Edge 2”) is created between the uninitialized state node 202 and the second node 204.
Two more nodes, a third node 208 and a fourth node 210, are added when thread one 110 executes Instruction C and Instruction D to update the “items” memory location and the “qsize” memory location, respectively. The context for the third node 208 is “LocWr, LocWr,” as the memory writes in Instruction A and Instruction B caused two LocWr states to be pushed onto the context FIFO queue for thread one 110, and the context for the fourth node 210 is “LocWr, LocWr, LocWr,” as the memory write in Instruction C caused another LocWr state to be pushed onto the context FIFO queue for thread one 110. No edges are created with the third node 208 or the fourth node 210 as a sink, because the last thread to write to the memory location in each case was the local thread, so there was no thread-to-thread communication.
A fifth node 212 is created when thread two 112 reads the “qsize” memory location at Instruction I. The context for thread two 112 contains “RemWr, RemWr, RemWr, RemWr,” representing the four remote write operations performed by thread one 110. An edge (“Edge 3”) is created having the fourth node 210 as the source node and the fifth node 212 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112, the thread currently accessing the “qsize” memory location.
A sixth node 214 is created when thread three 114 reads the “qsize” memory location at Instruction I. A remote read event was pushed onto the context FIFO for thread three 114 when thread two 112 read the “qsize” memory location, and so the context stored for the sixth node 214 is “RemRd, RemWr, RemWr, RemWr, RemWr.” An edge (“Edge 4”) is created having the fourth node 210 as the source node and the sixth node 214 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114, the thread currently accessing the “qsize” memory location. One should note that, in a context-oblivious communication graph, the interleaving between thread one 110 and thread two 112 and between thread one 110 and thread three 114 would be lost, because both memory reads would be represented by a single edge and would not be distinguishable by context.
A seventh node 216 is created when thread three 114 writes to the “qsize” memory location at Instruction G. A local read event was pushed onto the context FIFO for thread three 114 when it read the “qsize” memory location. The oldest element in the context FIFO, the remote read event added when thread one 110 executed Instruction A, was dropped from the context FIFO because the context FIFO was full before the local read event was pushed onto the context FIFO. Hence, the context stored for the seventh node 216 is “LocRd, RemRd, RemWr, RemWr, RemWr.” An edge (“Edge 5”) is created having the fourth node 210 as the source node and the seventh node 216 as the sink node, because the fourth node 210 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread three 114, the thread currently accessing the “qsize” memory location.
An eighth node 218 is created when thread two 112 reads from the “qsize” memory location at Instruction E. A remote read event was pushed onto the context FIFO for thread two 112 when thread three 114 read the “qsize” memory location, and a remote write event was pushed onto the context FIFO for thread two when thread three 114 wrote to the “qsize” memory location. The two oldest elements were removed from the full context FIFO, and so the context stored in the eighth node 218 is “RemWr, RemRd, LocRd, RemWr, RemWr.” An edge (“Edge 6”) is created having the seventh node 216 as the source node and the eighth node 218 as the sink node, because the seventh node 216 represents the last write operation to the “qsize” memory location, and because the last thread to write to the “qsize” memory location was not thread two 112, the thread currently accessing the “qsize” memory location. Edge 6 is illustrated as a dashed line, because it is this inter-thread communication that occurs in failed executions. Systems and methods for determining that Edge 6 is identified as being associated with a concurrency error are discussed in further detail below.
Collecting Communication Graphs
FIG. 3 is a block diagram that illustrates an exemplary computing device suitable for being updated to collect and analyze communication graphs according to various aspects of the present disclosure. The computing device 300 illustrated in FIG. 3 is not configured to collect or analyze communication graphs, but is instead included herein for the sake of further discussion below concerning how to configure a computing device 300 for collecting and/or analyzing communication graphs. The computing device 300 includes main memory 302, a coherence interconnect 304, a set of cache memories 312, 316, 320, and a set of processor cores 306, 308, 310. Each processor core 306, 308, 310 is associated with one of the cache memories 312, 316, 320.
One of ordinary skill in the art will recognize that, in general, to access data from a memory location in main memory 302, a processor core checks if a valid copy of the data from the memory location is present in its associated cache. If so, the processor core uses the cached copy of the data. If not, the coherence interconnect 304 obtains data from the memory location either from another cache which has a valid copy of the data or from main memory 302. In some embodiments, the coherence interconnect 304 may be a coherence bus, a scalable coherence interface, or any other suitable coherence interconnect technology. In some embodiments, the main memory 302 may be any suitable computer-readable medium, such as SRAM, DRAM, flash memory, a magnetic storage medium, and/or the like. In some embodiments, each of the cache memories 312, 316, 320 includes coherence logic 314, 318, 322 that interacts with the coherence interconnect 304 to synchronize the contents of the cache memories.
One of ordinary skill in the art will recognize that each processor core 306, 308, 310 may be located in a separate physical processor, or may be separate processing cores in a single physical processor. Further, one of ordinary skill in the art will also recognize that three processor cores and three cache memories have been illustrated herein for ease of discussion, and that in some embodiments, more or fewer processor cores, and/or more or fewer cache memories, may be used. In addition, in some embodiments, additional levels of cache memory between the illustrated cache and the main memory, or between the illustrated cache and the associated processor core, may be used, multiple processor cores may be associated with a single cache memory, and/or multiple cache memories may be associated with a single processor core. In some embodiments, the computing device 300 may be a desktop computer, a laptop computer, a tablet computing device, a mobile computing device, a server computer, and/or any other suitable computing device having at least one processor that executes more than one thread.
Two ways of collecting context-aware communication graphs include adding software-based instrumentation that monitors memory accesses within the executable program to be studied, and adding hardware-based features that monitor memory accesses within an uninstrumented executable program. FIG. 4 is a block diagram that illustrates one embodiment of a software-instrumented computing device 400 according to various aspects of the present disclosure. The software-instrumented computing device 400 is similar to the computing device 300 illustrated in FIG. 3, and includes three processor cores 406, 408, 410, three caches 412, 416, 420 that each include coherence logic 414, 418, 422, a coherence interconnect 404, and a main memory 402. However, the software-instrumented computing device 400 has been configured with one or more components 454 for collecting context-aware communication graphs.
In some embodiments, the components 454 include a graph analysis engine 456, a memory location metadata data store 458, a thread context data store 460, and a communication graph data store 462. The thread context data store 460 is configured to store a context FIFO queue for each thread executed by the computing device 400. The memory location metadata data store 458 is configured to store metadata for each memory location identifying at least an instruction and thread that last wrote to the memory location. The communication graph data store 462 is configured to store one or more communication graphs built using the information stored in the thread context data store 460 and the memory location metadata data store 458. The communication graph data store 462 may also store an indication of whether each communication graph is associated with correct behavior or failed behavior. The graph analysis engine 456 is configured to analyze a stored communication graph to find edges to be inspected for errors, as discussed further below.
In some embodiments, to analyze an executable program using the computing device 300, the executable program is instrumented to monitor memory accesses. For example, in some embodiments, a binary may be instrumented using the Pin dynamic instrumentation tool by Intel Corporation. As another example, in some embodiments, Java code may be instrumented using the RoadRunner dynamic analysis framework developed by Cormac Flanagan and Stephen N. Freund. The instrumentation tracks thread contexts, and memory location metadata while the program is executing, and builds the communication graph for storage in the communication graph data store 462. After collection, the graph analysis engine 456 may be used to analyze the communication graphs.
As understood by one of ordinary skill in the art, a “data store” may include any suitable device configured to store data for access by a computing device. Each data store may include a relational database, a structured flat file, and/or any other suitable data storage format.
For example, in some embodiments, the memory location metadata data store 458 may include a fixed-size hash table. To find metadata associated with a particular memory location, the memory location address modulo the hash table size may be used as an index into the hash table. In such an embodiment, a lossy collision resolution policy in which an access may read or overwrite a colliding location's metadata may be tolerated without unduly sacrificing performance if the fixed size of the hash table is large enough, such as having at least 32 million entries. As another example, in some embodiments that use a language such as Java and/or the like, the memory location metadata data store 458 may use a shadow memory feature of an instrumentation utility such as RoadRunner and/or the like to implement a distributed metadata table. Unique identifiers of memory access instructions in the bytecode may be used instead of instruction addresses. Contexts may be stored as integers using bit fields.
As yet another example, in some embodiments, a communication graph data store 462 may include a chaining hash table. To access the chaining hash table, a hash function may separately sum the entries in the source node context and the sink node context. Each node's sum may then be XORed with the instruction address of the node. The hash key may then be generated by XORing the result of the computation for the source node with the result of the computation for the sink node. As still another example, in some embodiments, a communication graph data store 462 may include an adjacency list and may use hash sets. In such an embodiment, nodes may be indexed by instruction address/context pairs. In some embodiments, other methods or data structures may be used within the communication graph data store 462, the memory location metadata data store 458, or any other data store described herein.
Each data store may include one or more non-volatile computer-readable storage media, such as a magnetic drive, optical drive, flash drive, and/or the like, and/or may include one or more volatile computer-readable storage media, such as DRAM, SRAM, and/or the like. Each data store may be accessible locally by the computing device, or may be accessible over some type of network. One of ordinary skill in the art will recognize that separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure. For example, in some embodiments, partial communication graphs may be stored in separate communication graph data stores 462 that are local to each thread. In such an embodiment, performance may be improved by making addition of edges to the graph a thread-local operation. When such a thread ends, the partial communication graph may be merged into a global communication graph stored in a master communication graph data store 462.
As understood by one of ordinary skill in the art, the term “engine” as used herein refers to logic embodied in hardware or software instructions, which may be written in a programming language, such as C, C++, COBOL, JAVA™, PHP, Perl, C#, and/or the like. An engine may be compiled into executable programs or written in interpreted programming languages. Software engines may be callable from other engines, or from themselves. Generally, the engines described herein refer to logical modules that may be merged with other engines or applications, or may be divided into sub-engines. The engines may be stored on any type of computer-readable medium or computer storage device and be stored on and executed by one or more general purpose computing devices, thus creating a special purpose computing device configured to provide the engine.
FIG. 5 is a table that illustrates abbreviations introduced for the ease of discussion. Four context values tracked by some embodiments of the present disclosure are Local Read (“LocRd”), Local Write (“LocWr”), Remote Read (“RemRd”), and Remote Write (“RemWr”). In the embodiments illustrated herein, these values may be represented by the integers 0, 1, 2, and 3, respectively, as indicated in the table in FIG. 5. Accordingly, a context having a fixed length of five elements may be represented by an array of anywhere from zero to five integers. This notation is used below interchangeably with the abbreviated notation for brevity and clarity. One of ordinary skill in the art will understand that the integer values map to the context values listed in FIG. 5. One of ordinary skill in the art will also understand that, in some embodiments, any other suitable representation may be used for individual context values and/or the elements of a context FIFO queue. For example, in some embodiments, a single integer may be used to represent all possible combinations of elements in a context FIFO queue. In an embodiment having four context values and a queue length of five, the integers between 0 and 1023, inclusive, may be used to represent every possible context FIFO queue.
FIGS. 6A-6C are tables that illustrate an exemplary embodiment of data stored within a memory location metadata data store 458 according to various aspects of the present disclosure. The information stored within a memory location metadata data store 458 represents a previous instruction that wrote to each memory location. In the illustrated embodiment, the tables store values for a memory location, a last writer thread ID, a last writer instruction, a context, and a timestamp. In some embodiments, more or less information than that shown in the figures may be stored in the memory location metadata data store 458. For example, in some embodiments, the timestamp value may not be collected, or more or less context information may be collected.
FIGS. 6A-6C contain exemplary information that may be collected during the execution trace illustrated in FIG. 1B. FIG. 6A illustrates a table 602 in the memory location metadata data store 458 after thread one 110 has executed Instruction B at time 2. The table 602 includes two entries for the two memory locations that were accessed: the “items” location and the “qsize” location. Both locations were last written by thread one 110. The “items” location was last written by Instruction A at timestamp 1, and the “qsize” location was last written by Instruction B at timestamp 2. As discussed above, the context for the write to the “items” memory location was empty, and the context for the write to the “qsize” memory location was “1” (“LocWr,” using the shorthand illustrated in FIG. 5).
FIG. 6B illustrates the table 602 after thread one 110 has executed Instruction D at time 4. The entry for the “items” location has been updated to show that the last writer instruction was Instruction C, and that the write occurred at timestamp 3 with a context of “1, 1” (corresponding to “LocWr, LocWr”). The entry for the “qsize” location has been updated to show that the last writer instruction was Instruction D, and that the write occurred at timestamp 4 with a context of “1, 1, 1” (corresponding to “LocWr, LocWr, LocWr”). Since both writes occurred in thread one 110, the last writer thread ID values for both entries remained the same.
FIG. 6C illustrates the table 602 after thread three 114 has executed Instruction G at time 7. The entry for the “qsize” location has been updated to show that the last writer thread was thread three 114 instead of thread one 110, that the last writer instruction was Instruction G, and that the write occurred at timestamp 7 with a context of “0, 2, 3, 3, 3” (corresponding to “LocRd, RemRd, RemWr, RemWr, RemWr”).
Upon detecting a memory access, the information in the memory location metadata data store 458 may be consulted to determine whether an edge should be added to a communication graph, and then may be updated if the memory access is a write. For example, upon detecting the read of the “qsize” location by Instruction I at time 5 in thread two 112, the entry for the “qsize” location is checked, and it is determined that the last writer thread was not thread two 112 (see FIG. 6B). Accordingly, an edge (Edge 3) is added to the communication graph having a source node indicating at least Instruction D and context “1, 1, 1,” and a sink node indicating at least Instruction I and context “3, 3, 3, 3.” In some embodiments, the timestamp information for both the source node and the sink node may also be included in the edge. One of ordinary skill in the art will understand that the rest of the communication graph may be similarly constructed during the execution of the code listing.
FIG. 7 illustrates one embodiment of a hardware-instrumented computing device 700 according to various aspects of the present disclosure. As with the other computing devices illustrated and discussed above with respect to FIGS. 3 and 4, the hardware-instrumented computing device 700 includes a main memory 702, a coherence interconnect 704, three processor cores 706, 708, 710, and three cache memories 712, 716, 720. Each of these components includes similar structure and function to the like components discussed above, except as outlined below. The hardware-instrumented computing device 700 also includes components 754 for storing and analyzing context-aware communication graphs. The components 754 include a graph analysis engine 756 and a communication graph data store 762 that may be similar to the graph analysis engine 456 and communication graph data store 462 discussed above. One difference between the communication graph data store 462 and the communication graph data store 762 is that the communication graph data store 762 may be populated by the additional hardware components discussed below instead of by instrumented software code.
Each processor core 706, 708, 710 is augmented with a context register 707, 709, 711. The context register 707, 709, 711 is configured to store a context FIFO queue, as described above, for a thread currently being executed by the associated processor core 706, 708, 710. Further, each cache line in each cache memory 712, 716, 720 is augmented with metadata 713, 717, 721 that describes the last instruction to write to the cache line. Details of the cache lines, including the metadata 713, 717, 721, are discussed further below with respect to FIGS. 9A-9D.
Whereas the cache memories illustrated in FIGS. 3 and 4 included unaltered coherence logic, the cache memories 712, 716, 720 in the hardware-instrumented computing device 700 include modified coherence logic 715, 719, 723. The modified coherence logic 715, 719, 723 monitors coherence messages sent via the coherence interconnect 704, and updates the metadata 713, 717, 721, and the context registers 707, 709, 711 accordingly. In some embodiments, only thread-to-thread communication that results in cache-to-cache transfers or memory-to-cache transfers may be considered for addition to communication graphs. While some thread-to-thread communication that happens between multiple threads on the same processor core may not be monitored in these embodiments, the distribution of threads among processing cores usually provides effective results. The modified coherence logic 715, 719, 723 also adds edges to communication graphs stored in a communication graph data store 762 based on at least the context registers 707, 709, 711 and the metadata 713, 717, 721.
In some embodiments, the modified coherence logic 715, 719, 723 is based on a modified MESI coherence protocol. Standard MESI coherence protocols are generally known in the art, and so are not discussed herein at length. However, FIG. 8 is a state diagram that illustrates state changes in an MESI coherence protocol suitable for use with embodiments of the present disclosure. As known to one of ordinary skill in the art, the states of the MESI coherence protocol describe the status of a cache line, and may be Modified, Exclusive, Shared, or Invalid. The numbers on each edge of the state diagram are associated with the legend at the bottom, which describes the type of operation that causes the change from one state to another. Solid lines represent edges that result from an action taken by a local processor associated with the cache, and dashed lines represent edges that result from a message received via the coherence interconnect 704 indicating an action taken by a remote processor.
The modified coherence logic 715, 719, 723 may adhere to a normal MESI coherence protocol, but may augment some coherence messages to share information about the instructions involved with the communication. For example, when a read reply is transmitted, the modified coherence logic 715, 719, 723 may include the metadata 713, 717, 721 of the corresponding cache line to provide information for read-after-write (RAW) communication. As another example, when an invalidate reply or acknowledgement is transmitted, the modified coherence logic 715, 719, 723 may include the metadata 713, 717, 721 of the cache line that was invalidated to provide information for write-after-write (WAW) communication.
The modified coherence logic 715, 719, 723 monitors traffic on the coherence interconnect 704, and pushes context events into the context register 707, 709, 711 of the associated processor core 706, 708, 710 when appropriate. For example, the modified coherence logic 715, 719, 723 may push a local read event into the context register 707, 709, 711 upon detecting a local read miss, a local write event upon detecting a local write miss or upgrade miss, a remote write event upon detecting an incoming invalidate request, and a remote read event upon detecting an incoming read request.
When appropriate, the modified coherence logic 715, 719, 723 also updates the communication graph. For example, the modified coherence logic 715, 719, 723 may add an edge to the communication graph upon detecting a read reply, an invalidate reply, or a read miss serviced from memory 702. Upon detecting a read reply, an edge is added having a source node including information from the metadata included in the read reply, and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened. Upon detecting an invalidate reply, an edge is added having a source node including information from the metadata for the cache line that was invalidated, and a sink node including information relating to the local instruction that caused the invalidate request and the context in which the request originated. Upon detecting a read miss serviced from memory 702, an edge is added with a source node set to a null value and a sink node including information relating to the local instruction that caused the miss and the context in which the miss happened, to indicate that an otherwise uninitialized memory location was accessed.
FIGS. 9A-9D illustrate various embodiments of metadata 713, 717, 721 stored in association with cache lines according to various aspects of the present disclosure. FIG. 9A illustrates a standard cache line that does not have any metadata 713, 717, 721 added. Fields are included for a tag indicating a state in the MESI diagram and for the data itself. FIG. 9B illustrates a modified cache line, in which a metadata field has been added to associate a last writer instruction address with the cache line. FIG. 9C adds a writer context field to the modified cache line of FIG. 9B, and FIG. 9D adds a timestamp field to the modified cache line of FIG. 9C. In some embodiments, the writer context field and/or the timestamp may be optional, though the available graph analysis functionality may change. One of ordinary skill in the art will recognize that the information stored in the metadata 713, 717, 721, in aggregate, may be similar to the information stored in the memory location metadata data store 458 in the software-instrumented computing device 400 described above, and may be used for similar purposes. One of ordinary skill in the art will also recognize that the metadata 713, 717, 721 or another portion of the associated cache line may include additional information not illustrated here, such as a writer thread ID or any other information, without departing from the scope of the disclosed subject matter.
Reconstructions
Context-aware communication graphs may be analyzed to determine instructions that are likely associated with failed program behavior. However, since concurrency bugs are difficult to diagnose, it would be helpful if a representation of the behavior of all threads around the instruction could be presented for debugging, and not just the single instruction or the single thread that failed. By adding timestamp data to the nodes of a context-aware communication graph, behavior likely to occur before, during, and after an instruction may be presented for debugging purposes. A reconstruction, according to various aspects of the present disclosure, presents communication nodes that occur before, during, and after an identified edge from a communication graph.
FIG. 10 is a table that illustrates an exemplary communication graph suitable for storage in the communication graph data store 762. Each edge in the communication graph includes a source instruction, a source context, a source timestamp; and a sink instruction, sink context, and sink timestamp. One of ordinary skill in the art will recognize that the communication graph illustrated in tabular format in FIG. 10 is similar to the communication graph illustrated and described in FIG. 2, but has added timestamp information for the source node and the sink node for each edge, when available.
FIG. 11 illustrates an exemplary reconstruction 1100 constructed from the communication graph of FIG. 10. The reconstruction 1100 is based on the edge from source node 1102 to sink node 1104. A prefix section 1106, a body 1108, and a suffix section 1110 are provided to present communication nodes that occurred before, during, and after the communication represented by the edge. In some embodiments, the timestamps of nodes in the communication graph are inspected to determine nodes that are appropriate for the prefix section 1106, body 1108, and suffix section 1110 of the reconstruction 1100. The prefix section 1106 and suffix section 1110 may include any number of nodes. In some embodiments, the prefix section 1106 and/or suffix section 1110 include at most a predetermined number of nodes. In some embodiments, the predetermined number of nodes may be less than or equal to a maximum length of the context FIFO queues used in the communication graph.
Though a reconstruction based on a single execution may be useful for understanding what occurred around a given edge, combining results from multiple executions may give a more complete picture of the behavior that is causing problems. However, since the problems represented are multi-threaded and indeterminate in nature, it is likely that even if an edge is repeated in multiple executions the associated reconstructions will not be the same. FIG. 12 illustrates the creation of an aggregate reconstruction 1210, which combines each node that appears in the prefix, body, or suffix of more than one execution into a single structure. FIG. 12 is based on a hypothetical different code listing and communication graph than those discussed in the rest of the present disclosure, having nodes labeled from the letter S to the letter Z. The code listing is not illustrated, but the details of the code listing are not necessary to understand the formation of an aggregate reconstruction.
The code was executed a plurality of times, and communication graphs were created for each execution. In those executions, four executions were identified that had a particular edge having node Y as the source node and node Z as the sink node. For each execution, a reconstruction 1202, 1204, 1206, 1208 was calculated based on the timestamps of the nodes in the communication graph around node Y and node Z. The reconstructions 1202, 1204, 1206, 1208 are slightly different in each case, reflecting the indeterminate nature of the execution.
To form the aggregate reconstruction 1210, the prefixes, bodies, and suffixes of each reconstruction 1202, 1204, 1206, 1208 are unioned together to form an aggregate prefix, an aggregate body, and an aggregate suffix. Nodes may appear in more than one portion of the aggregate reconstruction, because in some executions, a given node may occur before the sink node or source node, and in other executions, the given node may occur after the sink node or source node. Each node in the aggregate reconstruction 1210 is then assigned a confidence value, which indicates a proportion of executions for which the given node appeared in the given portion of the reconstruction. For example, node U in the body of the aggregate reconstruction 1210 is assigned a confidence value 1212 of 100%, because node U was present in the body of every reconstruction. Meanwhile, node S is assigned a confidence value 1214 of 50% in the prefix, and a confidence value 1216 of 50% in the body, because node S appeared in each portion of the reconstructions twice for the four executions. One of ordinary skill in the art will recognize that the other confidence values were similarly derived. In some embodiments, the nodes in the aggregate reconstruction 1210 are not ordered other than being segregated into prefix, body, and suffix portions, as the timestamps may not be comparable from one execution to another. The use of aggregate reconstructions and confidence values to find likely reconstructions that show failures will be discussed further below.
Using Context-Aware Communication Graphs for Debugging
Once collected, the context-aware communication graphs and reconstructions described above may be used to find concurrency errors. FIG. 13 illustrates one embodiment of a method 1300 of finding possible causes of concurrency errors using context-aware communication graphs. The illustrated method 1300 includes several procedures that are illustrated and described in further detail below.
From a start block, the method 1300 proceeds to block 1302, where a computing device is configured to collect context-aware communication graph information. The computing device may be a software-instrumented computing device 300, a hardware-instrumented computing device 700, or any other suitable computing device configured for collecting context-aware communication graph information, and may be configured as described above. Next, at block 1304, a procedure is performed wherein the computing device collects context-aware communication graphs for a set of failed executions and a set of correct executions.
FIG. 14 illustrates one embodiment of a procedure 1400 executed at block 1304 of FIG. 13 for collecting context-aware communication graphs for a set of failed executions and a set of correct executions according to various aspects of the present disclosure. From a start block, the procedure 1400 proceeds to block 1402, where a test case experiencing intermittent failures is identified. For example, a software developer may receive reports from users or other testers indicating a particular crash, exception, or other error occurs intermittently during a particular usage scenario. The software developer may then determine one or more reproduction steps to create a test case usable to attempt to recreate the reported error. In other cases, the software developer may execute generic functionality test cases, such as unit tests, load tests, or performance tests, in an attempt to reproduce the error. In some embodiments, the test case or generic functionality test cases may be executed by an automated testing framework, or may be executed by a test user performing a set of reproduction steps.
The procedure 1400 then proceeds to a for loop between a start block 1404 and an end block 1410, wherein the test case is executed and a test case result is determined. In some embodiments, the for loop between blocks 1404 and 1410 is executed a predetermined number of times. In other embodiments, the for loop between blocks 1404 and 1410 may be executed until a predetermined number of failed test case results are collected, and/or any other suitable number of times. From the for loop start block 1404, the procedure 1400 proceeds to block 1406, where the computing device collects and stores a communication graph during execution of the test case. The computing device may collect and store the communication graph via a suitable technique as described above. At block 1408, the computing device associates the communication graph with a test case result. For example, an automated testing framework may store a failed test case result with the communication graph upon detecting that an error occurred or an expected result was not obtained, and may store a correct test case result with the communication graph upon detecting that an expected result was obtained without any errors. As another example, a test user may analyze the results of the test case, and may indicate whether a correct test case result or a failed test case result should be stored with the communication graph.
The procedure 1400 proceeds to the for loop end block 1410 and determines whether the for loop should be executed again. If so, the procedure 1400 returns to the for loop start block 1404. If not, the procedure 1400 proceeds to block 1412, where the computing device creates a set of failed communication graphs based on the communication graphs having failed test case results. At block 1414, the computing device creates a set of correct communication graphs based on the communication graphs having correct test case results. In some embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in the communication graph data store 762 or 462, while in other embodiments, the computing device may store the set of failed communication graphs and the set of correct communication graphs in a separate data store for future processing. The procedure 1400 then proceeds to an end block and terminates.
Returning now to FIG. 13, the method 1300 proceeds from block 1304 to block 1306, where a procedure is performed wherein a graph analysis engine, such as graph analysis engine 456 or 756, selects a set of edges correlated with failed behavior based on a failed frequency ratio calculated for each edge.
FIG. 15 illustrates one embodiment of a procedure 1500 executed at block 1306 of FIG. 13 for selecting a set of edges correlated with failed behavior according to various aspects of the present disclosure. Briefly stated, the procedure 1500 selects edges from the communication graphs that appear more often in failed communication graphs than in correct communication graphs. From a start block, the procedure 1500 proceeds to block 1502, where the graph analysis engine determines a failed execution fraction for the edges of the communication graphs based on a number of occurrences of the edges in the set of failed communication graphs and a total number of failed executions. In some embodiments, the failed execution fraction for a given edge may be expressed by the following equation, wherein Fracf is the failed execution fraction for the edge, EdgeFreqf is the number of failed communication graphs in which the edge appears, and #Runsf is the total number of failed communication graphs.
Frac f = EdgeFreq f # Runs f ( 1 )
At block 1504, the graph analysis engine determines a correct execution fraction for the edges of the communication graphs based on a number of occurrences of the edges in the set of correct communication graphs and a total number of correct communication graphs. In some embodiments, the correct execution fraction for a given edge may be expressed by the following equation, wherein Fracc is the correct execution fraction for the edge, EdgeFreqc is the number of correct communication graphs in which the edge appears, and #Runsc is the total number correct communication graphs.
Frac c = EdgeFreq c # Runs c ( 2 )
Next, at block 1506, the graph analysis engine determines a failed frequency ratio for the edges of the communication graphs based on the failed execution fraction and the correct execution fraction. In some embodiments, the failed frequency ratio for a given edge may be expressed by the following equation, wherein F is the failed frequency ratio:
F = Frac f Frac c ( 3 )
In some embodiments, edges having a Fracc of zero may be particularly likely to be associated with failures, but would cause Function 3 above to be undefined. In such cases, the Fracc value may be replaced by a value that yields a large value for F. For example, in some embodiments, a Fracc of zero may be replaced by the following value:
Frac c = 1 # Runs c + 1 ( 4 )
The procedure 1500 then proceeds to block 1508, where the graph analysis engine selects a set of edges for further analysis based on the failed frequency ratios. In some embodiments, the graph analysis engine may select a predetermined number of edges having the highest failed frequency ratios. In some embodiments, the graph analysis engine may select edges having a failed frequency ratio greater than a threshold value. The procedure 1500 then proceeds to an end block and terminates.
Returning now to FIG. 13, the method 1300 proceeds to block 1308, where a procedure is performed wherein the graph analysis engine determines an aggregate reconstruction for each selected edge. In some embodiments, the aggregate reconstruction may be calculated for failed communication graphs in order to determine sets of likely nodes that co-occur with failed behavior. FIG. 16 illustrates one embodiment of a procedure 1600 executed at block 1308 of FIG. 13 for determining an aggregate reconstruction for each selected edge according to various aspects of the present disclosure. From a start block, the procedure 1600 proceeds to a for loop between a for loop start block 1602 and a for loop end block 1620, wherein the for loop executes once for each selected edge to create an aggregate reconstruction for each selected edge.
From the for loop start block 1602, the procedure 1600 proceeds to another for loop between a for loop start block 1604 and a for loop end block 1608, wherein the for loop executes once for each failed communication graph containing the selected edge to create reconstructions for the selected edge for each failed communication graph. From the for loop start block 1604, the procedure 1600 proceeds to block 1606, where the graph analysis engine creates a failed reconstruction based on timestamps of the source node and the sink node of the selected edge in the failed communication graph, as well as timestamps of neighboring nodes in the failed communication graph. As discussed above with respect to FIG. KK, the failed reconstruction may be built by selecting nodes having timestamps between the timestamp of the source node and sink node of the edge, a predetermined number of nodes having timestamps before the timestamp of the source node, and a predetermined number of nodes having timestamps after the timestamp of the sink node.
The procedure 1600 then proceeds to the for loop end block 1608 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1604 and calculates a failed reconstruction for another failed communication graph. If not, the procedure 1600 proceeds to block 1610, where the graph analysis engine creates an aggregate failed reconstruction for the selected edge based on frequencies of nodes in the prefix, body, and suffix of the created failed reconstructions. In some embodiments, the aggregate failed reconstruction for the selected edge may be built using a method similar to the construction of the aggregate reconstruction illustrated and described in FIG. 12.
The procedure 1600 then proceeds to the for loop end block 1620 and determines whether the for loop should be executed again. If so, the procedure 1600 returns to the for loop start block 1602 and calculates an aggregate reconstruction for the next selected edge. If not, the procedure 1600 proceeds to an end block and terminates.
Returning now to FIG. 13, the method 1300 proceeds to block 1310, where the graph analysis engine determines a reconstruction consistency for each aggregate reconstruction. In some embodiments, a reconstruction consistency represents a combined confidence value over all nodes in an aggregate reconstruction. In an aggregate reconstruction produced from a set of failed communication graphs, nodes having high confidence values occur consistently in the same region of the reconstructions, and are therefore likely to be related to the failed behavior. Hence, reconstructions containing many high confidence nodes may reflect a correlation between the co-occurrence of the instructions contained in the nodes in the order shown by the reconstruction and the occurrence of failures. In some embodiments, a reconstruction consistency may be determined by combining total average confidence values for the nodes in each reconstruction region. For example, a reconstruction consistency R for a reconstruction having a prefix region P, a body B, and a suffix S, may be represented by the following equation, wherein V(n,r) is the confidence value of node n in region r.
R = p P V ( p , P ) + b B V ( b , B ) + s S V ( s , S ) P + B + S ( 5 )
At block 1312, a procedure is performed wherein the graph analysis engine determines a difference in interleaving around the edge in failed communication graphs versus correct communication graphs. In some embodiments, the difference in interleaving may be represented by a context variation ratio, which is based on a comparison of a number of contexts in which either the source instruction or the sink instruction communicate in failed communication graphs versus correct communication graphs. Large differences between the number of contexts in correct communication graphs compared to failed communication graphs may be correlated with failures. FIG. 17 illustrates one embodiment of a procedure 1700 executed at block 1312 of FIG. 13 for determining a context variation ratio for each aggregate reconstruction.
From a start block, the procedure 1700 proceeds to block 1702, where the graph analysis engine determines a source instruction and a sink instruction associated with the edge used to create the aggregate reconstruction. Next, at block 1704, the graph analysis engine determines a number of failed source contexts based on a number of nodes in the failed communication graphs that include the source instruction. The failed source contexts may include contexts from any node wherein the source instruction appears, whether the node is a source node or a sink node. The procedure 1700 proceeds to block 1706, where the graph analysis engine determines a number of failed sink contexts based on a number of nodes in the failed communication graphs that include the sink instruction. Again, the failed sink contexts may include contexts from any node wherein the sink instruction appears. Next, at block 1708, the graph analysis engine adds the number of failed source contexts and the number of failed sink contexts to obtain a number of failed contexts. The number of failed contexts represents a count of the contexts in which either the source instruction or the sink instruction communicates as represented by the failed communication graphs.
The procedure 1700 proceeds to block 1710, where the graph analysis engine determines a number of correct source contexts based on a number of nodes in the correct communication graphs that include the source instruction. At block 1712, the graph analysis engine determines a number of correct sink contexts based on a number of nodes in the correct communication graphs that include the sink instruction. As discussed above, the source contexts and sink contexts include nodes wherein the source instruction or sink instruction, respectively, are present in either a source node or sink node. The procedure 1700 proceeds to block 1714, where the graph analysis engine adds the number of correct source contexts and the number of correct sink contexts to obtain a number of correct contexts.
At block 1716, the graph analysis engine determines a context variation ratio based on the number of failed contexts and the number of correct contexts. The procedure 1700 then proceeds to an end block and terminates. In some embodiments, the context variation ratio C may be represented by the following equation, wherein #Ctxf is the number of failed contexts and #Ctxc is the number of correct contexts.
C = # Ctx f - # Ctx c # Ctx f + # Ctx c ( 6 )
Returning now to FIG. 13, the method 1300 proceeds to block 1314, where the graph analysis engine ranks each aggregate reconstruction based on one or more of the reconstruction consistency, the context variation ratio, and the failed frequency ratio. In some embodiments, the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be useful separately or individually to rank aggregate reconstructions for finding aggregate reconstructions that accurately represent failed executions. In some embodiments, two or more of the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be combined to rank each aggregate reconstruction to allow the strengths of each score to complement each other. In some embodiments, the reconstruction consistency, the context variation ratio, and the failed frequency ratio may be multiplied together to produce a score for ranking each aggregate reconstruction. At block 1316, the graph analysis engine presents one or more highly ranked aggregate reconstructions for debugging. The top ranked aggregate reconstructions are likely to accurately represent failed executions, and so the error should be easily diagnosed by the developer once presented with the top ranked aggregate reconstructions. The method 1300 then proceeds to an end block and terminates.
The method 1300 illustrated and discussed above relates to cases in which failed executions are distinguished from correct executions. However, similar techniques for analyzing context-aware communication graphs to find possible causes of concurrency errors using executions which are not known to be failed or correct may also be useful. FIG. 18 illustrates one embodiment of a method 1800 of detecting possible causes of concurrency errors using such unlabeled executions. From a start block, the method 1800 proceeds to block 1802, where a computing device for collecting context-aware communication graph information is configured. As discussed above, the computing device may be a hardware-instrumented computing device 700, a software-instrumented computing device 400, or any other suitably configured computing device. At block 1804, the computing device collects context-aware communication graphs for a set of executions. Unlike the method 1300 discussed above, the executions or communication graphs are not labeled as correct or failed. Next, at block 1806, for each instruction in the communication graphs, a graph analysis engine calculates an instruction rank that reflects the rarity of contexts in which the instruction executed. In some embodiments, the instruction rank for each instruction may be represented by the following equation, wherein Xi is the set of contexts in which the instruction executed, Fi,x is a number of runs in which the instruction i executed in context x, and Fi,* is a total number of times the instruction i executed regardless of context across all runs.
rank i = x X i F i , x F i , * ( 7 )
The equation functions to rank instructions that were executed in rare contexts higher to reflect their increased likelihood of being associated with failed behavior. At block 1808, the graph analysis engine ranks the instructions based on the associated instruction ranks to identify one or more instructions for presenting for debugging. In some embodiments, reconstructions and/or aggregate reconstructions may be built as described above based on the highly ranked instruction and/or one or more edges associated with the highly ranked instruction to make debugging easier. The method 1800 then proceeds to an end block and terminates.
One of ordinary skill in the art will recognize that the pseudocode, execution listings, and communication graphs illustrated and discussed above are exemplary only, and that actual embodiments of the present disclosure may be used to find other concurrency errors, for any suitable code listings and/or communication graphs. In some embodiments, other types of errors, such as performance bottlenecks and/or the like, may also be detected using similar systems and/or methods.
While illustrative embodiments have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the claimed subject matter.

Claims (17)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. A nontransitory computer-readable medium having computer-executable instructions stored thereon that, if executed by one or more processors of a computing device, cause the computing device to perform actions to analyze a set of context-aware communication graphs for debugging, the actions comprising:
creating, by the computing device, a set of aggregate reconstructions based on edges of the set of communication graphs;
ranking, by the computing device, the aggregate reconstructions in order of likelihood of being associated with a failed execution; and
presenting, by the computing device, one or more highly ranked aggregate reconstructions;
wherein edges of the set of communication graphs represent communication events between threads;
wherein nodes of the set of communication graphs each include an instruction address and a context; and
wherein the context represents a sequence of communication events observed by a thread prior to the execution of an instruction at the instruction address regardless of the memory location involved in the sequence of communication events.
2. The computer-readable medium of claim 1, wherein the actions further comprise:
selecting edges of the set of communication graphs for creating aggregate reconstructions based on a correlation of edges of the set of communication graphs with failed executions.
3. The computer-readable medium of claim 2, wherein selecting edges includes determining a correlation for one or more edges of the set of communication graphs with failed executions.
4. The computer-readable medium of claim 3, wherein determining the correlation for an edge of the set of communication graphs with failed executions comprises:
determining a failed execution fraction for the edge;
determining a correct execution fraction for the edge; and
determining a failed frequency ratio based on the failed execution fraction and the correct execution fraction.
5. The computer-readable medium of claim 1, wherein each aggregate reconstruction includes an edge, wherein ranking the aggregate reconstructions includes calculating a score for each aggregate reconstruction, and wherein the score is based on at least one of:
a correlation of the edge of the aggregate reconstruction with failed executions;
a difference in interleaving around the edge between failed executions and correct executions; and
a level of consistency for the aggregate reconstruction.
6. The computer-readable medium of claim 5, wherein the difference in interleaving around the edge between failed executions and correct executions is calculated by:
calculating a number of failed contexts associated with a source node of the edge and a sink node of the edge from failed executions;
calculating a number of correct contexts associated with the source node and the sink node from correct executions; and
calculating a context variation ratio based on the number of failed contexts and the number of correct contexts.
7. The computer-readable medium of claim 5, wherein the level of consistency for the aggregate reconstruction is calculated by:
calculating a first total of confidence values for each prefix node in the aggregate reconstruction;
calculating a second total of confidence values for each body node in the aggregate reconstruction;
calculating a third total of confidence values for each suffix node in the aggregate reconstruction;
calculating a sum of the first, second and third total confidence values; and
dividing the sum by a sum of a total number of prefix nodes, a total number of body nodes, and a total number of suffix nodes.
8. A computing device for detecting concurrency bugs, the device comprising:
at least two processing cores;
at least two cache memories, wherein each cache memory is associated with at least one processing core, and wherein each cache memory is associated with coherence logic;
a coherence interconnect communicatively coupled to each of the cache memories; and
a communication graph data store;
wherein the coherence logic is configured to add edges to a communication graph stored in the communication graph data store based on coherence messages transmitted on the coherence interconnect;
wherein edges of the communication graph represent communication events between threads;
wherein nodes of the communication graph each include an instruction address and a context; and
wherein the context represents a sequence of communication events observed by a thread prior to the execution of an instruction at the instruction address regardless of the memory location involved in the sequence of communication events.
9. The computing device of claim 8, wherein each cache memory includes a plurality of cache lines, each cache line including metadata associated with a last write to the cache line.
10. The computing device of claim 9, wherein the metadata includes a writer instruction address.
11. The computing device of claim 10, wherein the metadata includes a writer context.
12. The computing device of claim 11, wherein the metadata further includes a timestamp.
13. The computing device of claim 8, wherein each processing core includes a context register.
14. The computing device of claim 8, wherein the coherence logic is configured according to an MESI cache coherence protocol.
15. The computing device of claim 14, wherein the MESI cache coherence protocol includes:
a read reply that includes a writer context and a writer instruction address of an associated cache line; and
an invalidate reply that includes a writer context and a writer instruction address of an associated cache line.
16. The computing device of claim 15,
wherein the coherence logic is configured to add an edge to a communication graph stored in the communication graph data store upon detecting a read reply;
wherein the edge includes a source node and a sink node;
wherein the source node includes the writer context and the writer instruction address of the read reply; and
wherein the sink node includes a reader instruction and a context of a thread that caused a cache miss associated with the read reply.
17. The computing device of claim 15,
wherein the coherence logic is configured to add an edge to a communication graph stored in the communication graph data store upon detecting an invalidate reply;
wherein the edge includes a source node and a sink node;
wherein the source node includes the writer context and the writer instruction address of the invalidate reply; and
wherein the sink node includes a writer instruction and a context of a thread that caused the invalidate request to be generated.
US13/312,844 2010-12-06 2011-12-06 Systems and methods for finding concurrency errors Expired - Fee Related US8832659B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/312,844 US8832659B2 (en) 2010-12-06 2011-12-06 Systems and methods for finding concurrency errors
US14/464,673 US9146737B2 (en) 2010-12-06 2014-08-20 Systems and methods for finding concurrency errors

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US42018510P 2010-12-06 2010-12-06
US13/312,844 US8832659B2 (en) 2010-12-06 2011-12-06 Systems and methods for finding concurrency errors

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/464,673 Division US9146737B2 (en) 2010-12-06 2014-08-20 Systems and methods for finding concurrency errors

Publications (2)

Publication Number Publication Date
US20120144372A1 US20120144372A1 (en) 2012-06-07
US8832659B2 true US8832659B2 (en) 2014-09-09

Family

ID=46163492

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/312,844 Expired - Fee Related US8832659B2 (en) 2010-12-06 2011-12-06 Systems and methods for finding concurrency errors
US14/464,673 Expired - Fee Related US9146737B2 (en) 2010-12-06 2014-08-20 Systems and methods for finding concurrency errors

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/464,673 Expired - Fee Related US9146737B2 (en) 2010-12-06 2014-08-20 Systems and methods for finding concurrency errors

Country Status (1)

Country Link
US (2) US8832659B2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9626283B1 (en) * 2013-03-06 2017-04-18 Ca, Inc. System and method for automatically assigning a defect to a responsible party
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
US10268558B2 (en) * 2017-01-13 2019-04-23 Microsoft Technology Licensing, Llc Efficient breakpoint detection via caches
US10740220B2 (en) 2018-06-27 2020-08-11 Microsoft Technology Licensing, Llc Cache-based trace replay breakpoints using reserved tag field bits
US10747643B2 (en) * 2018-05-23 2020-08-18 Dropbox, Inc. System for debugging a client synchronization service
US10970193B2 (en) 2018-05-23 2021-04-06 Dropbox, Inc. Debugging a client synchronization service

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2795466A4 (en) * 2011-12-21 2015-09-23 Intel Corp Methods and systems to identify and reproduce concurrency violations in multi-threaded programs
US9098941B2 (en) 2012-01-23 2015-08-04 Ayasdi, Inc. Systems and methods for graphical layout
US8830254B2 (en) * 2012-01-24 2014-09-09 Ayasdi, Inc. Systems and methods for graph rendering
US9135139B2 (en) * 2012-06-27 2015-09-15 Intel Corporation Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions
US9690737B2 (en) * 2012-07-31 2017-06-27 Hewlett Packard Enterprise Development Lp Systems and methods for controlling access to a shared data structure with reader-writer locks using multiple sub-locks
US9117021B2 (en) 2013-03-14 2015-08-25 Intel Corporation Methods and apparatus to manage concurrent predicate expressions
US9158640B2 (en) 2013-11-01 2015-10-13 International Business Machines Corporation Tightly-coupled context-aware irritator thread creation for verification of microprocessors
US10212116B2 (en) * 2015-09-29 2019-02-19 International Business Machines Corporation Intelligently condensing transcript thread history into a single common reduced instance
US9830270B2 (en) * 2015-11-25 2017-11-28 GM Global Technology Operations LLC Optimized memory layout through data mining
US10599551B2 (en) * 2016-08-12 2020-03-24 The University Of Chicago Automatically detecting distributed concurrency errors in cloud systems
US10599552B2 (en) * 2018-04-25 2020-03-24 Futurewei Technologies, Inc. Model checker for finding distributed concurrency bugs
CN110955603B (en) * 2019-12-03 2023-06-06 望海康信(北京)科技股份公司 Automated testing method, apparatus, electronic device and computer readable storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050183094A1 (en) * 1998-10-02 2005-08-18 Microsoft Corporation Tools and techniques for instrumenting interfaces of units of a software program
US7346486B2 (en) * 2004-01-22 2008-03-18 Nec Laboratories America, Inc. System and method for modeling, abstraction, and analysis of software
US20080097941A1 (en) * 2006-10-19 2008-04-24 Shivani Agarwal Learning algorithm for ranking on graph data
US20080201629A1 (en) * 2007-02-20 2008-08-21 International Business Machines Corporation Method and system for detecting synchronization errors in programs
US20090113399A1 (en) * 2007-10-24 2009-04-30 Rachel Tzoref Device, System and Method of Debugging Computer Programs
US20090125887A1 (en) * 2007-11-14 2009-05-14 Nec Laboratories America, Inc. System and method for generating error traces for concurrency bugs
US20090217303A1 (en) * 2008-02-27 2009-08-27 Accenture Global Services Gmbh Test script transformation analyzer with change guide engine
US7784035B2 (en) * 2005-07-05 2010-08-24 Nec Laboratories America, Inc. Method for the static analysis of concurrent multi-threaded software
US7844959B2 (en) * 2006-09-29 2010-11-30 Microsoft Corporation Runtime optimization of distributed execution graph
US20110083123A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Automatically localizing root error through log analysis
US8201142B2 (en) * 2006-09-29 2012-06-12 Microsoft Corporation Description language for structured graphs
US20130014090A1 (en) * 2009-01-15 2013-01-10 International Business Machines Corporation Weighted Code Coverage Tool
US8578342B2 (en) * 2009-07-14 2013-11-05 International Business Machines Corporation Fault detection and localization in dynamic software applications requiring user inputs and persistent states

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7028119B2 (en) * 2001-04-04 2006-04-11 Wind River Systems, Inc. Automated tool for detection of potential race condition
US6851075B2 (en) * 2002-01-04 2005-02-01 International Business Machines Corporation Race detection for parallel software
US7516446B2 (en) * 2002-06-25 2009-04-07 International Business Machines Corporation Method and apparatus for efficient and precise datarace detection for multithreaded object-oriented programs
US7366956B2 (en) * 2004-06-16 2008-04-29 Hewlett-Packard Development Company, L.P. Detecting data races in multithreaded computer programs
US7716645B2 (en) * 2005-06-10 2010-05-11 International Business Machines Corporation Using atomic sets of memory locations
US9268666B2 (en) * 2005-10-21 2016-02-23 Undo Ltd. System and method for debugging of computer programs
US7926046B2 (en) * 2005-12-13 2011-04-12 Soorgoli Ashok Halambi Compiler method for extracting and accelerator template program
GB2443277B (en) * 2006-10-24 2011-05-18 Advanced Risc Mach Ltd Performing diagnostics operations upon an asymmetric multiprocessor apparatus
US9086969B2 (en) * 2007-12-12 2015-07-21 F5 Networks, Inc. Establishing a useful debugging state for multithreaded computer program
US9454460B2 (en) * 2010-07-23 2016-09-27 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for providing determinism in multithreaded programs
US8813038B2 (en) * 2011-02-09 2014-08-19 Microsoft Corporation Data race detection

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050183094A1 (en) * 1998-10-02 2005-08-18 Microsoft Corporation Tools and techniques for instrumenting interfaces of units of a software program
US7984429B2 (en) * 1998-10-02 2011-07-19 Microsoft Corporation Tools and techniques for instrumenting interfaces of units of a software program
US7346486B2 (en) * 2004-01-22 2008-03-18 Nec Laboratories America, Inc. System and method for modeling, abstraction, and analysis of software
US7784035B2 (en) * 2005-07-05 2010-08-24 Nec Laboratories America, Inc. Method for the static analysis of concurrent multi-threaded software
US8201142B2 (en) * 2006-09-29 2012-06-12 Microsoft Corporation Description language for structured graphs
US7844959B2 (en) * 2006-09-29 2010-11-30 Microsoft Corporation Runtime optimization of distributed execution graph
US20080097941A1 (en) * 2006-10-19 2008-04-24 Shivani Agarwal Learning algorithm for ranking on graph data
US7865778B2 (en) * 2007-02-20 2011-01-04 International Business Machines Corporation Method and system for detecting synchronization errors in programs
US20080201629A1 (en) * 2007-02-20 2008-08-21 International Business Machines Corporation Method and system for detecting synchronization errors in programs
US20090113399A1 (en) * 2007-10-24 2009-04-30 Rachel Tzoref Device, System and Method of Debugging Computer Programs
US8356287B2 (en) * 2007-10-24 2013-01-15 International Business Machines Corporation Device, system and method of debugging computer programs
US20090125887A1 (en) * 2007-11-14 2009-05-14 Nec Laboratories America, Inc. System and method for generating error traces for concurrency bugs
US8527976B2 (en) * 2007-11-14 2013-09-03 Nec Laboratories America, Inc. System and method for generating error traces for concurrency bugs
US20090217303A1 (en) * 2008-02-27 2009-08-27 Accenture Global Services Gmbh Test script transformation analyzer with change guide engine
US20130014090A1 (en) * 2009-01-15 2013-01-10 International Business Machines Corporation Weighted Code Coverage Tool
US8578342B2 (en) * 2009-07-14 2013-11-05 International Business Machines Corporation Fault detection and localization in dynamic software applications requiring user inputs and persistent states
US20110083123A1 (en) * 2009-10-05 2011-04-07 Microsoft Corporation Automatically localizing root error through log analysis

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
Burckhardt, S., et al., "A Randomized Scheduler With Probabilistic Guarantees of Finding Bugs," Proceedings of the 15th Annual Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '10), Pittsburgh, Pa., Mar. 13-17, 2010, 12 pages.
Flanagan, C., and S.N. Freund, "The RoadRunner Dynamic Analysis Framework for Concurrent Programs," Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE '10), Toronto, Jun. 5-6, 2010, 8 pages.
Flanagan, C., et al., "Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08), Tucson, Ariz., Jun. 7-13, 2008, 11 pages.
Hammer, C., et al., "Dynamic Detection of Atomic-Set-Serializability Violations," 30th International Conference on Software Engineering (ICSE '08), Leipzig, Germany, May 10-18, 2008, 10 pages.
Kononenko, I., "Estimating Attributes: Analysis and Extensions of RELIEF," Proceedings of the European Conference on Machine Learning (ECML-94), Catania, Italy, Apr. 6-8, 1994, 12 pages.
Liblit, B.R., "Cooperative Bug Isolation," doctoral dissertation, University of California, Berkeley, Fall 2004, 172 pages.
Lu, S., et al., "AVIO: Detecting Atomicity Violations via Access Interleaving Invariants," Proceedings of the 12th International Conference of Architectural Support for Programming Languages and Operating Systems (ASPLOS '06), San Jose, Calif., Oct. 21-25, 2006, 12 pages.
Lucia, B., and L. Ceze, "Finding Concurrency Bugs with Context-Aware Communication Graphs," Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '09), New York, Dec. 12-16, 2009, 11 pages.
Luk, C-K., et al., "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05), Chicago, Jun. 11-15, 2005, 11 pages.
Musuvathi, M., et al., "Finding and Reproducing Heisenbugs in Concurrent Programs," Proceedings of the 8th USENIX Conference on Operating Systems Designs and Implementation (OSDI '08), San Diego, Dec. 8-10, 2008, pp. 267-280.
Park, S., et al., "CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places," Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '09), Washington, D.C., Mar. 7-11, 2009, 12 pages.
Park, S., et al., "Falcon: Fault Localization in Concurrent Programs," Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE '10), Cape Town, South Africa, May 2-8, 2010, vol. 1, pp. 245-254.
Shi, Y., et al., "Do I Use the Wrong Definition? DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs," Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA/SPLASH '10), Reno, Nev., Oct. 17-21, 2010, 15 pages.
Xu, M., et al., "A Serializability Violation Detector for Shared-Memory Server Programs," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05), Chicago, Jun. 11-15, 2005, 14 pages.
Yu, J., and S. Narayanasamy, "A Case for an Interleaving Constrained Shared-Memory Multi-Processor," Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), Austin, Tex., Jun. 20-24, 2009, 11 pages.

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9626283B1 (en) * 2013-03-06 2017-04-18 Ca, Inc. System and method for automatically assigning a defect to a responsible party
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
US10268558B2 (en) * 2017-01-13 2019-04-23 Microsoft Technology Licensing, Llc Efficient breakpoint detection via caches
US10747643B2 (en) * 2018-05-23 2020-08-18 Dropbox, Inc. System for debugging a client synchronization service
US10970193B2 (en) 2018-05-23 2021-04-06 Dropbox, Inc. Debugging a client synchronization service
US10740220B2 (en) 2018-06-27 2020-08-11 Microsoft Technology Licensing, Llc Cache-based trace replay breakpoints using reserved tag field bits

Also Published As

Publication number Publication date
US20140359577A1 (en) 2014-12-04
US9146737B2 (en) 2015-09-29
US20120144372A1 (en) 2012-06-07

Similar Documents

Publication Publication Date Title
US9146737B2 (en) Systems and methods for finding concurrency errors
Xu et al. A serializability violation detector for shared-memory server programs
US9092253B2 (en) Instrumentation of hardware assisted transactional memory system
US8949671B2 (en) Fault detection, diagnosis, and prevention for complex computing systems
Park et al. Falcon: fault localization in concurrent programs
Lucia et al. Isolating and understanding concurrency errors using reconstructed execution fragments
Lucia et al. Atom-aid: Detecting and surviving atomicity violations
Perkovic et al. Online data-race detection via coherency guarantees
US20070250820A1 (en) Instruction level execution analysis for debugging software
Lucia et al. Finding concurrency bugs with context-aware communication graphs
Vlachos et al. ParaLog: Enabling and accelerating online parallel monitoring of multithreaded applications
US20090077540A1 (en) Atomicity Violation Detection Using Access Interleaving Invariants
US9081628B2 (en) Detecting potential access errors in a multi-threaded application
US20110029819A1 (en) System and method for providing program tracking information
US20110320745A1 (en) Data-scoped dynamic data race detection
Atachiants et al. Parallel performance problems on shared-memory multicore systems: taxonomy and observation
CN110431536B (en) Implementing breakpoints across an entire data structure
Schmitz et al. DataRaceOnAccelerator–a micro-benchmark suite for evaluating correctness tools targeting accelerators
Pereira et al. Testing for race conditions in distributed systems via smt solving
Yang et al. Histlock+: precise memory access maintenance without lockset comparison for complete hybrid data race detection
Jeffrey et al. Understanding bloom filter intersection for lazy address-set disambiguation
Dolz et al. Enabling semantics to improve detection of data races and misuses of lock‐free data structures
Jiang et al. Optimistic shared memory dependence tracing (T)
Metzger et al. User-guided dynamic data race detection
Taheri ANALYSIS AND DEBUGGING TOOLS FOR CONCURRENT PROGRAMS

Legal Events

Date Code Title Description
AS Assignment

Owner name: UNIVERSITY OF WASHINGTON THROUGH ITS CENTER FOR CO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CEZE, LUIS;LUCIA, BRANDON;SIGNING DATES FROM 20120207 TO 20120208;REEL/FRAME:027703/0860

AS Assignment

Owner name: NATIONAL SCIENCE FOUNDATION, VIRGINIA

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF WASHINGTON CENTER FOR COMMERCIALIZATION;REEL/FRAME:028521/0117

Effective date: 20120217

AS Assignment

Owner name: NATIONAL SCIENCE FOUNDATION, VIRGINIA

Free format text: CONFIRMATORY LICENSE;ASSIGNOR:UNIVERSITY OF WASHINGTON;REEL/FRAME:034914/0957

Effective date: 20120217

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.)

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20180909