US20020174415A1 - System and method for debugging distributed software environments - Google Patents

System and method for debugging distributed software environments Download PDF

Info

Publication number
US20020174415A1
US20020174415A1 US09/885,456 US88545601A US2002174415A1 US 20020174415 A1 US20020174415 A1 US 20020174415A1 US 88545601 A US88545601 A US 88545601A US 2002174415 A1 US2002174415 A1 US 2002174415A1
Authority
US
United States
Prior art keywords
event
debugging
coordination
component
coordinator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/885,456
Inventor
Kenneth Hines
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
CONSYSTANT DESIGN TECHNOLOGIES Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CONSYSTANT DESIGN TECHNOLOGIES Inc filed Critical CONSYSTANT DESIGN TECHNOLOGIES Inc
Priority to US09/885,456 priority Critical patent/US20020174415A1/en
Assigned to CONSYSTANT DESIGN TECHNOLOGIES, INC. reassignment CONSYSTANT DESIGN TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HINES, KENNETH J.
Publication of US20020174415A1 publication Critical patent/US20020174415A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CONSYSTANT DESIGN TECHNOLOGIES, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3632Software debugging of specific synchronisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3664Environments for testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/547Remote procedure calls [RPC]; Web services

Definitions

  • the present invention relates to a system and method for debugging distributed software environments and, in particular to a coordination-centric approach in which a distributed software environment produces event traces to be analyzed by a debugging host.
  • a system design and programming methodology is most effective when it is closely integrated and coheres tightly with its corresponding debugging techniques.
  • the relationship between debugging approaches and design methodologies has traditionally been one-sided in favor of the design and programming methodologies.
  • Design and programming methodologies are typically developed without any consideration for the debugging techniques that will later be applied to software systems designed using that design and programming methodology. While these typical debugging approaches attempt to exploit features provided by the design and programming methodologies, the debugging techniques will normally have little or no impact on what the design and programming features are in the first place.
  • This lack of input from debugging approaches to design and programming methodologies serves to maintain the role of debugging as an afterthought, even though in a typical system design, debugging consumes a majority of the design time.
  • the need remains for a design and programming methodology that reflects input from, and consideration of, potential debugging approaches in order to enhance the design and reduce the implementation time of software systems.
  • Packaging refers to the set of interfaces a software element presents to other elements in a system.
  • Software packaging has many forms in modern methodologies. Some examples are programming language procedure call interfaces (as with libraries), TCP/IP socket interfaces with scripting languages (as with mail and Web servers), and file formats.
  • One common packaging style is based on object-oriented programming languages and provides procedure-based (method-based) packaging for software elements (objects within this framework). These procedure-based packages allow polymorphism (in which several types of objects can have identical interfaces) through subtyping, and code sharing through inheritance (deriving a new class of objects from an already existing class of objects).
  • procedure-based packages allow polymorphism (in which several types of objects can have identical interfaces) through subtyping, and code sharing through inheritance (deriving a new class of objects from an already existing class of objects).
  • an object's interface is defined by the object's methods.
  • Object-oriented approaches are useful in designing concurrent systems (systems with task level parallelism and multiple processing resources?) because of the availability of active objects (objects with a thread of control). Some common, concurrent object-oriented approaches are shown in actor languages and in concurrent Eiffel.
  • CORBA common object request broker architecture
  • IDL interface description language
  • RPC remote procedure call
  • an element's package is implemented as a set of co-routines that can be adapted for use with applications through use of adapters with interfaces complementary to the interface for the software element. These adapters can be application-specific-used only when the elements are composed into a system.
  • co-routines lets a designer specify transactions or sequences of events as part of an interface, rather than just as atomic events.
  • co-routines must be executed in lock-step, meaning a transition in one routine corresponds to a transition in the other co-routine. If there is an error in one or if an expected event is lost, the interface will fail because its context will be incorrect to recover from the lost event and the co-routines will be out of sync.
  • Coordination within the context of this application, means the predetermined ways through which software components interact. In a broader sense, coordination refers to a methodology for composing concurrent components into a complete system. This use of the term coordination differs slightly from the use of the term in the parallelizing compiler literature, in which coordination refers to a technique for maintaining program-wide semantics for a sequential program decomposed into parallel subprograms.
  • Coordination languages are usually a class of tuple-space programming languages, such as Linda.
  • a tuple is a data object containing two or more types of data that are identified by their tags and parameter lists.
  • coordination occurs through the use of tuple spaces, which are global multisets of tagged tuples stored in shared memory.
  • Tuple-space languages extend existing programming languages by adding six operators: out, in, read, eval, inp, and readp. The out, in, and read operators place, fetch and remove, and fetch without removing tuples from tuple space. Each of these three operators blocks until its operation is complete. The out operator creates tuples containing a tag and several arguments. Procedure calls can be included in the arguments, but since out blocks, the calls must be performed and the results stored in the tuple before the operator can return.
  • eval, inp, and readp are nonblocking versions of out, in, and read, respectively. They increase the expressive power of tuple-space languages.
  • eval the nonblocking version of out. Instead of evaluating all arguments of the tuple before returning, it spawns a thread to evaluate them, creating, in effect, an active tuple (whereas tuples created by out are passive).
  • the results are stored in a passive tuple and left in tuple space.
  • the eval call returns immediately, so that several active tuples can be left outstanding.
  • Tuple-space coordination can be used in concise implementations of many common interaction protocols. Unfortunately, tuple-space languages do not separate coordination issues from programming issues. Consider the annotated Linda implementation of RPC in Listing 1.
  • a tuple space can require large quantities of dynamically allocated memory.
  • most systems, and especially embedded systems must operate within predictable and sometimes small memory requirements.
  • Tuple-space systems are usually not suitable for coordination in systems that must operate within small predictable memory requirements because once a tuple has been generated, it remains in tuple space until it is explicitly removed or the software element that created it terminates. Maintaining a global tuple space can be very expensive in terms of overall system performance. Although much work has gone into improving the efficiency of tuple-space languages, system performance remains worse with tuple-space languages than with message-passing techniques.
  • This type of formalism can be provided by fixed coordination models in which the coordination style is embodied in an entity and separated from computational concerns. Synchronous coordination models coordinate activity through relative schedules. Typically, these approaches require the coordination protocol to be manually constructed in advance. In addition, computational elements must be tailored to the coordination style used for a particular system (which may require intrusive modification of the software elements).
  • the present invention provides a coordination-centric debugging approach and programming methodology to facilitate the debugging of distributed software environments.
  • This approach includes using “cooperative execution,” in which the software produces event traces to be analyzed by a debugging host.
  • a distributed software environment contains multiple processing elements, which are loaded with software, as well as sensors, displays, and other hardware.
  • these processing elements contain software programs that generate corresponding event records in response to selected events.
  • a software program includes components, runtime systems, coordinators, interfaces, and runtime debugging architectures.
  • the coordinator manages control and data flow interactions between components, and the interface between the coordinator and a component facilitates the exposure of an event.
  • the runtime system of a processing element collects the event record and transfers it to the runtime debugging architecture.
  • the runtime debugging architecture facilitates the transfer of the event record from the processing element to a debugging host, along a communication channel. The communication channel facilitates either a direct or an indirect connection between the processing element and the debugging host.
  • the runtime system of a processing element sends an event record to the runtime debugging architecture, which in turn connects to the debugging host and transfers the event record to the debugging host, which queues all event records and writes them to disk.
  • an event record is routed from a primary runtime debugging architecture to an intermediate processing element that collects the event record and transfers it to the debugging host.
  • a processing element may include a flash driver that interfaces with a flash memory.
  • the processing element's runtime system sends the event records to a flash driver that facilitates the collection and storage of event records in a flash memory for subsequent distributed debugging (i.e., “post-mortem” debugging).
  • Each event record is time-stamped and causality-stamped by the runtime debugging architecture before it is transferred to the debugging host, the intermediate processing element, or flash memory.
  • Time-stamping and causality-stamping facilitate coordination-centric debugging by stamping each event with a time in which the event was generated and with an identification of the cause of the event, allowing designers to debug a complex distributed software environment without having to guess at the proper event ordering.
  • any event made visible to the runtime system of a distributed software environment is a candidate for recording.
  • a distributed software environment contains instrumented distributed code, which produces the events to be captured by the runtime system.
  • the code includes information about how different software components interact. Instrumentation may consist of inserting event recording calls at each significant source line, which can cause a probe effect in the software but which effect can be minimized by selectively focusing the instrumentation. Also, events often occur in specific, predetermined, partially ordered sequences; in some instances, a token representing the sequence may be recorded, rather than every single event in the sequence, thereby leaving the debugging host to expand the token.
  • the coordination-centric debugging approach makes complex distributed software environments more debuggable.
  • the debugging approach of the present invention facilitates the debugging of complex distributed systems and lets a designer isolate the cause, or causes, of unexpected system performance without having looked at, or modified, any of the underlying source code and avoid the time-consuming and error-prone process of source level debugging.
  • FIG. 1 is a component in accordance with the present invention.
  • FIG. 2 is the component of FIG. 1 further having a set of coordination interfaces.
  • FIG. 3A is a prior art round-robin resource allocation protocol with a centralized controller.
  • FIG. 3B is a prior art round-robin resource allocation protocol implementing a token-passing scheme.
  • FIG. 4A is a detailed view of a component and a coordination interface connected to the component for use in round-robin resource allocation in accordance with the present invention.
  • FIG. 4B depicts a round-robin coordinator in accordance with the present invention.
  • FIG. 5 shows several typical ports for use in a coordination interface in accordance with the present invention.
  • FIG. 6A is a unidirectional data transfer coordinator in accordance with the present invention.
  • FIG. 6B is a bidirectional data transfer coordinator in accordance with the present invention.
  • FIG. 6C is a state unification coordinator in accordance with the present invention.
  • FIG. 6D is a control state mutex coordinator in accordance with the present invention.
  • FIG. 7 is a system for implementing subsumption resource allocation having components, a shared resource, and a subsumption coordinator.
  • FIG. 8 is a barrier synchronization coordinator in accordance with the present invention.
  • FIG. 9 is a rendezvous coordinator in accordance with the present invention.
  • FIG. 10 depicts a dedicated RPC system having a client, a server, and a dedicated RPC coordinator coordinating the activities of the client and the server.
  • FIG. 11 is a compound coordinator with both preemption and round-robin coordination for controlling the access of a set of components to a shared resource.
  • FIG. 12A is software system with two data transfer coordinators, each having constant message consumption and generation rules and each connected to a separate data-generating component and connected to the same data-receiving component.
  • FIG. 12B is the software system of FIG. 12A in which the two data transfer coordinators have been replaced with a merged data transfer coordinator.
  • FIG. 13 is a system implementing a first come, first served resource allocation protocol in accordance with the present invention.
  • FIG. 14 is a system implementing a multiclient RPC coordination protocol formed by combining the first come, first served protocol of FIG. 13 with the dedicated RPC coordinator of FIG. 10.
  • FIG. 15 depicts a large system in which the coordination-centric design methodology can be employed having a wireless device interacting with a cellular network.
  • FIG. 16 shows a top-level view of the behavior and components of a system for a cell phone.
  • FIG. 17A is a detailed view of a GUI component of the cell phone of FIG. 16.
  • FIG. 17B is a detailed view of a call log component of the cell phone of FIG. 16.
  • FIG. 18A is a detailed view of a voice subsystem component of the cell phone of FIG. 16.
  • FIG. 18B is a detailed view of a connection component of the cell phone of FIG. 16.
  • FIG. 19 depicts the coordination layers between a wireless device and a base station, and between the base station and a switching center, of FIG. 15.
  • FIG. 20 depicts a cell phone call management component, a master switching center call management component, and a call management coordinator connecting the respective call management components.
  • FIG. 21A is a detailed view of a transport component of the connection component of FIG. 18B.
  • FIG. 21B is a CDMA data modulator of the transport component of FIG. 18B.
  • FIG. 22 is a detailed view of a typical TDMA and a typical CDMA signal for the cell phone of FIG. 16.
  • FIG. 23A is a LCD touch screen component for a Web browser GUI for a wireless device.
  • FIG. 23B is a Web page formatter component for the Web browser GUI for the wireless device.
  • FIG. 24A is a completed GUI system for a handheld Web browser.
  • FIG. 24B shows the GUI system for the handheld Web browser combined with the connection subsystem of FIG. 18B in order to access the cellular network of FIG. 15.
  • FIG. 25 is a typical space/time diagram with space represented on a vertical axis and time represented on a horizontal axis.
  • FIG. 26 is a space/time diagram depicting a set of system events and two different observations of those system events.
  • FIG. 27 is a space/time diagram depicting a set of system events and an ideal observation of the events taken by a real-time observer.
  • FIG. 28 is a space/time diagram depicting two different yet valid observations of a system execution.
  • FIG. 29 is a space/time diagram depicting a system execution and an observation of that execution take by a discrete lamport observer.
  • FIG. 30 is a space/time diagram depicting a set of events that each include a lamport time stamp.
  • FIG. 31 is a space/time diagram illustrating the insufficiency of scalar timestamps to characterize causality between events.
  • FIG. 32 is a space/time diagram depicting a set of system events that each include a vector time stamp.
  • FIG. 33 depicts a display from a partial order event tracer (POET).
  • POET partial order event tracer
  • FIG. 34 is a space/time diagram depicting two compound events that are neither causal nor concurrent.
  • FIG. 35 is a POET display of two convex event clusters.
  • FIG. 36 is a basis for distributed event environments (BEE) abstraction facility for a single client.
  • BEE distributed event environments
  • FIG. 37 is a hierarchical tree construction of process clusters.
  • FIG. 38A depicts a qualitative measure of cohesion and coupling between a set of process clusters that have heavy communication or are instantiated from the same source code.
  • FIG. 38B depicts a qualitative measure of cohesion and coupling between a set of process clusters that do not have heavy communication or are not instances of the same source code.
  • FIG. 38C depicts a qualitative measure of cohesion and coupling between an alternative set of process clusters that have heavy communication or are instantiated from the same source code.
  • FIG. 39 depicts a consistent and an inconsistent cut of a system execution on a space/time diagram.
  • FIG. 40A is a space/time diagram depicting a system execution.
  • FIG. 40B is a lattice representing all possible consistent cuts of the space/time diagram of FIG. 40A.
  • FIG. 40C is a graphical representation of the possible consistent cuts of FIG. 40B.
  • FIG. 41A is a space/time diagram depicting a system execution.
  • FIG. 41B is the space/time diagram of FIG. 41A after performing a global-step.
  • FIG. 41C is the space/time diagram of FIG. 41A after performing a step-over.
  • FIG. 41D is the space/time diagram of FIG. 41A after performing a step-in.
  • FIG. 42 is a space/time diagram depicting a system that is subject to a domino effect whenever the system is rolled back in time to a checkpoint.
  • FIG. 43 depicts the coordination-centric debugging approach in accordance with the present invention.
  • FIG. 44 is a detailed view of a direct connection between the primary processing element and the debugging host in accordance with the present invention.
  • FIG. 45 is a detailed view of an indirect connection between the primary processing element and the debugging host in accordance with the present invention.
  • FIG. 46 depicts capturing event records in flash memory for post-mortem distributed debugging in accordance with the present invention.
  • FIG. 47 shows how flash memory may be allocated.
  • FIG. 48 shows a distributed software environment being executed on a hardware platform and placement of a probe for monitoring bus traces on the platform and for generating event records.
  • FIG. 1 is an example of a component 100 , which is the basic software element within the coordination-centric design framework, in accordance with the present invention.
  • component 100 contains a set of modes 102 .
  • Each mode 102 corresponds to a specific behavior associated with component 100 .
  • Each mode 102 can either be active or inactive, respectively enabling or disabling the behavior corresponding to that mode 102 .
  • Modes 102 can make the conditional aspects of the behavior of component 100 explicit.
  • the behavior of component 100 is encapsulated in a set of actions 104 , which are discrete, event-triggered behavioral elements within the coordination-centric design methodology.
  • Component 100 can be copied and the copies of component 100 can be modified, providing the code-sharing benefits of inheritance.
  • Actions 104 are enabled and disabled by modes 102 , and hence can be thought of as effectively being properties of modes 102 .
  • An event (not shown) is an instantaneous condition, such as a timer tick, a data departure or arrival, or a mode change.
  • Actions 104 can activate and deactivate modes 102 , thereby selecting the future behavior of component 100 . This is similar to actor languages, in which methods are allowed to replace an object's behavior.
  • FIG. 2 is component 100 further including a first coordination interface 200 , a second coordination interface 202 , and a third coordination interface 204 .
  • Coordination-centric design's components 100 provide the code-sharing capability of object-oriented inheritance through copying. Another aspect of object-oriented inheritance is polymorphism through shared interfaces. In object-oriented languages, an object's interface is defined by its methods. Although coordination-centric design's actions 104 are similar to methods in object-oriented languages, they do not define the interface for component 100 . Components interact through explicit and separate coordination interfaces, in this figure coordination interfaces 200 , 202 , and 204 .
  • the shape of coordination interfaces 200 , 202 , and 204 determines the ways in which component 100 may be connected within a software system.
  • the way coordination interfaces 200 , 202 , and 204 are connected to modes 102 and actions 104 within component 100 determines how the behavior of component 100 can be managed within a system. Systemwide behavior is managed through coordinators (see FIG. 4B and subsequent).
  • coordination refers to the predetermined ways by which components interact.
  • resource allocation One simple protocol for this is round-robin: participants are lined up, and the resource is given to each participant in turn. After the last participant is served, the resource is given back to the first. There is a resource-scheduling period during which each participant gets the resource exactly once, whether or not it is needed.
  • FIG. 3A is prior art round-robin resource allocation protocol with a centralized controller 300 , which keeps track of and distributes the shared resource (not shown) to each of software elements 302 , 304 , 306 , 308 , and 310 in turn.
  • controller 300 alone determines which software element 302 , 304 , 306 , 308 , or 310 is currently allowed to use the resource and which has it next.
  • This implementation of a round-robin protocol permits software elements 302 , 304 , 306 , 308 , and 310 to be modular, because only controller 300 keeps track of the software elements.
  • controller 300 when this implementation is implemented on a distributed architecture (not shown), controller 300 must typically be placed on a single processing element (not shown). As a result, all coordination requests must go through that processing element, which can cause a communication performance bottleneck. For example, consider the situation in which software elements 304 and 306 are implemented on a first processing element (not shown) and controller 300 is implemented on a second processing element. Software element 304 releases the shared resource and must send a message indicating this to controller 300 . Controller 300 must then send a message to software element 306 to inform software element 306 that it now has the right to the shared resource.
  • the shared resource must remain idle, even though both the current resource holder and the next resource holder (software elements 304 and 306 respectively) are implemented on the first processing element (not shown).
  • the shared resource must typically remain idle until communication can take place and controller 300 can respond. This is an inefficient way to control access to a shared resource.
  • FIG. 3B is a prior art round-robin resource allocation protocol implementing a token passing scheme.
  • this system consists of a shared resource 311 and a set of software elements 312 , 314 , 316 , 318 , 320 , and 322 .
  • a logical token 324 symbolizes the right to access resource 311 , i.e., when a software element holds token 324 , it has the right to access resource 311 .
  • one of software elements 312 , 314 , 316 , 318 , 320 , or 322 finishes with resource 311 , it passes token 324 , and with token 324 the access right, to a successor.
  • This implementation can be distributed without a centralized controller, but as shown in FIG. 3B, this is less modular, because it requires each software element in the set to keep track of a successor.
  • the coordination-centric design methodology provides an encapsulating formalism for coordination.
  • Components such as component 100 interact using coordination interfaces, such as first, second, and third coordination interfaces 200 , 202 , and 204 , respectively.
  • Coordination interfaces preserve component modularity while exposing any parts of a component that participate in coordination. This technique of connecting components provides polymorphism in a similar fashion to subtyping in object-oriented languages.
  • FIG. 4A is a detailed view of a component 400 and a resource access coordination interface 402 connected to component 400 for use in a round-robin coordination protocol in accordance with the present invention.
  • resource access coordination interface 402 facilitates implementation of a round-robin protocol that is similar to the token-passing round-robin protocol described above.
  • Resource access coordination interface 402 has a single bit of control state, called access, which is shown as an arbitrated control port 404 that indicates whether or not component 400 is holding a virtual token (not shown).
  • Component 400 can only use a send message port 406 on access coordination interface 402 when arbitrated control port 404 is true.
  • Access coordination interface 402 further has a receive message port 408 .
  • FIG. 4B show a round-robin coordinator 410 in accordance with the present invention.
  • round-robin coordinator 410 has a set of coordinator coordination interfaces 412 for connecting to a set of components 400 .
  • Each component 400 includes a resource access coordination interface 402 .
  • Each coordinator coordination interface 412 has a coordinator arbitrated control port 414 , an incoming send message port 416 and an outgoing receive message port 418 .
  • Coordinator coordination interface 412 in complimentary to resource access coordination interface 402 , and vice versa, because the ports on the two interfaces are compatible and can function to transfer information between the two interfaces.
  • the round-robin protocol requires round-robin coordinator 410 to manage the coordination topology.
  • Round-robin coordinator 410 is an instance of more general abstractions called coordination classes, in which coordination classes define specific coordination protocols and a coordinator is a specific implementation of the coordination class.
  • Round-robin coordinator 410 contains all information about how components 400 are supposed to coordinate. Although round-robin coordinator 410 can have a distributed implementation, no component 400 is required to keep references to any other component 400 (unlike the distributed round-robin implementation shown in FIG. 3B). All required references are maintained by round-robin coordinator 410 itself, and components 400 do not even need to know that they are coordinating through round-robin.
  • Resource access coordination interface 402 can be used with any coordinator that provides the appropriate complementary interface.
  • a coordinator's design is independent of whether it is implemented on a distributed platform or on a monolithic single processor platform.
  • Coordination interfaces are used to connect components to coordinators. They are also the principle key to a variety of useful runtime debugging techniques. Coordination interfaces support component modularity by exposing all parts of the component that participate in the coordination protocol. Ports are elements of coordination interfaces, as are guarantees and requirements, each of which will be described in turn.
  • a port is a primitive connection point for interconnecting components.
  • Each port is a five-tuple (T; A; Q; D; R) in which:
  • T represents the data type of the port.
  • T can be one of int, boolean, char, byte, float, double, or cluster, in which cluster represents a cluster of data types (e.g., an int followed by a float followed by two bytes).
  • A is a boolean value that is true if the port is arbitrated and false otherwise.
  • Q is an integer greater than zero that represents logical queue depth for a port.
  • D is one of in, out, inout, or custom and represents the direction data flows with respect to the port.
  • R is one of discard-on-read, discard-on-transfer, or hold and represents the policy for data removal on the port.
  • Discard-on-read indicates that data is removed immediately after it is read (and any data in the logical queue are shifted)
  • discard-on-transfer indicates that data is removed from a port immediately after being transferred to another port
  • hold indicates that data should be held until it is overwritten by another value. Hold is subject to arbitration.
  • Custom directionality allows designers to specify ports that accept or generate only certain specific values. For example, a designer may want a port that allows other components to activate, but not deactivate, a mode. While many combinations of port attributes are possible, we normally encounter only a few. The three most common are message ports (output or input), state ports (output, input, or both; sometimes arbitrated), and control ports (a type of state port).
  • FIG. 5 illustrates the visual syntax used for several common ports throughout this application. With reference to FIG. 5, this figure depicts an exported state port 502 , an imported state port 504 , an arbitrated state port 506 , an output data port 508 , and an input data port 510 .
  • Message ports (output and input) data ports 508 and 510 respectively) are either send (T; false; 1; out; discard-on-transfer) or receive (T; false; Q; in; discard-on-read). Their function is to transfer data between components. Data passed to a send port is transferred immediately to the corresponding receive port, thus it cannot be retrieved from the send port later. Receive data ports can have queues of various depths. Data arrivals on these ports are frequently used to trigger and pass data parameters into actions. Values remain on receive ports until they are read.
  • State ports take one of three forms:
  • State ports such as exported state port 502 , imported state port 504 , and arbitrated state port 506 , hold persistent values, and the value assigned to a state port may be arbitrated. This means that, unlike message ports, values remain on the state ports until changed.
  • arbitration coordinator not shown.
  • Control ports are similar to state ports, but a control port is limited to having the boolean data type. Control ports are typically bound to modes. Actions interact with a control port indirectly, by setting and responding to the values of a mode that is bound to the control port.
  • arbitrated control port 404 shown in FIG. 4A is a control port that can be bound to a mode (not shown) containing all actions that send data on a shared channel.
  • the mode is inactive, disabling all actions that send data on the channel.
  • Guarantees are formal declarations of invariant properties of a coordination interface. There can be several types of guarantees, such as timing guarantees between events, guarantees between control state (e.g., state A and state B are guaranteed to be mutually exclusive), etc.
  • a coordination interface's guarantees reflect properties of the component to which the coordination interface is connected, the guarantees are not physically bound to any internal portions of the component. Guarantees can often be certified through static analysis of the software system. Guarantees are meant to cache various properties that are inherent in a component or a coordinator in order to simplify static analysis of the software system.
  • a guarantee is a promise provided by a coordination interface.
  • the guarantee takes the form of a predicate promised to be invariant.
  • guarantees can include any type of predicate (e.g., x>3, in which x is an integer valued state port, or t ea ⁇ t eb ⁇ 2 ms).
  • guarantees will be only event-ordering guarantees (guarantees that specify acceptable orders of events) or control-relationship guarantees (guarantees pertaining to acceptable relative component behaviors).
  • a requirement is a formal declaration of the properties necessary for correct software system functionality.
  • An example of a requirement is a required response time for a coordination interface-the number of messages that must have arrived at the coordination interface before the coordination interface can transmit, or fire, the messages.
  • the requirements of the first coordination interface must be conservatively matched by the guarantees of the second coordination interface (e.g., x ⁇ 7 as a guarantee conservatively matches x ⁇ 8 as a requirement).
  • guarantees are not physically bound to anything within the component itself. Guarantees can often be verified to be sufficient for the correct operation of the software system in which the component is used.
  • a requirement is a predicate on a first coordination interface that must be conservatively matched with a guarantee on a complementary second coordination interface.
  • a coordination interface is a four-tuple (P; G; R; I) in which:
  • P is a set of named ports.
  • G is a set of named guarantees provided by the interface.
  • R is a set of named requirements that must be matched by guarantees of connected interfaces.
  • I is a set of named coordination interfaces.
  • Coordinator coordination interface 412 shown in FIG. 4B, used for round-robin coordination is called AccessInterface and is defined in Table 1.
  • a recursive coordination interface descriptor which is a five-tuple (P a ; G a ; R a ; I d ; N d ) in which:
  • P a is a set of abstract ports, which are ports that may be incomplete in their attributes (i.e., they do not yet have a datatype).
  • G a is a set of abstract guarantees, which are guarantees between abstract ports.
  • R a is a set of abstract requirements, which are requirements between abstract ports.
  • I d is a set of coordination interface descriptors.
  • Allowing coordination interfaces to contain other coordination interfaces is a powerful feature. It lets designers use common coordination interfaces as complex ports within other coordination interfaces. For example, the basic message ports described above are nonblocking, but we can build a blocking coordination interface (not shown) that serves as a blocking port by combining a wait state port with a message port.
  • a coordinator provides the concrete representations of intercomponent aspects of a coordination protocol. Coordinators allow a variety of static analysis debugging methodologies for software systems created with the coordination-centric design methodology.
  • a coordinator contains a set of coordination interfaces and defines the relationships the coordination interfaces. The coordination interfaces complement the component coordination interfaces provided by components operating within the protocol. Through matched interface pairs, coordinators effectively describe connections between message ports, correlations between control states, and transactions between components.
  • round-robin coordinator 410 shown in FIG. 4B, must ensure that only one component 400 has its component control port 404 's value, or its access bit, set to true. Round-robin coordinator 410 must further ensure that the correct component 400 has its component control port 404 set to true for the chosen sequence.
  • This section presents formal definitions of the parts that comprise coordinators: modes, actions, bindings, action triples, and constraints. These definitions culminate in a formal definition of coordinators.
  • a mode is a boolean value that can be used as a guard on an action.
  • the mode is most often bound to a control port in a coordination interface for the coordinator.
  • the modes of concern are bound to a coordinator control port 414 of each coordinator coordination interface 412 .
  • An action is a primitive behavioral element that can:
  • Actions can range in complexity from simple operations up to complicated pieces of source code.
  • An action in a coordinator is called a transparent action because the effects of the action can be precomputed and the internals of the action are completely exposed to the coordination-centric design tools.
  • Bindings connect input ports to output ports, control ports to modes, state ports to variables, and message ports to events. Bindings are transparent and passive. Bindings are simply conduits for event notification and data transfer. When used for event notification, bindings are called triggers.
  • an action must be enabled by a mode and triggered by an event.
  • the combination of a mode, trigger, and action is referred to as an action triple, which is a triple (m; t; a) in which:
  • m is a mode.
  • t is a trigger.
  • a is an action.
  • the trigger is a reference to an event type, but it can be used to pass data into the action.
  • Action triples are written: mode: trigger: action
  • a coordinator's actions are usually either pure control, in which both the trigger and action performed affect only control state, or pure data, in which both the trigger and action performed occur in the data domain.
  • pure control in which both the trigger and action performed affect only control state
  • pure data in which both the trigger and action performed occur in the data domain.
  • constraints are boolean relationships between control ports. They take the form:
  • a constraint differs from a guarantee in that the guarantee is limited to communicating in-variant relationships between components without providing a way to enforce the in-variant relationship.
  • the constraint is a set of instructions to the runtime system dealing with how to enforce certain relationships between components.
  • two corrective actions are available to the system: (1) modify the values on the left-hand side to make the left-hand expression evaluate as false (an effect termed backpressure) or (2) alter the right-hand side to make it true.
  • LHM left-hand modify
  • RHM right-hand modify
  • Round-robin coordinator 410 has a set of safety constraints to ensure that there is never more than one token in the system:
  • coordinators can be hierarchically composed.
  • a coordinator is a six-tuple (I; M; B; N; A; X) in which:
  • I is a set of coordination interfaces.
  • M is a set of modes.
  • B is a set of bindings between interface elements (e.g., control ports and message ports) and internal elements (e.g., modes and triggers).
  • interface elements e.g., control ports and message ports
  • internal elements e.g., modes and triggers
  • N is a set of constraints between interface elements.
  • A is a set of action triples for the coordinator.
  • X is a set of subcoordinators.
  • FIGS. 6A, 6B, 6 C, and 6 D show a few simple coordinators highlighting the bindings and constraints of the respective coordinators.
  • a unidirectional data transfer coordinator 600 transfers data in one direction between two components (not shown) by connecting incoming receive message port 408 to outgoing receive message port 418 with a binding 602 .
  • bidirectional data transfer coordinator 604 transfers data back and forth between two components (not shown) by connecting incoming receive message port 408 to outgoing receive message port 418 with binding 602 and connecting send message port 406 to incoming send message port 416 with a second binding 602 .
  • Unidirectional data transfer coordinator 600 and bidirectional data transfer coordinator 604 simply move data from one message port to another. Thus, each coordinator consists of bindings between corresponding ports on separate coordination interfaces.
  • state unification coordinator 606 ensures that a state port a 608 and a state port b 610 are always set to the same value. State unification coordinator 606 connects state port a 608 to state port b 610 with binding 602 .
  • control state mutex coordinator 612 has a first constraint 618 and a second constraint 620 as follows:
  • Constraints 618 and 620 can be restated as follows:
  • a state port c 614 having a true value implies that a state port d 616 has a false value
  • State port d 616 having a true value implies that state port c 614 has a false value.
  • a coordinator has two types of coordination interfaces: up interfaces that connect the coordinator to a second coordinator, which is at a higher level of design hierarchy and down interfaces that connect the coordinator either to a component or to a third coordinator, which is at a lower level of design hierarchy. Down interfaces have names preceded with “ ⁇ ”.
  • Round-robin coordinator 410 has six down coordination interfaces (previously referred to as coordinator coordination interface 412 ), with constraints that make the turning off of any coordinator control port 414 (also referred to as access control port) turn on the coordinator control port 414 of the next coordinator coordination interface 412 in line. Table 2 presents all constituents of the round-robin coordinator.
  • This tuple describes an implementation of a round-robin coordination protocol for a particular system with six components, as shown in round-robin coordinator 410 .
  • the coordination class is a six-tuple (Ic; Mc; Bc; Nc; Ac; Xc) in which:
  • Ic is a set of coordination interface descriptors in which each descriptor provides a type of coordination interface and specifies the number of such interfaces allowed within the coordination class.
  • Mc is a set of abstract modes that supplies appropriate modes when a coordination class is instantiated with a fixed number of coordinator coordination interfaces.
  • Bc is a set of abstract bindings that forms appropriate bindings between elements when the coordination class is instantiated.
  • Nc is a set of abstract constraints that ensures appropriate constraints between coordination interface elements are in place as specified at instantiation.
  • Xc is a set of coordination classes (hierarchy).
  • a coordinator describes coordination protocol for a particular application, it requires many aspects, such as the number of coordination interfaces and datatypes, to be fixed.
  • Coordination classes describe protocols across many applications.
  • the use of the coordination interface descriptors instead of coordination interfaces lets coordination classes keep the number of interfaces and datatypes undetermined until a particular coordinator is instantiated.
  • a round-robin coordinator contains a fixed number of coordinator coordination interfaces with specific bindings and constraints between the message and state ports on the fixed number of coordinator coordination interfaces.
  • a round-robin coordination class contains descriptors for the coordinator coordination interface type, without stating how many coordinator coordination interfaces, and instructions for building bindings and constraints between ports on the coordinator coordination interfaces when a particular round-robin coordinator is created.
  • a component is a six-tuple (I; A; M; V; S; X) in which:
  • I is a set of coordination interfaces.
  • A is a set of action triples.
  • M is a set of modes.
  • V is a set of typed variables.
  • S is a set of subcomponents.
  • X is a set of coordinators used to connect the subcomponents to each other and to the coordination interfaces.
  • Actions within a coordinator are fairly regular, and hence a large number of actions can be described with a few simple expressions. However, actions within a component are frequently diverse and can require distinct definitions for each individual action. Typically, a component's action triples are represented with a table that has three columns: one for the mode, one for the trigger, and one for the action code. Table 3 shows some example actions from a component that can use round-robin coordination.
  • Mode Trigger Action access tick AccessInterface.s.send(“Test message”); -access; access tick waitCount + +;
  • a component resembles a coordinator in several ways (for example, the modes and coordination interfaces in each are virtually the same).
  • Components can have internal coordinators, and because of the internal coordinators, components do not always require either bindings or constraints.
  • aspects of components are described in greater detail. Theses aspects of components include variable scope, action transparency, and execution semantics for systems of actions.
  • An action within a component can be either a transparent action or an opaque action.
  • Transparent and opaque actions each have different invocation semantics.
  • the internal properties, i.e. control structures, variable, changes in state, operators, etc., of transparent actions are visible to all coordination-centric design tools.
  • the design tools can separate, observe, and analyze all the internal properties of opaque actions.
  • Opaque actions are source code. Opaque actions must be executed directly, and looking at the internal properties of opaque actions can be accomplished only through traditional, source-level debugging techniques.
  • An opaque action must explicitly declare any mode changes and coordination interfaces that the opaque action may directly affect.
  • An action is triggered by an event, such as data arriving or departing a message port, or changes in value being applied to a state port.
  • An action can change the value of a state port, generate an event, and provide a way for the software system to interact with low-level device drivers. Since actions typically produce events, a single trigger can be propagated through a sequence of actions.
  • a subsumption protocol is a priority-based, preemptive resource allocation protocol commonly used in building small, autonomous robots, in which the shared resource is the robot itself.
  • FIG. 7 shows a set of coordination interfaces and a coordinator for implementing the subsumption protocol.
  • a subsumption coordinator 700 has a set of subsumption coordinator coordination interfaces 702 , which have a subsume arbitrated coordinator control port 704 and an incoming subsume message port 706 .
  • Each subsume component 708 has a subsume component coordination interface 710 .
  • Subsume component coordination interface 710 has a subsume arbitrated component control port 712 and an outgoing subsume message port 714 .
  • Subsumption coordinator 700 and each subsume component 708 are connected by their respective coordination interfaces, 702 and 710 .
  • Each subsumption coordinator coordination interface 702 in subsumption coordinator 700 is associated with a priority.
  • Each subsume component 708 has a behavior that can be applied to a robot (not shown). At any time, any subsume component 708 can attempt to assert its behavior on the robot. The asserted behavior coming from the subsume component 708 connected to the subsumption coordinator coordination interface 702 with the highest priority is the asserted behavior that will actually be performed by the robot.
  • Subsume components 708 need not know anything about other components in the system. In fact, each subsume component 708 is designed to perform independently of whether their asserted behavior is performed or ignored.
  • Subsumption coordinator 700 further has a slave coordinator coordination interface 716 , which has an outgoing slave message port 718 .
  • Outgoing slave message port 718 is connected to an incoming slave message port 720 .
  • Incoming slave message port 720 is part of a slave coordination interface 722 , which is connected to a slave 730 .
  • slave 730 which typically controls the robot
  • a lower-priority subsume component 708 already using the resource must surrender the resource whenever a higher-priority subsume component 708 tries to access the resource.
  • Subsumption coordination uses preemptive release semantics, whereby each subsume component 708 must always be prepared to relinquish the resource.
  • Table 4 presents the complete tuple for the subsumption coordinator.
  • FIG. 8 depicts a barrier synchronization coordinator 800 .
  • barrier synchronization coordinator 800 has a set of barrier synchronization coordination interfaces 802 , each of which has a coordinator arbitrated state port 804 , named wait.
  • Coordinator arbitrated state port 804 is connected to a component arbitrated state port 806 , which is part of a component coordination interface 808 .
  • Component coordination interface 808 is connected to a component 810 . When all components 810 reach their respective synchronization points, they are all released from waiting.
  • the actions for a barrier synchronization coordinator with n interfaces are: ⁇ 0 ⁇ i ⁇ n ⁇ wait i : ⁇ : ⁇ 0 ⁇ j ⁇ n ⁇ - wait j
  • FIG. 9 depicts a rendezvous coordinator 900 in accordance with the present invention.
  • rendezvous coordinator 900 has a rendezvous coordination interface 902 , which has a rendezvous arbitrated state port 904 .
  • a set of rendezvous components 906 each of which may perform different functions or have vastly different actions and modes, has a rendezvous component coordination interface 908 , which includes a component arbitrated state port 910 .
  • Rendezvous components 906 connect to rendezvous coordinator 900 through their respective coordination interfaces, 908 and 902 .
  • Rendezvous coordinator 900 further has a rendezvous resource coordination interface 912 , which has a rendezvous resource arbitrated state port 914 , also called available.
  • a resource 916 has a resource coordination interface 918 , which has a resource arbitrated state port 920 .
  • Resource 916 is connected to rendezvous coordinator 900 by their complementary coordination interfaces, 918 and 912 respectively.
  • rendezvous-style coordination there are two types of participants: resource 916 and several resource users, here rendezvous components 916 .
  • resource 916 When resource 916 is available, it activates its resource arbitrated state port 920 , also referred to as its available control port. If there are any waiting rendezvous components 916 , one will be matched with the resource; both participants are then released. This differs from subsumption and round-robin in that resource 916 plays an active role in the protocol by activating its available control port 920 .
  • rendezvous coordinator 900 The actions for rendezvous coordinator 900 are:
  • FIG. 10 depicts a dedicated RPC system.
  • a dedicated RPC coordinator 1000 has an RPC server coordination interface 1002 , which includes an RPC server imported state port 1004 , an RPC server output message port 1006 , and an RPC server input message port 1008 .
  • Dedicated RPC coordinator 1000 is connected to a server 1010 .
  • Server 1010 has a server coordination interface 1012 , which has a server exported state port 1014 , a server input data port 1016 , and a server output data port 1018 .
  • Dedicated RPC coordinator 1000 is connected to server 1010 through their complementary coordination interfaces, 1002 and 1012 respectively.
  • Dedicated RPC coordinator 1000 further has an RPC client coordination interface 1020 , which includes an RPC client imported state port 1022 , an RPC client input message port 1024 , and an RPC client output message port 1026 .
  • Dedicated RPC coordinator 1000 is connected to a client 1028 by connecting RPC client coordination interface 1020 to a complementary client coordination interface 1030 .
  • Client coordination interface 1030 has a client exported state port 1032 , a client output message port 1034 , and a client input message port 1036 .
  • the dedicated RPC protocol has a client/server protocol in which server 1010 is dedicated to a single client, in this case client 1028 .
  • server 1010 is dedicated to a single client, in this case client 1028 .
  • client 1028 the temporal behavior of this protocol is the most important factor in defining it.
  • the following transaction listing describes this temporal behavior:
  • Client 1028 enters blocked mode by changing the value stored at client exported state port 1032 to true.
  • Client 1028 transmits an argument data message to server 1010 via client output message port 1034 .
  • Server 1010 receives the argument (labeled “a”) data message via server input data port 1016 and enters serving mode by changing the value stored in server exported state port 1014 to true.
  • Server 1010 computes return value.
  • Server 1010 transmits a return (labeled “r”) message to client 1020 via server output data port 1018 and exits serving mode by changing the value stored in server exported state port 1014 to false.
  • Client 1028 receives the return data message via client input message port 1036 and exits blocked mode by changing the value stored at client exported state port 1032 to false.
  • T RPC ⁇ + client . blocked -> client . transmits -> ⁇ + server . serving -> server . transmits -> ⁇ ( - server . serving ⁇ ⁇ ⁇ client . receives ) -> - client . blocked
  • the r in server.r.output refers to the server output data port 1018 , also labeled as the r event port on the server, and the a in serving.a.input refers to server input data port 1016 , also labeled as the a port on the server (see FIG. 10).
  • client 1028 must be in blocked mode whenever it sends an argument message.
  • the first predicate takes the same form as a constraint; however, since dedicated RPC coordinator 1000 only imports the client:blocked and server:serving modes (i.e., through RPC client imported state port 1022 and RPC server imported state port 1004 respectively), dedicated RPC coordinator 1000 is not allowed to alter these values to comply. In fact, none of these predicates is explicitly enforced by a runtime system. However, the last two can be used as requirements and guarantees for interface type-checking.
  • Coordination-centric design methodology lets system specifications be executed directly, according to the semantics described above. When components and coordinators are composed into higher-order structures, however, it becomes essential to consider hazards that can affect system behavior. Examples include conflicting constraints, in which local resolution semantics may either leave the system in an inconsistent state or make it cycle forever, and conflicting actions that undo one another's behavior. In the remainder of this section, the effect of composition issues on system-level executions is explained.
  • a configuration is the combined control state of a system-basically, the set of active modes at a point in time.
  • a configuration in coordination-centric design is a bit vector containing one bit for each mode in the system. The bit representing a control state is true when the control state is active and false when the control state is inactive. Configurations representing the complete system control state facilitate reasoning on system properties and enable several forms of static analysis of system behavior.
  • Triggers are formal parameters for events. As mentioned earlier, there are two types of triggers: (1) control triggers, invoked by control events such as mode change requests, and (2) data flow triggers, invoked by data events such as message arrivals or departures. Components and coordinators can both request mode changes (on the modes visible to them) and generate new messages (on the message ports visible to them). Using actions, these events can be propagated through the components and coordinators in the system, causing a cascade of data transmissions and mode change requests, some of which can cancel other requests. When the requests, and secondary requests implied by them, are all propagated through the system, any requests that have not been canceled are confirmed and made part of the system's new configuration.
  • Triggers can be immediately propagated through their respective actions or delayed by a scheduling step. Recall that component actions can be either transparent or opaque. Transparent actions typically propagate their triggers immediately, although it is not absolutely necessary that they do so. Opaque actions typically must always delay propagation.
  • Some triggers must be immediately propagated through actions, but only on certain types of transparent actions. Immediate propagation can often involve static precomputation of the effect of changes, which means that certain actions may never actually be performed. For example, consider a system with a coordinator that has an action that activates mode A and a coordinator with an action that deactivates mode B whenever A is activated. Static analysis can be used to determine in advance that any event that activates A will also deactivate B; therefore, this effect can be executed immediately without actually propagating it through A.
  • Trigger propagation through opaque actions must typically be delayed, since the system cannot look into opaque actions to precompute their results. Propagation may be delayed for other reasons, such as system efficiency. For example, immediate propagation requires tight synchronization among software components. If functionality is spread among a number of architectural components, immediate propagation is impractical.
  • FIG. 11 shows a combined coordinator 1100 with both preemption and round-robin coordination for controlling access to a resource, as discussed above.
  • components 1102 , 1104 , 1106 , 1108 , and 1110 primarily use round-robin coordination, and each includes a component coordination interface 1112 , which has a component arbitrated control port 1114 and a component output message port 1116 .
  • preemptor component 1120 is allowed to grab the resource immediately.
  • Preemptor component 1120 has a preemptor component coordination interface 1122 .
  • Preemptor component coordination interface 1122 has a preemptor arbitrated state port 1124 , a preemptor output message port 1126 , and a preemptor input message port 1128 .
  • All component coordination interfaces 1112 and preemptor component coordination interface 1122 are connected to a complementary combined coordinator coordination interface 1130 , which has a coordinator arbitrated state port 1132 , a coordinator input message port 1134 , and a coordinator output message port 1136 .
  • Combined coordinator 1100 is a hierarchical coordinator and internally has a round-robin coordinator (not shown) and a preemption coordinator (not shown).
  • Combined coordinator coordination interface 1130 is connected to a coordination interface to round-robin 1138 and a coordination interface to preempt 1140 .
  • Coordinator arbitrated state port 1132 is bound to both a token arbitrated control port 1142 , which is part of coordination interface to round-robin 1138 , and to a preempt arbitrated control port 1144 , which is part of coordination interface to preempt 1140 .
  • Coordinator input message port 1134 is bound to an interface to a round-robin output message port 1146
  • coordinator output message port 1136 is bound to an interface to round-robin input message port 1148 .
  • preemption interferes with the normal round-robin ordering of access to the resource.
  • the resource moves to the component that in round-robin-ordered access would be the successor to preemptor component 1120 . If the resource is preempted too frequently, some components may starve.
  • triggers can be control-based, data-based, or both, and actions can produce both control and data events, control and dataflow aspects of a system are coupled through actions. Through combinations of actions, designers can effectively employ modal data flow, in which relative schedules are switched on and off based on the system configuration.
  • Relative scheduling is a form of coordination. Recognizing this and understanding how it affects a design can allow a powerful class of optimizations. Many data-centric systems (or subsystems) use conjunctive firing, which means that a component buffers messages until a firing rule is matched. When matching occurs, the component fires, consuming the messages in its buffer that caused it to fire and generating a message or messages of its own. Synchronous data flow systems are those in which all components have only firing rules with constant message consumption and generation.
  • FIG. 12A shows a system in which a component N1 1200 is connected to a component N3 1202 by a data transfer coordinator 1204 and a component N2 1206 is connected to component N3 1202 by a second data transfer coordinator 1208 .
  • Component N3 1202 fires when it accumulates three messages on a port c 1210 and two messages on a port d 1212 .
  • component N3 1202 produces two messages on a port o 1214 .
  • Coordination control state tracks the logical buffer depth for these components. This is shown with numbers representing the logical queue depth of each port in FIG. 12.
  • FIG. 12B shows the system of FIG. 12A in which data transfer coordinator 1204 and second data transfer coordinator 1208 have been merged to form a merged data transfer coordinator 1216 .
  • Merging the coordinators in this example provides an efficient static schedule for component firing.
  • Merged data transfer coordinator 1216 fires component N1 1200 three times and component N2 1206 twice.
  • Merged data transfer coordinator 1216 then fires component N3 1202 twice (to consume all messages produced by component N1 1200 and component N2 1206 ).
  • Message rates can vary based on mode. For example, a component may consume two messages each time it fires in one mode and four each time it fires in a second mode. For a component like this, it is often possible to merge schedules on a configuration basis, in which each configuration has static consumption and production rates for all affected components.
  • control and communication synthesis can be employed to automatically transform user-specified coordination to a selected set of standard protocols.
  • Designers may have to manually produce transformations for nonstandard protocols.
  • RPC-style coordination often has multiple clients for individual servers. Here, there is no apparent connection between client and server until one is forged for a transaction. After the connection is forged, however, the coordination proceeds in the same fashion as dedicated RPC.
  • FCFS First Come/First Serve protocol
  • FIG. 13 illustrates a first come/first serve (FCFS) resource allocation protocol, which is a protocol that allocates a shared resource to the requester that has waited longest.
  • FCFS first come/first serve
  • a FCFS component interface 1300 for this protocol has a request control port 1302 , an access control port 1304 and a component outgoing message port 1306 .
  • a FCFS coordinator 1308 for this protocol has a set of FCFS interfaces 1310 that are complementary to FCFS component interfaces 1300 , having a FCFS coordinator request control port 1312 , a FCFS coordinator access port 1314 , and a FCFS coordinator input message port 1316 .
  • FCFS coordinator 1308 asserts the appropriate FCFS coordinator access port 1314 , releasing FCFS coordinator request control port 1312 .
  • FCFS coordinator 1308 uses a rendezvous coordinator and two round-robin coordinators. One round-robin coordinator maintains a list of empty slots in which a component may be enqueued, and the other round-robin coordinator maintains a list showing the next component to be granted access.
  • FCFS coordinator request control port 1312 becomes active
  • FCFS coordinator 1308 begins a rendezvous access to a binder action. When activated, this action maps the appropriate component 1318 to a position in the round-robin queues. A separate action cycles through one of the queues and selects the next component to access the server.
  • FCFS coordinator 1308 attempts to grant access to resource 1320 to the earliest component 1318 having requested resource 1320 , with concurrent requests determined based on the order in the rendezvous coordinator of the respective components 1318 .
  • FIG. 14 depicts a multiclient RPC coordinator 1400 formed by combining FCFS coordinator 1308 with dedicated RPC coordinator 1000 .
  • a set of clients 1402 have a set of client coordination interfaces 1030 , as shown in FIG. 10.
  • multiclient RPC coordinator 1400 has a set of RPC client coordination interfaces 1020 , as shown in FIG. 10.
  • RPC client input message port 1024 of RPC client coordination interface 1020
  • Message transfer action 1403 serves to transfer messages between RPC client input message port 1024 and component outgoing message port 1306 .
  • multiclient RPC coordinator 1400 must negotiate accesses to a server 1404 and keep track of the values returned by server 1404 .
  • Monitor modes are modes that exclude all but a selected set of actions called continuations, which are actions that continue a behavior started by another action.
  • Monitor mode entry must be immediate (at least locally), so that no unexpected actions can execute before they are blocked by such a mode.
  • Each monitor mode has a list of actions that cannot be executed when it is entered.
  • the allowed (unlisted) actions are either irrelevant or are continuations of the action that caused entry into this mode. There are other conditions, as well.
  • This mode requires an exception action if forced to exit. However, this exception action is not executed if the monitor mode is turned off locally.
  • Exception actions are a type of continuation. When in a monitor mode, exception actions respond to unexpected events or events that signal error conditions. For example, multiclient RPC coordinator 1400 can bind client.blocked to a monitor mode and set an exception action on +server.serving. This will signal an error whenever the server begins to work when the client is not blocked for a response.
  • FIG. 15 depicts a large-scale example system under the coordination-centric design methodology.
  • the large scale system is a bimodal digital cellular network 1500 .
  • Network 1500 is for the most part a simplified version of a GSM (global system for mobile communications) cellular network. This example shows in greater detail how the parts of coordination-centric design work together and demonstrates a practical application of the methodology.
  • Network 1500 has two different types of cells, a surface cell 1502 (also referred to as a base station 1502 ) and a satellite cell 1504 . These cells are not only differentiated by physical position, but by the technologies they use to share network 1500 .
  • Satellite cells 1504 use a code division multiple access (CDMA) technology, and surface cells 1502 use a time division multiple access (TDMA) technology.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • Mobile units 1506 or wireless devices, can move between surface cells 1502 , requiring horizontal handoffs between surface cells 1502 .
  • Several surface cells 1502 are typically connected to a switching center 1508 .
  • Switching center 1508 is typically connected to a telephone network or the Internet 1512 .
  • the network In addition to handoffs between surface cells 1502 , the network must be able to hand off between switching centers 1508 .
  • mobile units 1506 leave the TDMA region, they remain covered by satellite cells 1504 via vertical handoffs between cells. Since vertical handoffs require changing protocols as well as changing base stations and switching centers, they can be complicated in terms of control.
  • Numerous embedded systems comprise the overall system.
  • switching center 1508 and base stations, surface cells 1502 are required as part of the network infrastructure, but cellular phones, handheld Web browsers, and other mobile units 1506 may be supported for access through network 1500 .
  • This section concentrates on the software systems for two particular mobile units 1506 : a simple digital cellular phone (shown in FIG. 16) and a handheld Web browser (shown in FIG. 24).
  • These examples require a wide variety of coordinators and reusable components.
  • Layered coordination is a feature in each system, because a function of many subsystems is to perform a layered protocol.
  • this example displays how the hierarchically constructed components can be applied in a realistic system to help manage the complexity of the overall design.
  • FIG. 16 shows a top-level coordination diagram of the behavior of a cell phone 1600 .
  • cell phone 1600 supports digital encoding of voice streams. Before it can be used, it must be authenticated with a home master switching center (not shown). This authentication occurs through a registered master switch for each phone and an authentication number from the phone itself. There are various authentication statuses, such as full access, grey-listed, or blacklisted. For cell phone 1600 , real-time performance is more important than reliability. A dropped packet is not retransmitted, and a late packet is dropped since its omission degrades the signal less than its late incorporation.
  • Each component of cell phone 1600 is hierarchical.
  • a GUI 1602 lets users enter phone numbers while displaying them and query an address book 1604 and a logs component 1606 .
  • Address book 1604 is a database that can map names to phone numbers and vice versa.
  • GUI 1602 uses address book 1604 to help identify callers and to look up phone numbers to be dialed.
  • Logs 1606 track both incoming and outgoing calls as they are dialed.
  • a voice component 1608 digitally encodes and decodes, and compresses and decompresses, an audio signal.
  • a connection component 1610 multiplexes, transmits, receives, and demultiplexes the radio signal and separates out the voice stream and caller identification information.
  • connection component 1610 and a clock 1612 and between logs 1606 and connection component 1610 , are unidirectional data transfer coordinators 600 as described with reference to FIG. 6A.
  • voice component 1608 and connection component 1610 and between GUI 1602 and connection component 1610 , are bidirectional data transfer coordinators 604 , as described with reference to FIG. 6B.
  • state unification coordinator 606 is between clock 1612 and GUI 1602 .
  • GUI 1602 and address book 1604 is a dedicated RPC coordinator 1000 as described with reference to FIG. 10, in which address book 1604 has client 1028 and GUI 1602 has server 1010 .
  • GUI/log coordinator 1614 There is also a custom GUI/log coordinator 1614 between logs 1606 and GUI 1602 .
  • GUI/log coordinator 1614 lets GUI 1602 transfer new logged information through an r output message port 1616 on a GUI coordination interface 1618 to an r input message port 1620 on a log coordination interface 1622 .
  • GUI/log coordinator 1614 also lets GUI 1602 choose current log entries through a pair of c output message ports 1624 on GUI coordination interface 1618 and a pair of c input message ports 1626 on log coordination interface 1622 .
  • Logs 1606 continuously display one entry each for incoming and outgoing calls.
  • FIG. 17A is a detailed view of GUI component 1602 , of FIG. 16.
  • GUI component 1602 has two inner components, a keypad 1700 and a text-based liquid crystal display 1702 , as well as several functions of its own (not shown). Each time a key press occurs, it triggers an action that interprets the press, depending on the mode of the system. Numeric presses enter values into a shared dialing buffer. When a complete number is entered, the contents of this buffer are used to establish a new connection through connection component 1610 . Table 5 shows the action triples for GUI 1602 .
  • Mode Trigger Action Idle numBuffer.append(keypress.val) Send radio.send(numBuffer.val) +outgoingCall Disconnect Nil Leftarrow AddressBook.forward() +lookupMode Rightarrow log.lastcall() +outlog LookupMode Leftarrow AddressBook.forward() Rightarrow AddressBook.backward()
  • An “Addr Coord” coordinator 1704 includes an address book mode (not shown) in which arrow key presses are transformed into RPC calls.
  • FIG. 17B is a detailed view of logs component 1606 , which tracks all incoming and outgoing calls.
  • both GUI component 1602 and connection component 1610 must communicate with logs component 1606 through specific message ports.
  • Those specific message ports include a transmitted number message port 1720 , a received number message port 1722 , a change current received message port 1724 , a change current transmitted message port 1726 , and two state ports 1728 and 1729 for presenting the current received and current transmitted values, respectively.
  • Logs component 1606 contains two identical single-log components: a send log 1730 for outgoing calls and a receive log 1740 for incoming calls.
  • the interface of logs component 1606 is connected to the individual log components by a pair of adapter coordinators, Adap1 1750 and Adap2 1752 .
  • Adap1 1750 has an adapter receive interface 1754 , which has a receive imported state port 1756 and a receive output message port 1758 .
  • Adap1 1750 further has an adapter send interface 1760 , which has a send imported state port 1762 and a send output message port 1764 .
  • state port 1728 is bound to receive imported state port 1756
  • change current received message port 1724 is bound to receive output message port 1758
  • received number message port 1722 is bound to a received interface output message port 1766 on a received number coordination interface 1768
  • change current transmitted message port 1726 is bound to send output message port 1764
  • state port 1729 is bound to Up.rc is bound to send imported state port 1762 .
  • FIG. 18A is a detailed view of voice component 1608 of FIG. 16.
  • Voice component 1608 has a compression component 1800 for compressing digitized voice signals before transmission, a decompression component 1802 for decompressing received digitized voice signals, and interfaces 1804 and 1806 to analog transducers (not shown) for digitizing sound to be transmitted and for converting received transmissions into sound.
  • Voice component 1608 is a pure data flow component containing sound generator 1808 which functions as a white-noise generator, a ring tone generator, and which has a separate port for each on sound generator interface 1810 , and voice compression functionality in the form of compression component 1800 and decompression component 1802 .
  • FIG. 18B is a detailed view of connection component 1610 of FIG. 16.
  • connection component 1610 coordinates with voice component 1608 , logs component 1606 , clock 1612 , and GUI 1602 .
  • connection component 1610 is responsible for coordinating the behavior of cell phone 1600 with a base station that owns the surface cell 1502 (shown in FIG. 15), a switching center 1508 (shown in FIG. 15), and all other phones (not shown) within surface cell 1502 .
  • Connection component 1610 must authenticate users, establish connections, and perform handoffs as needed—including appropriate changes in any low-level protocols (such as a switch from TDMA to CDMA).
  • FIG. 19 depicts a set of communication layers between connection component 1610 of cell phone 1600 and base station 1502 or switching center 1508 .
  • connection component 1610 has several subcomponents, or lower-level components, each of which coordinates with an equivalent, or peer, layer on either base station 1502 or switching center 1508 .
  • the subcomponents of connection component 1610 include a cell phone call manager 1900 , a cell phone mobility manager 1902 , a cell phone radio resource manager 1904 , a cell phone link protocol manager 1906 , and a cell phone transport manager 1908 which is responsible for coordinating access to and transferring data through the shared airwaves TDMA and CDMA coordination.
  • Each subcomponent will be described in detail including how each fits into the complete system.
  • Base station 1502 has a call management coordinator 1910 , a mobility management coordinator 1912 , a radio resource coordinator 1914 (BSSMAP 1915 ), a link protocol coordinator 1916 (SCCO 1917 ), and a transport coordinator 1918 (MTP 1919 ).
  • Switching center 1508 has a switching center call manager 1920 , a switching center mobility manager 1922 , a BSSMAP 1924 , a SCCP 1926 , and an MTP 1928 .
  • FIG. 20 is a detailed view of a call management layer 2000 consisting of cell phone call manager 1900 , which is connected to switching center call manager 1920 by call management coordinator 1910 .
  • call management layer 2000 coordinates the connection between cell phone 1600 and switching center 1508 .
  • Call management layer 2000 is responsible for dialing, paging, and talking.
  • Call management layer 2000 is always present in cell phone 1600 , though not necessarily in Internet appliances (discussed later).
  • Cell phone call manager 1900 includes a set of modes (not shown) for call management coordination that consists of the following modes:
  • Cell phone call manager 1900 has a cell phone call manager interface 2002 .
  • Cell phone call manager interface 2002 has a port corresponding to each of the above modes.
  • the standby mode is bound to a standby exported state port 2010 .
  • the dialing mode is bound to a dialing exported state port 2012 .
  • the RingingRemote mode is bound to a RingingRemote imported state port 2014 .
  • the Ringing mode is bound to a ringing imported state port 2016 .
  • the CallInProgress mode is bound to a CallInProgress arbitrated state port 2018 .
  • Switching center call manager 1920 includes the following modes (not shown) for call management coordination at the switching center:
  • Switching center call manager 1920 has a switching center call manager coordination interface 2040 , which includes a port for each of the above modes within switching center call manager 1920 .
  • switching center 1508 creates a new switching center call manager and establishes a call management coordinator 1910 between cell phone 1600 and switching center call manager 1920 .
  • a mobility management layer authenticates mobile unit 1506 or cell phone 1600 .
  • mobility manager 1902 contacts the switching center 1508 for surface cell 1502 and transfers a mobile unit identifier (not shown) for mobile unit 1506 to switching center 1508 .
  • Switching center 1508 looks up a home motor switching center for mobile unit 1506 and establishes a set of permissions assigned to mobile unit 1506 .
  • This layer also acts as a conduit for the call management layer.
  • the mobility management layer performs handoffs between base stations 1502 and switching centers 1508 based on information received from the radio resource layer.
  • radio resource manager 1904 chooses the target base station 1502 and tracks changes in frequencies, time slices, and CDMA codes. Cell phones may negotiate with up to 16 base stations simultaneously. This layer also identifies when handoffs are necessary.
  • the link layer manages a connection between cell phone 1600 and base station 1502 .
  • link protocol manager 1906 packages data for transfer to base station 1502 from cell phone 1600 .
  • FIG. 21A is a detailed view of transport component 1908 of connection component 1610 .
  • Transport component 1908 has two subcomponents, a receive component 2100 for receiving data and a transmit component 2102 for transmitting data.
  • Each of these subcomponents has two parallel data paths a CDMA path 2104 and a TDMA/FDMA path 2106 for communicating in the respective network protocols.
  • FIG. 21B is a detailed view of a CDMA modulator 2150 , which implements a synchronous data flow data path.
  • CDMA modulator 2150 takes the dot-product of an incoming data signal along path 2152 and a stored modulation code for cell phone 1600 along path 2154 , which is a sequence of chips, which are measured time signals having a value of ⁇ 1 or +1.
  • Transport component 1908 uses CDMA and TDMA technologies to coordinate access to a resource shared among several cell phones 1600 , i.e., the airwaves.
  • Transport components 1908 supersede the FDMA technologies (e.g., AM and FM) used for analog cellular phones and for radio and television broadcasts.
  • FDMA a signal is encoded for transmission by modulating it with a carrier frequency.
  • a signal is decoded by demodulation after being passed through a band pass filter to remove other carrier frequencies.
  • Each base station 1502 has a set of frequencies-chosen to minimize interference between adjacent cells. (The area covered by a cell may be much smaller than the net range of the transmitters within it.)
  • TDMA coordinates access to the airwaves through time slicing.
  • Cell phone 1600 on the network is assigned a small time slice, during which it has exclusive access to the media. Outside of the small time slice, cell phone 1600 must remain silent. Decoding is performed by filtering out all signals outside of the small time slice. The control for this access must be distributed. As such, each component involved must be synchronized to observe the start and end of the small time slice at the same instant.
  • TDMA Time Division Multiple Access
  • Most TDMA systems also employ FDMA, so that instead of sharing a single frequency channel, cell phones 1600 share several channels.
  • the band allocated to TDMA is broken into frequency channels, each with a carrier frequency and a reasonable separation between channels.
  • user channels for the most common implementations of TDMA can be represented as a two-dimensional array, in which the rows represent frequency channels and the columns represent time slices.
  • CDMA is based on vector arithmetic. In a sense, CDMA performs inter-cell-phone coordination using data flow. Instead of breaking up the band into frequency channels and time slicing these, CDMA regards the entire band as an n-dimensional vector space. Each channel is a code that represents a basis vector in this space. Bits in the signal are represented as either 1 or ⁇ 1, and the modulation is the inner product of this signal and a basis vector of mobile unit 1506 or cell phone 1600 . This process is called spreading, since it effectively takes a narrowband signal and converts it into a broadband signal.
  • Demultiplexing is simply a matter of taking the dot-product of the received signal with the appropriate basis vector, obtaining the original 1 or ⁇ 1. With fast computation and the appropriate codes or basis vectors, the signal can be modulated without a carrier frequency. If this is not the case, a carrier and analog techniques can be used to fill in where computation fails. If a carrier is used, however, all units use the same carrier in all cells.
  • FIG. 22 shows TDMA and CDMA signals for four cell phones 1600 .
  • each cell phone 1600 is assigned a time slice during which it can transmit.
  • Cell phone 1 is assigned time slice t0
  • cell phone 2 is assigned time slice t1
  • cell phone 3 is assigned time slice t2
  • cell phone 4 is assigned time slice t3.
  • each cell phone 1600 is assigned a basis vector that it multiplies with its signal.
  • Cell phone 1 is assigned the vector: ( - 1 1 - 1 1 )
  • Cell phone 2 is assigned the vector: ( 1 - 1 1 - 1 )
  • Cell phone 3 is assigned the vector: ( 1 1 - 1 - 1 )
  • Cell phone 4 is assigned the vector: ( - 1 - 1 1 1 )
  • FIG. 23A is a LCD touch screen component 2300 for a Web browser GUI (shown in FIG. 24A) for a wireless device 1506 .
  • a LCD touch screen component 2300 has an LCD screen 2302 and a touch pad 2304 .
  • FIG. 23B is a Web page access component 2350 for fetching and formatting web pages.
  • web access component 2350 has a page fetch subcomponent 2352 and a page format subcomponent 2354 .
  • Web access component 2350 reads hypertext markup language (HTML) from a connection interface 2356 , sends word placement requests to a display interface 2358 , and sends image requests to the connection interface 2356 .
  • Web access component 2350 also has a character input interface to allow users to enter page requests directly and to fill out forms on pages that have forms.
  • HTML hypertext markup language
  • FIG. 24A shows a completed handheld Web browser GUI 2400 .
  • handheld Web browser GUI 2400 has LCD touch screen component 2300 , web access component 2350 , and a pen stroke recognition component 2402 that translates pen strokes entered on touch pad 2304 into characters.
  • FIG. 24B shows the complete component view of a handheld Web browser 2450 .
  • handheld Web browser 2450 is formed by connecting handheld Web browser GUI 2400 to connection component 1610 of cell phone 1600 (described with reference to FIG. 16) with bi-directional data transfer coordinator 604 (described with reference to FIG. 6B).
  • Handheld Web browser 2450 is an example of mobile unit 1506 , and connects to the Internet through the cellular infrastructure described above.
  • handheld Web browser 2450 has different access requirements than does cell phone 1600 .
  • reliability is more important than real-time delivery. Dropped packets usually require retransmission, so it is better to deliver a packet late than to drop it. Real-time issues primarily affect download time and are therefore secondary.
  • handheld Web browser 2450 must coordinate media access with cell phones 1600 , and so it must use the same protocol as cell phones 1600 to connect to the network. For that reason, handheld Web browser 2450 can reuse connection component 1610 from cell phone 1600 .
  • debugging is a simple process. A designer locates the cause of undesired behavior in a system and fixes the cause. In practice, debugging—even of sequential software—remains difficult. Embedded systems are considerably more complicated to debug than sequential software, due to factors such as concurrence, distributed architectures, and real-time concerns. Issues taken for granted in sequential software, like a schedule that determines the order of all events (the program), are nonexistent in a typical distributed system. Locating and fixing bugs in these complex systems requires many factors including an understanding of the thought processes underpinning the design.
  • debugging techniques are event based debugging and state based debugging.
  • Most debugging techniques for general-purpose distributed systems are event based.
  • Event based debugging techniques operate by collecting event traces from individual system components and causally relating those event traces. These techniques require an ability to determine efficiently the causal ordering among any given pair of events. Determining the causal order can be difficult and costly.
  • Events may be primitive, or they may be hierarchical clusters of other events. Primitive events are abstractions for individual local occurrences that might be important to a debugger. Examples of primitive events in sequential programs are variable assignments and subroutine entry or returns. Primitive events for distributed systems include message send and receive events.
  • State-based debugging techniques are less commonly used in debugging distributed systems. State-based debugging techniques typically operate by presenting designers with views or snapshots of process state. Distributed systems are not tightly synchronized, and so these techniques traditionally involve only the state of individual processes. However, state-based debugging techniques can be applied more generally by relaxing the concept of an “instant in time” so that it can be effectively applied to asynchronous processes.
  • An observation is an event record collected by an observer.
  • An observer is an entity that watches the progress of an execution and records events, but does not interfere with the system. To determine the order in which two events occur, an observer must measure them both against a common reference.
  • FIG. 25 shows a typical space/time diagram 2500 , with space represented on a vertical axis 2502 and time represented on a horizontal axis 2504 .
  • space/time diagram 2500 provides a starting point for discussing executions in distributed systems.
  • Space/time diagram 2500 gives us a visual representation for discussing event ordering and for comparing various styles of observation.
  • a set of horizontal world lines 2506 , 2508 , and 2510 each represent an entity that is stationary in space.
  • the entities represented by horizontal world lines 2506 , 2508 , and 2510 are called processes and typically represent software processes in the subject system.
  • the entities can also represent any entity that generates events in a sequential fashion.
  • the spatial separation in the diagram, along vertical axis 2502 represents a virtual space, since several processes might execute on the same physical hardware.
  • a diagonal world line 2512 is called a message and represents discrete communications that pass between two processes.
  • a sphere 2514 represents an event.
  • vertical axis 2502 and horizontal axis 2504 are omitted from any space/time diagrams, unless vertical axis 2502 and horizontal axis 2504 provide additional clarity to a particular figure.
  • FIG. 26 shows a space/time diagram 2600 of two different observations of a single system execution, taken by a first observer 2602 and a second observer 2604 .
  • first observer 2602 and second observer 2604 are entities that record event occurrence.
  • First observer 2602 and second observer 2604 must each receive distinct notifications of each event that occurs and each must record the events in some total order.
  • First observer 2602 and second observer 2604 are represented in space/time diagram 2600 as additional processes, or horizontal world lines. Each event recorded requires a signal from its respective process to both first observer 2602 and second observer 2604 .
  • the signals from an event x 2606 on a process 2608 to both first observer 2602 and second observer 2604 are embodied in messages 2610 and 2612 , respectively.
  • First observer 2602 records event x 2606 as preceding an event y 2614 .
  • second observer 2604 records event y 2614 as preceding event x 2606 .
  • Such effects may be caused by nonuniform latencies within the system.
  • first observer 2602 and second observer 2604 are not equally valid.
  • a valid observation is typically an observation that preserves the order of events that depend on each other.
  • Second observer 2604 records the receipt of a message 2616 before that message is transmitted. Thus the observation from second observer 2604 is not valid.
  • FIG. 27 shows a space/time diagram 2700 for a special, ideal observer, called the real-time observer (RTO) 2702 .
  • RTO 2702 can view each event immediately as it occurs. Due to the limitations of physical clocks, and efficiency issues in employing them, it is usually not practical to implement RTO 2702 . However, RTO 2702 represents an upper bound on precision in event-order determination.
  • FIG. 28 shows a space/time graph 2800 showing two valid observations of a system taken by two separate observers: RTO 2702 and a third observer 2802 .
  • RTO 2702 there is nothing special about the ordering of the observation taken by RTO 2702 .
  • Events d 2804 , e 2806 , and f 2808 are all independent events in this execution. Therefore, the observation produced by RTO 2702 and the observation produced by third observer 2802 can each be used to reproduce equivalent executions of the system. Any observation in which event dependencies are preserved is typically equal in value to an observation by RTO 2702 .
  • real-time distributed systems may need additional processes to emulate timing constraints.
  • FIG. 29 is a space/time diagram 2900 of a methodological observer, called the discrete Lamport Observer (DLO) 2902 , that records each event in a set of ordered bins.
  • DLO 2902 records an event 2904 in an ordered bin 2906 based on the following rule: each event is recorded in the leftmost bin that follows all events on which it depends.
  • DLO 2902 views events discretely and does not need a clock.
  • DLO 2902 does however require explicit knowledge of event dependency. To determine the bin in which each event must be placed, DLO 2902 needs to know the bins of the immediately preceding events.
  • the observation produced by DLO 2902 is also referred to as a topological sort of the system execution's event graph.
  • E is the set of all events in an execution.
  • the immediate predecessor relation, ⁇ E ⁇ E, includes all pairs (e a , e b ) such that:
  • e a is called the immediate predecessor of e b .
  • Each event has at most two immediate predecessors. Therefore, DLO 2902 need only find the bins of at most two records before each placement.
  • the transitive closure of the immediate predecessor relation forms a causal relation.
  • the causal relation, ⁇ E ⁇ E; is the smallest transitive relation such that e i ⁇ e j e l e j .
  • a valid observation is an ordered record of events from a given execution, i.e., (R, ), where e ⁇ E (record(e)) ⁇ R, and is an ordering operator.
  • a valid observation has:
  • the dual of the causal relation is a concurrence relation.
  • the concurrence relation, E ⁇ E includes all pairs (e a ; e b ) such that neither e a e b nor e b e a . While the causal relation is transitive, the concurrence relation is not.
  • the concurrence relation is symmetric, where the causal relation is not.
  • FIG. 30, depicts a space/time graph 3000 with each event having a label 3002 .
  • DLO 2902 can accurately place event records in their proper bins—even if received out of order—as long as it knows the bins of the immediate predecessors. If we know the bins in which events are recorded, we can determine something about their causality. Fortunately, it is easy to label each event with the number of its intended bin. Labels 3002 are analogous to time and are typically called Lamport timestamps.
  • a Lamport timestamp is an integer t associated with an event e i such that
  • a labeling mechanism is said to characterize the causal relation if, based on their labels alone, it can be determined whether two events are causal or concurrent. Although Lamport timestamps are consistent with causality (if t(e l ) ⁇ t(e j ), then e l e j , they do not characterize the causal relation.
  • FIG. 31 is a space/time graph 3100 that demonstrates the inability of scalar time stamps to characterize causality between events.
  • space/time graph 3100 shows event e 1 3102 , event e 2 3104 , and event e 3 3106 .
  • e 1 3102 e 2 3104 and also (e 1 3102 ⁇ e 3 3106 ) ⁇ (e 2 3104 ⁇ e 3 3106 ) with scalar time stamps, label(e i ), that characterize causality. It can be shown that e 3 3106 appears, when scalar timestamps are used, concurrent with both e 1 3102 and e 2 3104 . However, since e 1 3102 e 2 3104 it is not possible for e 3 3106 to be concurrent with both.
  • Event causality can be tracked completely using explicit event dependence graphs, with directed edges from each event to its immediate predecessors. Unfortunately, this method cannot store enough information with each record to determine whether two arbitrarily chosen events are causally related without traversing the dependence graph.
  • the event history can be projected against specific processes. For a process P j : the P j history projection of H(e i ), H Pj (e i ), is the intersection of H(e i ) and the set of events local to P j .
  • the event graph represented by a space/time diagram can be partitioned into equivalence classes, with one class for each process. The set of events local to P j is just the P j equivalence class.
  • H Pj (e i )
  • e i The cardinality of H Pj (e i ) is thus the number of events local to P j that causally precede e i , and e i itself. Since local events always occur in sequence, we can uniquely identify an event by its process and the cardinality of its local history.
  • FIG. 32 shows a space/time diagram 3200 with vector timestamped events.
  • a vector timestamp 3202 is a vector label, t e , assigned to each event, e ⁇ E, such that the i th element represents [H Pi (e)].
  • t ei the vector label assigned to each event
  • e ⁇ E the i th element represents [H Pi (e)].
  • vector timestamps both fully characterize causality and uniquely identify each event in an execution.
  • Each process (P s ) contains a vector clock ( ⁇ circumflex over (t) ⁇ Ps ) with elements for every process in the system, where ⁇ circumflex over (t) ⁇ Ps [s] always equals the number of events local to P s . Snapshots of this vector counter are used to label each event, and snapshots are transmitted with each message.
  • the recipient of a message with a vector snapshot can update its own vector counter ( ⁇ circumflex over (t) ⁇ Pr ) by replacing it with sup( ⁇ circumflex over (t) ⁇ Ps , ⁇ circumflex over (t) ⁇ Pr ), the element-wise A A maximum of ⁇ circumflex over (t) ⁇ Ps and ⁇ circumflex over (t) ⁇ Pr .
  • This technique places enough information with each message to determine message ordering. It is performed by comparing snapshots attached to each message. However, transmission of entire snapshots is usually not practical, especially if the system contains a large number of processes.
  • Vector clocks can however be maintained without transmitting complete snapshots.
  • a transmitting process, P s can send a list that includes only those vector clock values that have changed since its last message.
  • a recipient, P r then compares the change list to its current elements and updates those that are smaller. This requires each process to maintain several vectors: one for itself, and one for each process to which it has sent messages. However, change lists do not contain enough information to independently track message order.
  • interval timestamps are not much better than Lamport timestamps. In cases with large numbers of messages, however, interval timestamps can yield better results.
  • Space/time diagrams have typically proven useful in discussing event causality and concurrence. Space/time diagrams are also often employed as the user display in concurrent program debugging tools.
  • the Los Alamos parallel debugging system uses a text based time-process display, and Idd uses a graphic display. Both of these, however, rely on an accurate global real-time clock (impractical in most systems).
  • FIG. 33 shows a partial order event tracer (POET) display 3300.
  • POET partial order event tracer
  • the partial order event tracer (POET) system supports several different languages and run-time environments, including Hermes, a high-level interpreted language for distributed systems, and Java.
  • POET display 3300 distinguishes among several types of events by shapes, shading, and alignment of corresponding message lines.
  • a Distributed Program Debugger is based on an Remote Execution Manager (REM) framework.
  • REM Remote Execution Manager
  • the REM framework is a set of servers on interconnected Unix machines, where each server is a Unix user-level process. Processes executing in this framework can create and communicate with processes elsewhere in the network as if they were all on the same machine. DPD uses space/time displays for debugging communication only, and it relies on separate source-level debuggers for individual processes.
  • Simple space/time displays can be used to present programmers with a wealth of information about distributed executions. Typically however, space/time diagrams are too abstract to be an ultimate debugging solution. Space/time diagrams show high-level events and message traffic, but they do not support designer interaction with the source code. On the other hand, simple space/time diagrams may sometimes have too much detail. Space/time diagrams display each distinct low-level message that contributes to a high-level transaction without support for abstracting the transaction.
  • FIG. 34 is a space/time diagram 3400 having a first compound event 3402 and a second compound event 3404 .
  • first and second compound events 3402 and 3404 might be neither causally related nor concurrent. Abstraction is typically applied across two dimensions-events and processes-to aid in the task of debugging distributed software.
  • Event abstraction represents sequences of events as single entities. A group of events may occasionally have a specific semantic meaning that is difficult to recognize, much as streams of characters can have a meaning that is difficult to interpret without proper spacing and punctuation. Event abstraction can in some circumstances complicate the relationships between events.
  • Event abstraction can be applied in one of three ways: filtering, clustering, and interpretation.
  • event filtering a programmer describes event types that the debugger should ignore, which are then hidden from view.
  • clustering the debugger collects a number of events and presents the group as a single event.
  • interpretation the debugger parses the event stream for event sequences with specific semantic meaning and presents them to a programmer.
  • Process abstraction is usually applied only as hierarchical clustering. The remainder of this section discusses these specific event and process abstraction approaches.
  • Event filtering and clustering are techniques used to hide events from a designer and thereby reduce clutter. Event filters exclude selected events from being tracked in event-based debugging techniques. In most cases, this filtering is implicit and can not be modified without changing the source code because the source code being debugged is designed to report only certain events to the debugger. When deployed, the code will report all such events to the tool. This approach is employed in both DPD and POET, although some events may be filtered from the display at a later time.
  • An event cluster is a group of events represented as a single event.
  • the placement of an event in a cluster is based on simple parameters, such as virtual time bounds and process groups.
  • Event clusters can have causal ambiguities. For example, one cluster may contain events that causally precede events in a second cluster, while other events causally follow certain events in the second cluster.
  • FIG. 35 shows a POET display 3500 involving a first convex event cluster 3502 and a second convex event cluster 3504 .
  • POET uses a virtual-time-based clustering technique that represents convex event clusters as single abstract events.
  • a convex event cluster is a set of event instances, C, such that for events
  • the third technique for applying event abstraction is interpretation, also referred to as behavioral abstraction. Both terms describe techniques that use debugging tools to interpret the behavior represented by sequences of events and present results to a designer. Most approaches to behavioral abstraction let a designer describe sequences of events using expressions, and the tools recognize the sequence of events through a combination of customized finite automata followed by explicit checks. Typically, matched expressions generate new events.
  • Chain expressions used in the Ariadne parallel debugger, are an alternate way to describe distributed behavior patterns that have both causality and concurrence. These behavioral descriptions are based on chains of events (abstract sequences not bound to processes), p-chains (chains bound to processes), and pt-chains (composed p-chains).
  • the syntax for describing chain expressions is fairly simple, with ⁇ a b> representing two causally related events and
  • the recognition algorithm has two functions. First, recognizing the appropriate event sequence from a linear stream (using an NFA). Second, checking the relationships between specific events
  • the behavioral abstraction techniques described so far rely on centralized abstraction facilities. These facilities can be distributed, as well.
  • the BEE (Basis for distributed Event Environments) project is a distributed, hierarchical, event-collection system, with debugging clients located with each process.
  • FIG. 36 show a Basis for distributed Event Environments (BEE) abstraction facility 3600 for a single client.
  • event interpretation is performed at several levels. The first is an event sensor 3602 , inserted into the source of the program under test and invoked whenever a primitive event occurs during execution. The next level is an event generator 3604 , where information-including timestamps and process identifiers-is attached to each event. Event generator 3604 uses an event table 3606 to determine whether events should be passed to an event handler 3608 or simply dropped. Event handler 3608 manages event table 3606 within event generator 3604 . Event handler 3608 filters and collects events and routes them to appropriate event interpreters (not shown).
  • BEE distributed Event Environments
  • Event interpreters gather events from a number of clients (not shown) and aggregate them for presentation to a programmer. Clients and their related event interpreters are placed together in groups managed by an event manager (not shown).
  • An event manager not shown.
  • Clusters can then serve as participants in interaction patterns to be further clustered.
  • These cluster hierarchies are strictly trees, as shown in FIG. 37, which depicts a hierarchical construction of process clusters 3700 .
  • a square node 3702 represents a process (not shown)
  • a round node 3704 represents a process cluster (not shown).
  • Cohesion ⁇ ( P ) ⁇ l ⁇ jSim f ⁇ ( p l , p j ) m ⁇ ( m - 1 ) / 2
  • ⁇ circumflex over (b) ⁇ > denotes the scaler product of vectors â and ⁇ circumflex over (b) ⁇ ,and ⁇ â ⁇ denotes the magnitude of vector â.
  • C P1 and C P2 are process characteristic vectors—in them, each element contains a value between 0 and 1 that indicates how strongly a particular characteristic manifests itself in each process. Characteristics can include keywords, type names, function references, etc. A is a value that equals 1 if any of the following apply:
  • P 1 and P 2 are unique instantiations of their own source.
  • P 1 and P 2 communicate with each other.
  • Coupling ⁇ ( P ) ⁇ ij ⁇ Sim f ⁇ ( p i , q j ) mn
  • State-based debugging techniques focus on the state of the system and the state changes caused by events, rather than on events themselves.
  • the familiar source-level debugger for sequential program debugging is state-based. This source-level debugger lets designers set breakpoints in the execution of a program, enabling them to investigate the state left by the execution to that point. This source-level debugger also lets programmers step through a program's execution and view changes in state caused by each step.
  • a consistent cut is a cut of an event dependency graph representing an execution that: a) intersects each process exactly once; and b) points all dependencies crossing the cut in the same direction.
  • consistent cuts have both a past and a future. These are the subgraphs on each side of the cut.
  • FIG. 39 shows that consistent cuts can be represented as a jagged line across the space/time diagram that meets the above requirements.
  • a space/time graph 3900 is shown having a first cut 3902 and a second cut 3904 . All events to the left of either first cut 3902 or second cut 3904 are in the past of each cut, and all events to the right are in the future of each cut, respectively.
  • First cut 3902 is a consistent cut because no message travel from the future to the past.
  • Second cut 3904 however is not consistent because a message 3906 travels from the future to the past.
  • FIGS. 40A, B, and C show that a distributed execution shown in a space/time diagram 4000 can be represented by a lattice of consistent cuts 4002 , where T is the start of the execution, and ⁇ is system termination.
  • lattice of consistent cuts 4002 represents the global statespace traversed by a single execution. Since lattice of consistent cuts 4002 's size is on the order of
  • [0475] relates cuts such that one immediately precedes the other, and relates cuts between which there is a path.
  • Controlled stepping, or single stepping, through regions of an execution can help with an analysis of system behavior.
  • the programmer can examine changes in state at the completion of each step to get a better understanding of system control flow.
  • Coherent single stepping for a distributed system requires steps to align with a path through a normal execution's consistent cut lattice.
  • DPD works with standard single-process debuggers (called client debuggers), such as DBX, GDB, etc.
  • client debuggers such as DBX, GDB, etc.
  • Programmers can use these tools to set source-level break-points and single step through individual process executions. However, doing so leaves the other processes executing during each step, which can yield unrealistic executions.
  • Zernic gives a simple procedure for single stepping using a post-mortem traversal of a consistent cut lattice.
  • the debugger chooses an event, e l , from the future such that any events it depends on are already in the past, i.e., there are no future events, e f , such that e f e l . This ensures that the step proceeds between two consistent cuts related by
  • the debugger moves this single event to the past, performing any necessary actions.
  • POET's support for single stepping uses three disjoint sets: executed, ready, and non-ready.
  • the executed set is identical to the past set in “Using Visualization Tools to Understand Concurrency,” by D. Zernik, M. Snir, and D. Malki, IEEE Software 9, 3 (1992), pp. 87-92.
  • the ready set contains all events that are fully enabled by events in the future, and the contents of the non-ready set have some enabling events in either the ready or non-ready sets.
  • Global-step and step-over may progress between two consistent cuts not related
  • a global-step is performed by moving all events from the ready set into the past. Afterwards, the debugger must move to the ready set all events in the non-ready set whose dependencies are in the executed set.
  • a global-step is useful when the programmer wants information about a system execution without having to look at any process in detail.
  • the step-over procedure considers a local, or single-process, projection of the ready and non-ready sets. To perform a step, it moves the earliest event from the local projections into the executed set and executes through events on the other processes until the next event in the projection is ready. This ensures that the process in focus will always have an event ready to execute in the step that follows.
  • Step-in is another type of local step. Unlike step-over, step-in does not advance the system at the completion of the step; instead, the system advance is considered to be a second step.
  • FIGS. 41A, B, C, and D show a space/time diagram before a step 4100 and a resulting space/time diagram after performing a global-step 4102 , a step-over 4104 , and a step-in 4106 .
  • each process performs some type of cut action (e.g., state saving).
  • some type of cut action e.g., state saving
  • barrier synchronization which erects a temporal barrier that no process can pass until all processes arrive. Any cut taken immediately before, or immediately after, the barrier is consistent.
  • barrier synchronization some processes may have a long wait before the final process arrives.
  • a more proactive technique is to use a process called the cutinitiator to send perform-cut messages to all other system processes.
  • a process Upon receiving a perform-cut message, a process performs its cut action, sends a cut-finished message to the initiator, and then suspends itself. After the cut initiator receives cut-finished messages from all processes, it sends each of them a message to resume computation.
  • the Chandy-Lamport algorithm does not require the system to be stopped.
  • the cut starts when a cut initiator sends perform-cut messages to all of the processes.
  • a process stops all work, performs its cut action, and then sends a mark on each of its outgoing channels; a mark is a special message that tells its recipient to perform a cut action before reading the next message from the channel.
  • a mark is a special message that tells its recipient to perform a cut action before reading the next message from the channel.
  • the process is free to continue computation. If the recipient has already performed the cut action when it receives a mark, it can continue working as normal.
  • Each cut request and each mark associated with a particular cut are labeled with a cut identifier, such as the process ID of the cut initiator and an integer. This lets a process distinguish between marks for cuts it has already performed and marks for cuts it has yet to perform.
  • the first is called the two-color, or red-white, algorithm.
  • This algorithm information about the cut state is transferred with each message.
  • Each process in the system has a color.
  • Processes not currently involved in a consistent cut are white, and all messages transmitted are given a white tag.
  • there is a cut initiator that sends perform-cut messages to all system processes. When a process receives this request, it halts, performs the cut action, and changes its color to red. From this point on, all messages transmitted are tagged with red to inform the recipients that a cut has occurred.
  • Any process can accept a white message without consequence, but when a white process receives a red message, it must perform its cut action before accepting the message. Essentially, white processes treat red messages as cut requests. Red processes can accept red messages at any time, without consequence.
  • a disadvantage of the two-color algorithm is that the system must reset all of the processes back to white after they have completed their cut action. After switching back, each process must treat red messages as if they were white until they are all flushed from the previous cut. After this, each process knows that the next red message it receives signals the next consistent cut.
  • messages-in-flight are simply white messages received by red processes. These can all be recorded locally, or the recipient can report them to the cut initiator. In the latter case, each red process simply sends the initiator a record of any white messages received.
  • DPD makes use of the UNIX fork system call to perform checkpointing for later rollback.
  • Fork When fork is called, it makes an exact copy of the calling process, including all current state.
  • the newly forked process is suspended and indexed. Rollback suspends the active process and resumes an indexed process.
  • the problem with this approach is that it can quickly consume all system memory, especially if checkpointing occurs too frequently.
  • DPD's solution is to let the programmer choose the checkpoint frequency through use of a slider in its GUI.
  • Processes must sometimes be returned to states that were not specifically saved. In this case, the debugger must do additional work to advance the system to the desired point. This is called replay and is performed using event trace information to guide an execution of the system.
  • replay the debugger chooses an enabled process (i.e., one whose next event has no pending causal requirements) and executes it, using the event trace to determine where the process needs to block for a message that may have arrived asynchronously in the original execution. When the process blocks, the debugger chooses the next enabled process and continues from there. In this way, a replay is causally identical to the original execution.
  • FIG. 42 shows a space time diagram 4200 for a system that is subject to the domino effect during rollback.
  • c 1 i.e., roll back to P 3 .
  • c 2 4206 requires a roll back to P 2 .
  • c 2 4208 which requires a roll back to P 1 .
  • c 2 4210 which requires a roll back to P 3 .
  • c 1 4212 which requires a roll back to P 2 .
  • c 1 4214 which requires a final roll back to P 1 .
  • c 1 4216 The problem is caused by causal overlaps between message transfers and checkpoints. Performing checkpoints only at consistent cuts avoids a domino effect.
  • Branching time temporal logic is predicate logic with temporal quantifiers, P, F, G, H, A, and E.
  • P ⁇ is true in the present if ⁇ was true at some point in the past;
  • F ⁇ is true in the present if ⁇ will be true at some point in the future;
  • G ⁇ is true in the present if ⁇ will be true at every moment in the future;
  • H ⁇ is true in the present if ⁇ was true at every moment of the past.
  • G ⁇ is the same as F ⁇
  • H ⁇ is the same as P ⁇ .
  • a monotonic global predicate is a predicate ⁇ such that C a
  • ⁇ C a
  • AG ⁇ .
  • a monotonic global predicate is one that remains true after becoming true.
  • An unstable global predicate is a predicate ⁇ such that C a
  • ⁇ C a
  • EG ⁇ .
  • An unstable global predicate is one that may become false after becoming true.
  • Monotonic predicates can be detected any time after becoming true.
  • One algorithm is to occasionally take consistent cuts and evaluate the predicate at each. In fact, it is not necessary to use consistent cuts, since any transverse cut whose future is a subset of the future of the consistent cut in which the predicate first became true will also show the predicate true.
  • the detection of possibly ⁇ for weak conjunctive predicates, or global predicates that can be expressed as conjunctions of local predicates, is ⁇ (
  • the algorithm for this is to walk a path through the consistent cut lattice that aligns with a single process, P t , until either: (1) the process's component of ⁇ is true, or (2) there is no way to proceed without diverging from P t . In either case, the target process is switched, and the walk continued. This algorithm continues until it reaches a state where all components of the predicate are true, or until it reaches ⁇ . In this way, if there are any consistent cuts where all parts of the predicate simultaneously hold, the algorithm will encounter at least one.
  • Complicating the debugging of heterogenous embedded systems are designs comprised of concurrent and distributed processes. Most of the difficulty in debugging distributed systems results from concurrent processes with globally unscheduled and frequently asynchronous interactions. Multiple executions of a system can produce wildly varying results—even if they are based on identical inputs.
  • the two main debugging approaches for these systems are event based and state based.
  • Event-based approaches are monitoring approaches. Events are presented to a designer in partially ordered event displays, called space/time displays. These are particularly good at showing inter-process communication over time. They can provide a designer with large amounts of information in a relatively small amount of space.
  • FIG. 43 is an example of a coordination-centric approach to the debugging of a distributed software environment 4300 , in accordance with the present invention.
  • distributed software environment 4300 containing several processing elements 4302 connected so that information can be exchanged, is connected through a communication channel 4304 to a debugging host 4306 .
  • Distributed software environment 4300 can be connected either directly or indirectly to debugging host 4306 .
  • Cooperative execution refers to simultaneously executing a distributed software environment 4300 and simulating distributed software environment 4300 on debugging host 4306 based on event traces from distributed software environment 4300 .
  • Debugging host 4306 may be a general-purpose workstation. Distributed software environments are likely to have several processing elements 4302 , but only a few of these have resources that let them connect directly to debugging host 4306 . Those that do not have these resources, either have an indirect connection to the debugging host or are opaque to debugging.
  • FIG. 44 is a detailed view of a direct connection between a primary processing element 4400 of distributed software environment 4300 and debugging host 4306 .
  • primary processing element 4400 contains a software program 4401 , which has at least two software components 4402 , a runtime system 4404 , a coordinator 4406 , an interface (not shown) having a port (not shown), and a primary runtime debugging architecture 4408 .
  • Software program 4401 generates an event record in response to a selected event.
  • Runtime system 4404 collects the events from software components 4402 and transfers them to primary runtime debugging architecture 4408 .
  • Primary runtime debugging architecture 4408 contains a time stamper 4410 , a causality stamper 4412 , a primary uplink component 4414 , and a primary transfer component 4416 coupled to primary uplink component 4414 .
  • Time stamper 4410 provides a time stamp to the event record generated by software program 4401
  • causality stamper 4412 provides an identification of a cause of the event associated with the corresponding event record.
  • Primary uplink component 4414 of primary runtime debugging architecture 4408 facilitates communication through communication channel 4304 between primary processing element 4400 and debugging host 4306 , while primary transfer component 4416 collects and facilitates the transfer of the time-stamped and causality-stamped event record from primary processing element 4400 to an event queue 4418 on debugging host 4306 .
  • Debugging host 4306 operates on the events-simulating the activity of distributed software environment 4300 and letting the designer navigate the execution. The bulk of debugging support is thus on debugging host 4306 , which reduces the probe effect in distributed software environment 4300 . All debugging is based on a combination of system state and event records, which must be collected and properly annotated to record causality.
  • primary processing element's runtime system 4404 asynchronously dumps all events onto debugging host 4306 .
  • Debugging host 4306 queues all events and writes them to disk. All debugging activity is based on post-processing on these events.
  • FIG. 45 is a detailed view of an indirect connection from primary processing element 4400 to debugging host 4306 .
  • primary runtime debugging architecture 4408 contains time stamper 4410 , causality stamper 4412 , and primary uplink component 4414
  • an intermediate processing element 4500 contains an intermediate runtime debugging architecture 4502 , having an intermediate transfer component 4504 coupled to an intermediate uplink component 4506 .
  • event records are routed to intermediate processing element 4500 , which collects and forwards them, along with some event records of its own, to debugging host 4306 .
  • Primary uplink component 4414 facilitates communication, along communication channel 4304 , between primary processing element 4400 and intermediate processing element 4500 .
  • Intermediate uplink component 4506 facilitates communication, along communication channel 4304 , between intermediate processing element 4500 and debugging host 4306 , while intermediate transfer component 4504 collects and facilitates the transfer of the event record from intermediate processing element 4500 to event queue 4418 in debugging host 4306 .
  • FIG. 46 depicts capturing event records in a flash memory for post-mortem distributed debugging.
  • runtime system 4404 collects and transfers the events to primary runtime debugging architecture 4408 , which includes time stamper 4410 , causality stamper 4412 , and a flash driver 4600 . After the events have been time-stamped and causality-stamped, flash driver 4600 sends the events to a flash memory 4602 for storage and post-mortem debugging.
  • FIG. 47 shows how flash memory 4602 can be allocated to ensure that the entire context of a system crash can be reconstructed.
  • the runtime system of a processing element records events from the execution, applies timestamps and causality pointers, and forwards the records to a communication channel. Any event made visible to the runtime system, either by being identified as an explicit event in the model or by eventRecord calls at runtime, is a candidate for recording.
  • the eventRecord function can be implemented either to transfer event records to a runtime simulation host or to record them into storage.
  • Runtime debugging support in the physical distributed software environment should be as lightweight as possible to avoid incurring any further probe effect.
  • One of the goals in debugging a physical system is to ensure that as little of the debugging support as possible is placed on the systems themselves.
  • FIG. 48 shows an example of low-level behavioral recognition for a simple I 2 C protocol. At the lowest level are signal transitions. These are translated into sub-low-level events, which are then fed into a language recognizer to generate low-level debugging events.

Abstract

A software system and method, using a coordination-centric approach, for debugging distributed software environments is described, wherein the distributed software environment produces event traces to be analyzed by a debugging host. Distributed software environments are connected to debugging hosts either directly or indirectly. In a direct connection, a processing element's runtime system collects event records and sends them to a primary runtime debugging architecture, where the event records are time-stamped and causality-stamped and transferred to an event queue on the debugging host. An indirect connection uses an intermediate runtime debugging architecture, which facilitates the transfer of event records from the processing element to the event queue. Event records also may be collected and stored on a flash memory for post-mortem distributed debugging. Event traces are made visible to the runtime system by inserting event recording calls at significant source lines in the distributed software environment.

Description

    RELATED APPLICATIONS
  • This application is a continuation of U.S. Provisional Application No. 60/213,496 filed Jun. 23, 2000, incorporated herein by refernece.[0001]
  • TECHNICAL FIELD
  • The present invention relates to a system and method for debugging distributed software environments and, in particular to a coordination-centric approach in which a distributed software environment produces event traces to be analyzed by a debugging host. [0002]
  • BACKGROUND OF THE INVENTION
  • A system design and programming methodology is most effective when it is closely integrated and coheres tightly with its corresponding debugging techniques. In distributed and embedded system methodologies, the relationship between debugging approaches and design methodologies has traditionally been one-sided in favor of the design and programming methodologies. Design and programming methodologies are typically developed without any consideration for the debugging techniques that will later be applied to software systems designed using that design and programming methodology. While these typical debugging approaches attempt to exploit features provided by the design and programming methodologies, the debugging techniques will normally have little or no impact on what the design and programming features are in the first place. This lack of input from debugging approaches to design and programming methodologies serves to maintain the role of debugging as an afterthought, even though in a typical system design, debugging consumes a majority of the design time. The need remains for a design and programming methodology that reflects input from, and consideration of, potential debugging approaches in order to enhance the design and reduce the implementation time of software systems. [0003]
  • 1. Packaging of Software Elements [0004]
  • Packaging refers to the set of interfaces a software element presents to other elements in a system. Software packaging has many forms in modern methodologies. Some examples are programming language procedure call interfaces (as with libraries), TCP/IP socket interfaces with scripting languages (as with mail and Web servers), and file formats. Several typical prior art packaging styles are described below, beginning with packaging techniques used in object-oriented programming languages and continuing with a description of more generalized approaches to packaging. [0005]
  • A. Object-oriented Approaches to Packaging [0006]
  • One common packaging style is based on object-oriented programming languages and provides procedure-based (method-based) packaging for software elements (objects within this framework). These procedure-based packages allow polymorphism (in which several types of objects can have identical interfaces) through subtyping, and code sharing through inheritance (deriving a new class of objects from an already existing class of objects). In a typical object-oriented programming language, an object's interface is defined by the object's methods. [0007]
  • Object-oriented approaches are useful in designing concurrent systems (systems with task level parallelism and multiple processing resources?) because of the availability of active objects (objects with a thread of control). Some common, concurrent object-oriented approaches are shown in actor languages and in concurrent Eiffel. [0008]
  • Early object-oriented approaches featured anonymity of objects through dynamic typechecking. This anonymity of objects meant that a first object did not need to know anything about a second object in order to send a message to the second object. One unfortunate result of this anonymity of objects was that the second object could unexpectedly respond to the first object that the sent message was not understood, resulting in a lack of predictability, due to this disruption of system executions, for systems designed with this object-oriented approach. [0009]
  • Most modern object-oriented approaches opt to sacrifice the benefits flowing from anonymity of objects in order to facilitate stronger static typing (checking to ensure that objects will properly communicate with one another before actually executing the software system). The main result of stronger static typing is improved system predictability. However, an unfortunate result of sacrificing the anonymity of objects is a tighter coupling between those objects, whereby each object must explicitly classify, and include knowledge about, other objects to which it sends messages. In modern object-oriented approaches the package (interface) has become indistinguishable from the object and the system in which the object is a part. [0010]
  • The need remains for a design and programming methodology that combines the benefits of anonymity for the software elements with the benefits derived from strong static typing of system designs. [0011]
  • B. Other Approaches to Packaging [0012]
  • Other packaging approaches provide higher degrees of separation between software elements and their respective packages than does the packaging in object-oriented systems. For example, the packages in event-based frameworks are interfaces with ports for transmitting and receiving events. These provide loose coupling for inter-element communication. However, in an event-based framework, a software designer must explicitly implement inter-element state coherence between software elements as communication between those software elements. This means that a programmer must perform the error-prone task of designing, optimizing, implementing, and debugging a specialized communication protocol for each state coherence requirement in a particular software system. [0013]
  • The common object request broker architecture (CORBA) provides an interface description language (IDL) for building packages around software elements written in a variety of languages. These packages are remote procedure call (RPC) based and provide no support for coordinating state between elements. With flexible packaging, an element's package is implemented as a set of co-routines that can be adapted for use with applications through use of adapters with interfaces complementary to the interface for the software element. These adapters can be application-specific-used only when the elements are composed into a system. [0014]
  • The use of co-routines lets a designer specify transactions or sequences of events as part of an interface, rather than just as atomic events. Unfortunately, co-routines must be executed in lock-step, meaning a transition in one routine corresponds to a transition in the other co-routine. If there is an error in one or if an expected event is lost, the interface will fail because its context will be incorrect to recover from the lost event and the co-routines will be out of sync. [0015]
  • The need remains for a design and programming methodology that provides software packaging that supports the implementation of state coherence in distributed concurrent systems without packaging or interface failure when an error or an unexpected event occurs. [0016]
  • 2. Approaches to Coordination [0017]
  • Coordination, within the context of this application, means the predetermined ways through which software components interact. In a broader sense, coordination refers to a methodology for composing concurrent components into a complete system. This use of the term coordination differs slightly from the use of the term in the parallelizing compiler literature, in which coordination refers to a technique for maintaining program-wide semantics for a sequential program decomposed into parallel subprograms. [0018]
  • A. Coordination Languages [0019]
  • Coordination languages are usually a class of tuple-space programming languages, such as Linda. A tuple is a data object containing two or more types of data that are identified by their tags and parameter lists. In tuple-space languages, coordination occurs through the use of tuple spaces, which are global multisets of tagged tuples stored in shared memory. Tuple-space languages extend existing programming languages by adding six operators: out, in, read, eval, inp, and readp. The out, in, and read operators place, fetch and remove, and fetch without removing tuples from tuple space. Each of these three operators blocks until its operation is complete. The out operator creates tuples containing a tag and several arguments. Procedure calls can be included in the arguments, but since out blocks, the calls must be performed and the results stored in the tuple before the operator can return. [0020]
  • The operators eval, inp, and readp are nonblocking versions of out, in, and read, respectively. They increase the expressive power of tuple-space languages. Consider the case of eval, the nonblocking version of out. Instead of evaluating all arguments of the tuple before returning, it spawns a thread to evaluate them, creating, in effect, an active tuple (whereas tuples created by out are passive). As with out, when the computation is finished, the results are stored in a passive tuple and left in tuple space. Unlike out, however, the eval call returns immediately, so that several active tuples can be left outstanding. [0021]
  • Tuple-space coordination can be used in concise implementations of many common interaction protocols. Unfortunately, tuple-space languages do not separate coordination issues from programming issues. Consider the annotated Linda implementation of RPC in [0022] Listing 1.
  • Listing 1: Linda Used to Emulate RPC: [0023]
    rpcCall(args) { /* C */
    out(“RPCToServer”, “Client”, args...);
    in(“Client, “ReturnFromServer”, &returnValue);
    return returnValue; /* C */
    } /* C */
    Server:
    ...
    while(true) { /* C */
    in(“RPCToServer”, &returnAddress, args...);
    returnValue = functionCall(args); /* C */
    out(returnAddress, “ReturnFromServer”, returnValue);
    } /* C */
  • Although the implementation depicted in [0024] Listing 1 is a compact representation of an RPC protocol, the implementation still depends heavily on an accompanying programming language (in this case, C). This dependency prevents designers from creating a new Linda RPC operator for arbitrary applications of RPC. Therefore, every time a designer uses Linda for RPC, they must copy the source code for RPC or make a C-macro. This causes tight coupling, because the client must know the name of the RPC server. If the server name is passed in as a parameter, flexibility increases; however, this requires a binding phase in which the name is obtained and applied outside of the Linda framework.
  • The need remains for a design and programming methodology that allows implementation of communication protocols without tight coupling between the protocol implementation and the software elements with which the protocol implementation works. [0025]
  • A tuple space can require large quantities of dynamically allocated memory. However, most systems, and especially embedded systems, must operate within predictable and sometimes small memory requirements. Tuple-space systems are usually not suitable for coordination in systems that must operate within small predictable memory requirements because once a tuple has been generated, it remains in tuple space until it is explicitly removed or the software element that created it terminates. Maintaining a global tuple space can be very expensive in terms of overall system performance. Although much work has gone into improving the efficiency of tuple-space languages, system performance remains worse with tuple-space languages than with message-passing techniques. [0026]
  • The need remains for a design and programming methodology that can effectively coordinate between software elements while respecting performance and predictable memory requirements. [0027]
  • B. Fixed Coordination Models [0028]
  • In tuple-space languages, much of the complexity of coordination remains entangled with the functionality of computational elements. An encapsulating coordination formalism decouples inter-component interactions from the computational elements. [0029]
  • This type of formalism can be provided by fixed coordination models in which the coordination style is embodied in an entity and separated from computational concerns. Synchronous coordination models coordinate activity through relative schedules. Typically, these approaches require the coordination protocol to be manually constructed in advance. In addition, computational elements must be tailored to the coordination style used for a particular system (which may require intrusive modification of the software elements). [0030]
  • The need remains for a design and programming methodology that allows for coordination between software elements without tailoring the software elements to the specific coordination style used in a particular software system while allowing for interactions between software elements is a way that facilitates debugging complex systems. [0031]
  • SUMMARY OF THE INVENTION
  • The present invention provides a coordination-centric debugging approach and programming methodology to facilitate the debugging of distributed software environments. This approach includes using “cooperative execution,” in which the software produces event traces to be analyzed by a debugging host. [0032]
  • A distributed software environment contains multiple processing elements, which are loaded with software, as well as sensors, displays, and other hardware. In accordance with the present invention, these processing elements contain software programs that generate corresponding event records in response to selected events. A software program includes components, runtime systems, coordinators, interfaces, and runtime debugging architectures. The coordinator manages control and data flow interactions between components, and the interface between the coordinator and a component facilitates the exposure of an event. The runtime system of a processing element collects the event record and transfers it to the runtime debugging architecture. The runtime debugging architecture facilitates the transfer of the event record from the processing element to a debugging host, along a communication channel. The communication channel facilitates either a direct or an indirect connection between the processing element and the debugging host. [0033]
  • In a direct connection, the runtime system of a processing element sends an event record to the runtime debugging architecture, which in turn connects to the debugging host and transfers the event record to the debugging host, which queues all event records and writes them to disk. In an indirect connection, an event record is routed from a primary runtime debugging architecture to an intermediate processing element that collects the event record and transfers it to the debugging host. [0034]
  • When it is not possible or practical to connect directly or indirectly to a debugging host, event records may still be collected and debugged at a later time. A processing element may include a flash driver that interfaces with a flash memory. The processing element's runtime system sends the event records to a flash driver that facilitates the collection and storage of event records in a flash memory for subsequent distributed debugging (i.e., “post-mortem” debugging). [0035]
  • Each event record, whether for direct or indirect connections or for post-mortem debugging, is time-stamped and causality-stamped by the runtime debugging architecture before it is transferred to the debugging host, the intermediate processing element, or flash memory. Time-stamping and causality-stamping facilitate coordination-centric debugging by stamping each event with a time in which the event was generated and with an identification of the cause of the event, allowing designers to debug a complex distributed software environment without having to guess at the proper event ordering. [0036]
  • In accordance with the present invention, any event made visible to the runtime system of a distributed software environment, either by being identified as an explicit event in the software or by event recording calls at runtime, is a candidate for recording. A distributed software environment contains instrumented distributed code, which produces the events to be captured by the runtime system. The code includes information about how different software components interact. Instrumentation may consist of inserting event recording calls at each significant source line, which can cause a probe effect in the software but which effect can be minimized by selectively focusing the instrumentation. Also, events often occur in specific, predetermined, partially ordered sequences; in some instances, a token representing the sequence may be recorded, rather than every single event in the sequence, thereby leaving the debugging host to expand the token. [0037]
  • In accordance with the present invention, the coordination-centric debugging approach makes complex distributed software environments more debuggable. The debugging approach of the present invention facilitates the debugging of complex distributed systems and lets a designer isolate the cause, or causes, of unexpected system performance without having looked at, or modified, any of the underlying source code and avoid the time-consuming and error-prone process of source level debugging. [0038]
  • Additional aspects and advantages of this invention will be apparent from the following detailed description of preferred embodiments thereof, which proceeds with reference to the accompanying drawings.[0039]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a component in accordance with the present invention. [0040]
  • FIG. 2 is the component of FIG. 1 further having a set of coordination interfaces. [0041]
  • FIG. 3A is a prior art round-robin resource allocation protocol with a centralized controller. [0042]
  • FIG. 3B is a prior art round-robin resource allocation protocol implementing a token-passing scheme. [0043]
  • FIG. 4A is a detailed view of a component and a coordination interface connected to the component for use in round-robin resource allocation in accordance with the present invention. [0044]
  • FIG. 4B depicts a round-robin coordinator in accordance with the present invention. [0045]
  • FIG. 5 shows several typical ports for use in a coordination interface in accordance with the present invention. [0046]
  • FIG. 6A is a unidirectional data transfer coordinator in accordance with the present invention. [0047]
  • FIG. 6B is a bidirectional data transfer coordinator in accordance with the present invention. [0048]
  • FIG. 6C is a state unification coordinator in accordance with the present invention. [0049]
  • FIG. 6D is a control state mutex coordinator in accordance with the present invention. [0050]
  • FIG. 7 is a system for implementing subsumption resource allocation having components, a shared resource, and a subsumption coordinator. [0051]
  • FIG. 8 is a barrier synchronization coordinator in accordance with the present invention. [0052]
  • FIG. 9 is a rendezvous coordinator in accordance with the present invention. [0053]
  • FIG. 10 depicts a dedicated RPC system having a client, a server, and a dedicated RPC coordinator coordinating the activities of the client and the server. [0054]
  • FIG. 11 is a compound coordinator with both preemption and round-robin coordination for controlling the access of a set of components to a shared resource. [0055]
  • FIG. 12A is software system with two data transfer coordinators, each having constant message consumption and generation rules and each connected to a separate data-generating component and connected to the same data-receiving component. [0056]
  • FIG. 12B is the software system of FIG. 12A in which the two data transfer coordinators have been replaced with a merged data transfer coordinator. [0057]
  • FIG. 13 is a system implementing a first come, first served resource allocation protocol in accordance with the present invention. [0058]
  • FIG. 14 is a system implementing a multiclient RPC coordination protocol formed by combining the first come, first served protocol of FIG. 13 with the dedicated RPC coordinator of FIG. 10. [0059]
  • FIG. 15 depicts a large system in which the coordination-centric design methodology can be employed having a wireless device interacting with a cellular network. [0060]
  • FIG. 16 shows a top-level view of the behavior and components of a system for a cell phone. [0061]
  • FIG. 17A is a detailed view of a GUI component of the cell phone of FIG. 16. [0062]
  • FIG. 17B is a detailed view of a call log component of the cell phone of FIG. 16. [0063]
  • FIG. 18A is a detailed view of a voice subsystem component of the cell phone of FIG. 16. [0064]
  • FIG. 18B is a detailed view of a connection component of the cell phone of FIG. 16. [0065]
  • FIG. 19 depicts the coordination layers between a wireless device and a base station, and between the base station and a switching center, of FIG. 15. [0066]
  • FIG. 20 depicts a cell phone call management component, a master switching center call management component, and a call management coordinator connecting the respective call management components. [0067]
  • FIG. 21A is a detailed view of a transport component of the connection component of FIG. 18B. [0068]
  • FIG. 21B is a CDMA data modulator of the transport component of FIG. 18B. [0069]
  • FIG. 22 is a detailed view of a typical TDMA and a typical CDMA signal for the cell phone of FIG. 16. [0070]
  • FIG. 23A is a LCD touch screen component for a Web browser GUI for a wireless device. [0071]
  • FIG. 23B is a Web page formatter component for the Web browser GUI for the wireless device. [0072]
  • FIG. 24A is a completed GUI system for a handheld Web browser. [0073]
  • FIG. 24B shows the GUI system for the handheld Web browser combined with the connection subsystem of FIG. 18B in order to access the cellular network of FIG. 15. [0074]
  • FIG. 25 is a typical space/time diagram with space represented on a vertical axis and time represented on a horizontal axis. [0075]
  • FIG. 26 is a space/time diagram depicting a set of system events and two different observations of those system events. [0076]
  • FIG. 27 is a space/time diagram depicting a set of system events and an ideal observation of the events taken by a real-time observer. [0077]
  • FIG. 28 is a space/time diagram depicting two different yet valid observations of a system execution. [0078]
  • FIG. 29 is a space/time diagram depicting a system execution and an observation of that execution take by a discrete lamport observer. [0079]
  • FIG. 30 is a space/time diagram depicting a set of events that each include a lamport time stamp. [0080]
  • FIG. 31 is a space/time diagram illustrating the insufficiency of scalar timestamps to characterize causality between events. [0081]
  • FIG. 32 is a space/time diagram depicting a set of system events that each include a vector time stamp. [0082]
  • FIG. 33 depicts a display from a partial order event tracer (POET). [0083]
  • FIG. 34 is a space/time diagram depicting two compound events that are neither causal nor concurrent. [0084]
  • FIG. 35 is a POET display of two convex event clusters. [0085]
  • FIG. 36 is a basis for distributed event environments (BEE) abstraction facility for a single client. [0086]
  • FIG. 37 is a hierarchical tree construction of process clusters. [0087]
  • FIG. 38A depicts a qualitative measure of cohesion and coupling between a set of process clusters that have heavy communication or are instantiated from the same source code. [0088]
  • FIG. 38B depicts a qualitative measure of cohesion and coupling between a set of process clusters that do not have heavy communication or are not instances of the same source code. [0089]
  • FIG. 38C depicts a qualitative measure of cohesion and coupling between an alternative set of process clusters that have heavy communication or are instantiated from the same source code. [0090]
  • FIG. 39 depicts a consistent and an inconsistent cut of a system execution on a space/time diagram. [0091]
  • FIG. 40A is a space/time diagram depicting a system execution. [0092]
  • FIG. 40B is a lattice representing all possible consistent cuts of the space/time diagram of FIG. 40A. [0093]
  • FIG. 40C is a graphical representation of the possible consistent cuts of FIG. 40B. [0094]
  • FIG. 41A is a space/time diagram depicting a system execution. [0095]
  • FIG. 41B is the space/time diagram of FIG. 41A after performing a global-step. [0096]
  • FIG. 41C is the space/time diagram of FIG. 41A after performing a step-over. [0097]
  • FIG. 41D is the space/time diagram of FIG. 41A after performing a step-in. [0098]
  • FIG. 42 is a space/time diagram depicting a system that is subject to a domino effect whenever the system is rolled back in time to a checkpoint. [0099]
  • FIG. 43 depicts the coordination-centric debugging approach in accordance with the present invention. [0100]
  • FIG. 44 is a detailed view of a direct connection between the primary processing element and the debugging host in accordance with the present invention. [0101]
  • FIG. 45 is a detailed view of an indirect connection between the primary processing element and the debugging host in accordance with the present invention. [0102]
  • FIG. 46 depicts capturing event records in flash memory for post-mortem distributed debugging in accordance with the present invention. [0103]
  • FIG. 47 shows how flash memory may be allocated. [0104]
  • FIG. 48 shows a distributed software environment being executed on a hardware platform and placement of a probe for monitoring bus traces on the platform and for generating event records.[0105]
  • DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
  • Coordination-centric Software Design [0106]
  • FIG. 1 is an example of a [0107] component 100, which is the basic software element within the coordination-centric design framework, in accordance with the present invention. With reference to FIG. 1, component 100 contains a set of modes 102. Each mode 102 corresponds to a specific behavior associated with component 100. Each mode 102 can either be active or inactive, respectively enabling or disabling the behavior corresponding to that mode 102. Modes 102 can make the conditional aspects of the behavior of component 100 explicit. The behavior of component 100 is encapsulated in a set of actions 104, which are discrete, event-triggered behavioral elements within the coordination-centric design methodology. Component 100 can be copied and the copies of component 100 can be modified, providing the code-sharing benefits of inheritance.
  • [0108] Actions 104 are enabled and disabled by modes 102, and hence can be thought of as effectively being properties of modes 102. An event (not shown) is an instantaneous condition, such as a timer tick, a data departure or arrival, or a mode change. Actions 104 can activate and deactivate modes 102, thereby selecting the future behavior of component 100. This is similar to actor languages, in which methods are allowed to replace an object's behavior.
  • In coordination-centric design, however, all possible behaviors must be identified and encapsulated before runtime. For example, a designer building a user interface component for a cell phone might define one mode for looking up numbers in an address book (in which the user interface behavior is to display complete address book entries in formatted text) and another mode for displaying the status of the phone (in which the user interface behavior is to graphically display the signal power and the battery levels of the phone). The designer must define both the modes and the actions for the given behaviors well before the component can be executed. [0109]
  • FIG. 2 is [0110] component 100 further including a first coordination interface 200, a second coordination interface 202, and a third coordination interface 204. Coordination-centric design's components 100 provide the code-sharing capability of object-oriented inheritance through copying. Another aspect of object-oriented inheritance is polymorphism through shared interfaces. In object-oriented languages, an object's interface is defined by its methods. Although coordination-centric design's actions 104 are similar to methods in object-oriented languages, they do not define the interface for component 100. Components interact through explicit and separate coordination interfaces, in this figure coordination interfaces 200, 202, and 204. The shape of coordination interfaces 200, 202, and 204 determines the ways in which component 100 may be connected within a software system. The way coordination interfaces 200, 202, and 204 are connected to modes 102 and actions 104 within component 100 determines how the behavior of component 100 can be managed within a system. Systemwide behavior is managed through coordinators (see FIG. 4B and subsequent).
  • For our approach to be effective, several factors in the design of software elements must coincide: packaging, internal organization, and how elements coordinate their behavior. Although these are often treated as independent issues, conflicts among them can exacerbate debugging. We handle them in a unified framework that separates the internal activity from the external relationship of [0111] component 100. This lets designers build more modular components and encourages them to specify distributable versions of coordination protocols. Components can be reused in a variety of contexts, both distributed, and single processor 1.
  • 1. Introduction to Coordination [0112]
  • Within this application, coordination refers to the predetermined ways by which components interact. Consider a common coordination activity: resource allocation. One simple protocol for this is round-robin: participants are lined up, and the resource is given to each participant in turn. After the last participant is served, the resource is given back to the first. There is a resource-scheduling period during which each participant gets the resource exactly once, whether or not it is needed. [0113]
  • FIG. 3A is prior art round-robin resource allocation protocol with a [0114] centralized controller 300, which keeps track of and distributes the shared resource (not shown) to each of software elements 302, 304, 306, 308, and 310 in turn. With reference to FIG. 3A, controller 300 alone determines which software element 302, 304, 306, 308, or 310 is currently allowed to use the resource and which has it next. This implementation of a round-robin protocol permits software elements 302, 304, 306, 308, and 310 to be modular, because only controller 300 keeps track of the software elements. Unfortunately, when this implementation is implemented on a distributed architecture (not shown), controller 300 must typically be placed on a single processing element (not shown). As a result, all coordination requests must go through that processing element, which can cause a communication performance bottleneck. For example, consider the situation in which software elements 304 and 306 are implemented on a first processing element (not shown) and controller 300 is implemented on a second processing element. Software element 304 releases the shared resource and must send a message indicating this to controller 300. Controller 300 must then send a message to software element 306 to inform software element 306 that it now has the right to the shared resource. If the communication channel between the first processing resource and the second processing resource is in use or the second processing element is busy, then the shared resource must remain idle, even though both the current resource holder and the next resource holder ( software elements 304 and 306 respectively) are implemented on the first processing element (not shown). The shared resource must typically remain idle until communication can take place and controller 300 can respond. This is an inefficient way to control access to a shared resource.
  • FIG. 3B is a prior art round-robin resource allocation protocol implementing a token passing scheme. With reference to FIG. 3B, this system consists of a shared resource [0115] 311 and a set of software elements 312, 314, 316, 318, 320, and 322. In this system a logical token 324 symbolizes the right to access resource 311, i.e., when a software element holds token 324, it has the right to access resource 311. When one of software elements 312, 314, 316, 318, 320, or 322 finishes with resource 311, it passes token 324, and with token 324 the access right, to a successor. This implementation can be distributed without a centralized controller, but as shown in FIG. 3B, this is less modular, because it requires each software element in the set to keep track of a successor.
  • Not only must [0116] software elements 312, 314, 316, 318, 320, and 322 keep track of successors, but each must implement a potentially complicated and error-prone protocol for transferring token 324 to its successor. Bugs can cause token 324 to be lost or introduce multiple tokens 324. Since there is no formal connection between the physical system and complete topology maps (diagrams that show how each software element is connected to others within the system), some software elements might erroneously be serviced more than once per cycle, while others are completely neglected. However, these bugs can be extremely difficult to track after the system is completed. The protocol is entangled with the functionality of each software element, and it is difficult to separate the two for debugging purposes. Furthermore, if a few of the software elements are located on the same machine, performance of the implementation can be poor. The entangling of computation and coordination requires intrusive modification to optimize the system.
  • 2. Coordination-centric Design's Approach to Coordination [0117]
  • The coordination-centric design methodology provides an encapsulating formalism for coordination. Components such as [0118] component 100 interact using coordination interfaces, such as first, second, and third coordination interfaces 200, 202, and 204, respectively. Coordination interfaces preserve component modularity while exposing any parts of a component that participate in coordination. This technique of connecting components provides polymorphism in a similar fashion to subtyping in object-oriented languages.
  • FIG. 4A is a detailed view of a [0119] component 400 and a resource access coordination interface 402 connected to component 400 for use in a round-robin coordination protocol in accordance with the present invention. With reference to FIG. 4A, resource access coordination interface 402 facilitates implementation of a round-robin protocol that is similar to the token-passing round-robin protocol described above. Resource access coordination interface 402 has a single bit of control state, called access, which is shown as an arbitrated control port 404 that indicates whether or not component 400 is holding a virtual token (not shown). Component 400 can only use a send message port 406 on access coordination interface 402 when arbitrated control port 404 is true. Access coordination interface 402 further has a receive message port 408.
  • FIG. 4B show a round-[0120] robin coordinator 410 in accordance with the present invention. With reference to FIG. 4B, round-robin coordinator 410 has a set of coordinator coordination interfaces 412 for connecting to a set of components 400. Each component 400 includes a resource access coordination interface 402. Each coordinator coordination interface 412 has a coordinator arbitrated control port 414, an incoming send message port 416 and an outgoing receive message port 418. Coordinator coordination interface 412 in complimentary to resource access coordination interface 402, and vice versa, because the ports on the two interfaces are compatible and can function to transfer information between the two interfaces.
  • The round-robin protocol requires round-[0121] robin coordinator 410 to manage the coordination topology. Round-robin coordinator 410 is an instance of more general abstractions called coordination classes, in which coordination classes define specific coordination protocols and a coordinator is a specific implementation of the coordination class. Round-robin coordinator 410 contains all information about how components 400 are supposed to coordinate. Although round-robin coordinator 410 can have a distributed implementation, no component 400 is required to keep references to any other component 400 (unlike the distributed round-robin implementation shown in FIG. 3B). All required references are maintained by round-robin coordinator 410 itself, and components 400 do not even need to know that they are coordinating through round-robin. Resource access coordination interface 402 can be used with any coordinator that provides the appropriate complementary interface. A coordinator's design is independent of whether it is implemented on a distributed platform or on a monolithic single processor platform.
  • 3. Coordination Interfaces [0122]
  • Coordination interfaces are used to connect components to coordinators. They are also the principle key to a variety of useful runtime debugging techniques. Coordination interfaces support component modularity by exposing all parts of the component that participate in the coordination protocol. Ports are elements of coordination interfaces, as are guarantees and requirements, each of which will be described in turn. [0123]
  • A. Ports [0124]
  • A port is a primitive connection point for interconnecting components. Each port is a five-tuple (T; A; Q; D; R) in which: [0125]
  • T represents the data type of the port. T can be one of int, boolean, char, byte, float, double, or cluster, in which cluster represents a cluster of data types (e.g., an int followed by a float followed by two bytes). [0126]
  • A is a boolean value that is true if the port is arbitrated and false otherwise. [0127]
  • Q is an integer greater than zero that represents logical queue depth for a port. [0128]
  • D is one of in, out, inout, or custom and represents the direction data flows with respect to the port. [0129]
  • R is one of discard-on-read, discard-on-transfer, or hold and represents the policy for data removal on the port. Discard-on-read indicates that data is removed immediately after it is read (and any data in the logical queue are shifted), discard-on-transfer indicates that data is removed from a port immediately after being transferred to another port, and hold indicates that data should be held until it is overwritten by another value. Hold is subject to arbitration. [0130]
  • Custom directionality allows designers to specify ports that accept or generate only certain specific values. For example, a designer may want a port that allows other components to activate, but not deactivate, a mode. While many combinations of port attributes are possible, we normally encounter only a few. The three most common are message ports (output or input), state ports (output, input, or both; sometimes arbitrated), and control ports (a type of state port). FIG. 5 illustrates the visual syntax used for several common ports throughout this application. With reference to FIG. 5, this figure depicts an exported [0131] state port 502, an imported state port 504, an arbitrated state port 506, an output data port 508, and an input data port 510.
  • 1. Message Ports [0132]
  • Message ports (output and input) [0133] data ports 508 and 510 respectively) are either send (T; false; 1; out; discard-on-transfer) or receive (T; false; Q; in; discard-on-read). Their function is to transfer data between components. Data passed to a send port is transferred immediately to the corresponding receive port, thus it cannot be retrieved from the send port later. Receive data ports can have queues of various depths. Data arrivals on these ports are frequently used to trigger and pass data parameters into actions. Values remain on receive ports until they are read.
  • 2. State Ports [0134]
  • State ports take one of three forms: [0135]
  • 1. (T; false; 1; out; hold) [0136]
  • 2. (T; false; 1; in; hold) [0137]
  • 3. (T; true; 1; inout; hold) [0138]
  • State ports, such as exported [0139] state port 502, imported state port 504, and arbitrated state port 506, hold persistent values, and the value assigned to a state port may be arbitrated. This means that, unlike message ports, values remain on the state ports until changed. When multiple software elements simultaneously attempt to alter the value of arbitrated state port 506, the final value is determined based on arbitration rules provided by the designer through an arbitration coordinator (not shown).
  • State ports transfer variable values between scopes. In coordination-centric design, all variables referenced by a component are local to that component, and these variables must be explicitly declared in the component's scope. Variables can, however, be bound to state ports that are connected to other components. In this way, a variable value can be transferred between components and the variable value achieves the system-level effect of a multivariable. [0140]
  • 3. Control Ports [0141]
  • Control ports are similar to state ports, but a control port is limited to having the boolean data type. Control ports are typically bound to modes. Actions interact with a control port indirectly, by setting and responding to the values of a mode that is bound to the control port. [0142]
  • For example, arbitrated [0143] control port 404 shown in FIG. 4A is a control port that can be bound to a mode (not shown) containing all actions that send data on a shared channel. When arbitrated control port 404 is false, the mode is inactive, disabling all actions that send data on the channel.
  • B. Guarantees [0144]
  • Guarantees are formal declarations of invariant properties of a coordination interface. There can be several types of guarantees, such as timing guarantees between events, guarantees between control state (e.g., state A and state B are guaranteed to be mutually exclusive), etc. Although a coordination interface's guarantees reflect properties of the component to which the coordination interface is connected, the guarantees are not physically bound to any internal portions of the component. Guarantees can often be certified through static analysis of the software system. Guarantees are meant to cache various properties that are inherent in a component or a coordinator in order to simplify static analysis of the software system. [0145]
  • A guarantee is a promise provided by a coordination interface. The guarantee takes the form of a predicate promised to be invariant. In principle, guarantees can include any type of predicate (e.g., x>3, in which x is an integer valued state port, or t[0146] ea−teb<2 ms). Throughout the remainder of this application, guarantees will be only event-ordering guarantees (guarantees that specify acceptable orders of events) or control-relationship guarantees (guarantees pertaining to acceptable relative component behaviors).
  • C. Requirements [0147]
  • A requirement is a formal declaration of the properties necessary for correct software system functionality. An example of a requirement is a required response time for a coordination interface-the number of messages that must have arrived at the coordination interface before the coordination interface can transmit, or fire, the messages. When two coordination interfaces are bound together, the requirements of the first coordination interface must be conservatively matched by the guarantees of the second coordination interface (e.g., x<7 as a guarantee conservatively matches x<8 as a requirement). As with guarantees, requirements are not physically bound to anything within the component itself. Guarantees can often be verified to be sufficient for the correct operation of the software system in which the component is used. In sum, a requirement is a predicate on a first coordination interface that must be conservatively matched with a guarantee on a complementary second coordination interface. [0148]
  • D. Conclusion Regarding Coordination Interfaces [0149]
  • A coordination interface is a four-tuple (P; G; R; I) in which: [0150]
  • P is a set of named ports. [0151]
  • G is a set of named guarantees provided by the interface. [0152]
  • R is a set of named requirements that must be matched by guarantees of connected interfaces. [0153]
  • I is a set of named coordination interfaces. [0154]
  • As this definition shows, coordination interfaces are recursive. [0155] Coordinator coordination interface 412, shown in FIG. 4B, used for round-robin coordination is called AccessInterface and is defined in Table 1.
    Constituent Value
    ports P = { access:StatePort, s:outMessagePort,
    r:inMessagePort }
    guarantees G = {
    Figure US20020174415A1-20021121-P00801
    access
    Figure US20020174415A1-20021121-P00802
    Figure US20020174415A1-20021121-P00801
    s.gen }
    requirements R = Ø
    interfaces I = Ø
  • Related to coordination interfaces is a recursive coordination interface descriptor, which is a five-tuple (P[0156] a; Ga; Ra; Id; Nd) in which:
  • P[0157] a is a set of abstract ports, which are ports that may be incomplete in their attributes (i.e., they do not yet have a datatype).
  • G[0158] a is a set of abstract guarantees, which are guarantees between abstract ports.
  • R[0159] a is a set of abstract requirements, which are requirements between abstract ports.
  • I[0160] d is a set of coordination interface descriptors.
  • N[0161] d is an element of Q×Q, where Q={∞}∪Z+ and Z+ denotes the set of positive integers. Nd indicates the number or range of numbers of permissible interfaces.
  • Allowing coordination interfaces to contain other coordination interfaces is a powerful feature. It lets designers use common coordination interfaces as complex ports within other coordination interfaces. For example, the basic message ports described above are nonblocking, but we can build a blocking coordination interface (not shown) that serves as a blocking port by combining a wait state port with a message port. [0162]
  • 4. Coordinators [0163]
  • A coordinator provides the concrete representations of intercomponent aspects of a coordination protocol. Coordinators allow a variety of static analysis debugging methodologies for software systems created with the coordination-centric design methodology. A coordinator contains a set of coordination interfaces and defines the relationships the coordination interfaces. The coordination interfaces complement the component coordination interfaces provided by components operating within the protocol. Through matched interface pairs, coordinators effectively describe connections between message ports, correlations between control states, and transactions between components. [0164]
  • For example, round-[0165] robin coordinator 410, shown in FIG. 4B, must ensure that only one component 400 has its component control port 404's value, or its access bit, set to true. Round-robin coordinator 410 must further ensure that the correct component 400 has its component control port 404 set to true for the chosen sequence. This section presents formal definitions of the parts that comprise coordinators: modes, actions, bindings, action triples, and constraints. These definitions culminate in a formal definition of coordinators.
  • A. Modes [0166]
  • A mode is a boolean value that can be used as a guard on an action. In a coordinator, the mode is most often bound to a control port in a coordination interface for the coordinator. For example, in round-[0167] robin coordinator 410, the modes of concern are bound to a coordinator control port 414 of each coordinator coordination interface 412.
  • B. Actions [0168]
  • An action is a primitive behavioral element that can: [0169]
  • Respond to events. [0170]
  • Generate events. [0171]
  • Change modes. [0172]
  • Actions can range in complexity from simple operations up to complicated pieces of source code. An action in a coordinator is called a transparent action because the effects of the action can be precomputed and the internals of the action are completely exposed to the coordination-centric design tools. [0173]
  • C. Bindings [0174]
  • Bindings connect input ports to output ports, control ports to modes, state ports to variables, and message ports to events. Bindings are transparent and passive. Bindings are simply conduits for event notification and data transfer. When used for event notification, bindings are called triggers. [0175]
  • D. Action Triples [0176]
  • To be executed, an action must be enabled by a mode and triggered by an event. The combination of a mode, trigger, and action is referred to as an action triple, which is a triple (m; t; a) in which: [0177]
  • m is a mode. [0178]
  • t is a trigger. [0179]
  • a is an action. [0180]
  • The trigger is a reference to an event type, but it can be used to pass data into the action. Action triples are written: mode: trigger: action [0181]
  • A coordinator's actions are usually either pure control, in which both the trigger and action performed affect only control state, or pure data, in which both the trigger and action performed occur in the data domain. In the case of round-[0182] robin coordinator 410, the following set of actions is responsible for maintaining the appropriate state:
  • accessi: −accessi: +access[0183] (i+1) mod n
  • The symbol “+” signifies a mode's activation edge (i.e., the event associated with the mode becoming true), and the symbol “−” signifies its deactivation edge. When any [0184] coordinator coordination interface 412 deactivates its arbitrated control port 404's, access bit, the access bit of the next coordinator coordination interface 412 is automatically activated.
  • E. Constraints [0185]
  • In this dissertation, constraints are boolean relationships between control ports. They take the form: [0186]
  • Condition [0187]
    Figure US20020174415A1-20021121-P00002
    Effect
  • This essentially means that the Condition (on the left side of the arrow) being true implies that Effect (on the right side of the arrow) is also true. In other words, if Condition is true, then Effect should also be true. [0188]
  • A constraint differs from a guarantee in that the guarantee is limited to communicating in-variant relationships between components without providing a way to enforce the in-variant relationship. The constraint, on the other hand, is a set of instructions to the runtime system dealing with how to enforce certain relationships between components. When a constraint is violated, two corrective actions are available to the system: (1) modify the values on the left-hand side to make the left-hand expression evaluate as false (an effect termed backpressure) or (2) alter the right-hand side to make it true. We refer to these techniques as LHM (left-hand modify) and RHM (right-hand modify). For example, given the constraint x[0189]
    Figure US20020174415A1-20021121-P00002
    Figure US20020174415A1-20021121-P00001
    y and the value X A y, with RHM semantics the runtime system must respond by disabling y or setting y to false. Thus, the value of
    Figure US20020174415A1-20021121-P00001
    y is set to true.
  • The decision of whether to use LHM, to use RHM, or even to suspend enforcement of a constraint in certain situations can dramatically affect the efficiency and predictability of the software system. Coordination-centric design does not attempt to solve simultaneous constraints at runtime. Rather, runtime algorithms use local ordered constraint solutions. This, however, can result in some constraints being violated and is discussed further below. [0190]
  • Round-[0191] robin coordinator 410 has a set of safety constraints to ensure that there is never more than one token in the system:
  • access[0192] i
    Figure US20020174415A1-20021121-P00002
    ≠i
    Figure US20020174415A1-20021121-P00001
    accessj
  • The above equation translates roughly as access[0193] i implies not accessj for the set of all accessj where j is not equal to i. Even this simple constraint system can cause problems with local resolution semantics (as are LHM and RHM). If the runtime system attempted to fix all constraints simultaneously, all access modes would be shut down. If they were fixed one at a time, however, any duplicate tokens would be erased on the first pass, satisfying all other constraints and leaving a single token in the system.
  • Since high-level protocols can be built from combinations of lower-level protocols, coordinators can be hierarchically composed. A coordinator is a six-tuple (I; M; B; N; A; X) in which: [0194]
  • I is a set of coordination interfaces. [0195]
  • M is a set of modes. [0196]
  • B is a set of bindings between interface elements (e.g., control ports and message ports) and internal elements (e.g., modes and triggers). [0197]
  • N is a set of constraints between interface elements. [0198]
  • A is a set of action triples for the coordinator. [0199]
  • X is a set of subcoordinators. [0200]
  • FIGS. 6A, 6B, [0201] 6C, and 6D show a few simple coordinators highlighting the bindings and constraints of the respective coordinators. With reference to FIG. 6A, a unidirectional data transfer coordinator 600 transfers data in one direction between two components (not shown) by connecting incoming receive message port 408 to outgoing receive message port 418 with a binding 602. With reference to FIG. 6B, bidirectional data transfer coordinator 604 transfers data back and forth between two components (not shown) by connecting incoming receive message port 408 to outgoing receive message port 418 with binding 602 and connecting send message port 406 to incoming send message port 416 with a second binding 602. Unidirectional data transfer coordinator 600 and bidirectional data transfer coordinator 604 simply move data from one message port to another. Thus, each coordinator consists of bindings between corresponding ports on separate coordination interfaces.
  • With reference to FIG. 6C, [0202] state unification coordinator 606 ensures that a state port a 608 and a state port b 610 are always set to the same value. State unification coordinator 606 connects state port a 608 to state port b 610 with binding 602. With reference to FIG. 6D, control state mutex coordinator 612 has a first constraint 618 and a second constraint 620 as follows:
  • (1) c[0203]
    Figure US20020174415A1-20021121-P00002
    Figure US20020174415A1-20021121-P00001
    d and
  • (2) d[0204]
    Figure US20020174415A1-20021121-P00002
    Figure US20020174415A1-20021121-P00001
    c.
  • [0205] Constraints 618 and 620 can be restated as follows:
  • (1) A [0206] state port c 614 having a true value implies that a state port d 616 has a false value, and
  • (2) [0207] State port d 616 having a true value implies that state port c 614 has a false value.
  • A coordinator has two types of coordination interfaces: up interfaces that connect the coordinator to a second coordinator, which is at a higher level of design hierarchy and down interfaces that connect the coordinator either to a component or to a third coordinator, which is at a lower level of design hierarchy. Down interfaces have names preceded with “[0208] ˜”. Round-robin coordinator 410 has six down coordination interfaces (previously referred to as coordinator coordination interface 412), with constraints that make the turning off of any coordinator control port 414 (also referred to as access control port) turn on the coordinator control port 414 of the next coordinator coordination interface 412 in line. Table 2 presents all constituents of the round-robin coordinator.
    Constituent Value
    coordination interfaces I = AccessInterface1-6
    modes M = access1-6
    bindings B = ∀1≦i≦6(˜AccessInterfacei.access, accessi) ∪
    constraints N = ∀1≦i≦6(∀(1≦j≦6){circumflex over ( )}(i≠j) accesst
    Figure US20020174415A1-20021121-P00802
    Figure US20020174415A1-20021121-P00801
    accessj)
    actions A = ∀1≦i≦6 accessi : −accessi : +access(i+1) mod 6
    subcoordinators X = Ø
  • This tuple describes an implementation of a round-robin coordination protocol for a particular system with six components, as shown in round-[0209] robin coordinator 410. We use a coordination class to describe a general coordination protocol that may not have a fixed number of coordinator coordination interfaces. The coordination class is a six-tuple (Ic; Mc; Bc; Nc; Ac; Xc) in which:
  • Ic is a set of coordination interface descriptors in which each descriptor provides a type of coordination interface and specifies the number of such interfaces allowed within the coordination class. [0210]
  • Mc is a set of abstract modes that supplies appropriate modes when a coordination class is instantiated with a fixed number of coordinator coordination interfaces. [0211]
  • Bc is a set of abstract bindings that forms appropriate bindings between elements when the coordination class is instantiated. [0212]
  • Nc is a set of abstract constraints that ensures appropriate constraints between coordination interface elements are in place as specified at instantiation. [0213]
  • Ac is a set of abstract action triples for the coordinator. [0214]
  • Xc is a set of coordination classes (hierarchy). [0215]
  • While a coordinator describes coordination protocol for a particular application, it requires many aspects, such as the number of coordination interfaces and datatypes, to be fixed. Coordination classes describe protocols across many applications. The use of the coordination interface descriptors instead of coordination interfaces lets coordination classes keep the number of interfaces and datatypes undetermined until a particular coordinator is instantiated. For example, a round-robin coordinator contains a fixed number of coordinator coordination interfaces with specific bindings and constraints between the message and state ports on the fixed number of coordinator coordination interfaces. A round-robin coordination class contains descriptors for the coordinator coordination interface type, without stating how many coordinator coordination interfaces, and instructions for building bindings and constraints between ports on the coordinator coordination interfaces when a particular round-robin coordinator is created. [0216]
  • 5. Components [0217]
  • A component is a six-tuple (I; A; M; V; S; X) in which: [0218]
  • I is a set of coordination interfaces. [0219]
  • A is a set of action triples. [0220]
  • M is a set of modes. [0221]
  • V is a set of typed variables. [0222]
  • S is a set of subcomponents. [0223]
  • X is a set of coordinators used to connect the subcomponents to each other and to the coordination interfaces. [0224]
  • Actions within a coordinator are fairly regular, and hence a large number of actions can be described with a few simple expressions. However, actions within a component are frequently diverse and can require distinct definitions for each individual action. Typically, a component's action triples are represented with a table that has three columns: one for the mode, one for the trigger, and one for the action code. Table 3 shows some example actions from a component that can use round-robin coordination. [0225]
    Mode Trigger Action
    access tick AccessInterface.s.send(“Test message”);
    -access;
    Figure US20020174415A1-20021121-P00801
    access
    tick waitCount + +;
  • A component resembles a coordinator in several ways (for example, the modes and coordination interfaces in each are virtually the same). Components can have internal coordinators, and because of the internal coordinators, components do not always require either bindings or constraints. In the following subsections, various aspects of components are described in greater detail. Theses aspects of components include variable scope, action transparency, and execution semantics for systems of actions. [0226]
  • A. Variable Scope [0227]
  • To enhance a component's modularity, all variables accessed by an action within the component are either local to the action, local to the immediate parent component of the action, or accessed by the immediate parent component of the action via state ports in one of the parent component's coordination interfaces. For a component's variables to be available to a hierarchical child component, they must be exported by the component and then imported by the child of the component. [0228]
  • B. Action Transparency [0229]
  • An action within a component can be either a transparent action or an opaque action. Transparent and opaque actions each have different invocation semantics. The internal properties, i.e. control structures, variable, changes in state, operators, etc., of transparent actions are visible to all coordination-centric design tools. The design tools can separate, observe, and analyze all the internal properties of opaque actions. Opaque actions are source code. Opaque actions must be executed directly, and looking at the internal properties of opaque actions can be accomplished only through traditional, source-level debugging techniques. An opaque action must explicitly declare any mode changes and coordination interfaces that the opaque action may directly affect. [0230]
  • C. Action Execution [0231]
  • An action is triggered by an event, such as data arriving or departing a message port, or changes in value being applied to a state port. An action can change the value of a state port, generate an event, and provide a way for the software system to interact with low-level device drivers. Since actions typically produce events, a single trigger can be propagated through a sequence of actions. [0232]
  • 6. Protocols Implemented with Coordination Classes [0233]
  • In this section, we describe several coordinators that individually implement some common protocols: subsumption, barrier synchronization, rendezvous, and dedicated RPC. [0234]
  • A. Subsumption Protocol [0235]
  • A subsumption protocol is a priority-based, preemptive resource allocation protocol commonly used in building small, autonomous robots, in which the shared resource is the robot itself. [0236]
  • FIG. 7 shows a set of coordination interfaces and a coordinator for implementing the subsumption protocol. With reference to FIG. 7, a [0237] subsumption coordinator 700 has a set of subsumption coordinator coordination interfaces 702, which have a subsume arbitrated coordinator control port 704 and an incoming subsume message port 706. Each subsume component 708 has a subsume component coordination interface 710. Subsume component coordination interface 710 has a subsume arbitrated component control port 712 and an outgoing subsume message port 714. Subsumption coordinator 700 and each subsume component 708 are connected by their respective coordination interfaces, 702 and 710. Each subsumption coordinator coordination interface 702 in subsumption coordinator 700 is associated with a priority. Each subsume component 708 has a behavior that can be applied to a robot (not shown). At any time, any subsume component 708 can attempt to assert its behavior on the robot. The asserted behavior coming from the subsume component 708 connected to the subsumption coordinator coordination interface 702 with the highest priority is the asserted behavior that will actually be performed by the robot. Subsume components 708 need not know anything about other components in the system. In fact, each subsume component 708 is designed to perform independently of whether their asserted behavior is performed or ignored.
  • [0238] Subsumption coordinator 700 further has a slave coordinator coordination interface 716, which has an outgoing slave message port 718. Outgoing slave message port 718 is connected to an incoming slave message port 720. Incoming slave message port 720 is part of a slave coordination interface 722, which is connected to a slave 730. When a subsume component 708 asserts a behavior and that component has the highest priority, subsumption coordinator 700 will control slave 730 (which typically controls the robot) based on the asserted behavior.
  • The following constraint describes the basis of the [0239] subsumption coordinator 700's behavior: subsume p i = 1 p - 1 subsume i
    Figure US20020174415A1-20021121-M00001
  • This means that if any [0240] subsume component 708 has a subsume arbitrated component control port 712 that has a value of true, then all lower-priority subsume arbitrated component control ports 712 are set to false. An important difference between round-robin and subsumption is that in round-robin, the resource access right is transferred only when surrendered. Therefore, round-robin coordination has cooperative release semantics. However, in subsumption coordination, a subsume component 708 tries to obtain the resource whenever it needs to and succeeds only when it has higher priority than any other subsume component 708 that needs the resource at the same time. A lower-priority subsume component 708 already using the resource must surrender the resource whenever a higher-priority subsume component 708 tries to access the resource. Subsumption coordination uses preemptive release semantics, whereby each subsume component 708 must always be prepared to relinquish the resource.
  • Table 4 presents the complete tuple for the subsumption coordinator. [0241]
    Constituent Value
    coordination interfaces I = (Subsume1-n) ∪ (Output)
    modes M = subsume1-n
    bindings B = ∀1≦i≦n (Subsumei.subsume, subsumei) ∪
    constraints N = ∀1≦i≦n (∀(1≦j≦i) subsumei
    Figure US20020174415A1-20021121-P00802
    Figure US20020174415A1-20021121-P00801
    subsumej)
    actions A = Ø
    subcoordinators X = Ø
  • B. Barrier Synchronization Protocol [0242]
  • Other simple types of coordination that components might engage in enforce synchronization of activities. An example is barrier synchronization, in which each component reaches a synchronization point independently and waits. FIG. 8 depicts a [0243] barrier synchronization coordinator 800. With reference to FIG. 8, barrier synchronization coordinator 800 has a set of barrier synchronization coordination interfaces 802, each of which has a coordinator arbitrated state port 804, named wait. Coordinator arbitrated state port 804 is connected to a component arbitrated state port 806, which is part of a component coordination interface 808. Component coordination interface 808 is connected to a component 810. When all components 810 reach their respective synchronization points, they are all released from waiting. The actions for a barrier synchronization coordinator with n interfaces are: Λ 0 i < n wait i : : 0 j < n - wait j
    Figure US20020174415A1-20021121-M00002
  • In other words, when all wait modes (not shown) become active, each one is released. The blank between the two colons indicates that the trigger event is the guard condition becoming true. [0244]
  • C. Rendezvous Protocol [0245]
  • A resource allocation protocol similar to barrier synchronization is called rendezvous. FIG. 9 depicts a [0246] rendezvous coordinator 900 in accordance with the present invention. With reference to FIG. 9, rendezvous coordinator 900 has a rendezvous coordination interface 902, which has a rendezvous arbitrated state port 904. A set of rendezvous components 906, each of which may perform different functions or have vastly different actions and modes, has a rendezvous component coordination interface 908, which includes a component arbitrated state port 910. Rendezvous components 906 connect to rendezvous coordinator 900 through their respective coordination interfaces, 908 and 902. Rendezvous coordinator 900 further has a rendezvous resource coordination interface 912, which has a rendezvous resource arbitrated state port 914, also called available. A resource 916 has a resource coordination interface 918, which has a resource arbitrated state port 920. Resource 916 is connected to rendezvous coordinator 900 by their complementary coordination interfaces, 918 and 912 respectively.
  • With rendezvous-style coordination, there are two types of participants: [0247] resource 916 and several resource users, here rendezvous components 916. When resource 916 is available, it activates its resource arbitrated state port 920, also referred to as its available control port. If there are any waiting rendezvous components 916, one will be matched with the resource; both participants are then released. This differs from subsumption and round-robin in that resource 916 plays an active role in the protocol by activating its available control port 920.
  • The actions for [0248] rendezvous coordinator 900 are:
  • available[0249] i
    Figure US20020174415A1-20021121-P00900
    waitj: : −availablel, −waitj
  • This could also be accompanied by other modes that indicate the status after the rendezvous. With rendezvous coordination, it is important that only one component at a time be released from wait mode. [0250]
  • D. Dedicated RPC Protocol [0251]
  • A coordination class that differs from those described above is dedicated RPC. FIG. 10 depicts a dedicated RPC system. With reference to FIG. 10, a [0252] dedicated RPC coordinator 1000 has an RPC server coordination interface 1002, which includes an RPC server imported state port 1004, an RPC server output message port 1006, and an RPC server input message port 1008. Dedicated RPC coordinator 1000 is connected to a server 1010. Server 1010 has a server coordination interface 1012, which has a server exported state port 1014, a server input data port 1016, and a server output data port 1018. Dedicated RPC coordinator 1000 is connected to server 1010 through their complementary coordination interfaces, 1002 and 1012 respectively. Dedicated RPC coordinator 1000 further has an RPC client coordination interface 1020, which includes an RPC client imported state port 1022, an RPC client input message port 1024, and an RPC client output message port 1026. Dedicated RPC coordinator 1000 is connected to a client 1028 by connecting RPC client coordination interface 1020 to a complementary client coordination interface 1030. Client coordination interface 1030 has a client exported state port 1032, a client output message port 1034, and a client input message port 1036.
  • The dedicated RPC protocol has a client/server protocol in which [0253] server 1010 is dedicated to a single client, in this case client 1028. Unlike the resource allocation protocol examples, the temporal behavior of this protocol is the most important factor in defining it. The following transaction listing describes this temporal behavior:
  • [0254] Client 1028 enters blocked mode by changing the value stored at client exported state port 1032 to true.
  • [0255] Client 1028 transmits an argument data message to server 1010 via client output message port 1034.
  • [0256] Server 1010 receives the argument (labeled “a”) data message via server input data port 1016 and enters serving mode by changing the value stored in server exported state port 1014 to true.
  • [0257] Server 1010 computes return value.
  • [0258] Server 1010 transmits a return (labeled “r”) message to client 1020 via server output data port 1018 and exits serving mode by changing the value stored in server exported state port 1014 to false.
  • [0259] Client 1028 receives the return data message via client input message port 1036 and exits blocked mode by changing the value stored at client exported state port 1032 to false.
  • This can be presented more concisely with an expression describing causal relationships: [0260] T RPC = + client . blocked -> client . transmits -> + server . serving -> server . transmits -> ( - server . serving client . receives ) -> - client . blocked
    Figure US20020174415A1-20021121-M00003
  • The transactions above describe what is supposed to happen. Other properties of this protocol must be described with temporal logic predicates. [0261]
  • server.serving[0262]
    Figure US20020174415A1-20021121-P00002
    client.blocked
  • server.serving[0263]
    Figure US20020174415A1-20021121-P00002
    F(server.r.output)
  • server.a.input[0264]
    Figure US20020174415A1-20021121-P00002
    F(server.serving)
  • The r in server.r.output refers to the server [0265] output data port 1018, also labeled as the r event port on the server, and the a in serving.a.input refers to server input data port 1016, also labeled as the a port on the server (see FIG. 10).
  • Together, these predicates indicate that (1) it is an error for [0266] server 1010 to be in serving mode if client 1028 is not blocked; (2) after server 1010 enters serving mode, a response message is sent or else an error occurs; and (3) server 1010 receiving a message means that server 1010 must enter serving mode. Relationships between control state and data paths must also be considered, such as:
  • (client.a[0267]
    Figure US20020174415A1-20021121-P00002
    client.blocked)
  • In other words, [0268] client 1028 must be in blocked mode whenever it sends an argument message.
  • The first predicate takes the same form as a constraint; however, since [0269] dedicated RPC coordinator 1000 only imports the client:blocked and server:serving modes (i.e., through RPC client imported state port 1022 and RPC server imported state port 1004 respectively), dedicated RPC coordinator 1000 is not allowed to alter these values to comply. In fact, none of these predicates is explicitly enforced by a runtime system. However, the last two can be used as requirements and guarantees for interface type-checking.
  • [0270] 7. System-level Execution
  • Coordination-centric design methodology lets system specifications be executed directly, according to the semantics described above. When components and coordinators are composed into higher-order structures, however, it becomes essential to consider hazards that can affect system behavior. Examples include conflicting constraints, in which local resolution semantics may either leave the system in an inconsistent state or make it cycle forever, and conflicting actions that undo one another's behavior. In the remainder of this section, the effect of composition issues on system-level executions is explained. [0271]
  • A. System Control Configurations [0272]
  • A configuration is the combined control state of a system-basically, the set of active modes at a point in time. In other words, a configuration in coordination-centric design is a bit vector containing one bit for each mode in the system. The bit representing a control state is true when the control state is active and false when the control state is inactive. Configurations representing the complete system control state facilitate reasoning on system properties and enable several forms of static analysis of system behavior. [0273]
  • B. Action-trigger Propagation [0274]
  • Triggers are formal parameters for events. As mentioned earlier, there are two types of triggers: (1) control triggers, invoked by control events such as mode change requests, and (2) data flow triggers, invoked by data events such as message arrivals or departures. Components and coordinators can both request mode changes (on the modes visible to them) and generate new messages (on the message ports visible to them). Using actions, these events can be propagated through the components and coordinators in the system, causing a cascade of data transmissions and mode change requests, some of which can cancel other requests. When the requests, and secondary requests implied by them, are all propagated through the system, any requests that have not been canceled are confirmed and made part of the system's new configuration. [0275]
  • Triggers can be immediately propagated through their respective actions or delayed by a scheduling step. Recall that component actions can be either transparent or opaque. Transparent actions typically propagate their triggers immediately, although it is not absolutely necessary that they do so. Opaque actions typically must always delay propagation. [0276]
  • 1. Immediate Propagation [0277]
  • Some triggers must be immediately propagated through actions, but only on certain types of transparent actions. Immediate propagation can often involve static precomputation of the effect of changes, which means that certain actions may never actually be performed. For example, consider a system with a coordinator that has an action that activates mode A and a coordinator with an action that deactivates mode B whenever A is activated. Static analysis can be used to determine in advance that any event that activates A will also deactivate B; therefore, this effect can be executed immediately without actually propagating it through A. [0278]
  • 2. Delayed Propagation [0279]
  • Trigger propagation through opaque actions must typically be delayed, since the system cannot look into opaque actions to precompute their results. Propagation may be delayed for other reasons, such as system efficiency. For example, immediate propagation requires tight synchronization among software components. If functionality is spread among a number of architectural components, immediate propagation is impractical. [0280]
  • C. A Protocol Implemented with a Compound Coordinator [0281]
  • Multiple coordinators are typically needed in the design of a system. The multiple coordinators can be used together for a single, unified behavior. Unfortunately, one coordinator may interfere with another's behavior. [0282]
  • FIG. 11 shows a combined [0283] coordinator 1100 with both preemption and round-robin coordination for controlling access to a resource, as discussed above. With reference to FIG. 11, components 1102, 1104, 1106, 1108, and 1110 primarily use round-robin coordination, and each includes a component coordination interface 1112, which has a component arbitrated control port 1114 and a component output message port 1116. However, when a preemptor component 1120 needs the resource, preemptor component 1120 is allowed to grab the resource immediately. Preemptor component 1120 has a preemptor component coordination interface 1122. Preemptor component coordination interface 1122 has a preemptor arbitrated state port 1124, a preemptor output message port 1126, and a preemptor input message port 1128.
  • All [0284] component coordination interfaces 1112 and preemptor component coordination interface 1122 are connected to a complementary combined coordinator coordination interface 1130, which has a coordinator arbitrated state port 1132, a coordinator input message port 1134, and a coordinator output message port 1136. Combined coordinator 1100 is a hierarchical coordinator and internally has a round-robin coordinator (not shown) and a preemption coordinator (not shown). Combined coordinator coordination interface 1130 is connected to a coordination interface to round-robin 1138 and a coordination interface to preempt 1140. Coordinator arbitrated state port 1132 is bound to both a token arbitrated control port 1142, which is part of coordination interface to round-robin 1138, and to a preempt arbitrated control port 1144, which is part of coordination interface to preempt 1140. Coordinator input message port 1134 is bound to an interface to a round-robin output message port 1146, and coordinator output message port 1136 is bound to an interface to round-robin input message port 1148.
  • Thus, preemption interferes with the normal round-robin ordering of access to the resource. After a preemption-based access, the resource moves to the component that in round-robin-ordered access would be the successor to [0285] preemptor component 1120. If the resource is preempted too frequently, some components may starve.
  • D. Mixing Control and Data in Coordinators [0286]
  • Since triggers can be control-based, data-based, or both, and actions can produce both control and data events, control and dataflow aspects of a system are coupled through actions. Through combinations of actions, designers can effectively employ modal data flow, in which relative schedules are switched on and off based on the system configuration. [0287]
  • Relative scheduling is a form of coordination. Recognizing this and understanding how it affects a design can allow a powerful class of optimizations. Many data-centric systems (or subsystems) use conjunctive firing, which means that a component buffers messages until a firing rule is matched. When matching occurs, the component fires, consuming the messages in its buffer that caused it to fire and generating a message or messages of its own. Synchronous data flow systems are those in which all components have only firing rules with constant message consumption and generation. [0288]
  • FIG. 12A shows a system in which a [0289] component N1 1200 is connected to a component N3 1202 by a data transfer coordinator 1204 and a component N2 1206 is connected to component N3 1202 by a second data transfer coordinator 1208. Component N3 1202 fires when it accumulates three messages on a port c 1210 and two messages on a port d 1212. On firing, component N3 1202 produces two messages on a port o 1214. Coordination control state tracks the logical buffer depth for these components. This is shown with numbers representing the logical queue depth of each port in FIG. 12.
  • FIG. 12B shows the system of FIG. 12A in which [0290] data transfer coordinator 1204 and second data transfer coordinator 1208 have been merged to form a merged data transfer coordinator 1216. Merging the coordinators in this example provides an efficient static schedule for component firing. Merged data transfer coordinator 1216 fires component N1 1200 three times and component N2 1206 twice. Merged data transfer coordinator 1216 then fires component N3 1202 twice (to consume all messages produced by component N1 1200 and component N2 1206).
  • Message rates can vary based on mode. For example, a component may consume two messages each time it fires in one mode and four each time it fires in a second mode. For a component like this, it is often possible to merge schedules on a configuration basis, in which each configuration has static consumption and production rates for all affected components. [0291]
  • E. Coordination Transformations [0292]
  • In specifying complete systems, designers must often specify not only the coordination between two objects, but also the intermediate mechanism they must use to implement this coordination. While this intermediate mechanism can be as simple as shared memory, it can also be another coordinator; hence coordination may be, and often is, layered. For example, RPC coordination often sits on top of a TCP/IP stack or on an IrDA stack, in which each layer coordinates with peer layers on other processing elements using unique coordination protocols. Here, each layer provides certain capabilities to the layer directly above it, and the upper layer must be implemented in terms of them. [0293]
  • In many cases, control and communication synthesis can be employed to automatically transform user-specified coordination to a selected set of standard protocols. Designers may have to manually produce transformations for nonstandard protocols. [0294]
  • F. Dynamic Behavior with Compound Coordinators [0295]
  • Even in statically bound systems, components may need to interact in a fashion that appears dynamic. For example, RPC-style coordination often has multiple clients for individual servers. Here, there is no apparent connection between client and server until one is forged for a transaction. After the connection is forged, however, the coordination proceeds in the same fashion as dedicated RPC. [0296]
  • Our approach to this is to treat the RPC server as a shared resource, requiring resource allocation protocols to control access. However, none of the resource allocation protocols described thus far would work efficiently under these circumstances. In the following subsections, an appropriate protocol for treating the RPC as a shared resource will be described and how that protocol should be used as part of a complete multiclient RPC coordination class-one that uses the same RPC coordination interfaces described earlier-will be discussed. [0297]
  • 1. First Come/First Serve protocol (FCFS) [0298]
  • FIG. 13 illustrates a first come/first serve (FCFS) resource allocation protocol, which is a protocol that allocates a shared resource to the requester that has waited longest. With reference to FIG. 13, a [0299] FCFS component interface 1300 for this protocol has a request control port 1302, an access control port 1304 and a component outgoing message port 1306. A FCFS coordinator 1308 for this protocol has a set of FCFS interfaces 1310 that are complementary to FCFS component interfaces 1300, having a FCFS coordinator request control port 1312, a FCFS coordinator access port 1314, and a FCFS coordinator input message port 1316. When a component 1318 needs to access a resource 1320, it asserts request control port 1302. When granted access, FCFS coordinator 1308 asserts the appropriate FCFS coordinator access port 1314, releasing FCFS coordinator request control port 1312.
  • To do this, [0300] FCFS coordinator 1308 uses a rendezvous coordinator and two round-robin coordinators. One round-robin coordinator maintains a list of empty slots in which a component may be enqueued, and the other round-robin coordinator maintains a list showing the next component to be granted access. When an FCFS coordinator request control port 1312 becomes active, FCFS coordinator 1308 begins a rendezvous access to a binder action. When activated, this action maps the appropriate component 1318 to a position in the round-robin queues. A separate action cycles through one of the queues and selects the next component to access the server. As much as possible, FCFS coordinator 1308 attempts to grant access to resource 1320 to the earliest component 1318 having requested resource 1320, with concurrent requests determined based on the order in the rendezvous coordinator of the respective components 1318.
  • 2. Multiclient RPC [0301]
  • FIG. 14 depicts a [0302] multiclient RPC coordinator 1400 formed by combining FCFS coordinator 1308 with dedicated RPC coordinator 1000. With reference to FIG. 14, a set of clients 1402 have a set of client coordination interfaces 1030, as shown in FIG. 10. In addition, multiclient RPC coordinator 1400 has a set of RPC client coordination interfaces 1020, as shown in FIG. 10. For each RPC client coordination interface 1020, RPC client input message port 1024, of RPC client coordination interface 1020, is bound to the component outgoing message port 1306 of FCFS coordinator 1308. Message transfer action 1403 serves to transfer messages between RPC client input message port 1024 and component outgoing message port 1306. For coordinating the actions of multiple clients 1402, multiclient RPC coordinator 1400 must negotiate accesses to a server 1404 and keep track of the values returned by server 1404.
  • F. Monitor Modes and Continuations [0303]
  • Features such as blocking behavior and exceptions can be implemented in the coordination-centric design methodology with the aid of monitor modes. Monitor modes are modes that exclude all but a selected set of actions called continuations, which are actions that continue a behavior started by another action. [0304]
  • 1. Blocking Behavior [0305]
  • With blocking behavior, one action releases control while entering a monitor mode, and a continuation resumes execution after the anticipated response event. Monitor mode entry must be immediate (at least locally), so that no unexpected actions can execute before they are blocked by such a mode. [0306]
  • Each monitor mode has a list of actions that cannot be executed when it is entered. The allowed (unlisted) actions are either irrelevant or are continuations of the action that caused entry into this mode. There are other conditions, as well. This mode requires an exception action if forced to exit. However, this exception action is not executed if the monitor mode is turned off locally. [0307]
  • When components are distributed over a number of processing elements, it is not practical to assume complete synchronization of the control state. In fact, there are a number of synchronization options available as detailed in Chou, P “Control Composition and Synthesis of Distributed Real-Time Embedded Systems”, Ph.D. dissertation, University of Washington, 1998. [0308]
  • 2. Exception Handling [0309]
  • Exception actions are a type of continuation. When in a monitor mode, exception actions respond to unexpected events or events that signal error conditions. For example, [0310] multiclient RPC coordinator 1400 can bind
    Figure US20020174415A1-20021121-P00901
    client.blocked to a monitor mode and set an exception action on +server.serving. This will signal an error whenever the server begins to work when the client is not blocked for a response.
  • 8. A Complete System Example [0311]
  • FIG. 15 depicts a large-scale example system under the coordination-centric design methodology. With reference to FIG. 15, the large scale system is a bimodal digital [0312] cellular network 1500. Network 1500 is for the most part a simplified version of a GSM (global system for mobile communications) cellular network. This example shows in greater detail how the parts of coordination-centric design work together and demonstrates a practical application of the methodology. Network 1500 has two different types of cells, a surface cell 1502 (also referred to as a base station 1502) and a satellite cell 1504. These cells are not only differentiated by physical position, but by the technologies they use to share network 1500. Satellite cells 1504 use a code division multiple access (CDMA) technology, and surface cells 1502 use a time division multiple access (TDMA) technology. Typically, there are seven frequency bands reserved for TDMA and one band reserved for CDMA. The goal is for as much communication as possible to be conducted through the smaller TDMA cells, here surface cells 1502, because power requirements for a CDMA cells, here satellite cell 1504, increase with the number of users in the CDMA cell. Mobile units 1506, or wireless devices, can move between surface cells 1502, requiring horizontal handoffs between surface cells 1502. Several surface cells 1502 are typically connected to a switching center 1508. Switching center 1508 is typically connected to a telephone network or the Internet 1512. In addition to handoffs between surface cells 1502, the network must be able to hand off between switching centers 1508. When mobile units 1506 leave the TDMA region, they remain covered by satellite cells 1504 via vertical handoffs between cells. Since vertical handoffs require changing protocols as well as changing base stations and switching centers, they can be complicated in terms of control.
  • Numerous embedded systems comprise the overall system. For example, [0313] switching center 1508 and base stations, surface cells 1502, are required as part of the network infrastructure, but cellular phones, handheld Web browsers, and other mobile units 1506 may be supported for access through network 1500. This section concentrates on the software systems for two particular mobile units 1506: a simple digital cellular phone (shown in FIG. 16) and a handheld Web browser (shown in FIG. 24). These examples require a wide variety of coordinators and reusable components. Layered coordination is a feature in each system, because a function of many subsystems is to perform a layered protocol. Furthermore, this example displays how the hierarchically constructed components can be applied in a realistic system to help manage the complexity of the overall design.
  • To begin this discussion, we describe the cellular phone in detail, focusing on its functional components and the formalization of their interaction protocols. We then discuss the handheld Web browser in less detail but highlight the main ways in which its functionality and coordination differ from those of the cellular phone. In describing the cellular phone, we use a top-down approach to show how a coherent system organization is preserved, even at a high level. In describing the handheld Web browser, we use a bottom-up approach to illustrate component reuse and bottom-up design. [0314]
  • A. Cellular Phone [0315]
  • FIG. 16 shows a top-level coordination diagram of the behavior of a [0316] cell phone 1600. Rather than using a single coordinator that integrates the components under a single protocol, we use several coordinators in concert. Interactions between coordinators occur mainly within the components to which they connect.
  • With reference to FIG. 16, [0317] cell phone 1600 supports digital encoding of voice streams. Before it can be used, it must be authenticated with a home master switching center (not shown). This authentication occurs through a registered master switch for each phone and an authentication number from the phone itself. There are various authentication statuses, such as full access, grey-listed, or blacklisted. For cell phone 1600, real-time performance is more important than reliability. A dropped packet is not retransmitted, and a late packet is dropped since its omission degrades the signal less than its late incorporation.
  • Each component of [0318] cell phone 1600 is hierarchical. A GUI 1602 lets users enter phone numbers while displaying them and query an address book 1604 and a logs component 1606. Address book 1604 is a database that can map names to phone numbers and vice versa. GUI 1602 uses address book 1604 to help identify callers and to look up phone numbers to be dialed. Logs 1606 track both incoming and outgoing calls as they are dialed. A voice component 1608 digitally encodes and decodes, and compresses and decompresses, an audio signal. A connection component 1610 multiplexes, transmits, receives, and demultiplexes the radio signal and separates out the voice stream and caller identification information.
  • Coordination among the above components makes use of several of the coordinators discussed above. Between [0319] connection component 1610 and a clock 1612, and between logs 1606 and connection component 1610, are unidirectional data transfer coordinators 600 as described with reference to FIG. 6A. Between voice component 1608 and connection component 1610, and between GUI 1602 and connection component 1610, are bidirectional data transfer coordinators 604, as described with reference to FIG. 6B. Between clock 1612 and GUI 1602 is a state unification coordinator 606, as described with reference to FIG. 6C. Between GUI 1602 and address book 1604 is a dedicated RPC coordinator 1000 as described with reference to FIG. 10, in which address book 1604 has client 1028 and GUI 1602 has server 1010.
  • There is also a custom GUI/[0320] log coordinator 1614 between logs 1606 and GUI 1602. GUI/log coordinator 1614 lets GUI 1602 transfer new logged information through an r output message port 1616 on a GUI coordination interface 1618 to an r input message port 1620 on a log coordination interface 1622. GUI/log coordinator 1614 also lets GUI 1602 choose current log entries through a pair of c output message ports 1624 on GUI coordination interface 1618 and a pair of c input message ports 1626 on log coordination interface 1622. Logs 1606 continuously display one entry each for incoming and outgoing calls.
  • 1. GUI Component [0321]
  • FIG. 17A is a detailed view of [0322] GUI component 1602, of FIG. 16. With reference to FIG. 17A, GUI component 1602 has two inner components, a keypad 1700 and a text-based liquid crystal display 1702, as well as several functions of its own (not shown). Each time a key press occurs, it triggers an action that interprets the press, depending on the mode of the system. Numeric presses enter values into a shared dialing buffer. When a complete number is entered, the contents of this buffer are used to establish a new connection through connection component 1610. Table 5 shows the action triples for GUI 1602.
    Mode Trigger Action
    Idle
    Figure US20020174415A1-20021121-P00803
    numBuffer.append(keypress.val)
    Send radio.send(numBuffer.val)
    +outgoingCall
    Disconnect Nil
    Leftarrow AddressBook.forward()
    +lookupMode
    Rightarrow log.lastcall()
    +outlog
    LookupMode Leftarrow AddressBook.forward()
    Rightarrow AddressBook.backward()
  • An “Addr Coord” coordinator [0323] 1704 includes an address book mode (not shown) in which arrow key presses are transformed into RPC calls.
  • 2. Logs Component [0324]
  • FIG. 17B is a detailed view of [0325] logs component 1606, which tracks all incoming and outgoing calls. With reference to FIG. 17B, both GUI component 1602 and connection component 1610 must communicate with logs component 1606 through specific message ports. Those specific message ports include a transmitted number message port 1720, a received number message port 1722, a change current received message port 1724, a change current transmitted message port 1726, and two state ports 1728 and 1729 for presenting the current received and current transmitted values, respectively.
  • [0326] Logs component 1606 contains two identical single-log components: a send log 1730 for outgoing calls and a receive log 1740 for incoming calls. The interface of logs component 1606 is connected to the individual log components by a pair of adapter coordinators, Adap1 1750 and Adap2 1752. Adap1 1750 has an adapter receive interface 1754, which has a receive imported state port 1756 and a receive output message port 1758. Adap1 1750 further has an adapter send interface 1760, which has a send imported state port 1762 and a send output message port 1764. Within Adap1, state port 1728 is bound to receive imported state port 1756, change current received message port 1724 is bound to receive output message port 1758, received number message port 1722 is bound to a received interface output message port 1766 on a received number coordination interface 1768, change current transmitted message port 1726 is bound to send output message port 1764, and state port 1729 is bound to Up.rc is bound to send imported state port 1762.
  • 3. Voice Component [0327]
  • FIG. 18A is a detailed view of [0328] voice component 1608 of FIG. 16. Voice component 1608 has a compression component 1800 for compressing digitized voice signals before transmission, a decompression component 1802 for decompressing received digitized voice signals, and interfaces 1804 and 1806 to analog transducers (not shown) for digitizing sound to be transmitted and for converting received transmissions into sound. Voice component 1608 is a pure data flow component containing sound generator 1808 which functions as a white-noise generator, a ring tone generator, and which has a separate port for each on sound generator interface 1810, and voice compression functionality in the form of compression component 1800 and decompression component 1802.
  • 4. Connection Component [0329]
  • FIG. 18B is a detailed view of [0330] connection component 1610 of FIG. 16. With reference to FIG. 18B, connection component 1610 coordinates with voice component 1608, logs component 1606, clock 1612, and GUI 1602. In addition, connection component 1610 is responsible for coordinating the behavior of cell phone 1600 with a base station that owns the surface cell 1502 (shown in FIG. 15), a switching center 1508 (shown in FIG. 15), and all other phones (not shown) within surface cell 1502. Connection component 1610 must authenticate users, establish connections, and perform handoffs as needed—including appropriate changes in any low-level protocols (such as a switch from TDMA to CDMA).
  • FIG. 19 depicts a set of communication layers between [0331] connection component 1610 of cell phone 1600 and base station 1502 or switching center 1508. With reference to FIG. 19, has several subcomponents, or lower-level components, each of which coordinates with an equivalent, or peer, layer on either base station 1502 or switching center 1508. The subcomponents of connection component 1610 include a cell phone call manager 1900, a cell phone mobility manager 1902, a cell phone radio resource manager 1904, a cell phone link protocol manager 1906, and a cell phone transport manager 1908 which is responsible for coordinating access to and transferring data through the shared airwaves TDMA and CDMA coordination. Each subcomponent will be described in detail including how each fits into the complete system.
  • [0332] Base station 1502 has a call management coordinator 1910, a mobility management coordinator 1912, a radio resource coordinator 1914 (BSSMAP 1915), a link protocol coordinator 1916 (SCCO 1917), and a transport coordinator 1918 (MTP 1919). Switching center 1508 has a switching center call manager 1920, a switching center mobility manager 1922, a BSSMAP 1924, a SCCP 1926, and an MTP 1928.
  • a. Call Management [0333]
  • FIG. 20 is a detailed view of a [0334] call management layer 2000 consisting of cell phone call manager 1900, which is connected to switching center call manager 1920 by call management coordinator 1910. With reference to FIG. 20, call management layer 2000 coordinates the connection between cell phone 1600 and switching center 1508. Call management layer 2000 is responsible for dialing, paging, and talking. Call management layer 2000 is always present in cell phone 1600, though not necessarily in Internet appliances (discussed later). Cell phone call manager 1900 includes a set of modes (not shown) for call management coordination that consists of the following modes:
  • Standby [0335]
  • Dialing [0336]
  • RingingRemote [0337]
  • Ringing [0338]
  • CallInProgress [0339]
  • Cell [0340] phone call manager 1900 has a cell phone call manager interface 2002. Cell phone call manager interface 2002 has a port corresponding to each of the above modes. The standby mode is bound to a standby exported state port 2010. The dialing mode is bound to a dialing exported state port 2012. The RingingRemote mode is bound to a RingingRemote imported state port 2014. The Ringing mode is bound to a ringing imported state port 2016. The CallInProgress mode is bound to a CallInProgress arbitrated state port 2018.
  • Switching [0341] center call manager 1920 includes the following modes (not shown) for call management coordination at the switching center:
  • Dialing [0342]
  • RingingRemote [0343]
  • Paging [0344]
  • CallInProgress [0345]
  • Switching [0346] center call manager 1920 has a switching center call manager coordination interface 2040, which includes a port for each of the above modes within switching center call manager 1920.
  • When [0347] cell phone 1600 requests a connection, switching center 1508 creates a new switching center call manager and establishes a call management coordinator 1910 between cell phone 1600 and switching center call manager 1920.
  • b. Mobility Management [0348]
  • A mobility management layer authenticates [0349] mobile unit 1506 or cell phone 1600. When there is a surface cell 1502 available, mobility manager 1902 contacts the switching center 1508 for surface cell 1502 and transfers a mobile unit identifier (not shown) for mobile unit 1506 to switching center 1508. Switching center 1508 then looks up a home motor switching center for mobile unit 1506 and establishes a set of permissions assigned to mobile unit 1506. This layer also acts as a conduit for the call management layer. In addition, the mobility management layer performs handoffs between base stations 1502 and switching centers 1508 based on information received from the radio resource layer.
  • C. Radio Resource [0350]
  • In the radio resource layer, [0351] radio resource manager 1904, chooses the target base station 1502 and tracks changes in frequencies, time slices, and CDMA codes. Cell phones may negotiate with up to 16 base stations simultaneously. This layer also identifies when handoffs are necessary.
  • d. Link Protocol [0352]
  • The link layer manages a connection between [0353] cell phone 1600 and base station 1502. In this layer, link protocol manager 1906 packages data for transfer to base station 1502 from cell phone 1600.
  • e. Transport [0354]
  • FIG. 21A is a detailed view of [0355] transport component 1908 of connection component 1610. Transport component 1908 has two subcomponents, a receive component 2100 for receiving data and a transmit component 2102 for transmitting data. Each of these subcomponents has two parallel data paths a CDMA path 2104 and a TDMA/FDMA path 2106 for communicating in the respective network protocols.
  • FIG. 21B is a detailed view of a [0356] CDMA modulator 2150, which implements a synchronous data flow data path. CDMA modulator 2150 takes the dot-product of an incoming data signal along path 2152 and a stored modulation code for cell phone 1600 along path 2154, which is a sequence of chips, which are measured time signals having a value of −1 or +1.
  • [0357] Transport component 1908 uses CDMA and TDMA technologies to coordinate access to a resource shared among several cell phones 1600, i.e., the airwaves. Transport components 1908 supersede the FDMA technologies (e.g., AM and FM) used for analog cellular phones and for radio and television broadcasts. In FDMA, a signal is encoded for transmission by modulating it with a carrier frequency. A signal is decoded by demodulation after being passed through a band pass filter to remove other carrier frequencies. Each base station 1502 has a set of frequencies-chosen to minimize interference between adjacent cells. (The area covered by a cell may be much smaller than the net range of the transmitters within it.)
  • TDMA, on the other hand, coordinates access to the airwaves through time slicing. [0358] Cell phone 1600 on the network is assigned a small time slice, during which it has exclusive access to the media. Outside of the small time slice, cell phone 1600 must remain silent. Decoding is performed by filtering out all signals outside of the small time slice. The control for this access must be distributed. As such, each component involved must be synchronized to observe the start and end of the small time slice at the same instant.
  • Most TDMA systems also employ FDMA, so that instead of sharing a single frequency channel, [0359] cell phones 1600 share several channels. The band allocated to TDMA is broken into frequency channels, each with a carrier frequency and a reasonable separation between channels. Thus user channels for the most common implementations of TDMA can be represented as a two-dimensional array, in which the rows represent frequency channels and the columns represent time slices.
  • CDMA is based on vector arithmetic. In a sense, CDMA performs inter-cell-phone coordination using data flow. Instead of breaking up the band into frequency channels and time slicing these, CDMA regards the entire band as an n-dimensional vector space. Each channel is a code that represents a basis vector in this space. Bits in the signal are represented as either 1 or −1, and the modulation is the inner product of this signal and a basis vector of [0360] mobile unit 1506 or cell phone 1600. This process is called spreading, since it effectively takes a narrowband signal and converts it into a broadband signal.
  • Demultiplexing is simply a matter of taking the dot-product of the received signal with the appropriate basis vector, obtaining the original 1 or −1. With fast computation and the appropriate codes or basis vectors, the signal can be modulated without a carrier frequency. If this is not the case, a carrier and analog techniques can be used to fill in where computation fails. If a carrier is used, however, all units use the same carrier in all cells. [0361]
  • FIG. 22 shows TDMA and CDMA signals for four [0362] cell phones 1600. With reference to FIG. 22, for TDMA, each cell phone 1600 is assigned a time slice during which it can transmit. Cell phone 1 is assigned time slice t0, cell phone 2 is assigned time slice t1, cell phone 3 is assigned time slice t2, and cell phone 4 is assigned time slice t3. For CDMA, each cell phone 1600 is assigned a basis vector that it multiplies with its signal. Cell phone 1 is assigned the vector: ( - 1 1 - 1 1 )
    Figure US20020174415A1-20021121-M00004
  • [0363] Cell phone 2 is assigned the vector: ( 1 - 1 1 - 1 )
    Figure US20020174415A1-20021121-M00005
  • [0364] Cell phone 3 is assigned the vector: ( 1 1 - 1 - 1 )
    Figure US20020174415A1-20021121-M00006
  • [0365] Cell phone 4 is assigned the vector: ( - 1 - 1 1 1 )
    Figure US20020174415A1-20021121-M00007
  • Notice that these vectors form an orthogonal basis. [0366]
  • B. Handheld Web Browser [0367]
  • In the previous subsection, we demonstrated our methodology on a cell phone with a top-down design approach. In this subsection, we demonstrate our methodology with a bottom-up approach in building a handheld Web browser. [0368]
  • FIG. 23A is a LCD [0369] touch screen component 2300 for a Web browser GUI (shown in FIG. 24A) for a wireless device 1506. With reference to FIG. 23A, a LCD touch screen component 2300, has an LCD screen 2302 and a touch pad 2304.
  • FIG. 23B is a Web [0370] page access component 2350 for fetching and formatting web pages. With reference to FIG. 23B, web access component 2350 has a page fetch subcomponent 2352 and a page format subcomponent 2354. Web access component 2350 reads hypertext markup language (HTML) from a connection interface 2356, sends word placement requests to a display interface 2358, and sends image requests to the connection interface 2356. Web access component 2350 also has a character input interface to allow users to enter page requests directly and to fill out forms on pages that have forms.
  • FIG. 24A shows a completed handheld [0371] Web browser GUI 2400. With reference to FIG. 24A, handheld Web browser GUI 2400, has LCD touch screen component 2300, web access component 2350, and a pen stroke recognition component 2402 that translates pen strokes entered on touch pad 2304 into characters.
  • FIG. 24B shows the complete component view of a [0372] handheld Web browser 2450. With reference to FIG. 24B, handheld Web browser 2450 is formed by connecting handheld Web browser GUI 2400 to connection component 1610 of cell phone 1600 (described with reference to FIG. 16) with bi-directional data transfer coordinator 604 (described with reference to FIG. 6B). Handheld Web browser 2450 is an example of mobile unit 1506, and connects to the Internet through the cellular infrastructure described above. However, handheld Web browser 2450 has different access requirements than does cell phone 1600. For handheld Web browser 2450, reliability is more important than real-time delivery. Dropped packets usually require retransmission, so it is better to deliver a packet late than to drop it. Real-time issues primarily affect download time and are therefore secondary. Despite this, handheld Web browser 2450 must coordinate media access with cell phones 1600, and so it must use the same protocol as cell phones 1600 to connect to the network. For that reason, handheld Web browser 2450 can reuse connection component 1610 from cell phone 1600.
  • Debugging Techniques [0373]
  • In concept, debugging is a simple process. A designer locates the cause of undesired behavior in a system and fixes the cause. In practice, debugging—even of sequential software—remains difficult. Embedded systems are considerably more complicated to debug than sequential software, due to factors such as concurrence, distributed architectures, and real-time concerns. Issues taken for granted in sequential software, like a schedule that determines the order of all events (the program), are nonexistent in a typical distributed system. Locating and fixing bugs in these complex systems requires many factors including an understanding of the thought processes underpinning the design. [0374]
  • Prior art research into debugging distributed systems is diverse, eclectic, and lacks any standard notations. This application uses a standardized notation both to describe the prior art and the present invention. As a result of this standardized notation, the principles in the prior art follow those published in the referenced works. However, the specific notation, theorems, etc. may differ. [0375]
  • The two general classes of debugging techniques are event based debugging and state based debugging. Most debugging techniques for general-purpose distributed systems are event based. Event based debugging techniques operate by collecting event traces from individual system components and causally relating those event traces. These techniques require an ability to determine efficiently the causal ordering among any given pair of events. Determining the causal order can be difficult and costly. [0376]
  • Events may be primitive, or they may be hierarchical clusters of other events. Primitive events are abstractions for individual local occurrences that might be important to a debugger. Examples of primitive events in sequential programs are variable assignments and subroutine entry or returns. Primitive events for distributed systems include message send and receive events. [0377]
  • State-based debugging techniques are less commonly used in debugging distributed systems. State-based debugging techniques typically operate by presenting designers with views or snapshots of process state. Distributed systems are not tightly synchronized, and so these techniques traditionally involve only the state of individual processes. However, state-based debugging techniques can be applied more generally by relaxing the concept of an “instant in time” so that it can be effectively applied to asynchronous processes. [0378]
  • 1. Event-based Debugging [0379]
  • In this section, prior art systems for finding and tracking meaningful event orderings, despite limits in observation, are described. Typical ways in which event orderings are used in visualization tools through automated space/time diagrams are then described. [0380]
  • A. Event Order Determination and Observation [0381]
  • The behavior of a software system is determined by the events that occur and the order in which they occur. For sequential systems, this seems almost too trivial to mention; of course, a given set of events, such as [0382]
  • {x:=2, x:=x*2, x=5, y:=x}, [0383]
  • arranged in two different ways may describe two completely different behaviors. However, since a sequential program is essentially a complete schedule of events, ordering is explicit. Sequential debugging tools depend on the invariance of this event schedule to let programmers reproduce failures by simply using the same inputs. In distributed systems, as in any concurrent system, it is neither practical nor efficient to completely schedule all events. Concurrent systems typically must be designed with flexible event ordering. [0384]
  • Determining the order in which events occur in a distributed system is subject to the limits of observation. An observation is an event record collected by an observer. An observer is an entity that watches the progress of an execution and records events, but does not interfere with the system. To determine the order in which two events occur, an observer must measure them both against a common reference. [0385]
  • FIG. 25 shows a typical space/time diagram [0386] 2500, with space represented on a vertical axis 2502 and time represented on a horizontal axis 2504. With reference to FIG. 25, space/time diagram 2500 provides a starting point for discussing executions in distributed systems. Space/time diagram 2500 gives us a visual representation for discussing event ordering and for comparing various styles of observation. A set of horizontal world lines 2506, 2508, and 2510 each represent an entity that is stationary in space. The entities represented by horizontal world lines 2506, 2508, and 2510 are called processes and typically represent software processes in the subject system. The entities can also represent any entity that generates events in a sequential fashion. The spatial separation in the diagram, along vertical axis 2502, represents a virtual space, since several processes might execute on the same physical hardware. A diagonal world line 2512 is called a message and represents discrete communications that pass between two processes. A sphere 2514 represents an event. In subsequent figures vertical axis 2502 and horizontal axis 2504 are omitted from any space/time diagrams, unless vertical axis 2502 and horizontal axis 2504 provide additional clarity to a particular figure.
  • FIG. 26 shows a space/time diagram [0387] 2600 of two different observations of a single system execution, taken by a first observer 2602 and a second observer 2604. With reference to FIG. 26, first observer 2602 and second observer 2604 are entities that record event occurrence. First observer 2602 and second observer 2604 must each receive distinct notifications of each event that occurs and each must record the events in some total order. First observer 2602 and second observer 2604 are represented in space/time diagram 2600 as additional processes, or horizontal world lines. Each event recorded requires a signal from its respective process to both first observer 2602 and second observer 2604. The signals from an event x 2606 on a process 2608 to both first observer 2602 and second observer 2604 are embodied in messages 2610 and 2612, respectively. First observer 2602 records event x 2606 as preceding an event y 2614. However, second observer 2604 records event y 2614 as preceding event x 2606. Such effects may be caused by nonuniform latencies within the system.
  • However, the observations of [0388] first observer 2602 and second observer 2604 are not equally valid. A valid observation is typically an observation that preserves the order of events that depend on each other. Second observer 2604 records the receipt of a message 2616 before that message is transmitted. Thus the observation from second observer 2604 is not valid.
  • FIG. 27 shows a space/time diagram [0389] 2700 for a special, ideal observer, called the real-time observer (RTO) 2702. With reference to FIG. 27, RTO 2702 can view each event immediately as it occurs. Due to the limitations of physical clocks, and efficiency issues in employing them, it is usually not practical to implement RTO 2702. However, RTO 2702 represents an upper bound on precision in event-order determination.
  • FIG. 28 shows a space/[0390] time graph 2800 showing two valid observations of a system taken by two separate observers: RTO 2702 and a third observer 2802. With reference to FIG. 28, there is nothing special about the ordering of the observation taken by RTO 2702. Events d 2804, e 2806, and f 2808 are all independent events in this execution. Therefore, the observation produced by RTO 2702 and the observation produced by third observer 2802 can each be used to reproduce equivalent executions of the system. Any observation in which event dependencies are preserved is typically equal in value to an observation by RTO 2702. However, real-time distributed systems may need additional processes to emulate timing constraints.
  • FIG. 29, is a space/time diagram [0391] 2900 of a methodological observer, called the discrete Lamport Observer (DLO) 2902, that records each event in a set of ordered bins. With reference to FIG. 29, DLO 2902 records an event 2904 in an ordered bin 2906 based on the following rule: each event is recorded in the leftmost bin that follows all events on which it depends. DLO 2902 views events discretely and does not need a clock. DLO 2902 does however require explicit knowledge of event dependency. To determine the bin in which each event must be placed, DLO 2902 needs to know the bins of the immediately preceding events. The observation produced by DLO 2902 is also referred to as a topological sort of the system execution's event graph.
  • In the following, E is the set of all events in an execution. The immediate predecessor relation, [0392]
    Figure US20020174415A1-20021121-P00003
    E×E, includes all pairs (ea, eb) such that:
  • a) If e[0393] a and eb are on the same process, ea, precedes eb with no intermediate events.
  • b) If e[0394] b is a receive event, ea is the send event that generated the message.
  • Given these conditions, e[0395] a is called the immediate predecessor of eb.
  • Each event has at most two immediate predecessors. Therefore, [0396] DLO 2902 need only find the bins of at most two records before each placement. The transitive closure of the immediate predecessor relation forms a causal relation. The causal relation,
    Figure US20020174415A1-20021121-P00003
    E×E; is the smallest transitive relation such that ei→ej
    Figure US20020174415A1-20021121-P00002
    el
    Figure US20020174415A1-20021121-P00003
    ej.
  • This relation defines a partial order of events and further limits the definition of a valid observation. A valid observation is an ordered record of events from a given execution, i.e., (R, [0397]
    Figure US20020174415A1-20021121-P00902
    ), where eεE
    Figure US20020174415A1-20021121-P00903
    (record(e))εR, and
    Figure US20020174415A1-20021121-P00902
    is an ordering operator. A valid observation has:
  • e[0398] i; ejεE, ei
    Figure US20020174415A1-20021121-P00003
    ej
    Figure US20020174415A1-20021121-P00002
    record (ei)
    Figure US20020174415A1-20021121-P00902
    record (ej)
  • The dual of the causal relation is a concurrence relation. The concurrence relation, E×E, includes all pairs (e[0399] a; eb) such that neither ea
    Figure US20020174415A1-20021121-P00003
    eb nor eb
    Figure US20020174415A1-20021121-P00003
    ea. While the causal relation is transitive, the concurrence relation is not. The concurrence relation is symmetric, where the causal relation is not.
  • B. Event Order Tracking [0400]
  • Debugging typically requires an understanding of the order in which events occur. Above, observers were presented as separate processes, while that treatment simplified the discussion of observers it is typically not a practical implementation of an observer. When the observer is implemented as a physical process the signals to indicate events would have to be transformed into physical messages, and the system would have to be synchronized to enable all messages to arrive in a valid order. [0401]
  • FIG. 30, depicts a space/[0402] time graph 3000 with each event having a label 3002. With reference to FIG. 30, DLO 2902 can accurately place event records in their proper bins—even if received out of order—as long as it knows the bins of the immediate predecessors. If we know the bins in which events are recorded, we can determine something about their causality. Fortunately, it is easy to label each event with the number of its intended bin. Labels 3002 are analogous to time and are typically called Lamport timestamps.
  • A Lamport timestamp is an integer t associated with an event e[0403] i such that
  • e[0404]
    Figure US20020174415A1-20021121-P00003
    ej
    Figure US20020174415A1-20021121-P00002
    t(ei)<t(ej)
  • Lamport timestamps can be assigned as needed, provided the labels of an event's immediate predecessors are known. This information can be maintained with a local counter, called a Lamport clock (not shown), t[0405] Pl, on each process, Pl. The clock's value is transmitted with each message Mj as tMj. Clock value tPl is updated with each event, as follows: tpi = { max ( tMj , tpi ) + 1 ; if e is a receive event tpi + 1 ; otherwise }
    Figure US20020174415A1-20021121-M00008
  • A labeling mechanism is said to characterize the causal relation if, based on their labels alone, it can be determined whether two events are causal or concurrent. Although Lamport timestamps are consistent with causality (if t(e[0406] l)≧t(ej), then el
    Figure US20020174415A1-20021121-P00005
    ej, they do not characterize the causal relation.
  • FIG. 31 is a space/[0407] time graph 3100 that demonstrates the inability of scalar time stamps to characterize causality between events. With reference to FIG. 31, space/time graph 3100 shows event e1 3102, event e 2 3104, and event e 3 3106.
  • e[0408] 1 3102
    Figure US20020174415A1-20021121-P00003
    e 2 3104, and also (e1 3102∥e3 3106)^ (e2 3104∥e3 3106) with scalar time stamps, label(ei), that characterize causality. It can be shown that e 3 3106 appears, when scalar timestamps are used, concurrent with both e1 3102 and e 2 3104. However, since e1 3102
    Figure US20020174415A1-20021121-P00003
    e 2 3104 it is not possible for e 3 3106 to be concurrent with both.
  • Event causality can be tracked completely using explicit event dependence graphs, with directed edges from each event to its immediate predecessors. Unfortunately, this method cannot store enough information with each record to determine whether two arbitrarily chosen events are causally related without traversing the dependence graph. [0409]
  • Other labeling techniques, such as vector timestamps, can characterize causality. The typical formulation of vector timestamps is based on the cardinality of event histories. A basis for vector timestamp is established based on the following definitions and theorems. An event history, H(e[0410] j), of an event ej is the set of all events, ei, such that either since e
    Figure US20020174415A1-20021121-P00003
    ej or el
    Figure US20020174415A1-20021121-P00003
    ei=e j. The event history can be projected against specific processes. For a process Pj: the Pj history projection of H(ei), HPj (ei), is the intersection of H(ei) and the set of events local to Pj. The event graph represented by a space/time diagram can be partitioned into equivalence classes, with one class for each process. The set of events local to Pj is just the Pj equivalence class.
  • The intersection of any two projections from the same process is identical to at least one of the two projections. Two history projections from a single process, Hp(a) and Hp(b), must satisfy one of the following: [0411]
  • a) Hp(a)⊂Hp(b) [0412]
  • b) Hp(a)=Hp(b) [0413]
  • c) Hp(a)⊃Hp(b) [0414]
  • The cardinality of H[0415] Pj (ei) is thus the number of events local to Pj that causally precede ei, and ei itself. Since local events always occur in sequence, we can uniquely identify an event by its process and the cardinality of its local history.
  • For events e[0416] a; eb with ea≠eb, HPea(ea)HPea(eb)
    Figure US20020174415A1-20021121-P00002
    ea
    Figure US20020174415A1-20021121-P00003
    eb
  • FIG. 32 shows a space/time diagram [0417] 3200 with vector timestamped events. A vector timestamp 3202 is a vector label, te, assigned to each event, eεE, such that the ith element represents [HPi(e)]. Given two events, e1 and e2, we can determine their causal ordering: if vector tei has a smaller value for its own process's entry than the other, tej, has at that same position, then ei
    Figure US20020174415A1-20021121-P00003
    ej. If both vectors have larger values for their own process entries, then ei∥ej. It is not possible for both events to have smaller values for their own entries because for events ea and eb, ea
    Figure US20020174415A1-20021121-P00004
    eb implies HPea(ea)HPea(eb). It is not necessary to know the local processes of events to determine their causal order using vector timestamps.
  • The causal order of two vector timestamped events, e[0418] a and eb, from unknown processes can be determined with an element-by-element comparison of their vector timestamps: i = 1 n t ea [ i ] t ea [ i ] e a -> e a i = 1 n t ea [ i ] t eb [ i ] i = 1 n t eb [ i ] t ea [ i ] e a e b
    Figure US20020174415A1-20021121-M00009
  • Thus, vector timestamps both fully characterize causality and uniquely identify each event in an execution. [0419]
  • Computing vector timestamps at runtime is similar to Lamport timestamp A computation. Each process (P[0420] s) contains a vector clock ({circumflex over (t)}Ps) with elements for every process in the system, where {circumflex over (t)}Ps[s] always equals the number of events local to Ps. Snapshots of this vector counter are used to label each event, and snapshots are transmitted with each message. The recipient of a message with a vector snapshot can update its own vector counter ({circumflex over (t)}Pr) by replacing it with sup({circumflex over (t)}Ps, {circumflex over (t)}Pr), the element-wise A A maximum of {circumflex over (t)}Ps and {circumflex over (t)}Pr.
  • This technique places enough information with each message to determine message ordering. It is performed by comparing snapshots attached to each message. However, transmission of entire snapshots is usually not practical, especially if the system contains a large number of processes. [0421]
  • Vector clocks can however be maintained without transmitting complete snapshots. A transmitting process, P[0422] s, can send a list that includes only those vector clock values that have changed since its last message. A recipient, Pr, then compares the change list to its current elements and updates those that are smaller. This requires each process to maintain several vectors: one for itself, and one for each process to which it has sent messages. However, change lists do not contain enough information to independently track message order.
  • The expense of maintaining vector clocks can be a strong deterrent to employing them. Unfortunately, no technique with smaller labels can characterize causality. It has been proven that the dimension of the causal relation for an N-process distributed execution is N, and hence N-element vectors are the smallest labels characterizing causality. [0423]
  • The problem results from concurrence, without which Lamport time would be sufficient. Concurrence can be tracked with concurrency maps, where each event keeps track of all events with which it is concurrent. Since the maps characterize concurrency, adding Lamport time lets them also characterize causality (the concurrency information disambiguates the scalar time). Unfortunately, concurrency maps can only be constructed after-the-fact, since doing so requires examination of events from all processes [0424]
  • In some situations, distinguishing between concurrency and causality is not a necessity, but merely a convenience. There are compact labeling techniques that allow better concurrence detection than Lamport time. One such technique uses interval clocks, in which each event record is labeled with its own Lamport time and the Lamport time of its earliest successor. This label then represents a Lamport time interval, during which the corresponding event was the latest known by the process. This gives each event a wider region with which to detect concurrence (indicated by overlapping intervals). [0425]
  • In cases where there is little or no cross-process causality (few messages), interval timestamps are not much better than Lamport timestamps. In cases with large numbers of messages, however, interval timestamps can yield better results. [0426]
  • C. Space/Time Displays in Debugging Tools [0427]
  • Space/time diagrams have typically proven useful in discussing event causality and concurrence. Space/time diagrams are also often employed as the user display in concurrent program debugging tools. [0428]
  • The Los Alamos parallel debugging system uses a text based time-process display, and Idd uses a graphic display. Both of these, however, rely on an accurate global real-time clock (impractical in most systems). [0429]
  • FIG. 33 shows a partial order event tracer (POET) [0430] display 3300. The partial order event tracer (POET) system supports several different languages and run-time environments, including Hermes, a high-level interpreted language for distributed systems, and Java. With reference to FIG. 33, POET display 3300 distinguishes among several types of events by shapes, shading, and alignment of corresponding message lines.
  • A Distributed Program Debugger (DPD) is based on an Remote Execution Manager (REM) framework. The REM framework is a set of servers on interconnected Unix machines, where each server is a Unix user-level process. Processes executing in this framework can create and communicate with processes elsewhere in the network as if they were all on the same machine. DPD uses space/time displays for debugging communication only, and it relies on separate source-level debuggers for individual processes. [0431]
  • 2. Abstraction in Event-based Debugging [0432]
  • Simple space/time displays can be used to present programmers with a wealth of information about distributed executions. Typically however, space/time diagrams are too abstract to be an ultimate debugging solution. Space/time diagrams show high-level events and message traffic, but they do not support designer interaction with the source code. On the other hand, simple space/time diagrams may sometimes have too much detail. Space/time diagrams display each distinct low-level message that contributes to a high-level transaction without support for abstracting the transaction. [0433]
  • FIG. 34 is a space/time diagram [0434] 3400 having a first compound event 3402 and a second compound event 3404. With reference to FIG. 34, even though a pair of primitive events are either causally related or concurrent, first and second compound events 3402 and 3404, or any other pair of compound events, might be neither causally related nor concurrent. Abstraction is typically applied across two dimensions-events and processes-to aid in the task of debugging distributed software. Event abstraction represents sequences of events as single entities. A group of events may occasionally have a specific semantic meaning that is difficult to recognize, much as streams of characters can have a meaning that is difficult to interpret without proper spacing and punctuation. Event abstraction can in some circumstances complicate the relationships between events.
  • Event abstraction can be applied in one of three ways: filtering, clustering, and interpretation. With event filtering, a programmer describes event types that the debugger should ignore, which are then hidden from view. With clustering, the debugger collects a number of events and presents the group as a single event. With interpretation, the debugger parses the event stream for event sequences with specific semantic meaning and presents them to a programmer. [0435]
  • Process abstraction is usually applied only as hierarchical clustering. The remainder of this section discusses these specific event and process abstraction approaches. [0436]
  • A. Event Filtering and Clustering [0437]
  • Event filtering and clustering are techniques used to hide events from a designer and thereby reduce clutter. Event filters exclude selected events from being tracked in event-based debugging techniques. In most cases, this filtering is implicit and can not be modified without changing the source code because the source code being debugged is designed to report only certain events to the debugger. When deployed, the code will report all such events to the tool. This approach is employed in both DPD and POET, although some events may be filtered from the display at a later time. [0438]
  • An event cluster is a group of events represented as a single event. The placement of an event in a cluster is based on simple parameters, such as virtual time bounds and process groups. Event clusters can have causal ambiguities. For example, one cluster may contain events that causally precede events in a second cluster, while other events causally follow certain events in the second cluster. [0439]
  • FIG. 35 shows a [0440] POET display 3500 involving a first convex event cluster 3502 and a second convex event cluster 3504. POET uses a virtual-time-based clustering technique that represents convex event clusters as single abstract events. A convex event cluster is a set of event instances, C, such that for events
  • a, b, cεE with a, cεC, a[0441]
    Figure US20020174415A1-20021121-P00003
    b
    Figure US20020174415A1-20021121-P00900
    b
    Figure US20020174415A1-20021121-P00003
    c
    Figure US20020174415A1-20021121-P00002
    bεC
  • Convex event clusters, unlike generic event clusters, cannot overlap. [0442]
  • B. Event Interpretation [0443]
  • The third technique for applying event abstraction is interpretation, also referred to as behavioral abstraction. Both terms describe techniques that use debugging tools to interpret the behavior represented by sequences of events and present results to a designer. Most approaches to behavioral abstraction let a designer describe sequences of events using expressions, and the tools recognize the sequence of events through a combination of customized finite automata followed by explicit checks. Typically, matched expressions generate new events. [0444]
  • 1. Event description language (EDL) [0445]
  • One of the earliest behavioral abstraction technique was Bates's event description language (EDL), where event streams are pattern-matched using shuffle automata. A match produces a new event that can, in turn, be part of another pattern. Essentially, abstract events are hierarchical and are built bottom up. [0446]
  • This approach can recognize event patterns that contain concurrent events. There are however several weaknesses in this approach. First, shuffle automata match events from a linear stream, which is subject to a strong observational bias. In addition, even if the stream constitutes a valid observation, interleaving may cause false intermediates between an event and its immediate successor. Finally, concurrent events appear to occur in some specific order. [0447]
  • Bates partially compensates for these problems in three ways. First, all intermediates between two recognized events are ignored-hence, false intermediates are skipped. Unfortunately, true intermediates are also skipped, making error detection difficult. Second, the shuffle operator, A, is used to identify matches with concurrent events. Unfortunately, shuffle recognizes events that occur in any order, regardless of whether they are truly ordered in the corresponding execution. For example, e[0448] 1Δe2 can match with either e1
    Figure US20020174415A1-20021121-P00902
    e2 or e2
    Figure US20020174415A1-20021121-P00902
    e1 in the event stream, but this means the actual matches could be: e1
    Figure US20020174415A1-20021121-P00003
    e2, e2
    Figure US20020174415A1-20021121-P00003
    e1, in addition to the e1∥e2 that the programmer intended to match. Third, the programmer can prescribe explicit checks to be performed on each match before asserting the results. However, the checks allowed do not include causality or concurrence checks.
  • 2. Chain Expressions [0449]
  • Chain expressions, used in the Ariadne parallel debugger, are an alternate way to describe distributed behavior patterns that have both causality and concurrence. These behavioral descriptions are based on chains of events (abstract sequences not bound to processes), p-chains (chains bound to processes), and pt-chains (composed p-chains). The syntax for describing chain expressions is fairly simple, with <a b> representing two causally related events and |[a b]| representing two concurrent events. [0450]
  • The recognition algorithm has two functions. First, recognizing the appropriate event sequence from a linear stream (using an NFA). Second, checking the relationships between specific events [0451]
  • For example, when looking for sequences that match the expression <|[a b]|c> (viz., a and b are concurrent, and both causally precede c), Ariadne will find the sequence a b c and then verify the relationships among them. Unfortunately, the fact that sequences are picked in order from a linear stream before relationships are checked can cause certain matches to be missed. For example, |[a b]| and |[b a]| should have the same meaning, but they do not cause identical matches. This is because Ariadne uses NFAs as the first stage in event abstraction. In the totally ordered stream to which an NFA responds, either a will precede b, preventing the NFA for the second expression from recognizing the string, or b will precede a, preventing the NFA for the first expression from recognizing the string. [0452]
  • 3. Distributed abstraction [0453]
  • The behavioral abstraction techniques described so far rely on centralized abstraction facilities. These facilities can be distributed, as well. The BEE (Basis for distributed Event Environments) project is a distributed, hierarchical, event-collection system, with debugging clients located with each process. [0454]
  • FIG. 36 show a Basis for distributed Event Environments (BEE) [0455] abstraction facility 3600 for a single client. With reference to FIG. 36, event interpretation is performed at several levels. The first is an event sensor 3602, inserted into the source of the program under test and invoked whenever a primitive event occurs during execution. The next level is an event generator 3604, where information-including timestamps and process identifiers-is attached to each event. Event generator 3604 uses an event table 3606 to determine whether events should be passed to an event handler 3608 or simply dropped. Event handler 3608 manages event table 3606 within event generator 3604. Event handler 3608 filters and collects events and routes them to appropriate event interpreters (not shown). Event interpreters (not shown) gather events from a number of clients (not shown) and aggregate them for presentation to a programmer. Clients and their related event interpreters are placed together in groups managed by an event manager (not shown). A weakness of this technique is that it does not specifically track causality. Instead, this technique relies on the real-time stamps attached to specific primitive or abstract events. However, as discussed above these timestamps are not able to characterize causality.
  • C. Process Clustering [0456]
  • Most distributed computing environments feature flat process structures, with few formally stated relationships among processes. Automatic process clustering tools can partially reverse-engineer a hierarchical structure to help remove spurious information from a debugger's view. Intuitively, a good cluster hierarchy should reveal, at the top level, high-level system behavior, and the resolution should improve proportionally with the number of processes exposed. A poor cluster hierarchy would show very little at the top level and would require a programmer to descend several hierarchical levels before getting even a rough idea about system behavior. Process clustering tools attempt to identify common interaction patterns-such as client-server, master-slave, complex server, layered system, and so forth. When these patterns are identified, the participants are clustered together. Clusters can then serve as participants in interaction patterns to be further clustered. These cluster hierarchies are strictly trees, as shown in FIG. 37, which depicts a hierarchical construction of [0457] process clusters 3700. With reference to FIG. 37, a square node 3702 represents a process (not shown) and a round node 3704 represents a process cluster (not shown).
  • Programmers can choose a debugging focus, in which they specify the aspects and detail levels they want to use to observe an execution. With reference to FIG. 37, a representative debugging focus that includes nodes I, J, E, F, G and H is shown. One drawback of this approach is that when a parent cluster is in focus, none of its children can be. For example, if we wanted to look at process K in detail, we would also need to expose at least as much detail for processes E and L and process cluster D. [0458]
  • Each process usually participates in many types of interactions with other processes. Therefore, the abstraction tools must heuristically decide between several options. These decisions have a substantial impact on the quality of a cluster hierarchy. In “Abstract Behaviour of Distributed Executions with Applications to Visualization,” PhD. thesis, Technische Hochschule Darmstadt, Darmstadt, Germany, May 1994, by T. Kunz, the author evaluates the quality of his tool by measuring the cohesion, which though expressed quantitatively is actually a qualitative measurement (the higher the better) within a cluster and the coupling, a qualitative measure of the information clusters must know about each other (the higher the worse), between clusters. For a cluster P of m processes, cohesion is quantified by: [0459] Cohesion ( P ) = l < jSim f ( p l , p j ) m ( m - 1 ) / 2
    Figure US20020174415A1-20021121-M00010
  • where Sim[0460] f (P1,P2) is a similarity metric that equals: Sim f = A C ^ P 1 C ^ P 2 C ^ P 1 · C ^ P 2
    Figure US20020174415A1-20021121-M00011
  • Here, <â|{circumflex over (b)}> denotes the scaler product of vectors â and {circumflex over (b)},and ∥â∥ denotes the magnitude of vector â. C[0461] P1 and CP2 are process characteristic vectors—in them, each element contains a value between 0 and 1 that indicates how strongly a particular characteristic manifests itself in each process. Characteristics can include keywords, type names, function references, etc. A is a value that equals 1 if any of the following apply:
  • P[0462] 1 and P1 are instantiations of the same source.
  • P[0463] 1 and P2 are unique instantiations of their own source.
  • P[0464] 1 and P2 communicate with each other.
  • A equals 0 if none of these is true (e.g., P[0465] 1 and P2 are non-unique instantiations of separate source that do not communicate with each other). Coupling is quantified by: Coupling ( P ) = ij Sim f ( p i , q j ) mn
    Figure US20020174415A1-20021121-M00012
  • where q[0466] jεQ, Q is the complement of P, and n=|Q|. The quality of a cluster is quantified as its Coupling minus its Cohesion. In many cases, these metrics match many of the characteristics that intuitively differentiate good and poor clusters, as shown in FIGS. 38A, B, and C. With reference to FIGS. 38A and C, Cohesion is high where clusters correspond to heavy communication and where clusters correspond to processes instantiated from the same source code. Coupling is shown to be low in each of the above cases. With reference to FIG. 38B Coupling is high when clusters do not correspond to heavily communicating processes or to instances of the same source code. It is not clear, however, that the cluster in FIG. 38C should be assigned the same quality value as the cluster in FIG. 38A. Using these metrics, Kunz achieved qualities of between :15 and :31 for his clustering techniques. However, it is hard to tell what this means in terms of cluster usefulness.
  • 3. State-based Debugging [0467]
  • State-based debugging techniques focus on the state of the system and the state changes caused by events, rather than on events themselves. The familiar source-level debugger for sequential program debugging is state-based. This source-level debugger lets designers set breakpoints in the execution of a program, enabling them to investigate the state left by the execution to that point. This source-level debugger also lets programmers step through a program's execution and view changes in state caused by each step. [0468]
  • Concurrent systems have no unique meaning for an instant in execution time. Stopping or single stepping the whole system can unintentionally, but substantially, change the nature of interactions between processes. [0469]
  • A. Consistent Cuts and Global State [0470]
  • In distributed event-based debugging, the concept of causality is typically of such importance that little of value can be discussed without a firm understanding of causality and its implications. In distributed state-based debugging, the concept of a global instant in time is equally important. [0471]
  • Here again, it may seem intuitive to consider real-time instants as the global instants of interest. However, just as determining the real-time order of events is not practical or even particularly useful, finding accurate real-time instants makes little sense. Instead, a global instant is represented by a consistent cut. A consistent cut is a cut of an event dependency graph representing an execution that: a) intersects each process exactly once; and b) points all dependencies crossing the cut in the same direction. Like real-time instants, consistent cuts have both a past and a future. These are the subgraphs on each side of the cut. [0472]
  • FIG. 39, shows that consistent cuts can be represented as a jagged line across the space/time diagram that meets the above requirements. With reference to FIG. 39, a space/[0473] time graph 3900 is shown having a first cut 3902 and a second cut 3904. All events to the left of either first cut 3902 or second cut 3904 are in the past of each cut, and all events to the right are in the future of each cut, respectively. First cut 3902 is a consistent cut because no message travel from the future to the past. Second cut 3904 however is not consistent because a message 3906 travels from the future to the past.
  • FIGS. 40A, B, and C show that a distributed execution shown in a space/time diagram [0474] 4000 can be represented by a lattice of consistent cuts 4002, where T is the start of the execution, and ⊥ is system termination. With reference to FIGS. 40A, B, and C, lattice of consistent cuts 4002 represents the global statespace traversed by a single execution. Since lattice of consistent cuts 4002's size is on the order of |E||P| it, unlike space/time diagrams, are never actually constructed. In the remainder of this chapter, to describe properties of consistent cut lattices, the symbol
    Figure US20020174415A1-20021121-C00001
  • relates cuts such that one immediately precedes the other, and [0475]
    Figure US20020174415A1-20021121-P00003
    relates cuts between which there is a path.
  • B. Single Stepping in a Distributed Environment [0476]
  • Controlled stepping, or single stepping, through regions of an execution can help with an analysis of system behavior. The programmer can examine changes in state at the completion of each step to get a better understanding of system control flow. Coherent single stepping for a distributed system requires steps to align with a path through a normal execution's consistent cut lattice. [0477]
  • DPD works with standard single-process debuggers (called client debuggers), such as DBX, GDB, etc. Programmers can use these tools to set source-level break-points and single step through individual process executions. However, doing so leaves the other processes executing during each step, which can yield unrealistic executions. [0478]
  • Zernic gives a simple procedure for single stepping using a post-mortem traversal of a consistent cut lattice. At each point in the step process, there are two disjoint sets of events: the past set, or events that have already been encountered by the stepping tool, and the future set, or those that have yet to be encountered. To perform a step, the debugger chooses an event, e[0479] l, from the future such that any events it depends on are already in the past, i.e., there are no future events, ef, such that ef
    Figure US20020174415A1-20021121-P00003
    el. This ensures that the step proceeds between two consistent cuts related by
    Figure US20020174415A1-20021121-C00002
  • The debugger moves this single event to the past, performing any necessary actions. [0480]
  • To allow more types of steps, POET's support for single stepping uses three disjoint sets: executed, ready, and non-ready. The executed set is identical to the past set in “Using Visualization Tools to Understand Concurrency,” by D. Zernik, M. Snir, and D. Malki, [0481] IEEE Software 9, 3 (1992), pp. 87-92. The ready set contains all events that are fully enabled by events in the future, and the contents of the non-ready set have some enabling events in either the ready or non-ready sets. Using these sets, it is possible to perform three different types of step: global-step, step-over, and step-in. Global-step and step-over may progress between two consistent cuts not related
    Figure US20020174415A1-20021121-C00003
  • (i.e., there may be several intermediate cuts between the step cuts). [0482]
  • A global-step is performed by moving all events from the ready set into the past. Afterwards, the debugger must move to the ready set all events in the non-ready set whose dependencies are in the executed set. A global-step is useful when the programmer wants information about a system execution without having to look at any process in detail. [0483]
  • The step-over procedure considers a local, or single-process, projection of the ready and non-ready sets. To perform a step, it moves the earliest event from the local projections into the executed set and executes through events on the other processes until the next event in the projection is ready. This ensures that the process in focus will always have an event ready to execute in the step that follows. [0484]
  • Step-in is another type of local step. Unlike step-over, step-in does not advance the system at the completion of the step; instead, the system advance is considered to be a second step. FIGS. 41A, B, C, and D show a space/time diagram before a [0485] step 4100 and a resulting space/time diagram after performing a global-step 4102, a step-over 4104, and a step-in 4106.
  • C. Runtime Consistent Cut Algorithms [0486]
  • It is occasionally necessary to capture consistent cuts at runtime. To do so, each process performs some type of cut action (e.g., state saving). This can be done with barrier synchronization, which erects a temporal barrier that no process can pass until all processes arrive. Any cut taken immediately before, or immediately after, the barrier is consistent. However, with barrier synchronization, some processes may have a long wait before the final process arrives. [0487]
  • A more proactive technique is to use a process called the cutinitiator to send perform-cut messages to all other system processes. Upon receiving a perform-cut message, a process performs its cut action, sends a cut-finished message to the initiator, and then suspends itself. After the cut initiator receives cut-finished messages from all processes, it sends each of them a message to resume computation. [0488]
  • The cut obtained by this algorithm is consistent: no process is allowed to send any messages from the time it performs its own cut action until all processes have completed the cut. This means that no post-cut messages can be received by processes that have yet to perform their own cut action. This algorithm has the undesirable characteristic of stopping the system for the duration of the cut. The following algorithms differ in that they allow some processing to continue. [0489]
  • 1. Chandy-Lamport Algorithm [0490]
  • The Chandy-Lamport algorithm does not require the system to be stopped. Once again, the cut starts when a cut initiator sends perform-cut messages to all of the processes. When a process receives a perform-cut message, it stops all work, performs its cut action, and then sends a mark on each of its outgoing channels; a mark is a special message that tells its recipient to perform a cut action before reading the next message from the channel. When all marks have been sent, the process is free to continue computation. If the recipient has already performed the cut action when it receives a mark, it can continue working as normal. [0491]
  • Each cut request and each mark associated with a particular cut are labeled with a cut identifier, such as the process ID of the cut initiator and an integer. This lets a process distinguish between marks for cuts it has already performed and marks for cuts it has yet to perform. [0492]
  • 2. Color-based Algorithms [0493]
  • The Chandy-Lamport algorithm works only for FIFO (First In First Out) channels. If a channel is non-FIFO (e.g., UDP), a post-cut message may outrun the mark and be inconsistently received before the recipient is even aware of the cut, i.e., it is received in the cut's past. The remedy to this situation is a color-based algorithm. Two such algorithms are discussed below. [0494]
  • The first is called the two-color, or red-white, algorithm. With this algorithm, information about the cut state is transferred with each message. Each process in the system has a color. Processes not currently involved in a consistent cut are white, and all messages transmitted are given a white tag. Again, there is a cut initiator that sends perform-cut messages to all system processes. When a process receives this request, it halts, performs the cut action, and changes its color to red. From this point on, all messages transmitted are tagged with red to inform the recipients that a cut has occurred. [0495]
  • Any process can accept a white message without consequence, but when a white process receives a red message, it must perform its cut action before accepting the message. Essentially, white processes treat red messages as cut requests. Red processes can accept red messages at any time, without consequence. [0496]
  • A disadvantage of the two-color algorithm is that the system must reset all of the processes back to white after they have completed their cut action. After switching back, each process must treat red messages as if they were white until they are all flushed from the previous cut. After this, each process knows that the next red message it receives signals the next consistent cut. [0497]
  • This problem is addressed by the three-color algorithm, which resembles the two-color algorithm in that every process changes color after performing a cut; it differs in that every change in color represents a cut. For colors zero through two, if a process with the color c receives a message with the color (c−1) [0498] mod 3, it registers this as a message-in-flight (see below). On the other hand, if it receives a message with the color (c+1) mod 3, it must perform its cut action and switch color to (c+1) mod 3 before receiving the message. Of course, this can now be generalized to n-color algorithms, but three colors are usually sufficient.
  • Programmers may need to know about messages transmitted across the cut, or messages-in-flight. In the two-color algorithm, messages-in-flight are simply white messages received by red processes. These can all be recorded locally, or the recipient can report them to the cut initiator. In the latter case, each red process simply sends the initiator a record of any white messages received. [0499]
  • It is not safe to switch from red to white in the two-color algorithm until the last message-in-flight has been received. This can be detected by associating a counter with each process. A process increments its counter for each message sent and decrements it for each message received. When the value of this counter is sent to the initiator at the start of each process's cut action, the initiator can use the total value to determine the total number of messages-in-flight. The initiator simply decrements this count for each message-in-flight notification it receives. [0500]
  • D. State Recovery-rollback and Replay [0501]
  • Since distributed executions tend to be non-deterministic, it is often difficult to reproduce bugs that occur during individual executions. To do so, most distributed debuggers contain a rollback facility that returns the system to a previous state. For this to be feasible, all processes in the system must occasionally save their state. This is called checkpointing the system. Checkpoints do not have to save the entire state of the system. It is sufficient to save only the changes since the last checkpoint. However, such incremental checkpointing can prolong recovery. [0502]
  • DPD makes use of the UNIX fork system call to perform checkpointing for later rollback. When fork is called, it makes an exact copy of the calling process, including all current state. In the DPD checkpoint facility, the newly forked process is suspended and indexed. Rollback suspends the active process and resumes an indexed process. The problem with this approach is that it can quickly consume all system memory, especially if checkpointing occurs too frequently. DPD's solution is to let the programmer choose the checkpoint frequency through use of a slider in its GUI. [0503]
  • Processes must sometimes be returned to states that were not specifically saved. In this case, the debugger must do additional work to advance the system to the desired point. This is called replay and is performed using event trace information to guide an execution of the system. In replay, the debugger chooses an enabled process (i.e., one whose next event has no pending causal requirements) and executes it, using the event trace to determine where the process needs to block for a message that may have arrived asynchronously in the original execution. When the process blocks, the debugger chooses the next enabled process and continues from there. In this way, a replay is causally identical to the original execution. [0504]
  • Checkpoints must be used in a way that prevents domino effects. The domino effect occurs when rollbacks force processes to restore more than one state. Domino effects can roll the system back to the starting point. FIG. 42 shows a space time diagram [0505] 4200 for a system that is subject to the domino effect during rollback. With reference to FIG. 42, if the system requests a rollback to checkpoint c 3 4202 of process P3 4204, all processes in the system must roll back to c1 (i.e., roll back to P3. c2 4206 requires a roll back to P2. c2 4208, which requires a roll back to P1. c2 4210, which requires a roll back to P3. c1 4212, which requires a roll back to P2. c1 4214, which requires a final roll back to P1. c1 4216). The problem is caused by causal overlaps between message transfers and checkpoints. Performing checkpoints only at consistent cuts avoids a domino effect.
  • E. Global State Predicates [0506]
  • The ability to detect the truth value of predicates on global state yields much leverage when debugging distributed systems. This technique lets programmers raise flags when global assertions fail, set global breakpoints, and monitor interesting aspects of an execution. Global predicates are those whose truth value depends on the state maintained by several processes. They are typically denoted with the symbol Φ. Some examples include (Σ[0507] lcl>20) and (c1<20^ c2 5), where cl is some variable in process Pl that stores positive integers. In the worst case (such as when (Σlcl>20) is false for an entire execution), it may be necessary to get the value of all such variables in all consistent cuts. In the following discussion, we use the notation Ca|=Φ to indicate that Φ is true in consistent cut Ca.
  • At this point, it is useful to introduce branching time temporal logic. Branching time temporal logic is predicate logic with temporal quantifiers, P, F, G, H, A, and E. PΦ is true in the present if Φ was true at some point in the past; FΦ is true in the present if Φ will be true at some point in the future; GΦ is true in the present if Φ will be true at every moment in the future; and HΦ is true in the present if Φ was true at every moment of the past. Notice that GΦ is the same as [0508]
    Figure US20020174415A1-20021121-P00001
    F
    Figure US20020174415A1-20021121-P00001
    Φ, and HΦ is the same as
    Figure US20020174415A1-20021121-P00001
    P
    Figure US20020174415A1-20021121-P00001
    Φ.
  • Since global time passage in distributed systems is marked by a partially ordered consistent cut lattice rather than by a totally ordered stream, we need the quantifiers A, which precedes a predicate that is true on all paths, and E, which precedes a predicate that is true on at least one path. So, AFΦ is true in the consistent cut representing the present if Φ is true at least once on all paths in the lattice leaving this cut. EPΦ is true in the consistent cut representing the present if Φ is true on at least one path leading to this cut. [0509]
  • A monotonic global predicate is a predicate Φ such that C[0510] a|=Φ
    Figure US20020174415A1-20021121-P00002
    Ca|=AGΦ. A monotonic global predicate is one that remains true after becoming true. An unstable global predicate, on the other hand, is a predicate Φ such that Ca|=Φ
    Figure US20020174415A1-20021121-P00002
    Ca|=EG
    Figure US20020174415A1-20021121-P00901
    Φ. An unstable global predicate is one that may become false after becoming true.
  • 1. Detecting Monotonic Global Predicates [0511]
  • Monotonic predicates can be detected any time after becoming true. One algorithm is to occasionally take consistent cuts and evaluate the predicate at each. In fact, it is not necessary to use consistent cuts, since any transverse cut whose future is a subset of the future of the consistent cut in which the predicate first became true will also show the predicate true. [0512]
  • 2. Detecting Unstable Global Predicates [0513]
  • Detecting arbitrary unstable global predicates can take at worst |E|[0514] |P| time, where |E||P| is the size of an execution's consistent cut lattice, where [E];is the number of events in the execution, and [P] is the number of processes. This is so, because it may be necessary to test for the predicate in every possible consistent cut. However, there are a few special circumstances that allow |E| time algorithms.
  • Some unstable global predicates are true on only a few paths through the consistent cut lattice, while others are true on all paths. Cooper and Marzullo describe predicate qualifiers definitely Φ for predicates that are true on all paths (i.e., T|=A FΦ) and possibly Φ for those that are true on at least one path (i.e., T|=>E FΦ). [0515]
  • The detection of possibly Φ for weak conjunctive predicates, or global predicates that can be expressed as conjunctions of local predicates, is φ (|E|). The algorithm for this is to walk a path through the consistent cut lattice that aligns with a single process, P[0516] t, until either: (1) the process's component of Φ is true, or (2) there is no way to proceed without diverging from Pt. In either case, the target process is switched, and the walk continued. This algorithm continues until it reaches a state where all components of the predicate are true, or until it reaches ⊥. In this way, if there are any consistent cuts where all parts of the predicate simultaneously hold, the algorithm will encounter at least one.
  • Detection of possibly Φ for weak disjunctive predicates, or global predicates that can be expressed as disjunctions of local predicates, is also φ (|E|); it is the same algorithm as above, except it halts at the first node where any component is true. However, weak conjunctive and disjunctive predicates constitute only a small portion of the types of predicates that could be useful in debugging distributed systems. [0517]
  • 3. Conclusions [0518]
  • Complicating the debugging of heterogenous embedded systems are designs comprised of concurrent and distributed processes. Most of the difficulty in debugging distributed systems results from concurrent processes with globally unscheduled and frequently asynchronous interactions. Multiple executions of a system can produce wildly varying results—even if they are based on identical inputs. The two main debugging approaches for these systems are event based and state based. [0519]
  • Event-based approaches are monitoring approaches. Events are presented to a designer in partially ordered event displays, called space/time displays. These are particularly good at showing inter-process communication over time. They can provide a designer with large amounts of information in a relatively small amount of space. [0520]
  • State-based approaches focus locally on the state of individual processes or globally on the state of the system. Designers can observe individual system states, set watches for specific global predicates, step through executions, and set breakpoints based on global state predicates. These approaches deal largely with snapshots, considering temporal aspects only as differences between snapshots. [0521]
  • As distributed systems increase in size and complexity, the sheer volume of events generated during an execution grows to a point where it is exceedingly difficult for designers to correctly identify aspects of the execution that may be relevant in locating a bug. For distributed system debugging techniques to scale to larger and faster systems, behavioral abstraction will typically become a necessity to help designers identify and interpret complicated behavioral sequences in a system execution. Finally, embedded systems must execute in a separate environment from the one in which they were designed and embedded systems may also run for long periods of time without clear stopping points. Debugging them requires probes to report debugging information to a designer during the execution. These probes inevitably alter system behavior, which can mask existing bugs or create new bugs that are not present in the uninstrumented system. While it is not possible to completely avoid these probe effects, they can be minimized through careful placement, or masked through permanent placement. [0522]
  • Debugging by Cooperative Execution in Coordination-centric Designs [0523]
  • FIG. 43 is an example of a coordination-centric approach to the debugging of a distributed [0524] software environment 4300, in accordance with the present invention. With reference to FIG. 43, distributed software environment 4300, containing several processing elements 4302 connected so that information can be exchanged, is connected through a communication channel 4304 to a debugging host 4306. Distributed software environment 4300 can be connected either directly or indirectly to debugging host 4306.
  • Debugging, in a coordination-centric approach, is performed using “cooperative execution.” Cooperative execution refers to simultaneously executing a distributed [0525] software environment 4300 and simulating distributed software environment 4300 on debugging host 4306 based on event traces from distributed software environment 4300. Debugging host 4306 may be a general-purpose workstation. Distributed software environments are likely to have several processing elements 4302, but only a few of these have resources that let them connect directly to debugging host 4306. Those that do not have these resources, either have an indirect connection to the debugging host or are opaque to debugging.
  • 1. Direct Connection [0526]
  • Events can be recorded locally on distributed [0527] software environment 4300, and event records can be transferred directly to debugging host 4306. FIG. 44 is a detailed view of a direct connection between a primary processing element 4400 of distributed software environment 4300 and debugging host 4306. In a direct connection, primary processing element 4400 contains a software program 4401, which has at least two software components 4402, a runtime system 4404, a coordinator 4406, an interface (not shown) having a port (not shown), and a primary runtime debugging architecture 4408. Software program 4401 generates an event record in response to a selected event.
  • [0528] Runtime system 4404 collects the events from software components 4402 and transfers them to primary runtime debugging architecture 4408. Primary runtime debugging architecture 4408 contains a time stamper 4410, a causality stamper 4412, a primary uplink component 4414, and a primary transfer component 4416 coupled to primary uplink component 4414. Time stamper 4410 provides a time stamp to the event record generated by software program 4401, while causality stamper 4412 provides an identification of a cause of the event associated with the corresponding event record. Primary uplink component 4414 of primary runtime debugging architecture 4408 facilitates communication through communication channel 4304 between primary processing element 4400 and debugging host 4306, while primary transfer component 4416 collects and facilitates the transfer of the time-stamped and causality-stamped event record from primary processing element 4400 to an event queue 4418 on debugging host 4306.
  • [0529] Debugging host 4306 operates on the events-simulating the activity of distributed software environment 4300 and letting the designer navigate the execution. The bulk of debugging support is thus on debugging host 4306, which reduces the probe effect in distributed software environment 4300. All debugging is based on a combination of system state and event records, which must be collected and properly annotated to record causality.
  • With a direct connection, primary processing element's [0530] runtime system 4404 asynchronously dumps all events onto debugging host 4306. Debugging host 4306 queues all events and writes them to disk. All debugging activity is based on post-processing on these events.
  • 2. Indirect Connection [0531]
  • FIG. 45 is a detailed view of an indirect connection from [0532] primary processing element 4400 to debugging host 4306. In an indirect connection, primary runtime debugging architecture 4408 contains time stamper 4410, causality stamper 4412, and primary uplink component 4414, whereas an intermediate processing element 4500 contains an intermediate runtime debugging architecture 4502, having an intermediate transfer component 4504 coupled to an intermediate uplink component 4506.
  • For [0533] primary processing element 4400 with an indirect connection to debugging host 4306, event records are routed to intermediate processing element 4500, which collects and forwards them, along with some event records of its own, to debugging host 4306. Primary uplink component 4414 facilitates communication, along communication channel 4304, between primary processing element 4400 and intermediate processing element 4500. Intermediate uplink component 4506 facilitates communication, along communication channel 4304, between intermediate processing element 4500 and debugging host 4306, while intermediate transfer component 4504 collects and facilitates the transfer of the event record from intermediate processing element 4500 to event queue 4418 in debugging host 4306.
  • 3. Storage-based Event Collection [0534]
  • If it is not possible or practical to connect distributed [0535] software environment 4300 to debugging host 4306, it may still be possible to employ cooperative execution in a post-mortem fashion. FIG. 46 depicts capturing event records in a flash memory for post-mortem distributed debugging. In the present invention, runtime system 4404 collects and transfers the events to primary runtime debugging architecture 4408, which includes time stamper 4410, causality stamper 4412, and a flash driver 4600. After the events have been time-stamped and causality-stamped, flash driver 4600 sends the events to a flash memory 4602 for storage and post-mortem debugging.
  • As long as a global name space is used, each [0536] processing element 4302 can store events separately, and the complete global event trace can be reconstructed later. Unlike connection-based collection, there is a definite limit on the amount of storage space, and it is essential to use it wisely. FIG. 47 shows how flash memory 4602 can be allocated to ensure that the entire context of a system crash can be reconstructed.
  • 4. Event Recording Call [0537]
  • The runtime system of a processing element records events from the execution, applies timestamps and causality pointers, and forwards the records to a communication channel. Any event made visible to the runtime system, either by being identified as an explicit event in the model or by eventRecord calls at runtime, is a candidate for recording. The eventRecord function can be implemented either to transfer event records to a runtime simulation host or to record them into storage. [0538]
  • To build concurrent event traces, distributed [0539] software environment 4300 must be instrumented to produce appropriate event records, and media must be applied to store them. Unlike traditional approaches, instrumentation must include information about how different software components 4402 interact. Instrumented distributed code can change the execution characteristics of distributed software environment 4300 being debugged (recall the probe effect).
  • To collect event traces, the code itself must be well instrumented. This instrumentation can be automatically applied to the code by the runtime synthesis engine. A simple form of instrumentation involves inserting event recording calls at each significant source line (such as message sent or variable changed) and can include the values sent or the values to which the variable was changed. Over-parameterization of event records can create significant delays in simulated systems and can cause a probe effect in physical systems, since the time for packaging parameters can be significant. For the most part, static instrumentation code is required for activities that do not appear as events to [0540] runtime system 4404, since runtime system 4404 can deal with explicit events. The eventRecord method assigns a unique identifier to the event record, assigns all appropriate timestamps, and identifies implicit causality. See Listing 2 below for an example of an event recording call:
    Listing 2 - Instrumented Event Record
    cl.p2.send (x + 3);
    eventRecord (“cl.p2.send”); / / synthesized
  • As with other aspects of the system, a designer does not know in advance exactly how much detail will be needed, and detail requirements may change over time. This means that the principles of selective focus should be applied to event collection, as well. While static instrumentation cannot be altered during system execution, it can be designed to be disabled by the debugging system during execution. [0541]
  • Runtime debugging support in the physical distributed software environment should be as lightweight as possible to avoid incurring any further probe effect. One of the goals in debugging a physical system is to ensure that as little of the debugging support as possible is placed on the systems themselves. [0542]
  • 5. Precomputed Event Sequences [0543]
  • Events often occur in specific, predetermined, partially ordered sequences. Therefore, instead of sending every single event in the sequence, it is possible to send a token representing the sequence and let debugging [0544] host 4306 expand it. This is an application of selective focus to event collection. Since the lowest detail levels are completely determined, there is nothing lost when operating at these higher levels of abstraction.
  • For example, with the RPC protocol, it may be necessary to indicate only the start and finish of a transaction. If a bug causes the transaction to fail, simply knowing the identity of that particular transaction can be a useful starting point for further evaluation in simulation. [0545]
  • 6. Execution Shadowing [0546]
  • Since individual traces only include events, the simulator must keep track of all other aspects of an execution. The host side of the debugger tracks system state through state change events. This lets the debugger track decisions made by the distributed system and is called execution shadowing. [0547]
  • 7. Interpolation of Events from Bus Traces [0548]
  • Events that are visible on a system bus can be recorded by observing bus signals. This requires hardware support for bus trace collection. In this case, events of a far more primitive nature must be dealt with (e.g., transitions between states for physical wires and very low-level actions). [0549]
  • FIG. 48 shows an example of low-level behavioral recognition for a simple I[0550] 2C protocol. At the lowest level are signal transitions. These are translated into sub-low-level events, which are then fed into a language recognizer to generate low-level debugging events.
  • The main problems with this approach involve bandwidth. Potentially large numbers of events must be analyzed to present a reasonable view of what happens at the lower levels. This can either (1) require the hardware to save a great deal of information and then parse it later for content or (2) cause a great deal of computation overhead at runtime, which slows down the system. [0551]
  • It will be obvious to those having skill in the art that many changes may be made to the details of the above-described embodiments of this invention without departing from the underlying principles thereof. The scope of the present invention should, therefore, be determined only by the following claims. [0552]

Claims (11)

1. A software system for debugging a distributed software environment, the software system comprising:
a primary processing element having a software program that generates a corresponding event record in response to a selected event; and
a communication channel that links the primary processing element to a debugging host.
2. A software system according to claim 1 wherein the software program comprises:
at least two components;
a coordinator that manages control and data flow interactions between the components;
an interface between the coordinator and the component, the interface having a port that exposes an event;
a runtime system that collects the event record; and
a primary runtime debugging architecture that receives the event record from the runtime system and forwards the event record to the debugging host along the communication channel.
3. A software system according to claim 2 wherein the primary runtime debugging architecture comprises:
a time stamper to provide a time stamp to the event record generated by the software program; and
a causality stamper to provide an identification of a cause of the event associated with the corresponding event record.
4. A software system according to claim 3 wherein the primary runtime debugging architecture further comprises:
a primary uplink component to enable communication between the primary processing element and the debugging host along the communication channel; and
a primary transfer component coupled to the uplink component to collect and transfer the event record from the primary processing element to the debugging host along the communication channel.
5. A software system according to claim 3 wherein the software system further comprises:
an intermediate processing element disposed along the communication channel between the primary processing element and the debugging host to enable communication with the debugging host; and
the primary runtime debugging architecture further comprises a primary uplink component to enable communication between the primary processing element and the intermediate processing element.
6. A software system according to claim 5 wherein the intermediate processing element comprises;
an intermediate uplink component to enable communication between the intermediate processing element and the debugging host along the communication channel; and
an intermediate transfer component coupled to the intermediate uplink component to collect and transfer the event record from the intermediate processing element to the debugging host along the communication channel.
7. A software system according to claim 3 wherein the primary runtime debugging architecture further comprises a flash driver for interfacing with a flash memory to facilitate collection of the event record from the primary processing element and storage of the event record for subsequent transfer to a remote debugging host.
8. A software system according to claim 1 wherein the distributed software environment implements a predetermined design model having an explicitly defined event; and the software program generates the event record in response to an occurrence of the explicitly defined event.
9. A software system according to claim 1 wherein the distributed software environment comprises an explicit event recording call; and the software program generates the event record in response to an occurrence of the explicit event recording call.
10. A software system according to claim 1 wherein:
the distributed software environment executes on a target hardware platform; and
the target hardware platform comprises a probe for monitoring a selected bus trace on the target hardware platform and for generating an event record responsive to a predetermined activity on the bus trace.
11. A software system according to claim 1 wherein the software program generates a token representative of a predetermined sequence of events.
US09/885,456 2000-06-23 2001-06-19 System and method for debugging distributed software environments Abandoned US20020174415A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/885,456 US20020174415A1 (en) 2000-06-23 2001-06-19 System and method for debugging distributed software environments

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US21349600P 2000-06-23 2000-06-23
US09/885,456 US20020174415A1 (en) 2000-06-23 2001-06-19 System and method for debugging distributed software environments

Publications (1)

Publication Number Publication Date
US20020174415A1 true US20020174415A1 (en) 2002-11-21

Family

ID=22795323

Family Applications (6)

Application Number Title Priority Date Filing Date
US09/881,391 Abandoned US20030005407A1 (en) 2000-06-23 2001-06-12 System and method for coordination-centric design of software systems
US09/885,456 Abandoned US20020174415A1 (en) 2000-06-23 2001-06-19 System and method for debugging distributed software environments
US09/886,459 Abandoned US20020087953A1 (en) 2000-06-23 2001-06-20 Data structure and method for detecting constraint conflicts in coordination-centric software systems
US09/886,479 Expired - Fee Related US7003777B2 (en) 2000-06-23 2001-06-20 Coordination-centric framework for software design in a distributed environment
US09/888,061 Abandoned US20030028858A1 (en) 2000-06-23 2001-06-21 Evolution diagrams for debugging distributed embedded software applications
US09/888,082 Abandoned US20020062463A1 (en) 2000-06-23 2001-06-22 Dynamic control graphs for analysis of coordination-centric software designs

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/881,391 Abandoned US20030005407A1 (en) 2000-06-23 2001-06-12 System and method for coordination-centric design of software systems

Family Applications After (4)

Application Number Title Priority Date Filing Date
US09/886,459 Abandoned US20020087953A1 (en) 2000-06-23 2001-06-20 Data structure and method for detecting constraint conflicts in coordination-centric software systems
US09/886,479 Expired - Fee Related US7003777B2 (en) 2000-06-23 2001-06-20 Coordination-centric framework for software design in a distributed environment
US09/888,061 Abandoned US20030028858A1 (en) 2000-06-23 2001-06-21 Evolution diagrams for debugging distributed embedded software applications
US09/888,082 Abandoned US20020062463A1 (en) 2000-06-23 2001-06-22 Dynamic control graphs for analysis of coordination-centric software designs

Country Status (6)

Country Link
US (6) US20030005407A1 (en)
EP (3) EP1297428A2 (en)
AT (1) ATE305153T1 (en)
AU (4) AU2001271354A1 (en)
DE (1) DE60113538T2 (en)
WO (4) WO2002001349A2 (en)

Cited By (102)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020073401A1 (en) * 2000-12-07 2002-06-13 International Business Machines Corporation Method of detecting zombie breakpoints
US20030004671A1 (en) * 2001-06-28 2003-01-02 Mitsubishi Denki Kabushiki Kaisha Remote debugging apparatus for executing procedure preregistered in database at program breakpoint
US20040049712A1 (en) * 2002-09-11 2004-03-11 Betker Michael Richard Processor system with cache-based software breakpoints
US20040059782A1 (en) * 2002-09-20 2004-03-25 American Megatrends, Inc. Systems and methods for establishing interaction between a local computer and a remote computer
US20040117768A1 (en) * 2002-12-12 2004-06-17 International Business Machines Corporation System and method on generating multi-dimensional trace files and visualizing them using multiple Gantt charts
US20040181781A1 (en) * 2003-02-28 2004-09-16 Denso Corporation Method, apparatus and program for testing control program
US20040190773A1 (en) * 2003-03-31 2004-09-30 American Megatrends, Inc. Method, apparatus, and computer-readable medium for identifying character coordinates
US20040210872A1 (en) * 2003-04-15 2004-10-21 Dorr Robert A. Server debugging framework using scripts
US20040243883A1 (en) * 2003-05-27 2004-12-02 American Megatrends, Inc. Method and system for remote software debugging
US20040255276A1 (en) * 2003-06-16 2004-12-16 Gene Rovang Method and system for remote software testing
US20050046637A1 (en) * 2001-12-10 2005-03-03 American Megatrends, Inc. Systems and methods for capturing screen displays from a host computing system for display at a remote terminal
US20050065994A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Framework for restricting resources consumed by ghost agents
US20050065992A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Restricting resources consumed by ghost agents
US20050065803A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Using ghost agents in an environment supported by customer service providers
US20050097537A1 (en) * 2003-10-30 2005-05-05 Laura Joseph G. System and method for distributed processing in COBOL
US20050288916A1 (en) * 2004-06-28 2005-12-29 Graniteedge Networks Determining event causality including employment of partitioned event space
US20050288915A1 (en) * 2004-06-28 2005-12-29 Graniteedge Networks Determining event causality including employment of causal chains
US20060195821A1 (en) * 2005-02-25 2006-08-31 Niels Vanspauwen Interface converter for unified view of multiple computer system simulations
US20060195825A1 (en) * 2005-02-25 2006-08-31 Niels Vanspauwen Method and system for dynamically adjusting speed versus accuracy of computer platform simulation
WO2006099446A2 (en) * 2005-03-11 2006-09-21 Argade Pramod V Environment for controlling the execution of computer programs
US7133820B2 (en) * 2000-03-15 2006-11-07 Arc International Method and apparatus for debugging programs in a distributed environment
US20070032986A1 (en) * 2005-08-05 2007-02-08 Graniteedge Networks Efficient filtered causal graph edge detection in a causal wavefront environment
US20070067754A1 (en) * 2005-09-20 2007-03-22 Microsoft Corporation Server application state
US20070156882A1 (en) * 2005-06-09 2007-07-05 Whirlpool Corporation Data acquisition engine and system for an appliance
US20070162158A1 (en) * 2005-06-09 2007-07-12 Whirlpool Corporation Software architecture system and method for operating an appliance utilizing configurable notification messages
US7246056B1 (en) * 2003-09-26 2007-07-17 The Mathworks, Inc. Runtime parameter mapping for system simulation
US20070168975A1 (en) * 2005-12-13 2007-07-19 Thomas Kessler Debugger and test tool
US20070169055A1 (en) * 2005-12-12 2007-07-19 Bernd Greifeneder Method and system for automated analysis of the performance of remote method invocations in multi-tier applications using bytecode instrumentation
US20070192079A1 (en) * 2006-02-16 2007-08-16 Karl Van Rompaey Run-time switching for simulation with dynamic run-time accuracy adjustment
US20070240173A1 (en) * 2005-06-09 2007-10-11 Whirlpool Corporation Data acquisition engine and system for an appliance
US20070294051A1 (en) * 2006-06-15 2007-12-20 Microsoft Corporation Declaration and Consumption of A Causality Model for Probable Cause Analysis
US20080010564A1 (en) * 2006-06-19 2008-01-10 Microsoft Corporation Failure handling and debugging with causalities
US20080120605A1 (en) * 2006-10-31 2008-05-22 Microsoft Corporation Stepping and application state viewing between points
US20080126974A1 (en) * 2006-11-28 2008-05-29 Fawcett Bradley W Presenting completion progress status of an installer via join points
US20080137670A1 (en) * 2005-06-09 2008-06-12 Whirlpool Corporation Network System with Message Binding for Appliances
US20080177525A1 (en) * 2007-01-23 2008-07-24 Microsoft Corporation Integrated debugger simulator
US20080178195A1 (en) * 2007-01-23 2008-07-24 Microsoft Corporation Transparently capturing the causal relationships between requests across distributed applications
US7426717B1 (en) * 2001-11-27 2008-09-16 Adobe Systems Incorporated System and method for debugging files in a runtime environment
US20080244539A1 (en) * 2007-03-30 2008-10-02 Microsoft Corporation Enabling analysis of software source code
US20080276227A1 (en) * 2007-05-06 2008-11-06 Bernd Greifeneder Method and System for Adaptive, Generic Code Instrumentation using Run-time or Load-time generated Inheritance Information for Diagnosis and Monitoring Application Performance and Failure
US20080301251A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Debugging in a distributed system
US20080301707A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Rules engine for a persistent message store
US20080301286A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Persistent message store
US20090006064A1 (en) * 2007-06-29 2009-01-01 Microsoft Corporation Replaying Distributed Systems
US7487241B2 (en) 2005-08-05 2009-02-03 Vantos, Inc. Performing efficient insertions in wavefront table based causal graphs
US7519749B1 (en) 2004-08-25 2009-04-14 American Megatrends, Inc. Redirecting input and output for multiple computers
US20090138858A1 (en) * 2007-11-27 2009-05-28 Microsoft Corporation Data Driven Profiling for Distributed Applications
US7543277B1 (en) 2003-06-27 2009-06-02 American Megatrends, Inc. Method and system for remote software debugging
US20090319993A1 (en) * 2008-06-24 2009-12-24 Microsoft Corporation, Generalized and extensible software architecture representation
US7653881B2 (en) 2006-06-19 2010-01-26 Microsoft Corporation Failure handling and debugging with causalities
US7702909B2 (en) * 2003-12-22 2010-04-20 Klimenty Vainstein Method and system for validating timestamps
US7707459B2 (en) 2007-03-08 2010-04-27 Whirlpool Corporation Embedded systems debugging
US7747742B2 (en) 2008-06-27 2010-06-29 Microsoft Corporation Online predicate checking for distributed systems
US7783799B1 (en) 2006-08-31 2010-08-24 American Megatrends, Inc. Remotely controllable switch and testing methods using same
US7827258B1 (en) 2004-03-01 2010-11-02 American Megatrends, Inc. Method, system, and apparatus for communicating with a computer management device
US20110066894A1 (en) * 2009-09-14 2011-03-17 Myspace, Inc. Debugging a map reduce application on a cluster
US7913311B2 (en) 2001-12-12 2011-03-22 Rossmann Alain Methods and systems for providing access control to electronic data
US7921288B1 (en) 2001-12-12 2011-04-05 Hildebrand Hal S System and method for providing different levels of key security for controlling access to secured items
US7921412B1 (en) * 2003-11-26 2011-04-05 Sprint Communications Company L.P. Application monitor system and method
US7921284B1 (en) 2001-12-12 2011-04-05 Gary Mark Kinghorn Method and system for protecting electronic data in enterprise environment
US7921450B1 (en) 2001-12-12 2011-04-05 Klimenty Vainstein Security system using indirect key generation from access rules and methods therefor
US7930756B1 (en) 2001-12-12 2011-04-19 Crocker Steven Toye Multi-level cryptographic transformations for securing digital assets
US7933759B2 (en) 2008-03-28 2011-04-26 Microsoft Corporation Predicate checking for distributed systems
US7950066B1 (en) 2001-12-21 2011-05-24 Guardian Data Storage, Llc Method and system for restricting use of a clipboard application
US20110131450A1 (en) * 2009-11-30 2011-06-02 Microsoft Corporation Using synchronized event types for testing an application
US8006280B1 (en) 2001-12-12 2011-08-23 Hildebrand Hal S Security system for generating keys from access rules in a decentralized manner and methods therefor
US8010843B2 (en) 2005-12-14 2011-08-30 American Megatrends, Inc. System and method for debugging a target computer using SMBus
US20110252404A1 (en) * 2009-08-03 2011-10-13 Knu-Industry Cooperation Foundation Web-based software debugging apparatus and method for remote debugging
US8040234B2 (en) 2005-06-09 2011-10-18 Whirlpool Corporation Method and apparatus for remote service of an appliance
US8090564B1 (en) 2003-11-03 2012-01-03 Synopsys, Inc. Automatic generation of transaction level bus simulation instructions from bus protocol
US8127366B2 (en) 2003-09-30 2012-02-28 Guardian Data Storage, Llc Method and apparatus for transitioning between states of security policies used to secure electronic documents
US8176334B2 (en) 2002-09-30 2012-05-08 Guardian Data Storage, Llc Document security system that permits external users to gain access to secured files
US8266674B2 (en) 2001-12-12 2012-09-11 Guardian Data Storage, Llc Method and system for implementing changes to security policies in a distributed security system
US20120253857A1 (en) * 2011-03-28 2012-10-04 Infosys Technologies Limited Structured methods for business process unification
US8327138B2 (en) 2003-09-30 2012-12-04 Guardian Data Storage Llc Method and system for securing digital assets using process-driven security policies
US20120317550A1 (en) * 2004-07-23 2012-12-13 Green Hills Software, Inc Forward post-execution software debugger
USRE43906E1 (en) 2001-12-12 2013-01-01 Guardian Data Storage Llc Method and apparatus for securing digital assets
US20130030568A1 (en) * 2010-04-23 2013-01-31 Samsung Heavy Ind. Co., Ltd. Robot system control method and a device therefor
US8533687B1 (en) 2009-11-30 2013-09-10 dynaTrade Software GmbH Methods and system for global real-time transaction tracing
US8543827B2 (en) 2001-12-12 2013-09-24 Intellectual Ventures I Llc Methods and systems for providing access control to secured data
US8543367B1 (en) 2006-02-16 2013-09-24 Synopsys, Inc. Simulation with dynamic run-time accuracy adjustment
US20130339931A1 (en) * 2012-06-19 2013-12-19 Sap Ag Application trace replay and simulation systems and methods
US20140040897A1 (en) * 2012-08-04 2014-02-06 Microsoft Corporation Function Evaluation using Lightweight Process Snapshots
US8707034B1 (en) 2003-05-30 2014-04-22 Intellectual Ventures I Llc Method and system for using remote headers to secure electronic files
US8918839B2 (en) 2001-12-12 2014-12-23 Intellectual Ventures I Llc System and method for providing multi-location access management to secured items
US8924939B2 (en) 2012-05-09 2014-12-30 International Business Machines Corporation Streams debugging within a windowing condition
US20150026687A1 (en) * 2013-07-18 2015-01-22 International Business Machines Corporation Monitoring system noises in parallel computer systems
US9047412B2 (en) 2007-05-06 2015-06-02 Dynatrace Corporation System and method for extracting instrumentation relevant inheritance relationships for a distributed, inheritance rule based instrumentation system
US9146829B1 (en) * 2013-01-03 2015-09-29 Amazon Technologies, Inc. Analysis and verification of distributed applications
US9231858B1 (en) 2006-08-11 2016-01-05 Dynatrace Software Gmbh Completeness detection of monitored globally distributed synchronous and asynchronous transactions
US9235384B2 (en) 2013-09-20 2016-01-12 Axure Software Solutions, Inc. Language notification generator
US9274919B2 (en) 2011-04-29 2016-03-01 Dynatrace Software Gmbh Transaction tracing mechanism of distributed heterogenous transactions having instrumented byte code with constant memory consumption and independent of instrumented method call depth
US9448820B1 (en) 2013-01-03 2016-09-20 Amazon Technologies, Inc. Constraint verification for distributed applications
US9727394B2 (en) 2015-04-27 2017-08-08 Microsoft Technology Licensing, Llc Establishing causality order of computer trace records
US9779012B1 (en) * 2016-02-26 2017-10-03 Mbit Wireless, Inc. Dynamic and global in-system debugger
US9804945B1 (en) 2013-01-03 2017-10-31 Amazon Technologies, Inc. Determinism for distributed applications
US10033700B2 (en) 2001-12-12 2018-07-24 Intellectual Ventures I Llc Dynamic evaluation of access rights
US10169171B2 (en) 2013-05-13 2019-01-01 Nxp Usa, Inc. Method and apparatus for enabling temporal alignment of debug information
US10268568B2 (en) * 2016-03-29 2019-04-23 Infosys Limited System and method for data element tracing
US10360545B2 (en) 2001-12-12 2019-07-23 Guardian Data Storage, Llc Method and apparatus for accessing secured electronic data off-line
US10416974B2 (en) * 2017-10-06 2019-09-17 Chicago Mercantile Exchange Inc. Dynamic tracer message logging based on bottleneck detection
US10430321B1 (en) * 2018-08-21 2019-10-01 International Business Machines Corporation White box code concurrency testing for transaction processing

Families Citing this family (167)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6378066B1 (en) * 1999-02-04 2002-04-23 Sun Microsystems, Inc. Method, apparatus, and article of manufacture for developing and executing data flow programs, and optimizing user input specifications
US20030121027A1 (en) * 2000-06-23 2003-06-26 Hines Kenneth J. Behavioral abstractions for debugging coordination-centric software designs
US7062771B2 (en) * 2000-07-03 2006-06-13 Oculus Technologies Corporation Method and apparatus for generating a decentralized model on a computer network
US7577554B2 (en) * 2001-07-03 2009-08-18 I2 Technologies Us, Inc. Workflow modeling using an acyclic directed graph data structure
US7210145B2 (en) * 2001-10-15 2007-04-24 Edss, Inc. Technology for integrated computation and communication; TICC
FI113709B (en) * 2001-12-10 2004-05-31 Nokia Corp A method for providing remote device functionality in an embedded environment
US7693522B2 (en) * 2001-12-19 2010-04-06 Thomson Licensing Method and apparatus for handing off a mobile terminal between a mobile network and a wireless LAN
US7281241B2 (en) * 2002-01-15 2007-10-09 Cadence Design (Israel) Ii Ltd. System and method for visual debugging of constraint systems
JP2003241807A (en) * 2002-02-19 2003-08-29 Yaskawa Electric Corp Robot control unit
EP1349111A1 (en) * 2002-03-27 2003-10-01 Hewlett-Packard Company Improvements in or relating to software
US7167861B2 (en) * 2002-06-28 2007-01-23 Nokia Corporation Mobile application service container
WO2004038620A1 (en) * 2002-10-28 2004-05-06 Renesas Technology Corp. System development method and data processing system
US7542471B2 (en) 2002-10-30 2009-06-02 Citrix Systems, Inc. Method of determining path maximum transmission unit
US7616638B2 (en) 2003-07-29 2009-11-10 Orbital Data Corporation Wavefront detection and disambiguation of acknowledgments
US7630305B2 (en) * 2003-07-29 2009-12-08 Orbital Data Corporation TCP selective acknowledgements for communicating delivered and missed data packets
US8233392B2 (en) * 2003-07-29 2012-07-31 Citrix Systems, Inc. Transaction boundary detection for reduction in timeout penalties
US8270423B2 (en) 2003-07-29 2012-09-18 Citrix Systems, Inc. Systems and methods of using packet boundaries for reduction in timeout prevention
US20040177139A1 (en) * 2003-03-03 2004-09-09 Schuba Christoph L. Method and apparatus for computing priorities between conflicting rules for network services
CA2432866A1 (en) * 2003-06-20 2004-12-20 Ibm Canada Limited - Ibm Canada Limitee Debugging optimized flows
US8437284B2 (en) * 2003-07-29 2013-05-07 Citrix Systems, Inc. Systems and methods for additional retransmissions of dropped packets
US7656799B2 (en) * 2003-07-29 2010-02-02 Citrix Systems, Inc. Flow control system architecture
US8432800B2 (en) * 2003-07-29 2013-04-30 Citrix Systems, Inc. Systems and methods for stochastic-based quality of service
US7698453B2 (en) 2003-07-29 2010-04-13 Oribital Data Corporation Early generation of acknowledgements for flow control
US8238241B2 (en) * 2003-07-29 2012-08-07 Citrix Systems, Inc. Automatic detection and window virtualization for flow control
US7512912B1 (en) * 2003-08-16 2009-03-31 Synopsys, Inc. Method and apparatus for solving constraints for word-level networks
JP2007528059A (en) * 2004-01-22 2007-10-04 エヌイーシー ラボラトリーズ アメリカ インク Systems and methods for software modeling, abstraction, and analysis
US20050223288A1 (en) * 2004-02-12 2005-10-06 Lockheed Martin Corporation Diagnostic fault detection and isolation
US7801702B2 (en) * 2004-02-12 2010-09-21 Lockheed Martin Corporation Enhanced diagnostic fault detection and isolation
US20050240555A1 (en) * 2004-02-12 2005-10-27 Lockheed Martin Corporation Interactive electronic technical manual system integrated with the system under test
US7584420B2 (en) * 2004-02-12 2009-09-01 Lockheed Martin Corporation Graphical authoring and editing of mark-up language sequences
US7594227B2 (en) * 2004-03-08 2009-09-22 Ab Initio Technology Llc Dependency graph parameter scoping
US20050257219A1 (en) * 2004-04-23 2005-11-17 Holt John M Multiple computer architecture with replicated memory fields
US7844665B2 (en) * 2004-04-23 2010-11-30 Waratek Pty Ltd. Modified computer architecture having coordinated deletion of corresponding replicated memory locations among plural computers
US20050262513A1 (en) * 2004-04-23 2005-11-24 Waratek Pty Limited Modified computer architecture with initialization of objects
US7707179B2 (en) * 2004-04-23 2010-04-27 Waratek Pty Limited Multiple computer architecture with synchronization
US20060095483A1 (en) * 2004-04-23 2006-05-04 Waratek Pty Limited Modified computer architecture with finalization of objects
US7849452B2 (en) * 2004-04-23 2010-12-07 Waratek Pty Ltd. Modification of computer applications at load time for distributed execution
US7657873B2 (en) * 2004-04-29 2010-02-02 Microsoft Corporation Visualizer system and methods for debug environment
US7509618B1 (en) * 2004-05-12 2009-03-24 Altera Corporation Method and apparatus for facilitating an adaptive electronic design automation tool
US7516052B2 (en) * 2004-05-27 2009-04-07 Robert Allen Hatcherson Container-based architecture for simulation of entities in a time domain
AU2005269383A1 (en) * 2004-07-28 2006-02-09 Sd Pharmaceuticals, Inc. Stable injectable composition of alpha tocopheryl succinate, analogues and salts thereof
US7970639B2 (en) * 2004-08-20 2011-06-28 Mark A Vucina Project management systems and methods
US7487501B2 (en) * 2004-08-30 2009-02-03 International Business Machines Corporation Distributed counter and centralized sensor in barrier wait synchronization
US20060120181A1 (en) * 2004-10-05 2006-06-08 Lockheed Martin Corp. Fault detection and isolation with analysis of built-in-test results
US20060085692A1 (en) * 2004-10-06 2006-04-20 Lockheed Martin Corp. Bus fault detection and isolation
US8555286B2 (en) * 2004-10-27 2013-10-08 International Business Machines Corporation Method, system, and apparatus for establishing a software configurable computing environment
WO2006050483A2 (en) * 2004-11-02 2006-05-11 Furtek Automatically deriving logical, arithmetic and timing dependencies
KR100582389B1 (en) * 2004-11-08 2006-05-23 주식회사 팬택앤큐리텔 Wireless Communication Terminal suspending the interrupt at paying using RF mode and its method
US8181182B1 (en) * 2004-11-16 2012-05-15 Oracle America, Inc. Resource allocation brokering in nested containers
US20080052281A1 (en) * 2006-08-23 2008-02-28 Lockheed Martin Corporation Database insertion and retrieval system and method
US8271448B2 (en) * 2005-01-28 2012-09-18 Oracle International Corporation Method for strategizing protocol presumptions in two phase commit coordinator
US20060265704A1 (en) * 2005-04-21 2006-11-23 Holt John M Computer architecture and method of operation for multi-computer distributed processing with synchronization
US7900193B1 (en) * 2005-05-25 2011-03-01 Parasoft Corporation System and method for detecting defects in a computer program using data and control flow analysis
US7454738B2 (en) * 2005-06-10 2008-11-18 Purdue Research Foundation Synthesis approach for active leakage power reduction using dynamic supply gating
US7427025B2 (en) * 2005-07-08 2008-09-23 Lockheed Marlin Corp. Automated postal voting system and method
US7693690B2 (en) * 2005-08-09 2010-04-06 Nec Laboratories America, Inc. Disjunctive image computation for sequential systems
US7958322B2 (en) * 2005-10-25 2011-06-07 Waratek Pty Ltd Multiple machine architecture with overhead reduction
US7660960B2 (en) 2005-10-25 2010-02-09 Waratek Pty, Ltd. Modified machine architecture with partial memory updating
US20070100828A1 (en) * 2005-10-25 2007-05-03 Holt John M Modified machine architecture with machine redundancy
US7849369B2 (en) * 2005-10-25 2010-12-07 Waratek Pty Ltd. Failure resistant multiple computer system and method
US7761670B2 (en) * 2005-10-25 2010-07-20 Waratek Pty Limited Modified machine architecture with advanced synchronization
US8015236B2 (en) * 2005-10-25 2011-09-06 Waratek Pty. Ltd. Replication of objects having non-primitive fields, especially addresses
WO2007065146A2 (en) * 2005-12-02 2007-06-07 Citrix Systems, Inc. Method and apparatus for providing authentication credentials from a proxy server to a virtualized computing environment to access a remote resource
US7924884B2 (en) 2005-12-20 2011-04-12 Citrix Systems, Inc. Performance logging using relative differentials and skip recording
US8448137B2 (en) * 2005-12-30 2013-05-21 Sap Ag Software model integration scenarios
US7849446B2 (en) * 2006-06-09 2010-12-07 Oracle America, Inc. Replay debugging
US8365200B1 (en) 2006-06-30 2013-01-29 Sap Ag Using cancellation status models in a computer system
US8200715B1 (en) 2006-06-30 2012-06-12 Sap Ag Using status models with adaptable process steps in a computer system
US8706776B1 (en) 2006-06-30 2014-04-22 Sap Ag Extending status models in a computer system
US8522261B2 (en) * 2006-06-30 2013-08-27 Sap Ag Using status models with state guards in a computer system
US8694684B2 (en) 2006-08-21 2014-04-08 Citrix Systems, Inc. Systems and methods of symmetric transport control protocol compression
US8949790B2 (en) * 2006-08-30 2015-02-03 International Business Machines Corporation Debugging visual and embedded programs
AU2007299571B2 (en) * 2006-09-20 2013-09-12 National Ict Australia Limited Generating a transition system for use with model checking
WO2008040075A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Contention detection with modified message format
US20080120478A1 (en) * 2006-10-05 2008-05-22 Holt John M Advanced synchronization and contention resolution
US20080133859A1 (en) * 2006-10-05 2008-06-05 Holt John M Advanced synchronization and contention resolution
WO2008040066A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Redundant multiple computer architecture
WO2008040085A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Network protocol for network communications
US20080126703A1 (en) * 2006-10-05 2008-05-29 Holt John M Cyclic redundant multiple computer architecture
WO2008040064A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Switch protocol for network communications
US20080130652A1 (en) * 2006-10-05 2008-06-05 Holt John M Multiple communication networks for multiple computers
US20080134189A1 (en) * 2006-10-05 2008-06-05 Holt John M Job scheduling amongst multiple computers
US20080126502A1 (en) * 2006-10-05 2008-05-29 Holt John M Multiple computer system with dual mode redundancy architecture
JP5318768B2 (en) * 2006-10-05 2013-10-16 ワラテック プロプライエタリー リミテッド Advanced conflict detection
US20080133884A1 (en) * 2006-10-05 2008-06-05 Holt John M Multiple network connections for multiple computers
US20080133861A1 (en) * 2006-10-05 2008-06-05 Holt John M Silent memory reclamation
US7849151B2 (en) * 2006-10-05 2010-12-07 Waratek Pty Ltd. Contention detection
WO2008040063A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Multi-path switching networks
WO2008040070A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Asynchronous data transmission
WO2008040065A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Contention detection and resolution
US20080126322A1 (en) * 2006-10-05 2008-05-29 Holt John M Synchronization with partial memory replication
US7958329B2 (en) * 2006-10-05 2011-06-07 Waratek Pty Ltd Hybrid replicated shared memory
WO2008040083A1 (en) * 2006-10-05 2008-04-10 Waratek Pty Limited Adding one or more computers to a multiple computer system
US20080133692A1 (en) * 2006-10-05 2008-06-05 Holt John M Multiple computer system with redundancy architecture
US20080140973A1 (en) * 2006-10-05 2008-06-12 Holt John M Contention detection with data consolidation
US20080250221A1 (en) * 2006-10-09 2008-10-09 Holt John M Contention detection with data consolidation
US8296737B2 (en) * 2006-11-03 2012-10-23 International Business Machines Corporation Computer program for tracing impact of errors in software applications
US8191052B2 (en) 2006-12-01 2012-05-29 Murex S.A.S. Producer graph oriented programming and execution
US8307337B2 (en) 2006-12-01 2012-11-06 Murex S.A.S. Parallelization and instrumentation in a producer graph oriented programming framework
US8332827B2 (en) 2006-12-01 2012-12-11 Murex S.A.S. Produce graph oriented programming framework with scenario support
US7865872B2 (en) 2006-12-01 2011-01-04 Murex S.A.S. Producer graph oriented programming framework with undo, redo, and abort execution support
US8219650B2 (en) * 2006-12-28 2012-07-10 Sap Ag Communicating with a status management component in a computer system
US20080209405A1 (en) * 2007-02-28 2008-08-28 Microsoft Corporation Distributed debugging for a visual programming language
US7890518B2 (en) * 2007-03-29 2011-02-15 Franz Inc. Method for creating a scalable graph database
US8244772B2 (en) * 2007-03-29 2012-08-14 Franz, Inc. Method for creating a scalable graph database using coordinate data elements
US8316190B2 (en) * 2007-04-06 2012-11-20 Waratek Pty. Ltd. Computer architecture and method of operation for multi-computer distributed processing having redundant array of independent systems with replicated memory and code striping
US7721158B2 (en) * 2007-06-04 2010-05-18 Microsoft Corporation Customization conflict detection and resolution
US8276124B2 (en) * 2007-06-20 2012-09-25 Microsoft Corporation Constructing petri nets from traces for diagnostics
US8533678B2 (en) * 2007-07-13 2013-09-10 Digi International Inc. Embedded device program debug control
US20090064092A1 (en) * 2007-08-29 2009-03-05 Microsoft Corporation Visual programming language optimization
KR101473337B1 (en) * 2007-10-01 2014-12-16 삼성전자 주식회사 Method and Appartus for providing interface compatibility based on a component model
FR2927438B1 (en) * 2008-02-08 2010-03-05 Commissariat Energie Atomique METHOD FOR PRECHARGING IN A MEMORY HIERARCHY CONFIGURATIONS OF A RECONFIGURABLE HETEROGENETIC INFORMATION PROCESSING SYSTEM
US8504980B1 (en) 2008-04-14 2013-08-06 Sap Ag Constraining data changes during transaction processing by a computer system
JP5389902B2 (en) 2008-04-28 2014-01-15 セールスフォース ドット コム インコーポレイティッド An object-oriented system for creating and managing websites and their content
US8473085B2 (en) * 2008-04-30 2013-06-25 Perkinelmer Las, Inc. Mutex-mediated control of spatial access by appliances moveable over a common physical space
JP5195149B2 (en) * 2008-08-11 2013-05-08 富士通株式会社 Authenticity judgment method
US8307345B2 (en) * 2008-11-04 2012-11-06 Ca, Inc. Intelligent engine for dynamic and rule based instrumentation of software
US9703678B2 (en) * 2008-12-23 2017-07-11 Microsoft Technology Licensing, Llc Debugging pipeline for debugging code
US20100235809A1 (en) * 2009-03-12 2010-09-16 Honeywell International Inc. System and method for managing a model-based design lifecycle
US8607189B2 (en) * 2009-05-18 2013-12-10 National Instruments Corporation Dynamic analysis of a graphical program in a browser
US8863088B2 (en) * 2010-02-08 2014-10-14 Red Hat, Inc. Simulating a line of source code in a debugging tool
US20110228696A1 (en) * 2010-03-19 2011-09-22 Navneet Agarwal Dynamic directed acyclic graph (dag) topology reporting
US9111031B2 (en) * 2010-04-16 2015-08-18 Salesforce.Com, Inc. Method and system for simulating and analyzing code execution in an on-demand service environment
US8443342B2 (en) 2010-06-01 2013-05-14 Microsoft Corporation Static analysis using interactive and integration tools
US9223892B2 (en) 2010-09-30 2015-12-29 Salesforce.Com, Inc. Device abstraction for page generation
KR101649925B1 (en) * 2010-10-13 2016-08-31 삼성전자주식회사 Analysis for Single Thread Access of variable in Multi-threaded program
US8935360B2 (en) 2010-12-03 2015-01-13 Salesforce.Com, Inc. Techniques for metadata-driven dynamic content serving
US8296708B1 (en) * 2011-05-24 2012-10-23 Springsoft Inc. Method of constraint-hierarchy-driven IC placement
AT511334B1 (en) * 2011-07-14 2012-11-15 Fronius Int Gmbh WELDING CURRENT SOURCE AND METHOD FOR CONTROLLING THEREOF
US9805094B2 (en) 2011-11-04 2017-10-31 Ipc Systems, Inc. User interface displaying filtered information
EP2610746A1 (en) * 2011-12-30 2013-07-03 bioMérieux Job scheduler for electromechanical system for biological analysis
US9251039B2 (en) 2012-02-17 2016-02-02 Microsoft Technology Licensing, Llc Remote debugging as a service
US8996472B2 (en) 2012-04-16 2015-03-31 Sap Se Verification of status schemas based on business goal definitions
US8996473B2 (en) 2012-08-06 2015-03-31 Sap Se Checking compatibility of extended and core SAM schemas based on complex goals
US10223450B1 (en) * 2013-03-14 2019-03-05 Google Llc Data delivery
US10289406B2 (en) * 2013-04-30 2019-05-14 Entit Software Llc Dependencies between feature flags
US10417594B2 (en) 2013-05-02 2019-09-17 Sap Se Validation of functional correctness of SAM schemas including action chains
US10339229B1 (en) 2013-05-31 2019-07-02 Cadence Design Systems, Inc. Simulation observability and control of all hardware and software components of a virtual platform model of an electronics system
FR3011955B1 (en) * 2013-10-10 2015-10-30 Bull Sas METHOD FOR DEPLOYING AN APPLICATION, CORRESPONDING COMPUTER PROGRAM, SYSTEM FOR DEPLOYING AN APPLICATION, AND INSTALLATION COMPRISING THE DEPLOYMENT SYSTEM
US9098377B1 (en) 2014-05-30 2015-08-04 Semmle Limited Aggregating source code metric values
US10505826B2 (en) * 2014-09-26 2019-12-10 Oracle International Corporation Statistical pattern correlation of events in cloud deployments using codebook approach
US9417985B2 (en) * 2014-11-14 2016-08-16 Semmle Limited Distributed analysis and attribution of source code
US9785777B2 (en) * 2014-12-19 2017-10-10 International Business Machines Corporation Static analysis based on abstract program representations
US11487561B1 (en) 2014-12-24 2022-11-01 Cadence Design Systems, Inc. Post simulation debug and analysis using a system memory model
US10460047B1 (en) * 2015-02-27 2019-10-29 The Mathworks, Inc. Tentative model components
US10834065B1 (en) 2015-03-31 2020-11-10 F5 Networks, Inc. Methods for SSL protected NTLM re-authentication and devices thereof
EP3311265B1 (en) * 2015-06-18 2020-06-17 The Joan & Irwin Jacobs Technion-Cornell Innovation Institute A computing platform and method thereof for searching, executing, and evaluating computational algorithms
US10755590B2 (en) * 2015-06-18 2020-08-25 The Joan and Irwin Jacobs Technion-Cornell Institute Method and system for automatically providing graphical user interfaces for computational algorithms described in printed publications
US10802852B1 (en) * 2015-07-07 2020-10-13 Cadence Design Systems, Inc. Method for interactive embedded software debugging through the control of simulation tracing components
US9720652B2 (en) 2015-08-06 2017-08-01 Symphore, LLC Generating a software complex using superordinate design input
US9547478B1 (en) 2015-09-30 2017-01-17 Semmle Limited Hierarchical dependency analysis enhancements using disjoint-or trees
US9672135B2 (en) * 2015-11-03 2017-06-06 Red Hat, Inc. System, method and apparatus for debugging of reactive applications
EP3402106B1 (en) * 2016-01-04 2019-12-04 Zhejiang Libiao Robots Co., Ltd. Method and system for synchronization between robot and server
US10404698B1 (en) 2016-01-15 2019-09-03 F5 Networks, Inc. Methods for adaptive organization of web application access points in webtops and devices thereof
US11663110B2 (en) * 2016-10-31 2023-05-30 International Business Machines Corporation Analysis to check web API code usage and specification
US11086755B2 (en) * 2017-06-26 2021-08-10 Jpmorgan Chase Bank, N.A. System and method for implementing an application monitoring tool
WO2019099008A1 (en) 2017-11-16 2019-05-23 Hewlett-Packard Development Company, L.P. Software builds using a cloud system
US10649884B2 (en) 2018-02-08 2020-05-12 The Mitre Corporation Methods and system for constrained replay debugging with message communications
US10564940B2 (en) * 2018-05-03 2020-02-18 International Business Machines Corporation Systems and methods for programming drones
CN109508260B (en) * 2018-10-31 2021-11-12 西北工业大学 Reliability modeling and analyzing method for self-repairing processor to lockstep system
US11556374B2 (en) 2019-02-15 2023-01-17 International Business Machines Corporation Compiler-optimized context switching with compiler-inserted data table for in-use register identification at a preferred preemption point
US11514019B1 (en) 2019-12-30 2022-11-29 Cigna Intellectual Property, Inc. Systems and methods for maintaining and updating an event logging database
US11204767B2 (en) 2020-01-06 2021-12-21 International Business Machines Corporation Context switching locations for compiler-assisted context switching
US11762858B2 (en) 2020-03-19 2023-09-19 The Mitre Corporation Systems and methods for analyzing distributed system data streams using declarative specification, detection, and evaluation of happened-before relationships
US11681603B2 (en) 2021-03-31 2023-06-20 International Business Machines Corporation Computer generation of illustrative resolutions for reported operational issues
US20220334836A1 (en) * 2021-04-15 2022-10-20 Dell Products L.P. Sharing of computing resources between computing processes of an information handling system
US11843663B1 (en) * 2023-01-03 2023-12-12 Huawei Cloud Computing Technologies Co., Ltd. Vector-scalar logical clock and associated method, apparatus and system

Citations (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4965743A (en) * 1988-07-14 1990-10-23 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Discrete event simulation tool for analysis of qualitative models of continuous processing system
US5095421A (en) * 1989-08-17 1992-03-10 International Business Machines Corporation Transaction processing facility within an operating system environment
US5428782A (en) * 1989-09-28 1995-06-27 Texas Instruments Incorporated Portable and dynamic distributed applications architecture
US5485617A (en) * 1993-12-13 1996-01-16 Microsoft Corporation Method and system for dynamically generating object connections
US5551035A (en) * 1989-06-30 1996-08-27 Lucent Technologies Inc. Method and apparatus for inter-object communication in an object-oriented program controlled system
US5581691A (en) * 1992-02-04 1996-12-03 Digital Equipment Corporation Work flow management system and method
US5596750A (en) * 1992-06-09 1997-01-21 Bull S.A. System for transactional processing between an information processing server and a plurality of workstations
US5642478A (en) * 1994-12-29 1997-06-24 International Business Machines Corporation Distributed trace data acquisition system
US5694539A (en) * 1994-08-10 1997-12-02 Intrinsa Corporation Computer process resource modelling method and apparatus
US5724508A (en) * 1995-03-09 1998-03-03 Insoft, Inc. Apparatus for collaborative computing
US5737607A (en) * 1995-09-28 1998-04-07 Sun Microsystems, Inc. Method and apparatus for allowing generic stubs to marshal and unmarshal data in object reference specific data formats
US5790778A (en) * 1996-08-07 1998-08-04 Intrinsa Corporation Simulated program execution error detection method and apparatus
US5794046A (en) * 1994-09-29 1998-08-11 International Business Machines Corporation Method and system for debugging parallel and distributed applications
US5819270A (en) * 1993-02-25 1998-10-06 Massachusetts Institute Of Technology Computer system for displaying representations of processes
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US5920717A (en) * 1995-12-20 1999-07-06 Nec Corporation Method and apparatus for automated program-generation
US5933639A (en) * 1996-05-17 1999-08-03 International Business Machines Corporation System and method for debugging distributed programs
US5941945A (en) * 1997-06-18 1999-08-24 International Business Machines Corporation Interest-based collaborative framework
US5949998A (en) * 1996-07-03 1999-09-07 Sun Microsystems, Inc. Filtering an object interface definition to determine services needed and provided
US5980096A (en) * 1995-01-17 1999-11-09 Intertech Ventures, Ltd. Computer-based system, methods and graphical interface for information storage, modeling and stimulation of complex systems
US5999728A (en) * 1996-07-30 1999-12-07 Sun Microsystems, Inc. Method and apparatus for enhancing the portability of an object oriented interface among multiple platforms
US6003037A (en) * 1995-11-14 1999-12-14 Progress Software Corporation Smart objects for development of object oriented software
US6038381A (en) * 1997-11-25 2000-03-14 Synopsys, Inc. Method and system for determining a signal that controls the application of operands to a circuit-implemented function for power savings
US6044211A (en) * 1994-03-14 2000-03-28 C.A.E. Plus, Inc. Method for graphically representing a digital device as a behavioral description with data and control flow elements, and for converting the behavioral description to a structural description
US6052527A (en) * 1997-02-21 2000-04-18 Alcatel Method of generating platform-independent software application programs
US6083281A (en) * 1997-11-14 2000-07-04 Nortel Networks Corporation Process and apparatus for tracing software entities in a distributed system
US6125392A (en) * 1996-10-11 2000-09-26 Intel Corporation Method and apparatus for high speed event log data compression within a non-volatile storage area
US6134676A (en) * 1998-04-30 2000-10-17 International Business Machines Corporation Programmable hardware event monitoring method
US6145099A (en) * 1996-08-13 2000-11-07 Nec Corporation Debugging system
US6192419B1 (en) * 1997-06-18 2001-02-20 International Business Machines Corporation Collaborative framework for disparate application programs
US6298476B1 (en) * 1995-12-04 2001-10-02 International Business Machines Corporation Object oriented software build framework mechanism
US6314555B1 (en) * 1997-07-25 2001-11-06 British Telecommunications Public Limited Company Software system generation
US6317773B1 (en) * 1994-10-11 2001-11-13 International Business Machines Corporation System and method for creating an object oriented transaction service that interoperates with procedural transaction coordinators
US6340977B1 (en) * 1999-05-07 2002-01-22 Philip Lui System and method for dynamic assistance in software applications using behavior and host application models
US6347374B1 (en) * 1998-06-05 2002-02-12 Intrusion.Com, Inc. Event detection
US20020078431A1 (en) * 2000-02-03 2002-06-20 Reps Thomas W. Method for representing information in a highly compressed fashion
US6470482B1 (en) * 1990-04-06 2002-10-22 Lsi Logic Corporation Method and system for creating, deriving and validating structural description of electronic system from higher level, behavior-oriented description, including interactive schematic design and simulation
US6470388B1 (en) * 1999-06-10 2002-10-22 Cisco Technology, Inc. Coordinated extendable system for logging information from distributed applications
US6539501B1 (en) * 1999-12-16 2003-03-25 International Business Machines Corporation Method, system, and program for logging statements to monitor execution of a program
US6567818B1 (en) * 1999-06-14 2003-05-20 International Business Machines Corporation Employing management policies to manage instances of objects
US6665819B1 (en) * 2000-04-24 2003-12-16 Microsoft Corporation Data capture and analysis for embedded systems
US6701382B1 (en) * 1998-12-23 2004-03-02 Nortel Networks Limited Name service for transparent container objects
US6718294B1 (en) * 2000-05-16 2004-04-06 Mindspeed Technologies, Inc. System and method for synchronized control of system simulators with multiple processor cores

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5168554A (en) * 1989-10-13 1992-12-01 International Business Machines Corporation Converting trace data from processors executing in parallel into graphical form
EP0444315B1 (en) * 1990-02-26 1997-10-01 Digital Equipment Corporation System and method for software application event collection
IT1292052B1 (en) 1997-05-30 1999-01-25 Sace Spa PROCEDURE FOR PARTITIONING CONTROL FUNCTIONS IN DISTRIBUTED SYSTEMS
JP2001195406A (en) * 2000-01-06 2001-07-19 Media Fusion Co Ltd Database management system
US6523020B1 (en) * 2000-03-22 2003-02-18 International Business Machines Corporation Lightweight rule induction

Patent Citations (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4965743A (en) * 1988-07-14 1990-10-23 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Discrete event simulation tool for analysis of qualitative models of continuous processing system
US5551035A (en) * 1989-06-30 1996-08-27 Lucent Technologies Inc. Method and apparatus for inter-object communication in an object-oriented program controlled system
US5095421A (en) * 1989-08-17 1992-03-10 International Business Machines Corporation Transaction processing facility within an operating system environment
US5428782A (en) * 1989-09-28 1995-06-27 Texas Instruments Incorporated Portable and dynamic distributed applications architecture
US6470482B1 (en) * 1990-04-06 2002-10-22 Lsi Logic Corporation Method and system for creating, deriving and validating structural description of electronic system from higher level, behavior-oriented description, including interactive schematic design and simulation
US5581691A (en) * 1992-02-04 1996-12-03 Digital Equipment Corporation Work flow management system and method
US5596750A (en) * 1992-06-09 1997-01-21 Bull S.A. System for transactional processing between an information processing server and a plurality of workstations
US5819270A (en) * 1993-02-25 1998-10-06 Massachusetts Institute Of Technology Computer system for displaying representations of processes
US5485617A (en) * 1993-12-13 1996-01-16 Microsoft Corporation Method and system for dynamically generating object connections
US6044211A (en) * 1994-03-14 2000-03-28 C.A.E. Plus, Inc. Method for graphically representing a digital device as a behavioral description with data and control flow elements, and for converting the behavioral description to a structural description
US5694539A (en) * 1994-08-10 1997-12-02 Intrinsa Corporation Computer process resource modelling method and apparatus
US6154876A (en) * 1994-08-10 2000-11-28 Intrinsa Corporation Analysis of the effect of program execution of calling components with data variable checkpointing and resource allocation analysis
US5794046A (en) * 1994-09-29 1998-08-11 International Business Machines Corporation Method and system for debugging parallel and distributed applications
US6317773B1 (en) * 1994-10-11 2001-11-13 International Business Machines Corporation System and method for creating an object oriented transaction service that interoperates with procedural transaction coordinators
US5642478A (en) * 1994-12-29 1997-06-24 International Business Machines Corporation Distributed trace data acquisition system
US5980096A (en) * 1995-01-17 1999-11-09 Intertech Ventures, Ltd. Computer-based system, methods and graphical interface for information storage, modeling and stimulation of complex systems
US5724508A (en) * 1995-03-09 1998-03-03 Insoft, Inc. Apparatus for collaborative computing
US5737607A (en) * 1995-09-28 1998-04-07 Sun Microsystems, Inc. Method and apparatus for allowing generic stubs to marshal and unmarshal data in object reference specific data formats
US5870588A (en) * 1995-10-23 1999-02-09 Interuniversitair Micro-Elektronica Centrum(Imec Vzw) Design environment and a design method for hardware/software co-design
US6003037A (en) * 1995-11-14 1999-12-14 Progress Software Corporation Smart objects for development of object oriented software
US6298476B1 (en) * 1995-12-04 2001-10-02 International Business Machines Corporation Object oriented software build framework mechanism
US5920717A (en) * 1995-12-20 1999-07-06 Nec Corporation Method and apparatus for automated program-generation
US5933639A (en) * 1996-05-17 1999-08-03 International Business Machines Corporation System and method for debugging distributed programs
US6083277A (en) * 1996-07-03 2000-07-04 Sun Microsystems, Inc. Filtering an object interface definition to determine services needed and provided
US5949998A (en) * 1996-07-03 1999-09-07 Sun Microsystems, Inc. Filtering an object interface definition to determine services needed and provided
US5999728A (en) * 1996-07-30 1999-12-07 Sun Microsystems, Inc. Method and apparatus for enhancing the portability of an object oriented interface among multiple platforms
US5790778A (en) * 1996-08-07 1998-08-04 Intrinsa Corporation Simulated program execution error detection method and apparatus
US6145099A (en) * 1996-08-13 2000-11-07 Nec Corporation Debugging system
US6125392A (en) * 1996-10-11 2000-09-26 Intel Corporation Method and apparatus for high speed event log data compression within a non-volatile storage area
US6052527A (en) * 1997-02-21 2000-04-18 Alcatel Method of generating platform-independent software application programs
US5941945A (en) * 1997-06-18 1999-08-24 International Business Machines Corporation Interest-based collaborative framework
US6192419B1 (en) * 1997-06-18 2001-02-20 International Business Machines Corporation Collaborative framework for disparate application programs
US6314555B1 (en) * 1997-07-25 2001-11-06 British Telecommunications Public Limited Company Software system generation
US6083281A (en) * 1997-11-14 2000-07-04 Nortel Networks Corporation Process and apparatus for tracing software entities in a distributed system
US6038381A (en) * 1997-11-25 2000-03-14 Synopsys, Inc. Method and system for determining a signal that controls the application of operands to a circuit-implemented function for power savings
US6134676A (en) * 1998-04-30 2000-10-17 International Business Machines Corporation Programmable hardware event monitoring method
US6347374B1 (en) * 1998-06-05 2002-02-12 Intrusion.Com, Inc. Event detection
US6701382B1 (en) * 1998-12-23 2004-03-02 Nortel Networks Limited Name service for transparent container objects
US6340977B1 (en) * 1999-05-07 2002-01-22 Philip Lui System and method for dynamic assistance in software applications using behavior and host application models
US6470388B1 (en) * 1999-06-10 2002-10-22 Cisco Technology, Inc. Coordinated extendable system for logging information from distributed applications
US6567818B1 (en) * 1999-06-14 2003-05-20 International Business Machines Corporation Employing management policies to manage instances of objects
US6539501B1 (en) * 1999-12-16 2003-03-25 International Business Machines Corporation Method, system, and program for logging statements to monitor execution of a program
US20020078431A1 (en) * 2000-02-03 2002-06-20 Reps Thomas W. Method for representing information in a highly compressed fashion
US6665819B1 (en) * 2000-04-24 2003-12-16 Microsoft Corporation Data capture and analysis for embedded systems
US6718294B1 (en) * 2000-05-16 2004-04-06 Mindspeed Technologies, Inc. System and method for synchronized control of system simulators with multiple processor cores

Cited By (193)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7133820B2 (en) * 2000-03-15 2006-11-07 Arc International Method and apparatus for debugging programs in a distributed environment
US20020073401A1 (en) * 2000-12-07 2002-06-13 International Business Machines Corporation Method of detecting zombie breakpoints
US6961923B2 (en) * 2000-12-07 2005-11-01 International Business Machines Corporation Method of detecting zombie breakpoints
US20030004671A1 (en) * 2001-06-28 2003-01-02 Mitsubishi Denki Kabushiki Kaisha Remote debugging apparatus for executing procedure preregistered in database at program breakpoint
US7426717B1 (en) * 2001-11-27 2008-09-16 Adobe Systems Incorporated System and method for debugging files in a runtime environment
US7233336B2 (en) 2001-12-10 2007-06-19 American Megatrends, Inc. Systems and methods for capturing screen displays from a host computing system for display at a remote terminal
US20050046637A1 (en) * 2001-12-10 2005-03-03 American Megatrends, Inc. Systems and methods for capturing screen displays from a host computing system for display at a remote terminal
US8341407B2 (en) 2001-12-12 2012-12-25 Guardian Data Storage, Llc Method and system for protecting electronic data in enterprise environment
US7921450B1 (en) 2001-12-12 2011-04-05 Klimenty Vainstein Security system using indirect key generation from access rules and methods therefor
US9129120B2 (en) 2001-12-12 2015-09-08 Intellectual Ventures I Llc Methods and systems for providing access control to secured data
US7921284B1 (en) 2001-12-12 2011-04-05 Gary Mark Kinghorn Method and system for protecting electronic data in enterprise environment
US8543827B2 (en) 2001-12-12 2013-09-24 Intellectual Ventures I Llc Methods and systems for providing access control to secured data
US7921288B1 (en) 2001-12-12 2011-04-05 Hildebrand Hal S System and method for providing different levels of key security for controlling access to secured items
US7930756B1 (en) 2001-12-12 2011-04-19 Crocker Steven Toye Multi-level cryptographic transformations for securing digital assets
US7913311B2 (en) 2001-12-12 2011-03-22 Rossmann Alain Methods and systems for providing access control to electronic data
US9542560B2 (en) 2001-12-12 2017-01-10 Intellectual Ventures I Llc Methods and systems for providing access control to secured data
US8918839B2 (en) 2001-12-12 2014-12-23 Intellectual Ventures I Llc System and method for providing multi-location access management to secured items
US8006280B1 (en) 2001-12-12 2011-08-23 Hildebrand Hal S Security system for generating keys from access rules in a decentralized manner and methods therefor
US10033700B2 (en) 2001-12-12 2018-07-24 Intellectual Ventures I Llc Dynamic evaluation of access rights
US8266674B2 (en) 2001-12-12 2012-09-11 Guardian Data Storage, Llc Method and system for implementing changes to security policies in a distributed security system
US10229279B2 (en) 2001-12-12 2019-03-12 Intellectual Ventures I Llc Methods and systems for providing access control to secured data
US8341406B2 (en) 2001-12-12 2012-12-25 Guardian Data Storage, Llc System and method for providing different levels of key security for controlling access to secured items
USRE43906E1 (en) 2001-12-12 2013-01-01 Guardian Data Storage Llc Method and apparatus for securing digital assets
US10360545B2 (en) 2001-12-12 2019-07-23 Guardian Data Storage, Llc Method and apparatus for accessing secured electronic data off-line
US10769288B2 (en) 2001-12-12 2020-09-08 Intellectual Property Ventures I Llc Methods and systems for providing access control to secured data
US7950066B1 (en) 2001-12-21 2011-05-24 Guardian Data Storage, Llc Method and system for restricting use of a clipboard application
US8943316B2 (en) 2002-02-12 2015-01-27 Intellectual Ventures I Llc Document security system that permits external users to gain access to secured files
US7296259B2 (en) * 2002-09-11 2007-11-13 Agere Systems Inc. Processor system with cache-based software breakpoints
US20040049712A1 (en) * 2002-09-11 2004-03-11 Betker Michael Richard Processor system with cache-based software breakpoints
US20040222944A1 (en) * 2002-09-20 2004-11-11 American Megatrands, Inc. In-line video, keyboard and mouse remote management unit
US20040059782A1 (en) * 2002-09-20 2004-03-25 American Megatrends, Inc. Systems and methods for establishing interaction between a local computer and a remote computer
US7260624B2 (en) 2002-09-20 2007-08-21 American Megatrends, Inc. Systems and methods for establishing interaction between a local computer and a remote computer
US7454490B2 (en) 2002-09-20 2008-11-18 American Megatrends, Inc. In-line video, keyboard and mouse remote management unit
USRE47443E1 (en) 2002-09-30 2019-06-18 Intellectual Ventures I Llc Document security system that permits external users to gain access to secured files
US8176334B2 (en) 2002-09-30 2012-05-08 Guardian Data Storage, Llc Document security system that permits external users to gain access to secured files
US20040117768A1 (en) * 2002-12-12 2004-06-17 International Business Machines Corporation System and method on generating multi-dimensional trace files and visualizing them using multiple Gantt charts
US7131113B2 (en) * 2002-12-12 2006-10-31 International Business Machines Corporation System and method on generating multi-dimensional trace files and visualizing them using multiple Gantt charts
US20040181781A1 (en) * 2003-02-28 2004-09-16 Denso Corporation Method, apparatus and program for testing control program
US7441235B2 (en) * 2003-02-28 2008-10-21 Denso Corporation Method, apparatus and program for testing control program
US20040190773A1 (en) * 2003-03-31 2004-09-30 American Megatrends, Inc. Method, apparatus, and computer-readable medium for identifying character coordinates
US7418141B2 (en) 2003-03-31 2008-08-26 American Megatrends, Inc. Method, apparatus, and computer-readable medium for identifying character coordinates
US7117483B2 (en) * 2003-04-15 2006-10-03 Microsoft Corporation Server debugging framework using scripts
US20040210872A1 (en) * 2003-04-15 2004-10-21 Dorr Robert A. Server debugging framework using scripts
US7412625B2 (en) * 2003-05-27 2008-08-12 American Megatrends, Inc. Method and system for remote software debugging
US20040243883A1 (en) * 2003-05-27 2004-12-02 American Megatrends, Inc. Method and system for remote software debugging
US8707034B1 (en) 2003-05-30 2014-04-22 Intellectual Ventures I Llc Method and system for using remote headers to secure electronic files
US8539435B1 (en) 2003-06-16 2013-09-17 American Megatrends, Inc. Method and system for remote software testing
US20040255276A1 (en) * 2003-06-16 2004-12-16 Gene Rovang Method and system for remote software testing
US7546584B2 (en) 2003-06-16 2009-06-09 American Megatrends, Inc. Method and system for remote software testing
US7945899B2 (en) 2003-06-16 2011-05-17 American Megatrends, Inc. Method and system for remote software testing
US8046743B1 (en) 2003-06-27 2011-10-25 American Megatrends, Inc. Method and system for remote software debugging
US8898638B1 (en) 2003-06-27 2014-11-25 American Megatrends, Inc. Method and system for remote software debugging
US7543277B1 (en) 2003-06-27 2009-06-02 American Megatrends, Inc. Method and system for remote software debugging
US7472184B2 (en) 2003-09-19 2008-12-30 International Business Machines Corporation Framework for restricting resources consumed by ghost agents
US8024713B2 (en) 2003-09-19 2011-09-20 International Business Machines Corporation Using ghost agents in an environment supported by customer service providers
US20050065992A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Restricting resources consumed by ghost agents
US7386837B2 (en) * 2003-09-19 2008-06-10 International Business Machines Corporation Using ghost agents in an environment supported by customer service providers
US20050065994A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Framework for restricting resources consumed by ghost agents
US8312466B2 (en) 2003-09-19 2012-11-13 International Business Machines Corporation Restricting resources consumed by ghost agents
US20050065803A1 (en) * 2003-09-19 2005-03-24 International Business Machines Corporation Using ghost agents in an environment supported by customer service providers
US7480914B2 (en) 2003-09-19 2009-01-20 International Business Machines Corporation Restricting resources consumed by ghost agents
US20080104578A1 (en) * 2003-09-19 2008-05-01 International Business Machines Corporation Using ghost agents in an environment supported by customer service providers
US20090083749A1 (en) * 2003-09-19 2009-03-26 International Business Machines Corporation Restricting resources consumed by ghost agents
US7246056B1 (en) * 2003-09-26 2007-07-17 The Mathworks, Inc. Runtime parameter mapping for system simulation
US8327138B2 (en) 2003-09-30 2012-12-04 Guardian Data Storage Llc Method and system for securing digital assets using process-driven security policies
US8127366B2 (en) 2003-09-30 2012-02-28 Guardian Data Storage, Llc Method and apparatus for transitioning between states of security policies used to secure electronic documents
US8739302B2 (en) 2003-09-30 2014-05-27 Intellectual Ventures I Llc Method and apparatus for transitioning between states of security policies used to secure electronic documents
US20050097537A1 (en) * 2003-10-30 2005-05-05 Laura Joseph G. System and method for distributed processing in COBOL
US8090564B1 (en) 2003-11-03 2012-01-03 Synopsys, Inc. Automatic generation of transaction level bus simulation instructions from bus protocol
US7921412B1 (en) * 2003-11-26 2011-04-05 Sprint Communications Company L.P. Application monitor system and method
US7702909B2 (en) * 2003-12-22 2010-04-20 Klimenty Vainstein Method and system for validating timestamps
US20110015918A1 (en) * 2004-03-01 2011-01-20 American Megatrends, Inc. Method, system, and apparatus for communicating with a computer management device
US7827258B1 (en) 2004-03-01 2010-11-02 American Megatrends, Inc. Method, system, and apparatus for communicating with a computer management device
US8359384B2 (en) 2004-03-01 2013-01-22 American Megatrends, Inc. Method, system, and apparatus for communicating with a computer management device
US20050288915A1 (en) * 2004-06-28 2005-12-29 Graniteedge Networks Determining event causality including employment of causal chains
US20050288916A1 (en) * 2004-06-28 2005-12-29 Graniteedge Networks Determining event causality including employment of partitioned event space
US7363203B2 (en) 2004-06-28 2008-04-22 Graniteedge Networks Determining event causality including employment of partitioned event space
US8914777B2 (en) * 2004-07-23 2014-12-16 Green Hills Software Forward post-execution software debugger
US20120317550A1 (en) * 2004-07-23 2012-12-13 Green Hills Software, Inc Forward post-execution software debugger
US8001302B2 (en) 2004-08-25 2011-08-16 American Megatrends, Inc. Redirecting input and output for multiple computers
US20110066773A1 (en) * 2004-08-25 2011-03-17 American Megatrends, Inc. Redirecting input and output for multiple computers
US7840728B1 (en) 2004-08-25 2010-11-23 American Megatrends, Inc. Redirecting input and output for multiple computers
US7861020B1 (en) 2004-08-25 2010-12-28 American Megatrends, Inc. Redirecting input and output for multiple computers
US7519749B1 (en) 2004-08-25 2009-04-14 American Megatrends, Inc. Redirecting input and output for multiple computers
US7793019B1 (en) 2004-08-25 2010-09-07 American Megatrends, Inc. Redirecting input and output for multiple computers
US20110119043A1 (en) * 2005-02-25 2011-05-19 Coware, Inc. Interface converter for unified view of multiple computer system simulations
US20060195825A1 (en) * 2005-02-25 2006-08-31 Niels Vanspauwen Method and system for dynamically adjusting speed versus accuracy of computer platform simulation
US7742905B2 (en) 2005-02-25 2010-06-22 Coware, Inc. Method and system for dynamically adjusting speed versus accuracy of computer platform simulation
US20110035201A1 (en) * 2005-02-25 2011-02-10 Synopsys, Inc. Method for dynamically adjusting speed versus accuracy of computer platform simulation
US8484006B2 (en) * 2005-02-25 2013-07-09 Synopsys, Inc. Method for dynamically adjusting speed versus accuracy of computer platform simulation
US20060195821A1 (en) * 2005-02-25 2006-08-31 Niels Vanspauwen Interface converter for unified view of multiple computer system simulations
US7716031B2 (en) * 2005-02-25 2010-05-11 Coware, Inc. Interface converter for unified view of multiple computer system simulations
US8903703B2 (en) 2005-02-25 2014-12-02 Synopsys, Inc. Dynamically adjusting speed versus accuracy of computer platform simulation
US8793115B2 (en) * 2005-02-25 2014-07-29 Synopsys, Inc. Interface converter for unified view of multiple computer system simulations
WO2006099446A2 (en) * 2005-03-11 2006-09-21 Argade Pramod V Environment for controlling the execution of computer programs
WO2006099446A3 (en) * 2005-03-11 2007-04-26 Pramod V Argade Environment for controlling the execution of computer programs
US7921429B2 (en) 2005-06-09 2011-04-05 Whirlpool Corporation Data acquisition method with event notification for an appliance
US8040234B2 (en) 2005-06-09 2011-10-18 Whirlpool Corporation Method and apparatus for remote service of an appliance
US7917914B2 (en) 2005-06-09 2011-03-29 Whirlpool Corporation Event notification system for an appliance
US20080137670A1 (en) * 2005-06-09 2008-06-12 Whirlpool Corporation Network System with Message Binding for Appliances
US20070240173A1 (en) * 2005-06-09 2007-10-11 Whirlpool Corporation Data acquisition engine and system for an appliance
US20070156882A1 (en) * 2005-06-09 2007-07-05 Whirlpool Corporation Data acquisition engine and system for an appliance
US20070162158A1 (en) * 2005-06-09 2007-07-12 Whirlpool Corporation Software architecture system and method for operating an appliance utilizing configurable notification messages
US7487241B2 (en) 2005-08-05 2009-02-03 Vantos, Inc. Performing efficient insertions in wavefront table based causal graphs
US20070032986A1 (en) * 2005-08-05 2007-02-08 Graniteedge Networks Efficient filtered causal graph edge detection in a causal wavefront environment
US7698691B2 (en) * 2005-09-20 2010-04-13 Microsoft Corporation Server application state
US20070067754A1 (en) * 2005-09-20 2007-03-22 Microsoft Corporation Server application state
US20070169055A1 (en) * 2005-12-12 2007-07-19 Bernd Greifeneder Method and system for automated analysis of the performance of remote method invocations in multi-tier applications using bytecode instrumentation
US8402443B2 (en) * 2005-12-12 2013-03-19 dyna Trace software GmbH Method and system for automated analysis of the performance of remote method invocations in multi-tier applications using bytecode instrumentation
US20070168975A1 (en) * 2005-12-13 2007-07-19 Thomas Kessler Debugger and test tool
US8566644B1 (en) 2005-12-14 2013-10-22 American Megatrends, Inc. System and method for debugging a target computer using SMBus
US8010843B2 (en) 2005-12-14 2011-08-30 American Megatrends, Inc. System and method for debugging a target computer using SMBus
US8543367B1 (en) 2006-02-16 2013-09-24 Synopsys, Inc. Simulation with dynamic run-time accuracy adjustment
US7899661B2 (en) 2006-02-16 2011-03-01 Synopsys, Inc. Run-time switching for simulation with dynamic run-time accuracy adjustment
US8521499B1 (en) 2006-02-16 2013-08-27 Synopsys, Inc. Run-time switching for simulation with dynamic run-time accuracy adjustment
US20070192079A1 (en) * 2006-02-16 2007-08-16 Karl Van Rompaey Run-time switching for simulation with dynamic run-time accuracy adjustment
US9471727B2 (en) 2006-02-16 2016-10-18 Synopsys, Inc. Simulation with dynamic run-time accuracy adjustment
US20070294051A1 (en) * 2006-06-15 2007-12-20 Microsoft Corporation Declaration and Consumption of A Causality Model for Probable Cause Analysis
US7801712B2 (en) * 2006-06-15 2010-09-21 Microsoft Corporation Declaration and consumption of a causality model for probable cause analysis
US7653881B2 (en) 2006-06-19 2010-01-26 Microsoft Corporation Failure handling and debugging with causalities
US7664997B2 (en) 2006-06-19 2010-02-16 Microsoft Corporation Failure handling and debugging with causalities
US20080010564A1 (en) * 2006-06-19 2008-01-10 Microsoft Corporation Failure handling and debugging with causalities
US9231858B1 (en) 2006-08-11 2016-01-05 Dynatrace Software Gmbh Completeness detection of monitored globally distributed synchronous and asynchronous transactions
US7783799B1 (en) 2006-08-31 2010-08-24 American Megatrends, Inc. Remotely controllable switch and testing methods using same
US7979610B2 (en) 2006-08-31 2011-07-12 American Megatrends, Inc. Remotely controllable switch and testing methods using same
US20110040904A1 (en) * 2006-08-31 2011-02-17 American Megatrends, Inc. Remotely controllable switch and testing methods using same
US10769047B2 (en) 2006-10-31 2020-09-08 Microsoft Technology Licensing, Llc Stepping and application state viewing between points
US9355012B2 (en) 2006-10-31 2016-05-31 Microsoft Technology Licensing, Llc Stepping and application state viewing between points
US8429613B2 (en) * 2006-10-31 2013-04-23 Microsoft Corporation Stepping and application state viewing between points
US20080120605A1 (en) * 2006-10-31 2008-05-22 Microsoft Corporation Stepping and application state viewing between points
US8495592B2 (en) * 2006-11-28 2013-07-23 International Business Machines Corporation Presenting completion progress status of an installer via join points
US20080126974A1 (en) * 2006-11-28 2008-05-29 Fawcett Bradley W Presenting completion progress status of an installer via join points
US8065688B2 (en) * 2007-01-23 2011-11-22 Microsoft Corporation Transparently capturing the causal relationships between requests across distributed applications
US8135572B2 (en) * 2007-01-23 2012-03-13 Microsoft Corporation Integrated debugger simulator
US20080177525A1 (en) * 2007-01-23 2008-07-24 Microsoft Corporation Integrated debugger simulator
US20080178195A1 (en) * 2007-01-23 2008-07-24 Microsoft Corporation Transparently capturing the causal relationships between requests across distributed applications
US7707459B2 (en) 2007-03-08 2010-04-27 Whirlpool Corporation Embedded systems debugging
US7917900B2 (en) * 2007-03-30 2011-03-29 Microsoft Corporation Enabling analysis of software source code
US20080244539A1 (en) * 2007-03-30 2008-10-02 Microsoft Corporation Enabling analysis of software source code
US20080276227A1 (en) * 2007-05-06 2008-11-06 Bernd Greifeneder Method and System for Adaptive, Generic Code Instrumentation using Run-time or Load-time generated Inheritance Information for Diagnosis and Monitoring Application Performance and Failure
US8464225B2 (en) 2007-05-06 2013-06-11 Dynatrace Software Gmbh Method and system for adaptive, generic code instrumentation using run-time or load-time generated inheritance information for diagnosis and monitoring application performance and failure
US9047412B2 (en) 2007-05-06 2015-06-02 Dynatrace Corporation System and method for extracting instrumentation relevant inheritance relationships for a distributed, inheritance rule based instrumentation system
US8752065B2 (en) 2007-05-31 2014-06-10 Red Hat, Inc. Rules engine for a persistent message store
US7937497B2 (en) 2007-05-31 2011-05-03 Red Hat, Inc. Apparatus for selectively copying at least portions of messages in a distributed computing system
US20080301707A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Rules engine for a persistent message store
US20080301286A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Persistent message store
US20080301251A1 (en) * 2007-05-31 2008-12-04 Mark Cameron Little Debugging in a distributed system
US7788542B2 (en) * 2007-05-31 2010-08-31 Red Hat, Inc. Debugging in a distributed system
US7925487B2 (en) 2007-06-29 2011-04-12 Microsoft Corporation Replaying distributed systems
US20090006064A1 (en) * 2007-06-29 2009-01-01 Microsoft Corporation Replaying Distributed Systems
US8826242B2 (en) * 2007-11-27 2014-09-02 Microsoft Corporation Data driven profiling for distributed applications
US20140325062A1 (en) * 2007-11-27 2014-10-30 Microsoft Corporation Data-driven profiling for distributed applications
US10050848B2 (en) * 2007-11-27 2018-08-14 Microsoft Technology Licensing, Llc Data-driven profiling for distributed applications
US20090138858A1 (en) * 2007-11-27 2009-05-28 Microsoft Corporation Data Driven Profiling for Distributed Applications
US8121824B2 (en) 2008-03-28 2012-02-21 Microsoft Corporation Predicate checking for distributed systems
US20110178788A1 (en) * 2008-03-28 2011-07-21 Microsoft Corporation Predicate Checking for Distributed Systems
US7933759B2 (en) 2008-03-28 2011-04-26 Microsoft Corporation Predicate checking for distributed systems
US20090319993A1 (en) * 2008-06-24 2009-12-24 Microsoft Corporation, Generalized and extensible software architecture representation
US7747742B2 (en) 2008-06-27 2010-06-29 Microsoft Corporation Online predicate checking for distributed systems
US8589881B2 (en) * 2009-08-03 2013-11-19 Knu-Industry Cooperation Foundation Web-based software debugging apparatus and method for remote debugging
US20110252404A1 (en) * 2009-08-03 2011-10-13 Knu-Industry Cooperation Foundation Web-based software debugging apparatus and method for remote debugging
US20110066894A1 (en) * 2009-09-14 2011-03-17 Myspace, Inc. Debugging a map reduce application on a cluster
US8572575B2 (en) * 2009-09-14 2013-10-29 Myspace Llc Debugging a map reduce application on a cluster
US8533687B1 (en) 2009-11-30 2013-09-10 dynaTrade Software GmbH Methods and system for global real-time transaction tracing
US20110131450A1 (en) * 2009-11-30 2011-06-02 Microsoft Corporation Using synchronized event types for testing an application
US20130030568A1 (en) * 2010-04-23 2013-01-31 Samsung Heavy Ind. Co., Ltd. Robot system control method and a device therefor
US20120253857A1 (en) * 2011-03-28 2012-10-04 Infosys Technologies Limited Structured methods for business process unification
US9274919B2 (en) 2011-04-29 2016-03-01 Dynatrace Software Gmbh Transaction tracing mechanism of distributed heterogenous transactions having instrumented byte code with constant memory consumption and independent of instrumented method call depth
US9811362B2 (en) 2011-04-29 2017-11-07 Dynatrace Software Gmbh Method and system for transaction controlled sampling of distributed heterogeneous transactions without source code modifications
US8924939B2 (en) 2012-05-09 2014-12-30 International Business Machines Corporation Streams debugging within a windowing condition
US8924940B2 (en) 2012-05-09 2014-12-30 International Business Machines Corporation Streams debugging within a windowing condition
US20130339931A1 (en) * 2012-06-19 2013-12-19 Sap Ag Application trace replay and simulation systems and methods
US8898643B2 (en) * 2012-06-19 2014-11-25 Sap Se Application trace replay and simulation systems and methods
US20140040897A1 (en) * 2012-08-04 2014-02-06 Microsoft Corporation Function Evaluation using Lightweight Process Snapshots
US9710357B2 (en) * 2012-08-04 2017-07-18 Microsoft Technology Licensing, Llc Function evaluation using lightweight process snapshots
US9146829B1 (en) * 2013-01-03 2015-09-29 Amazon Technologies, Inc. Analysis and verification of distributed applications
US9804945B1 (en) 2013-01-03 2017-10-31 Amazon Technologies, Inc. Determinism for distributed applications
US9448820B1 (en) 2013-01-03 2016-09-20 Amazon Technologies, Inc. Constraint verification for distributed applications
US10169171B2 (en) 2013-05-13 2019-01-01 Nxp Usa, Inc. Method and apparatus for enabling temporal alignment of debug information
US20170097856A1 (en) * 2013-07-18 2017-04-06 International Business Machines Corporation Monitoring system noises in parallel computer systems
US9558095B2 (en) 2013-07-18 2017-01-31 International Business Machines Corporation Monitoring system noises in parallel computer systems
US10203996B2 (en) * 2013-07-18 2019-02-12 International Business Machines Corporation Filtering system noises in parallel computer system during thread synchronization
US20150026687A1 (en) * 2013-07-18 2015-01-22 International Business Machines Corporation Monitoring system noises in parallel computer systems
US9361202B2 (en) * 2013-07-18 2016-06-07 International Business Machines Corporation Filtering system noises in parallel computer systems during thread synchronization
US9235384B2 (en) 2013-09-20 2016-01-12 Axure Software Solutions, Inc. Language notification generator
US9727394B2 (en) 2015-04-27 2017-08-08 Microsoft Technology Licensing, Llc Establishing causality order of computer trace records
US9779012B1 (en) * 2016-02-26 2017-10-03 Mbit Wireless, Inc. Dynamic and global in-system debugger
US10268568B2 (en) * 2016-03-29 2019-04-23 Infosys Limited System and method for data element tracing
US10416974B2 (en) * 2017-10-06 2019-09-17 Chicago Mercantile Exchange Inc. Dynamic tracer message logging based on bottleneck detection
US10990366B2 (en) 2017-10-06 2021-04-27 Chicago Mercantile Exchange Inc. Dynamic tracer message logging based on bottleneck detection
US11520569B2 (en) 2017-10-06 2022-12-06 Chicago Mercantile Exchange Inc. Dynamic tracer message logging based on bottleneck detection
US10430321B1 (en) * 2018-08-21 2019-10-01 International Business Machines Corporation White box code concurrency testing for transaction processing
US10956311B2 (en) 2018-08-21 2021-03-23 International Business Machines Corporation White box code concurrency testing for transaction processing

Also Published As

Publication number Publication date
WO2002001390A2 (en) 2002-01-03
EP1297428A2 (en) 2003-04-02
DE60113538T2 (en) 2006-06-22
US7003777B2 (en) 2006-02-21
EP1323042A2 (en) 2003-07-02
DE60113538D1 (en) 2005-10-27
WO2002001349A9 (en) 2002-03-21
WO2002001359A2 (en) 2002-01-03
WO2002001362A2 (en) 2002-01-03
US20030005407A1 (en) 2003-01-02
WO2002001349A2 (en) 2002-01-03
AU2001270079A1 (en) 2002-01-08
AU2001272985A1 (en) 2002-01-08
US20030028858A1 (en) 2003-02-06
AU2001270094A1 (en) 2002-01-08
WO2002001359A3 (en) 2002-07-18
WO2002001390A3 (en) 2002-04-25
ATE305153T1 (en) 2005-10-15
US20020062463A1 (en) 2002-05-23
WO2002001362A3 (en) 2003-04-03
US20020059558A1 (en) 2002-05-16
EP1297424B1 (en) 2005-09-21
AU2001271354A1 (en) 2002-01-08
WO2002001349A3 (en) 2002-05-10
EP1297424A2 (en) 2003-04-02
US20020087953A1 (en) 2002-07-04

Similar Documents

Publication Publication Date Title
US20020174415A1 (en) System and method for debugging distributed software environments
US20030121027A1 (en) Behavioral abstractions for debugging coordination-centric software designs
Arlat et al. Dependability of COTS microkernel-based systems
Eshuis et al. Comparing Petri net and activity diagram variants for workflow modelling–a quest for reactive Petri nets
US7640538B2 (en) Virtual threads in business process programs
JP2005527008A (en) Runtime monitoring of component-based systems
Karsai et al. A modeling language and its supporting tools for avionics systems
Goswami et al. Dynamic slicing of concurrent programs
Hao et al. VIZIR: an integrated environment for distributed program visualization
Yong Replay and distributed breakpoints in an OSF DCE environment
Karsai et al. A model-based front-end to TAO/ACE: the embedded system modeling language
Falcone et al. Monitoring distributed component-based systems
Subramonian Timed automata models for principled composition of middleware
Pedersen Multilevel debugging of parallel message passing programs
Hines Coordination-centric debugging for heterogeneous distributed embedded systems
Wang et al. CAT: Context Aware Tracing for Rust Asynchronous Programs
Duchien et al. Reflection and debug for CORBA applications
Liang Abstracting Distributed, Time-Sensitive Applications
Navarro et al. Detecting and coordinating complex patterns of distributed events with KETAL
de Kergommeaux et al. Execution replay of parallel procedural programs
Hammer Component-based architecting for distributed real-time systems: How to achieve composability?
Schmidt et al. The design and use of the ace reactor
Vishnuvajjala Software reuse in time-critical systems
Iglinski An Execution Reply Facility and Event-based Debugger for the Enterprise Parallel Programming System
El-Kadi Tap processes

Legal Events

Date Code Title Description
AS Assignment

Owner name: CONSYSTANT DESIGN TECHNOLOGIES, INC., WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HINES, KENNETH J.;REEL/FRAME:012512/0491

Effective date: 20010913

AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CONSYSTANT DESIGN TECHNOLOGIES, INC.;REEL/FRAME:014380/0907

Effective date: 20031120

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION