WO1999066404A1 - Multi-processor system bridge with controlled access - Google Patents

Multi-processor system bridge with controlled access Download PDF

Info

Publication number
WO1999066404A1
WO1999066404A1 PCT/US1999/012431 US9912431W WO9966404A1 WO 1999066404 A1 WO1999066404 A1 WO 1999066404A1 US 9912431 W US9912431 W US 9912431W WO 9966404 A1 WO9966404 A1 WO 9966404A1
Authority
WO
WIPO (PCT)
Prior art keywords
bridge
bus
processing
enor
sets
Prior art date
Application number
PCT/US1999/012431
Other languages
French (fr)
Inventor
Stephen Rowlinson
Femi A. Oyelakin
Paul J. Garnett
Original Assignee
Sun Microsystems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems, Inc. filed Critical Sun Microsystems, Inc.
Priority to DE69901255T priority Critical patent/DE69901255T2/en
Priority to EP99926161A priority patent/EP1090350B1/en
Priority to AT99926161T priority patent/ATE216098T1/en
Priority to JP2000555161A priority patent/JP2002518736A/en
Publication of WO1999066404A1 publication Critical patent/WO1999066404A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/004Error avoidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2268Logging of test results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1608Error detection by comparing the output signals of redundant hardware
    • G06F11/1625Error detection by comparing the output signals of redundant hardware in communications, e.g. transmission, interfaces

Definitions

  • TITLE MULTI-PROCESSOR SYSTEM BRIDGE WITH CONTROLLED ACCESS
  • This mvention relates to a multi-processor system in which first and second processmg sets (each of which may compnse one or more processors) communicate with an I/O device bus
  • the application finds particular application to fault tolerant computer systems where two or more processor sets need to communicate with an I/O device bus m lockstep with provision for identifying lockstep errors m order to detect faulty operation of the system as a whole
  • an ami is not only to be able to identify faults, but also to provide a structure which is able to provide a high degree of system availability
  • an aim of the present mvention is to address these technical problems
  • a bndge for a multi-processor system The bridge compnses a first processor bus mterface for connection to an I/O bus of a first processmg set, a second processor bus mterface for connection to an I/O bus of a second processmg set and a device bus mterface for connection to a device bus It also comprises a bridge control mechamsm configured to be operable, in an operational mode to permit access by at least one of the first and second processmg sets to bridge resources and to the device bus and, m an error mode, to prevent access by the processing sets to the device bus and to permit restncted access by at least one of the processing sets to at least predetermined bridge resources
  • the bridge can act as a secure repository for information which can be used by the processmg sets to investigate and diagnose the error and hopefully to recover therefrom
  • the processmg sets can be prevented from corrupting devices connected to the device bus
  • bus interfaces referenced above need not be separate components of the bndge, but may be incorporated in other components of the bridge, and may indeed be simply connections for the lines of the buses concerned
  • the bndge control mechanism can be operable, m response to detection of an error state, to cause the bridge to cease operation in the operational mode and instead to operate m the error mode
  • error state registers can be provided for saving operating parameters on entry to the enor mode, read only access to the error state registers being permitted by at least one processmg set during the enor mode
  • a posted wnte buffer can be provided for the storage of wntes already posted by at least one processing set on entry to the enor mode, read only access to the posted wnte buffer being permitted by at least one processing set during the enor mode
  • the bridge control mechanism can be operable in an initial enor mode to store m the posted write buffer any internal bndge write accesses initiated by the processmg sets and to allow and to arbitrate any internal bridge read accesses mitiated by the processing sets It can also be operable I the initial mode to store m a posted write buffer any device bus wnte accesses initiated by the processing sets and to abort any device bus read accesses mitiated by the processing sets
  • the b ⁇ dge control mechanism can be operable to allow and to arbitrate any internal bridge write accesses mitiated by the primary processmg set, to discard any internal b ⁇ dge write accesses mitiated by any other processing set, and to allow and to arbitrate any internal bndge read accesses initiated by the processmg sets It can also be operable m this mode to discard any device bus write accesses initiated by the processing sets and to abort any device bus read accesses mitiated by the processmg sets
  • the primary processmg set is a processmg set which determines that it is operational, and not faulty, as a result of a fault analysis process This allows any write accesses for the b ⁇ dge or for the device bus which have already been posted by the processmg sets to be saved durmg the mitial enor phase Later wnte accesses to the device bus can be discarded as being enone
  • the bndge control mechanism can be further operable, in a split operational mode, to arbitrate between the first and the second processing sets for access to each others I/O bus and to the device bus and, in a combined operational mode, to monitor lockstep operation of the first and second processing sets
  • the bridge control mechanism can be operable on power up of the bridge to in an initial enor mode until a processor set asserts itself as a primary processmg set, then m the split operational mode to enable all processing sets to be set to a conesponding state before transfe ⁇ ing to the combined operational mode
  • the b ⁇ dge can mclude a storage sub-system and a controllable routmg matrix connected between the first processor bus mterface, the second processor bus interface, the device bus mterface and the storage sub- system, the bridge control mechanism being operable to control the routing mat ⁇ x selectively to mtercormect the first processor bus mterface, the second processor bus interface, the device bus interface and the memory sub-system accordmg to a cunent mode of operation
  • the bridge can mclude at least one further processor bus interface for connection to an I/O bus of a further processmg set
  • a computer system comp ⁇ smg a first processmg set havmg an I/O bus, a second processing set having an I/O bus, a device bus and a bridge, the bndge compnsmg a first processor bus interface connected to the I/O bus of the first processmg set, a second processor bus mterface connected to the I/O bus of the second processmg set, a device bus interface connected to the device bus and a bndge control mechanism as described above.
  • a method of operating a multiprocessor system compnsmg a first processmg set havmg an I/O bus, a second processmg set havmg an I/O bus, a device bus and a b ⁇ dge, the bndge compnsmg a first processor bus interface connected to the I/O bus of the first processmg set, a second processor bus mterface connected to the I/O bus of the second processmg set and a device bus interface connected to the device bus, the method comp ⁇ smg selectively operating the bndge- in an operational mode to permit access by at least one of the first and second processmg sets to bndge resources and to the device bus; and m an enor mode to prevent access by the processmg sets to the device bus and to permit restncted access by at least one of the processmg sets to at least predetermmed bndge resources.
  • Figure 2 is a schematic overview of a specific implementation of a system based on that of Figure 1;
  • Figure 3 is a schematic representation of one implementation of a processmg set
  • Figure 4 is a schematic representation of another example of a processmg set
  • Figure 5 is a schematic representation of a further processmg set
  • Figure 6 is a schematic block diagram of an embodiment of a bndge for the system of Figure 1,
  • Figure 7 is a schematic block diagram of storage for the bndge of Figure 6;
  • Figure 8 is a schematic block diagram of control logic of the bndge of Figure 6;
  • Figure 9 is a schematic representation of a routmg mat ⁇ x of the bndge of Figure 6;
  • Figure 10 is an example implementation of the bndge of Figure 6;
  • Figure 11 is a state diagram illustrating operational states of the b ⁇ dge of Figure 6;
  • Figure 12 is a flow diagram illustrating stages m the operation of the bndge of Figure 6,
  • Figure 13 is a detail of a stage of operation from Figure 12,
  • Figure 14 illustrates the posting of I/O cycles m the system of Figure 1
  • Figure 15 illustrates the data stored m a posted wnte buffer
  • Figure 16 is a schematic representation of a slot response register
  • Figure 17 illustrates a dissimilar data wnte stage
  • Figure 19 illustrates a dissimilar data read stage
  • Figure 20 illustrates an alternative dissimilar data read stage
  • Figure 21 is a flow diagram summansmg the operation of a dissimilar data wnte mechamsm
  • Figure 22 is a schematic block diagram explammg arbitration within the system of Figure 1 .
  • Figure 23 is a state diagram illustrating the operation of a device bus arbiter
  • Figure 24 is a state diagram illustrating the operation of a bndge arbiter
  • Figure 25 is a timing diagram for PCI signals
  • Figure 26 is a schematic diagram illustrating the operation of the bndge of Figure 6 for direct memory access
  • Figure 27 is a flow diagram illustrating a direct memory access method m the bndge of Figure 6, and Figure 28 is a flow diagram of a re-integration process mcludmg the monitoring of a duly RAM
  • FIG 1 is a schematic overview of a fault tolerant computmg system 10 compnsmg a plurality of CPUsets (processmg sets) 14 and 16 and a bndge 12 As shown m Figure 1, there are two processing sets 14 and 16, although m other embodiments there may be three or more processmg sets The bndge 12 forms an interface between the processmg sets and I/O devices such as devices 28, 29, 30, 31 and 32
  • the term "processmg set” is used to denote a group of one or more processors, possibly mcludmg memory, which output and receive common outputs and inputs
  • CPUset could be used mstead, and that these terms could be used mterchangeably throughout this document
  • the term “bndge” is used to denote any device, apparatus or anangement suitable for interconnecting two or more buses of the same or different types
  • the first processing set 14 is connected to the b ⁇ dge 12 via a first processmg set I/O bus (PA bus) 24, in the present mstance a Peripheral Component Interconnect (PCI) bus
  • the second processing set 16 is connected to the bndge 12 via a second processing set I/O bus (PB bus) 26 of the same type as the PA bus 24 (I e here a PCI bus)
  • the I/O devices are connected to the bridge 12 via a device I/O bus (D bus) 22, m the present instance also a PCI bus
  • buses 22, 24 and 26 are all PCI buses, this is merely by way of example, and m other embodiments other bus protocols may be used and the D-bus 22 may have a different protocol from that of the PA bus and the PB bus (P buses) 24 and 26
  • the processing sets 14 and 16 and the bridge 12 are operable m synchronism under the control of a common clock 20, which is connected thereto by clock signal lmes 21
  • Some of the devices including an Ethernet (E-NET) interface 28 and a Small Computer System Interface (SCSI) mterface 29 are permanently connected to the device bus 22, but other I/O devices such as I/O devices 30, 31 and 32 can be hot msertable into mdividual switched slots 33, 34 and 35 Dynamic field effect transistor (FET) switching can be provided for the slots 33, 34 and 35 to enable hot msertability of the devices such as devices 30, 31 and 32
  • FET Dynamic field effect transistor
  • FIG 2 is a schematic overview of a particular implementation of a fault tolerant computer employing a bridge structure of the type illustrated m Figure 1
  • the fault tolerant computer system includes a plurality (here four) of bndges 12 on first and second I/O motherboards (MB 40 and MB 42) order to mcrease the number of I/O devices which may be connected and also to improve reliability and redundancy
  • two processmg sets 14 and 16 are each provided on a respective processmg set board 44 and 46, with the processmg set boards 44 and 46 'bndging' the I/O motherboards MB 40 and MB 42
  • a first, master clock source 20A is mounted on the first motherboard 40 and a second, slave clock source 20B is mounted on the second motherboard 42
  • Clock signals are supplied to the processing set boards 44 and 46 via respective connections (not shown m Figure 2)
  • First and second bndges 12 1 and 12 2 are mounted on the first I/O motherboard 40
  • the second bridge 12 2 is connected to the processmg sets 14 and 16 by P buses 24 1 and 26 1, respectively
  • the bridge 12 1 is connected to an I/O databus (D bus) 22 1 and the b ⁇ dge 12 2 is connected to an I/O databus (D bus)
  • Third and fourth bndges 12 3 and 12 4 are mounted on the second I/O motherboard 42
  • the b ⁇ dge 12 3 is connected to the processmg sets 14 and 16 by P buses 24 3 and 26 3, respectively
  • the bridge 4 is connected to the processing sets 14 and 16 by P buses 24 4 and 26 4, respectively
  • the bndge 12 3 is connected to an I/O databus (D bus) 22 3 and the bndge 12 4 is connected to an I/O databus (D bus) 224
  • anangement shown m Figure 2 can enable a large number of I/O devices to be connected to the two processing sets 14 and 16 via the D buses 22 1, 22 2, 22 3 and 22 4 for either increasing the range of I/O devices available, or providmg a higher degree of redundancy, or both
  • FIG 3 is a schematic overview of one possible configuration of a processmg set, such as the piocessmg set 14 of Figure 1
  • the processmg set 16 could have the same configuration
  • a plurality of processors (here four) 52 are connected by one or more buses 54 to a processmg set bus controller 50
  • one or more processing set output buses 24 are connected to the processmg set bus controller 50, each processmg set output bus 24 being connected to a respective bridge 12
  • P bus processing set I/O bus
  • FIG. 3 individual processors operate using the common memory 56, and receive inputs and provide outputs on the common P bus(es) 24
  • FIG 4 is an alternative configuration of a processmg set, such as the processmg set 14 of Figure 1
  • a plurality of processor/memory groups 61 are connected to a common internal bus 64
  • Each processor/memory group 61 includes one or more processors 62 and associated memory 66 connected to a internal group bus 63
  • An interface 65 connects the internal group bus 63 to the common internal bus 64
  • individual processing groups, with each of the processors 62 and associated memory 66 are connected via a common internal bus 64 to a processmg set bus controller 60
  • the mterfaces 65 enable a processor 62 of one processmg group to operate not only on the data in its local memory 66, but also m the memory of another processing group 61 withm the processing set 14
  • the processing set bus controller 60 provides a common interface between the common internal bus 64 and the processmg set I/O bus(es) (P bus(es)) 24 connected to the b ⁇ dge(s) 12 It should be noted that although only
  • FIG. 5 illustrates an alternative configuration of a processmg set, such as the processing set 14 of Figure 1
  • a simple processing set m cludes a smgle processor 72 and associated memory 76 connected via a common bus 74 to a processmg set bus controller 70
  • the processmg set bus controller 70 provides an interface between the internal bus 74 and the processing set I/O bus(es) (P bus(es)) 24 for connection to the b ⁇ dge(s) 12
  • the b ⁇ dge(s) 12 are operable in a number of operatmg modes These modes of operation will be described in more detail later However, to assist m a general understanding of the structure of the bridge, the two operating modes will be bnefly summanzed here
  • a bridge 12 is operable to route addresses and data between the processing sets 14 and 16 (via the PA and PB buses 24 and 26, respectively) and the devices (via the D bus 22)
  • I/O cycles generated by the processing sets 14 and 16 are compared to ensure that both processing sets are operatmg conectly Comparison failures force the b ⁇ dge 12 mto an enor limiting mode (EState) in which device I/O is prevented and diagnostic information is collected
  • the b ⁇ dge 12 routes and arbitrates addresses and data from one of the processing sets 14 and 16 onto the D bus 22 and/or onto the other one of the processmg sets 16 and 14, respectively In this mode of operation, the processmg sets
  • FIG. 6 is a schematic functional overview of the bridge 12 of Figure 1
  • First and second processmg set I/O bus interfaces, PA bus mterface 84 and PB bus mterface 86, are connected to the PA and PB buses 24 and 26, respectively
  • a device I/O bus mterface, D bus interface 82, is connected to the D bus 22
  • the PA, PB and D bus mterfaces need not be configured as separate elements but could be incorporated m other elements of the bridge Accordmgly, within the context of this document, where a references is made to a bus mterface, this does not require the presence of a specific separate component, but rather the capability of the bridge to connect to the bus concerned, for example by means of physical or logical bridge connections for the lines of the buses concerned
  • Routmg (hereinafter termed a routmg matrix) 80 is connected via a first internal path 94 to the PA bus interface 84 and via a second internal path 96 to the PB bus interface 86 The routing matnx 80 is further connected via a third internal path 92 to the D bus interface 82 The routmg matrix 80 is thereby able to provide I/O bus transaction routmg in both directions between the PA and PB bus interfaces 84 and 86 It is also able to provide routing in both directions between one or both of the PA and PB bus mterfaces and the D bus mterface 82.
  • the routing matnx 80 is connected via a further internal path 100 to storage control logic 90
  • the storage control logic 90 controls access to bridge registers 110 and to a random access memory (SRAM) 126
  • SRAM random access memory
  • the routmg matrix 80 is therefore also operable to provide routing in both directions between the PA, PB and D bus interfaces 84, 86 and 82 and the storage control logic 90.
  • the routmg matrix 80 is controlled by bridge control logic 88 over control paths 98 and 99.
  • the b ⁇ dge control logic 88 is responsive to control signals, data and addresses on internal paths 93, 95 and 97, and also to clock signals on the clock l ⁇ ne(s) 21.
  • each of the P buses (PA bus 24 and PB bus 26) operates under a PCI protocol
  • the processmg set bus controllers 50 also operate under the PCI protocol
  • the PA and PB bus mterfaces 84 and 86 each provide all the functionality required for a compatible interface providmg both master and slave operation for data transfened to and from the D bus 22 or internal memones and registers of the bridge m the storage subsystem 90
  • the bus mterfaces 84 and 86 can provide diagnostic information to internal bndge status registers m the storage subsystem 90 on transition of the b ⁇ dge to an enor state (EState) or on detection of an I/O enor
  • the device bus mterface 82 performs all the functionality required for a PCI compliant master and slave interface for transferring data to and from one of the PA and PB buses 84 and 86
  • the D bus 82 is operable dunng direct memory access (DMA) transfers to provide diagnostic information to internal status registers in the storage subsystem 90 of the b ⁇ dge on transition to an EState or on detection of an I/O enor
  • Figure 7 illustrates in more detail the bridge registers 110 and the SRAM 124.
  • the storage control logic 110 is connected via a path (e g a bus) 112 to a number of register components 114, 116, 118, 120.
  • the storage control logic is also connected via a path (e g.
  • a bus 128 to the SRAM 126 in which a posted write buffer component 122 and a duly RAM component 124 are mapped.
  • these components may be configured in other ways, with other components defined as regions of a common memory (e.g a random access memory such as the SRAM 126, with the path 112/128 being formed by the internal addressing of the regions of memory)
  • the posted write buffer 122 and the duty RAM 124 are mapped to different regions of the SRAM memory 126, whereas the registers 114, 116, 118 and 120 are configured as separate from the SRAM memory.
  • Control and status registers (CSRs) 114 form internal registers which allow the control of va ⁇ ous operating modes of the bndge, allow the capture of diagnostic information for an EState and for I/O enors, and control processing set access to PCI slots and devices connected to the D bus 22. These registers are set by signals from the routmg mat ⁇ x 80.
  • Dissimilar data registers (DDRs) 116 provide locations for contammg dissimilar data for different processing sets to enable non-deterministic data events to be handled. These registers are set by signals from the PA and PB buses.
  • B ⁇ dge decode logic enables a common w ⁇ te to disable a data comparator and allow writes to two DDRs 116, one for each processmg set 14 and 16. A selected one of the DDRs can then be read m-sync by the processmg sets 14 and 16
  • the DDRs thus provide a mechanism enablmg a location to be reflected from one processmg set (14/16) to another (16/14)
  • SRRs Slot response registers
  • Disconnect registers 120 are used for the storage of data phases of an I/O cycle which is aborted while data is in the bridge on the way to another bus
  • the disconnect registers 120 receive all data queued m the bndge when a target device disconnects a transaction, or as the EState is detected
  • These registers are connected to the routmg matnx 80
  • the routmg matnx can queue up to three data words and byte enables Provided the lniual addresses are voted as bemg equal, address target controllers denve addresses which mcrement as data is exchanged between the b ⁇ dge and the destmation (or target)
  • a wnter for example a processor I/O wnte, or a DVMA (D bus to P bus access)
  • EState and enor CSRs 114 provided for the capture of a failmg cycle on the P buses 24 and 26, with an mdication of the failmg datum Following a move to an EState, all of the wntes mitiated to the P buses are logged in the posted wnte buffer 122 These may be other writes that have been posted m the processmg set bus controllers 50, or which may be initiated by software before an EState interrupt causes the processors to stop carrying out writes to the P buses 24 and 26
  • a dirty RAM 124 is used to mdicate which pages of the mam memory 56 of the processmg sets 14 and
  • DMA direct memory access
  • Each page (e g each 8K page) is marked by a single bit m the dirty RAM 124 which is set when a DMA w ⁇ te occurs and can be cleared by a read and clear cycle initiated on the dirty RAM 124 by a processor 52 of a processmg set 14 and 16
  • FIG. 8 is a schematic functional overview of the bridge control logic 88 shown in Figure 6 All of the devices connected to the D bus 22 are addressed geographically Accordingly, the bridge canies out decoding necessary to enable the isolating FETs for each slot before an access to those slots is initiated
  • the address decodmg performed by the address decode logic 136 and 138 essentially permits four basic access types - an out-of-sync access (l e not m the combined mode) by one processmg set (e g processmg set 14 of
  • slot 0 on motherboard A has the same address when refened to by processmg set 14 or by processmg set 16
  • a smgle device select signal can be provided for the switched PCI slots as the FET signals can be used to enable a conect card.
  • Separate FET switch lines are provided to each slot for separately switchmg the FETs for the slots.
  • the SRRs 118 which could be incorporated m the CSR registers 114, are associated with the address decode functions.
  • the SRRs 118 serve m a number of different roles which will be described m more detail later. However, some of the roles are summarized here.
  • each slot may be disabled so that wntes are simply acknowledged without any transaction occumng on the device bus 22, whereby the data is lost Reads will return meanmgless data, once again without causing a transaction on the device board.
  • each slot can be in one of three states
  • the states are:
  • a slot that is not owned by a processing set 14 or 16 making an access (this includes not owned or unowned slots) cannot be accessed. Accordmgly, such an access is aborted.
  • a processing set 14 or 16 When a processing set 14 or 16 is powered off, all slots owned by it move to the un-owned state A processing set 14 or 16 can only claim an un-owned slot, it cannot wrest ownership away from another processing set. This can only be done by powermg off the other processing set, or by getting the other processing set to relinquish ownership.
  • the ownership bits are assessable and settable while the combmed mode, but have no effect until a split state is entered. This allows the configuration of a split system to be determined while still in the combined mode
  • Each PCI device is allocated an area of the processmg set address map The top bits of the address are determined by the PCI slot.
  • the bridge is able to check that the device is using the conect address because a D bus arbiter informs the bridge which device is using the bus at a particular time If a device access is a processmg set address which is not valid for it, then the device access will be ignored It should be noted that an address presented by a device will be a virtual address which would be translated by an I/O memory management unit m the processing set bus controller 50 to an actual memory address.
  • the addresses output by the address decoders are passed via the initiator and target controllers 138 and 140 to the routing matnx 80 via the lines 98 under control of a bridge controller 132 and an arbiter 134
  • An arbiter 134 is operable m various different modes to arbitrate for use of the bridge on a first-come- first-served basis usmg conventional PCI bus signals on the P and D buses.
  • the arbiter 134 is operable to arbitrate between the ui-sync processmg sets 14 and 16 and any initiators on the device bus 22 for use of the bndge 12 Possible plausiblenos are-
  • both processmg sets 14 and 16 must arbitrate the use of the bridge and thus access to the device bus 22 and internal bridge registers (e.g. CSR registers 114).
  • the bndge 12 must also contend with initiators on the device bus 22 for use of that device bus 22.
  • Each slot on the device bus has an arbitration enable bit associated with it These arbitration enable bits are cleared after reset and must be set to allow a slot to request a bus When a device on the device bus 22 is suspected of providmg an I/O enor, the arbitration enable bit for that device is automatically reset by the bndge
  • a PCI bus interface m the processmg set bus controller(s) 50 expects to be the master bus controller for the P bus concerned, that is it contains the PCI bus arbiter for the PA or PB bus to which it is connected
  • the bridge 12 cannot directly control access to the PA and PB buses 24 and 26.
  • the bridge 12 competes for access to the PA or PB bus with the processmg set on the bus concerned under the control of the bus controller 50 on the bus concerned.
  • FIG. 8 Also shown m Figure 8 is a comparator 130 and a bridge controller 132.
  • the comparator 130 is operable to compare I/O cycles from the processing sets 14 and 16 to determine any out-of-sync events On determining an out-of-sync event, the comparator 130 is operable to cause the bridge controller 132 to activate an EState for analysis of the out-of-sync event and possible recovery therefrom.
  • Figure 9 is a schematic functional overview of the routmg mat ⁇ x 80.
  • the routmg mat ⁇ x 80 compnses a multiplexer 143 which is responsive to mitiator control signals 98 from the initiator controller 138 of Figure 8 to select one of the PA bus path 94 , PB bus path 96, D bus path 92 or internal bus path 100 as the cunent mput to the routing matrix
  • Separate output buffers 144, 145, 146 and 147 are provided for output to each of the paths 94, 96, 92 and 100, with those buffers bemg selectively enabled by signals 99 from the target controller 140 of Figure 8
  • a buffer 149 In the present embodiment three cycles of data for an I/O cycle will be held in the pipeline represented by the multiplexer 143, the buffer 149 and the buffers 144.
  • FIG. 6 is a schematic representation of a physical configuration of the bndge m which the bridge control logic 88, the storage control logic 90 and the bridge registers 110 are implemented in a first field programmable gate anay (FPGA) 89, the routmg matnx 80 is implemented in further FPGAs 80 1 and 80.2 and the SRAM 126 is implemented as one or more separate SRAMs addressed by a address control lmes 127
  • the bus interfaces 82, 84 and 86 shown m Figure 6 are not separate elements, but are mtegrated m the FPGAs 80 1, 80 2 and 89
  • Two FPGAs 80 1 and 80 2 are used for the upper 32 bits 32-63 of a 64 bit PCI bus and the lower 32 bits 0-31 of the 64 bit PCI bus
  • FIG 11 is a transition diagram illustrating in more detail the various operating modes of the bridge
  • the bridge operation can be divided into three basic modes, namely an enor state (EState) mode 150, a split state mode 156 and a combmed state mode 158
  • EState mode 150 can be further divided into 2 states
  • the bndge is in this initial EState 152
  • all wntes are stored m the posted wnte buffer 120 and reads from the internal bridge registers (e g , the CSR registers 116) are allowed, and all other reads are treated as enors (I e they are aborted)
  • the mdividual processing sets 14 and 16 perform evaluations for determining a restart tune
  • Each processmg set 14 and 16 will determine its own restart timer timing
  • the timer setting depends on a "blame" factor for the transition to the EState A processing set which determines that it is likely to have caused the enor sets a long tune for the timer A processmg set which thinks it unlikely to have caused the enor sets a short time for the timer
  • the first processmg set 14 and 16 which tunes out becomes a primary processing set Accordingly, when this is determmed, the bnd
  • the bridge then moves (155) to the split state 156 In the split state 156, access to the device bus 22 is controlled by the SRR registers 118 while access to the bndge storage is simply arbitrated The pnmary status of the processing sets 14 and 16 is ignored Transition to a combined operation is achieved by means of a sync_reset (157) After issue of the sync_reset operation, the bridge is then operable in the combined state 158, whereby all read and write accesses on the D bus 22 and the PA and PB buses 24 and 26 are allowed All such accesses on the PA and PB buses 24 and 26 are compared in the comparator 130 Detection of a mismatch between any read and wnte cycles (with an exception of specific dissumlar data I/O cycles) cause a transition 151 to the EState 150
  • the various states described are controlled by the bridge controller 132
  • the role of the comparator 130 is to monitor and compare I/O operations on the PA and PB buses in the combined state 151 and, in response to a mismatched signal, to notify the bndge controller 132, whereby the b ⁇ dge controller 132 causes the transition 152 to the enor state 150
  • the I/O operations can mclude all I/O operations initiated by the processing sets, as well as DMA transfers in respect of DMA mitiated by a device on the device bus.
  • the system is m the mitial EState 152.
  • neither processing sets 14 or 16 can access the D bus 22 or the P bus 26 or 24 of the other processing set 16 or 14
  • the internal bridge registers 116 of the bridge are accessible, but are read only
  • a system running m the combined mode 158 transitions to the EState 150 where there is a companson failure detected m this bridge, or alternatively a comparison failure is detected m another b ⁇ dge m a multi- bndge system as shown, for example, in Figure 2
  • transitions to an EState 150 can occur in other situations, for example m the case of a software controlled event forming part of a self test operation.
  • an interrupt is signaled to all or a subset of the processors of the processing sets via an interrupt lme 95
  • all I/O cycles generated on a P bus 24 or 26 result m reads being returned with an exception and writes being recorded in the posted write buffer
  • the operation of the comparator 130 will now be described m more detail
  • the comparator is connected to paths 94, 95, 96 and 97 for comparing address, data and selected control signals from the PA and PB bus interfaces 84 and 86.
  • a failed comparison of m-sync accesses to device I/O bus 22 devices causes a move from the combmed state 158 to the EState 150
  • the address, command, address parity, byte enables and parity enor parameters are compared. If the companson fails during the address phase, the bridge asserts a retry to the processing set bus controllers 50, which prevents data leavmg the I/O bus controllers 50 No activity occurs m this case on the device I/O bus 22. On the processor(s) retrying, no enor is returned.
  • the b ⁇ dge signals a target-abort to the processing set bus controllers 50.
  • An enor is returned to the processors
  • the address, command, parity, byte enables and data parameters are compared
  • the bridge asserts a retry to the processing set bus controllers 50, which results in the processing set bus controllers 50 retrying the cycle agam
  • the posted write buffer 122 is then active No activity occurs on the device I/O bus 22
  • Enors fall roughly mto two types, those which are made visible to the software by the processmg set bus controller 50 and those which are not made visible by the processmg set bus controller 50 and hence need to be made visible by an mterrupt from the bridge 12 Accordmgly, the bridge is operable to capture enors reported in connection with processmg set read and write cycles, and DMA reads and wntes
  • FIG. 12 is a flow diagram illustrating a possible sequence of operatmg stages where lockstep enors are detected dunng a combmed mode of operation
  • Stage SI represents the combmed mode of operation where lockstep enor checking is performed by the comparator 130 shown in Figure 8
  • Stage S2 a lockstep enor is assumed to have been detected by the comparator 130
  • Stage S3 the cunent state is saved in the CSR registers 114 and posted writes are saved in the posted write buffer 122 and/or m the disconnect registers 120
  • FIG 13 illustrates Stage S3 in more detail Accordmgly, m Stage S31
  • the bridge controller 132 detects whether the lockstep enor notified by the comparator 130 has occuned durmg a data phase in which it is possible to pass data to the device bus 22
  • the bus cycle is terminated
  • the data phases are stored m the disconnect registers 120 and control then passes to Stage S35 where an evaluation is made as to whether a further I/O cycle needs to be stored
  • the address and data phases for any posted write I/O cycles are stored in the posted write buffer 122
  • Stage S34 if there are any further posted write I/O operations pendmg, these are also stored in the posted wnte buffer 122
  • Stage S3 is performed at the initiation of the initial enor state 152 shown in Figure 11 In this state, the first and second processing sets arbitrate for
  • I/O write cycles may already have been posted by the processors 52, either m their own buffers 162, or already transfened to the buffers 160 of the processing set bus controllers 50 It is the I/O write cycles m the buffers 162 and 160 which gradually propagate through and need to be stored in the posted wnte buffer 122
  • a write cycle 164 posted to the posted w ⁇ te buffer 122 can compnse an address field 165 mcludmg an address and an address type, and between one and 16 data fields 166 including a byte enable field and the data itself
  • the data is wntten mto the posted write buffer 122 m the EState unless the initiating processmg set has been designated as a pnmary CPU set At that time, non-primary writes m an EState still go to the posted w ⁇ te buffer even after one of the CPU sets has become a primary processing set
  • An address pointer m the CSR registers 114 points to the next available posted wnte buffer address, and also provides an overflow bit which is set when the b ⁇ dge attempts to w ⁇ te past of the top of the posted write buffer for any one of the processmg sets 14 and 16 Indeed, m the present implementation, only the first 16 K of data is recorded in each buffer Attempts to wnte beyond the top of the posted write buffer are ignored
  • the value of the posted write buffer pointer can be cleared at reset, or by software using a w ⁇ te under the control of a primary processing set
  • the mdividual processing sets independently seek to evaluate the enor state and to determine whether one of the processmg sets is faulty This determination is made by the mdividual processors in an enor state m which they individually read status from the control state and EState registers 114 Dunng this enor mode, the arbiter 134 arbitrates for access to the bridge 12
  • Stage S5 one of the processmg sets 14 and 16 establishes itself as the primary processmg set This is determined by each of the processmg sets identifying a time factor based on the estimated degree of responsibility for the enor, whereby the first processmg set to time out becomes the primary processing set In Stage S5, the status is recovered for that processing set and is copied to the other processmg set The primary processing is able to access the posted write buffer 122 and the disconnect registers 120
  • Stage S6 the bridge is operable m a split mode If it is possible to re-establish an equivalent status for the first and second processmg sets, then a reset is issued at Stage S7 to put the processing sets in the combined mode at Stage SI However, it may not be possible to re-establish an equivalent state until a faulty processing set is replaced Accordmgly the system will stay m the Split mode of Stage S6 m order to continued operation based on a single processmg set After replacing the faulty processing set the system could then establish an equivalent state and move via Stage S7 to Stage SI
  • the comparator 130 is operable in the combmed mode to compare the I/O operations output by the first and second processing sets 14 and 16 This is fine as long as all of the I/O operations of the first and second processmg sets 14 and 16 are fully synchronized and determmistic Any deviation from this will be interpreted by the comparator 130 as a loss of lockstep.
  • a solution to this problem employs the dissimilar data registers (DDR) 116 mentioned earlier
  • the solution is to wnte data from the processing sets mto respective DDRs m the bridge while disabling the compa ⁇ son of the data phases of the wnte operations and then to read a selected one of the DDRs back to each processing set, whereby each of the processmg sets is able to act on the same data.
  • Figure 17 is a schematic representation of details of the bndge of Figures 6 to 10 It will be noted that details of the bndge not shown m Figure 6 to 8 are shown m Figure 17, whereas other details of the bridge shown in Figures 6 to 8 are not shown m Figure 17, for reasons of cla ⁇ ty.
  • the DDRs 116 are provided m the bndge registers 110 of Figure 7, but could be provided elsewhere m the b ⁇ dge in other embodiments.
  • One DDR 116 is provided for each processmg set.
  • two DDRs 116A and 116B are provided, one for each of the first and second processing sets 14 and 16, respectively
  • Figure 17 represents a dissimilar data wnte stage.
  • the addressing logic 136 is shown schematically to comprise two decoder sections, one decoder section 136A for the first processing set and one decoder section 136B for the second processmg set 16 Dunng an address phase of a dissimilar data I/O wnte operation each of the processing sets 14 and 16 outputs the same predetermined address DDR-W which is separately interpreted by the respective first and second decoding sections 136A and 136B as addressing the respective first and second respective DDRs 116A and 116B As the same address is output by the first and second processing sets 14 and 16, this is not interpreted by the comparator 130 as a lockstep enor.
  • the decoding section 136A, or the decodmg section 136B, or both are ananged to further output a disable signal 137 in response to the predetermmed write address supplied by the first and second processing sets 14 and 16
  • This disable signal is supplied to the comparator 130 and is operative during the data phase of the write operation to disable the comparator.
  • the data output by the first processing set can be stored in the first DDR 116A and the data output by the second processmg set can be stored m the second DDR 116B without the comparator being operative to detect a difference, even if the data from the first and second piocessing sets is different
  • the first decoding section is operable to cause the routing matrix to store the data from the first processmg set 14 m the first DDR 116A
  • the second decodmg section is operable to cause the routing matrix to store the data from the second processing set 16 in the second DDR 116B
  • the comparator 130 is once agam enabled to detect any differences between I/O address and/or data phases as indicative of a lockstep enor Following the writing of the dissimilar data to the first and second DDRs 116A and 116B, the processing sets are then operable to read the data from a selected one of the DDRs 116A/116B.
  • Figure 18 illustrates an alternative anangement where the disable signal 137 is negated and is used to control a gate 131 at the output of the comparator 130 When the disable signal is active the output of the comparator is disabled, whereas when the disable signal is inactive the output of the comparator is enabled.
  • FIG 19 illustrates the readmg of the first DDR 116A in a subsequent dissimilar data read stage.
  • each of the processing sets 14 and 16 outputs the same predetermmed address DDR-RA which is separately interpreted by the respective first and second decodmg sections 136A and 136B as addressing the same DDR, namely the first DDR 116A.
  • the content of the first DDR 116A is read by both of the processmg sets 14 and 16, thereby enabling those processmg sets to receive the same data
  • This enables the two processmg sets 14 and 16 to achieve determmistic behavior, even if the source of the data written into the DDRs 116 by the processing sets 14 and 16 was not determmistic.
  • the processing sets could each read the data from the second DDR 116B.
  • Figure 20 illustrates the readmg of the second DDR 116B in a dissimilar data read stage followmg the dissimilar data w ⁇ te stage of Figure 15
  • each of the processing sets 14 and 16 outputs the same fileetermmed address DDR-RB which is separately interpreted by the respective first and second decoding sections 136A and 136B as addressing the same DDR, namely the second DDR 116B.
  • the content of the second DDR 116B is read by both of the processmg sets 14 and 16, thereby enablmg those processing sets to receive the same data.
  • this enables the two processing sets 14 and 16 to achieve deterministic behavior, even if the source of the data wntten into the DDRs 116 by the processmg sets 14 and 16 was not determmistic.
  • the selection of which of the first and second DDRs 116A and 116B to be read can be determined m any appropriate manner by the software operatmg on the processmg modules This could be done on the basis of a simple selection of one or the other DDRs, or on a statistical basis or randomly or m any other manner as long as the same choice of DDR is made by both or all of the processmg sets
  • Figure 21 is a flow diagram summarizing the various stages of operation of the DDR mechanism described above.
  • stage S10 a DDR write address DDR-W is received and decoded by the address decoders sections 136A and 136B dunng the address phase of the DDR write operation.
  • stage SI 1 the comparator 130 is disabled.
  • stage SI 2 the data received from the processing sets 14 and 16 durmg the data phase of the DDR write operation is stored in the first and second DDRs 116A and 116B, respectively, as selected by the first and second decode sections 136A and 136B, respectively
  • stage SI 3 a DDR read address is received from the first and second processing sets and is decoded by the decode sections 136A and 136B, respectively
  • FIG. 22 is a schematic representation of the arbitration performed on the respective buses 22, 24 and 26, and the arbitration for the bridge itself
  • Each of the processmg set bus controllers 50 m the respective processmg sets 14 and 16 includes a conventional PCI master bus arbiter 180 for providing arbitration to the respective buses 24 and 26
  • Each of the master arbiters 180 is responsive to request signals from the associated processmg set bus controller 50 and the bridge 12 on respective request (REQ) lines 181 and 182
  • the master arbiters 180 allocate access to the bus on a first-come-first-served basis, issumg a grant (GNT) signal to the winning party on an approp ⁇ ate grants line 183 or 184
  • a conventional PCI bus arbiter 185 provides arbitration on the D bus 22
  • the D bus arbiter 185 can be configured as part of the D bus mterface 82 of Figure 6 or could be separate therefrom
  • the D bus arbiter is responsive to request signals from the contending devices, including the bridge and the devices 30, 31, etc connected to the device bus 22
  • Respective request lmes 186, 187, 188, etc for each of the entities competing for access to the D bus 22 are provided for the request signals (REQ)
  • the D bus arbiter 185 allocates access to the D bus on a first-come-frrst-served basis, issuing a grant (GNT) signal to the winning entity via respective grant lines 189, 190, 192, etc
  • Figure 23 is a state diagram summarising the operation of the D bus arbiter 185
  • up to six request signals may be produced by respective D bus devices and one by the bridge itself
  • these are sorted by a pnonty encoder and a request signal (REQ#) with the highest priority is registered as the winner and gets a grant (GNT#) signal
  • REQ# request signal
  • GNT# grant
  • Each winner which is selected modifies the pnonties in a priority encoder so that given the same REQ# signals on the next move to grant A different device has the highest priority, hence each device has a "fair" chance of accessing DEVs
  • the b ⁇ dge REQ# has a higher weighting than D bus devices and will, under very busy conditions, get the bus for every second device
  • a device requesting the bus fails to perform a transaction within 16 cycles it may lose GNT# via the BACKOFF state BACKOFF is required as, under PCI rules, a device may access the bus one cycle after GNT# is removed Devices may only be granted access to D bus if the bridge is not in the not in the EState A new GNT# is produced at the times when the bus is idle
  • a priority encoder can be provided to resolve access attempts which collide In this case "a collision" the loser/losers are retried which forces them to give up the bus Under PCI rules retried devices must try repeatedly to access the bridge and this can be expected to happen
  • the bridge arbiter 134 is responsive to standard PCI signals provided on standard PCI control lines 22, 24 and 25 to control access to the bridge 12
  • Figure 25 illustrates signals associated with an I/O operation cycle on the PCI bus
  • a PCI frame signal (FRAME#) is initially asserted
  • address (A) signals will be available on the DATA BUS and the approp ⁇ ate command (w ⁇ te/read) signals (C) will be available on the command bus (CMD BUS)
  • C command bus
  • IRDY# the initiator ready signal
  • DEVSEL# device selected signal
  • TRDY# data transfer
  • the bridge is operable to allocate access to the bridge resources and thereby to negotiate allocation of a target bus in response to the FRAME# being asserted low for the mitiator bus concerned
  • the bndge arbiter 134 is operable to allocate access to the bridge resources and/or to a target bus on a first-come- first-served basis m response to the FRAME# being asserted low
  • the arbiters may be additionally provided with a mechanism for loggmg the arbitration requests, and can imply a conflict resolution based on the request and allocation history where two requests are received at an identical time
  • a simple priority can be allocated to the various requesters, whereby, in the case of identically timed requests, a particular requester always wins the allocation process
  • Each of the slots on the device bus 22 has a slot response register (SRR) 118, as well as other devices connected to the bus, such as a SCSI mterface
  • SRR slot response register
  • Each of the SRRs 118 contams bits defining the ownership of the slots, or the devices connected to the slots on the direct memory access bus
  • each SRR 118 comprises a four bit register
  • a larger register will be required to determine ownership between more than two processing sets For example, if three processing sets are provided, then a five bit register will be required for each slot
  • Figure 16 illustrates schematically one such four bit register 600 As shown m Figure 16, a first bit 602 is identified as SRR[0], a second bit 604 is identified as SRR[1], a third bit 606 is identified as SRR[2] and a fourth bit 608 is identified as SRR[3]
  • Bit SRR[0] is a bit which is set when writes for valid transactions are to be suppressed
  • Bit SRR[1] is set when the device slot is owned by the first processing set 14 This defines the access route between the first processmg set 14 and the device slot When this bit is set, the first processing set 14 can always be master of a device slot 22, while the ability for the device slot to be master depends on whether bit
  • Bit SRR[2] is set when the device slot is owned by the second processing set 16 This defines the access route between the second processing set 16 and the device slot
  • Bit SRR[3] is an arbitration bit which gives the device slot the ability to become master of the device bus 22, but only if it is owned by one of the processing sets 14 and 16, that is if one of the SRR fl] and SRR[2]
  • the fake bit (SRR[0]) of an SRR 118 When the fake bit (SRR[0]) of an SRR 118 is set, writes to the device for that slot are ignored and do not appear on the device bus 22. Reads return indeterminate data without causing a transaction on the device bus 22.
  • the fake bit SRR[0] of the SRR 188 co ⁇ espondmg to the device which caused the enor is set by the hardware configuration of the bridge to disable further access to the device slot concerned.
  • An interrupt may also be generated by the bridge to inform the software which o ⁇ gmated the access leading to the I/O enor that the enor has occu ⁇ ed.
  • the fake bit has an effect whether the system is in the split or the combmed mode of operation.
  • each slot can be m three states: Not-owned;
  • a slot which is not owned by the processing set making the access (this includes un-owned slots) cannot be accessed and results m an abort.
  • a processing set can only claim an un-owned slot; it cannot wrest ownership away from another processmg set. This can only be done by powerrng-off the other processing set When a processing set is powered off, all slots owned by it move to the un-owned state. Whilst it is not possible for a processmg set to wrest ownership from another processing set, it is possible for a processing set to give ownership to another processing set.
  • the owned bits can be altered when in the combined mode of operation state but they have no effect until the split mode is entered.
  • Table 2 summarizes the access rights as determined by an SRR 118.
  • the setting of SRR[2] logic high indicates that the device is owned by processing set B
  • SRR[3] is set logic low and the device is not allowed access to the processmg set SRR[0] is set high so that any writes to the device are ignored and reads therefrom return mdetermmate data
  • the malfunctioning device is effectively isolated from the processing set, and provides indeterminate data to satisfy any device drivers, for example, that might be lookmg for a response from the device
  • Figure 26 illustrates the operation of the bridge 12 for direct memory access by a device such as one of the devices 28, 29, 30, 31 and 32 to the memory 56 of the processing sets 14 and 16
  • DMA direct memory access
  • the address decode logic 142 holds or has access to a geographic address map 196, which identifies the relationship between the processor address space and the slots as a result of the geographic address employed
  • This geographic address map 196 could be held as a table in the bridge memory 126, along with the posted write buffer 122 and the dirty RAM 124 Alternatively, it could be held as a table in a separate memory element, possibly forming part of the address decoder 142 itself
  • the map 182 could be configured m a form other than a table
  • the address decode logic 142 is configured to verify the conectness of the DMA addresses supplied by the device 30 In one embodiment of the mvention, this is achieved by comparing four significant address bits of the address supplied by the device 30 with the conesponding four address bits of the address held in the geographic addressing map 196 for the slot identified by the D bus grant signal for the DMA request In this example, four address bits are sufficient to determine whether the address supplied is within the conect address range
  • 32 bit PCI bus addresses are used, with bits 31 and 30 always being set to 1, bit 29 being allocated to identify which of two bridges on a motherboard is being addressed (see Figure 2) and bits 28 to 26 identifying a PCI device Bits 25-0 define an offset from the base address for the address range for each slot Accordingly, by comparing bits 29-26, it is possible to identify whether the address(es) supplied fall(s) within the appropriate address range for the slot concerned It will be appreciated that in other embodiments a different number of bits may need to be compared to make this determination depending upon the allocation of the addresses
  • the address decode logic 142 could be ananged to use the bus grant signal 184 for the slot concerned to identify a table entry for the slot concerned and then to compare the address m that entry with the address(es) received with the DMA request as descnbed above Alternatively, the address decode logic 142 could be ananged to use the address(es) received with the DMA address to address a relational geographic address map and to determine a slot number therefrom, which could be compared to the slot for which the bus grant signal 194 is intended and thereby to determine whether the addresses fall within the address range appropriate for the slot concerned
  • the address decode logic 142 is ananged to permit DMA to proceed if the DMA addresses fall within the expected address space for the slot concerned Otherwise, the address decoder is ananged to ignore the slots and the physical addresses
  • the address decode logic 142 is further operable to control the routing of the DMA request to the appropriate processmg set(s) 14/16 If the bridge is in the combmed mode, the DMA access will automatically be allocated to all of the m-sync processing sets 14/16 The address decode logic 142 will be aware that the bridge is in the combmed mode as it is under the control of the bridge controller 132 (see Figure 8) However, where the bridge is in the split mode, a decision will need to be made as to which, if any, of the processing sets the DMA lequest is to be sent
  • the address decode logic 142 is operable to determine the ownership of the device originating the DMA request by accessing the SRR 118 for the slot concerned
  • the appropriate slot can be identified by the D bus giant signal
  • the address decode logic 142 is operable to control the target controller 140 (see Figure 8) to pass the DMA lequest to the appropriate processing set(s) 14/16 based on the ownership bits SRR[1] and SRR[2] If bit SRR[1] is set, the first processing set 14 is the owner and the DMA request is passed to the first processing set If bit SRR[2] is set, the second processing set 16 is the owner and the DMA request is passed to the second piocessing set If neither of the bit SRR[1] and SRR[2] is set, then the DMA request is ignored by the address decoder and is not passed
  • FIG. 27 is a flow diagram summarizing the DMA verification process as illustrated with reference to Figure 24
  • the D-bus arbiter 160 arbitrates for access to the D bus 22
  • stage S21 the address decoder 142 verifies the DMA addresses supplied with the DMA request by accessing the geographic address map.
  • stage S22 the address decoder ignores the DMA access where the address falls outside the expected range for the slot concerned
  • the actions of the address decoder are dependent upon whether the bridge is m the combmed or the split mode
  • stage S24 the address decoder controls the target controller 140 (see Figure 8) to cause the routing mat ⁇ x 80 (see Figure 6) to pass the DMA request to both processing sets 14 and 16 If the b ⁇ dge is m the split mode, the address decoder is operative to verify the ownership of the slot concerned by reference to the SRR 118 for that slot in stage S25
  • the address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matnx 80 (see Figure 6) to pass the DMA request to first processing set 14 If the slot is allocated to the second processing set 16 (I e the SRR[2] bit is set), then in stage S27 the address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matnx 80 (see Figure 6) to pass the DMA request to the second processmg set 16
  • step S18 the address decoder 142 ignores or discards the DMA request and the DMA request is not passed to the processing sets 14 and 16
  • a DMA, or direct vector memory access (DVMA) request sent to one or more of the processing sets causes the necessary memory operations (read or write as appropriate) to be effected on the processing set memory
  • the automatic recovery process mcludes remtegration of the state of the processmg sets to a common status m order to attempt a restart m lockstep
  • the processmg set which asserts itself as the primary processing set as descnbed above copies its complete state to the other processmg set
  • a problem with the copymg of the content of the memory from one processmg set to the other is that during this copymg process a device connected to the D bus 22 might attempt to make a direct memory access (DMA) request for access to the memory of the primary processmg set If DMA is enabled, then a wnte made to an area of memory which has already been copied would result m the memory state of the two processors at the end of the copy not being the same In principle, it would be possible to inhibit DMA for the whole of the copy process However, this would be undesirable, bearing m mind that
  • a dirty RAM 124 is provided m the bndge. As descnbed earlier the dirty RAM 124 is configured as part of the bndge SRAM memory 126
  • the duty RAM 124 compnses a bit map havmg a dirty mdicator, for example a duty bit, for each block, or page, of memory.
  • the bit for a page of memory is set when a wnte access to the area of memory concerned is made In an embodiment of the invention one bit is provided for every 8K page of mam processmg set memory
  • the bit for a page of processmg set memory is set automatically by the address decoder 142 when this decodes a DMA request for that page of memory for either of the processmg sets 14 or 16 from a device connected to the D bus 22
  • the dirty RAM can be reset, or cleared when it is read by a processmg set, for example by means of read and clear instructions at the beginning of a copy pass, so that it can start to record pages which are dirtied smce a given tune
  • the dirty RAM 124 can be read word by word. If a large word size is chosen for readmg the duty RAM
  • the bits m the duty RAM 124 will mdicate those pages of piocessmg set memory which have been changed (or dirtied) by DMA wntes dunng the penod of the copy. A further copy pass can then be performed for only those pages of memory which have been dirtied. This will take less time that a full copy of the memory. Accordmgly, there are typically less pages marked as dirty at the end of the next copy pass and, as a result, the copy passes can become shorter and shorter.
  • the duty RAM 124 is set and cleared m both the combined and split modes. This means that in split mode the duty RAM 124 may be cleared by either processmg set
  • the dirty RAM 124 address is decoded from bits 13 to 28 of the PCI address presented by the D bus device Erroneous accesses which present illegal combinations of the address bits 29 to 31 are mapped into the dirty RAM 124 and a bit is dirtied on a write, even though the b ⁇ dge will not pass these transactions to the piocessing sets
  • the bridge When reading the dirty RAM 124, the bridge defines the whole area from 0x00008000 to OxOOOOffff as dirty RAM and will clear the contents of any location m this range on a read.
  • Figure 28 is a flow diagram summansmg the operation of the dirty RAM 124.
  • stage S41 the primary processmg set reads the dirty RAM 124 which has the effect of resetting the duty RAM 124
  • the primary processor e.g processmg set 14
  • copies the whole of its memory 56 to the memory 56 of the other processmg set e.g. processmg set 16
  • stage S43 the primary processmg set reads the dirty RAM 124 which has the effect of resetting the duty RAM 124
  • the primary processor determines whether less than a predetermmed number of bits have been w ⁇ tten in the dirty RAM 124.
  • stage S45 copies those pages of its memory 56 which have been dutied, as mdicated by the dirty bits read from the duty RAM 124 in stage S43, to the memory 56 of the other processmg set Control then passes back to stage S43. If, in stage S44, it is determmed less than the predetermmed number of bits have been wntten in the duty
  • the primary processor causes the bndge to inhibit DMA requests from the devices connected to the D bus 22. This could, for example, be achieved by clearmg the arbitration enable bit for each of the device slots, thereby denymg access of the DMA devices to the D bus 22 Alternatively, the address decoder 142 could be configured to ignore DMA requests under instructions from the primary processor Dunng the penod m which DMA accesses are prevented, the primary processor then makes a final copy pass from its memory to the memory 56 of the other processor for those memory pages co ⁇ esponding to the bits set m the duty RAM 124
  • stage S47 the primary processor can issue a reset operation for initiating a combmed mode.
  • stage S48 DMA accesses are once more permitted.

Abstract

A bridge for a multi-processor system includes bus interfaces for connection to an I/O bus of a first processing set, an I/O bus of a second processing set and a device bus. It also comprises a bridge control mechanism configured to be operable, in an operational mode to permit access by at least one of the first and second processing sets to bridge resources and to the device bus and, in an error mode, to prevent access by the processing sets to the device bus and to permit restricted access to at least one of the processing sets to at least predetermined bridge resources. By providing restricted access to selected parameters held in the bridge during an error mode, the bridge can act as a secure repository for information which can be used by the processing sets to investigate the error and hopefully to recover therefrom, while preventing I/O devices connected to device bus from being corrupted by a faulty processing set. Storage in the bridge provides for buffering data pending resolution of the error.

Description

TITLE: MULTI-PROCESSOR SYSTEM BRIDGE WITH CONTROLLED ACCESS
BACKGROUND OF THE INVENTION
This mvention relates to a multi-processor system in which first and second processmg sets (each of which may compnse one or more processors) communicate with an I/O device bus
The application finds particular application to fault tolerant computer systems where two or more processor sets need to communicate with an I/O device bus m lockstep with provision for identifying lockstep errors m order to detect faulty operation of the system as a whole In such a fault tolerant computer system, an ami is not only to be able to identify faults, but also to provide a structure which is able to provide a high degree of system availability In order to provide high levels of system availability, it would be desirable for such systems automatically to attempt recovery from a fault, or error condition
Automatic recovery from an error provides significant technical challenges m that the system has to provide an environment where it can contmue to operate followmg a fault m a manner which does not further corrupt the system while permitting diagnostic operations to be performed
Accordingly, an aim of the present mvention is to address these technical problems
SUMMARY OF THE INVENTION
Particular and preferred aspects of the mvention are set out m the accompanying mdependent and dependent claims Combinations of features from the dependent claims may be combmed with features of the mdependent claims as appropnate and not merely as explicitly set out in the claims
In accordance with one aspect of the invennon, there is provided a bndge for a multi-processor system The bridge compnses a first processor bus mterface for connection to an I/O bus of a first processmg set, a second processor bus mterface for connection to an I/O bus of a second processmg set and a device bus mterface for connection to a device bus It also comprises a bridge control mechamsm configured to be operable, in an operational mode to permit access by at least one of the first and second processmg sets to bridge resources and to the device bus and, m an error mode, to prevent access by the processing sets to the device bus and to permit restncted access by at least one of the processing sets to at least predetermined bridge resources
By providmg restricted access to selected parameters held m the bridge durmg an error mode, the bridge can act as a secure repository for information which can be used by the processmg sets to investigate and diagnose the error and hopefully to recover therefrom By preventmg the processmg sets from havmg access to the device bus, a faulty processing set can be prevented from corrupting devices connected to the device bus
It should be noted that the bus interfaces referenced above need not be separate components of the bndge, but may be incorporated in other components of the bridge, and may indeed be simply connections for the lines of the buses concerned The bndge control mechanism can be operable, m response to detection of an error state, to cause the bridge to cease operation in the operational mode and instead to operate m the error mode
Storage can be provided m the bridge for buffering data pendmg resolution of the error For example, error state registers can be provided for saving operating parameters on entry to the enor mode, read only access to the error state registers being permitted by at least one processmg set during the enor mode A posted wnte buffer can be provided for the storage of wntes already posted by at least one processing set on entry to the enor mode, read only access to the posted wnte buffer being permitted by at least one processing set during the enor mode
The bridge control mechanism can be operable in an initial enor mode to store m the posted write buffer any internal bndge write accesses initiated by the processmg sets and to allow and to arbitrate any internal bridge read accesses mitiated by the processing sets It can also be operable I the initial mode to store m a posted write buffer any device bus wnte accesses initiated by the processing sets and to abort any device bus read accesses mitiated by the processing sets
In a primary enor mode in which a processing set asserts itself as a primary processmg set, the bπdge control mechanism can be operable to allow and to arbitrate any internal bridge write accesses mitiated by the primary processmg set, to discard any internal bπdge write accesses mitiated by any other processing set, and to allow and to arbitrate any internal bndge read accesses initiated by the processmg sets It can also be operable m this mode to discard any device bus write accesses initiated by the processing sets and to abort any device bus read accesses mitiated by the processmg sets The primary processmg set is a processmg set which determines that it is operational, and not faulty, as a result of a fault analysis process This allows any write accesses for the bπdge or for the device bus which have already been posted by the processmg sets to be saved durmg the mitial enor phase Later wnte accesses to the device bus can be discarded as being enoneous Read accesses to the device bus can safely be aborted as they can be resent on exit from the enor mode Read access by the processing sets to the bndge is possible for diagnostic purposes When a processing set asserts itself as a primary processing set, this processing set is then able to have wnte access to the bndge as well
The bndge control mechanism can be further operable, in a split operational mode, to arbitrate between the first and the second processing sets for access to each others I/O bus and to the device bus and, in a combined operational mode, to monitor lockstep operation of the first and second processing sets The bridge control mechanism can be operable on power up of the bridge to in an initial enor mode until a processor set asserts itself as a primary processmg set, then m the split operational mode to enable all processing sets to be set to a conesponding state before transfeπing to the combined operational mode
The bπdge can mclude a storage sub-system and a controllable routmg matrix connected between the first processor bus mterface, the second processor bus interface, the device bus mterface and the storage sub- system, the bridge control mechanism being operable to control the routing matπx selectively to mtercormect the first processor bus mterface, the second processor bus interface, the device bus interface and the memory sub-system accordmg to a cunent mode of operation
The bridge can mclude at least one further processor bus interface for connection to an I/O bus of a further processmg set In accordance with another aspect of the invention, there is provided a computer system compπsmg a first processmg set havmg an I/O bus, a second processing set having an I/O bus, a device bus and a bridge, the bndge compnsmg a first processor bus interface connected to the I/O bus of the first processmg set, a second processor bus mterface connected to the I/O bus of the second processmg set, a device bus interface connected to the device bus and a bndge control mechanism as described above.
In accordance with a further aspect of the mvention, there is provided a method of operating a multiprocessor system compnsmg a first processmg set havmg an I/O bus, a second processmg set havmg an I/O bus, a device bus and a bπdge, the bndge compnsmg a first processor bus interface connected to the I/O bus of the first processmg set, a second processor bus mterface connected to the I/O bus of the second processmg set and a device bus interface connected to the device bus, the method compπsmg selectively operating the bndge- in an operational mode to permit access by at least one of the first and second processmg sets to bndge resources and to the device bus; and m an enor mode to prevent access by the processmg sets to the device bus and to permit restncted access by at least one of the processmg sets to at least predetermmed bndge resources.
BRIEF DESCRIPTION OF THE DRAWINGS
Exemplary embodiments of the present invennon will be descnbed hereinafter, by way of example only, with reference to the accompanymg drawings in which like reference signs relate to like elements and in which Figure 1 is a schematic overview of a fault tolerant computer system incorporating an embodiment of the mvention;
Figure 2 is a schematic overview of a specific implementation of a system based on that of Figure 1;
Figure 3 is a schematic representation of one implementation of a processmg set;
Figure 4 is a schematic representation of another example of a processmg set; Figure 5 is a schematic representation of a further processmg set,
Figure 6 is a schematic block diagram of an embodiment of a bndge for the system of Figure 1,
Figure 7 is a schematic block diagram of storage for the bndge of Figure 6;
Figure 8 is a schematic block diagram of control logic of the bndge of Figure 6;
Figure 9 is a schematic representation of a routmg matπx of the bndge of Figure 6; Figure 10 is an example implementation of the bndge of Figure 6;
Figure 11 is a state diagram illustrating operational states of the bπdge of Figure 6;
Figure 12 is a flow diagram illustrating stages m the operation of the bndge of Figure 6,
Figure 13 is a detail of a stage of operation from Figure 12,
Figure 14 illustrates the posting of I/O cycles m the system of Figure 1, Figure 15 illustrates the data stored m a posted wnte buffer;
Figure 16 is a schematic representation of a slot response register,
Figure 17 illustrates a dissimilar data wnte stage,
Figure 18 illustrates a modification to Figure 17,
Figure 19 illustrates a dissimilar data read stage, Figure 20 illustrates an alternative dissimilar data read stage,
Figure 21 is a flow diagram summansmg the operation of a dissimilar data wnte mechamsm,
Figure 22 is a schematic block diagram explammg arbitration within the system of Figure 1 ,
Figure 23 is a state diagram illustrating the operation of a device bus arbiter,
Figure 24 is a state diagram illustrating the operation of a bndge arbiter,
Figure 25 is a timing diagram for PCI signals,
Figure 26 is a schematic diagram illustrating the operation of the bndge of Figure 6 for direct memory access,
Figure 27 is a flow diagram illustrating a direct memory access method m the bndge of Figure 6, and Figure 28 is a flow diagram of a re-integration process mcludmg the monitoring of a duly RAM
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Figure 1 is a schematic overview of a fault tolerant computmg system 10 compnsmg a plurality of CPUsets (processmg sets) 14 and 16 and a bndge 12 As shown m Figure 1, there are two processing sets 14 and 16, although m other embodiments there may be three or more processmg sets The bndge 12 forms an interface between the processmg sets and I/O devices such as devices 28, 29, 30, 31 and 32 In this document, the term "processmg set" is used to denote a group of one or more processors, possibly mcludmg memory, which output and receive common outputs and inputs It should be noted that the alternative term mentioned above, "CPUset", could be used mstead, and that these terms could be used mterchangeably throughout this document Also, it should be noted that the term "bndge" is used to denote any device, apparatus or anangement suitable for interconnecting two or more buses of the same or different types
The first processing set 14 is connected to the bπdge 12 via a first processmg set I/O bus (PA bus) 24, in the present mstance a Peripheral Component Interconnect (PCI) bus The second processing set 16 is connected to the bndge 12 via a second processing set I/O bus (PB bus) 26 of the same type as the PA bus 24 (I e here a PCI bus) The I/O devices are connected to the bridge 12 via a device I/O bus (D bus) 22, m the present instance also a PCI bus
Although, m the particular example described, the buses 22, 24 and 26 are all PCI buses, this is merely by way of example, and m other embodiments other bus protocols may be used and the D-bus 22 may have a different protocol from that of the PA bus and the PB bus (P buses) 24 and 26
The processing sets 14 and 16 and the bridge 12 are operable m synchronism under the control of a common clock 20, which is connected thereto by clock signal lmes 21
Some of the devices including an Ethernet (E-NET) interface 28 and a Small Computer System Interface (SCSI) mterface 29 are permanently connected to the device bus 22, but other I/O devices such as I/O devices 30, 31 and 32 can be hot msertable into mdividual switched slots 33, 34 and 35 Dynamic field effect transistor (FET) switching can be provided for the slots 33, 34 and 35 to enable hot msertability of the devices such as devices 30, 31 and 32 The provision of the FETs enables an increase m the length of the D bus 22 as only those devices which are active are switched on, reducing the effective total bus length It will be appreciated that the number of I/O devices which may be connected to the D bus 22, and the number of slots provided for them, can be adjusted accordmg to a particular implementation m accordance with specific design requirements
Figure 2 is a schematic overview of a particular implementation of a fault tolerant computer employing a bridge structure of the type illustrated m Figure 1 In Figure 2, the fault tolerant computer system includes a plurality (here four) of bndges 12 on first and second I/O motherboards (MB 40 and MB 42) order to mcrease the number of I/O devices which may be connected and also to improve reliability and redundancy Thus, in the embodiment shown m Figure 2, two processmg sets 14 and 16 are each provided on a respective processmg set board 44 and 46, with the processmg set boards 44 and 46 'bndging' the I/O motherboards MB 40 and MB 42 A first, master clock source 20A is mounted on the first motherboard 40 and a second, slave clock source 20B is mounted on the second motherboard 42 Clock signals are supplied to the processing set boards 44 and 46 via respective connections (not shown m Figure 2)
First and second bndges 12 1 and 12 2 are mounted on the first I/O motherboard 40 The first bridge
12 1 is connected to the processmg sets 14 and 16 by P buses 24 1 and 26 1, respectively Similarly, the second bridge 12 2 is connected to the processmg sets 14 and 16 by P buses 24 2 and 26 2, respectively The bridge 12 1 is connected to an I/O databus (D bus) 22 1 and the bπdge 12 2 is connected to an I/O databus (D bus)
22 2
Third and fourth bndges 12 3 and 12 4 are mounted on the second I/O motherboard 42 The bπdge 12 3 is connected to the processmg sets 14 and 16 by P buses 24 3 and 26 3, respectively Similarly, the bridge 4 is connected to the processing sets 14 and 16 by P buses 24 4 and 26 4, respectively The bndge 12 3 is connected to an I/O databus (D bus) 22 3 and the bndge 12 4 is connected to an I/O databus (D bus) 224
It can be seen that the anangement shown m Figure 2 can enable a large number of I/O devices to be connected to the two processing sets 14 and 16 via the D buses 22 1, 22 2, 22 3 and 22 4 for either increasing the range of I/O devices available, or providmg a higher degree of redundancy, or both
Figure 3 is a schematic overview of one possible configuration of a processmg set, such as the piocessmg set 14 of Figure 1 The processmg set 16 could have the same configuration In Figure 3, a plurality of processors (here four) 52 are connected by one or more buses 54 to a processmg set bus controller 50 As shown in Figure 3, one or more processing set output buses 24 are connected to the processmg set bus controller 50, each processmg set output bus 24 being connected to a respective bridge 12 For example, in the anangement of Figure 1, only one processing set I/O bus (P bus) 24 would be provided, whereas in the anangement of Figure 2, four such processing set I/O buses (P buses) 24 would be provided In the processing set 14 shown in Figure 3, individual processors operate using the common memory 56, and receive inputs and provide outputs on the common P bus(es) 24
Figure 4 is an alternative configuration of a processmg set, such as the processmg set 14 of Figure 1 Here a plurality of processor/memory groups 61 are connected to a common internal bus 64 Each processor/memory group 61 includes one or more processors 62 and associated memory 66 connected to a internal group bus 63 An interface 65 connects the internal group bus 63 to the common internal bus 64 Accordingly, in the anangement shown m Figure 4, individual processing groups, with each of the processors 62 and associated memory 66 are connected via a common internal bus 64 to a processmg set bus controller 60 The mterfaces 65 enable a processor 62 of one processmg group to operate not only on the data in its local memory 66, but also m the memory of another processing group 61 withm the processing set 14 The processing set bus controller 60 provides a common interface between the common internal bus 64 and the processmg set I/O bus(es) (P bus(es)) 24 connected to the bπdge(s) 12 It should be noted that although only two processing groups 61 are shown in Figure 4, it will be appreciated that such a structure is not limited to this number of processmg groups
Figure 5 illustrates an alternative configuration of a processmg set, such as the processing set 14 of Figure 1 Here a simple processing set mcludes a smgle processor 72 and associated memory 76 connected via a common bus 74 to a processmg set bus controller 70 The processmg set bus controller 70 provides an interface between the internal bus 74 and the processing set I/O bus(es) (P bus(es)) 24 for connection to the bπdge(s) 12
Accordmgly, it will be appreciated from Figures 3, 4 and 5 that the processmg set may have many different forms and that the particular choice of a particular processmg set structure can be made on the basis of the processing requirement of a particular application and the degree of redundancy required In the followmg description, it is assumed that the processmg sets 14 and 16 refened to have a structure as shown in Figure 3, although it will be appreciated that another form of processing set could be provided
The bπdge(s) 12 are operable in a number of operatmg modes These modes of operation will be described in more detail later However, to assist m a general understanding of the structure of the bridge, the two operating modes will be bnefly summanzed here In a first, combmed mode, a bridge 12 is operable to route addresses and data between the processing sets 14 and 16 (via the PA and PB buses 24 and 26, respectively) and the devices (via the D bus 22) In this combined mode, I/O cycles generated by the processing sets 14 and 16 are compared to ensure that both processing sets are operatmg conectly Comparison failures force the bπdge 12 mto an enor limiting mode (EState) in which device I/O is prevented and diagnostic information is collected In the second, split mode, the bπdge 12 routes and arbitrates addresses and data from one of the processing sets 14 and 16 onto the D bus 22 and/or onto the other one of the processmg sets 16 and 14, respectively In this mode of operation, the processmg sets 14 and 16 are not synchronized and no I/O comparisons are made DMA operations are also permitted in both modes As mentioned above, the different modes of operation, including the combined and split modes, will be described m more detail later However, there now follows a description of the basic structure of an example of the bπdge 12
Figure 6 is a schematic functional overview of the bridge 12 of Figure 1 First and second processmg set I/O bus interfaces, PA bus mterface 84 and PB bus mterface 86, are connected to the PA and PB buses 24 and 26, respectively A device I/O bus mterface, D bus interface 82, is connected to the D bus 22 It should be noted that the PA, PB and D bus mterfaces need not be configured as separate elements but could be incorporated m other elements of the bridge Accordmgly, within the context of this document, where a references is made to a bus mterface, this does not require the presence of a specific separate component, but rather the capability of the bridge to connect to the bus concerned, for example by means of physical or logical bridge connections for the lines of the buses concerned
Routmg (hereinafter termed a routmg matrix) 80 is connected via a first internal path 94 to the PA bus interface 84 and via a second internal path 96 to the PB bus interface 86 The routing matnx 80 is further connected via a third internal path 92 to the D bus interface 82 The routmg matrix 80 is thereby able to provide I/O bus transaction routmg in both directions between the PA and PB bus interfaces 84 and 86 It is also able to provide routing in both directions between one or both of the PA and PB bus mterfaces and the D bus mterface 82. The routing matnx 80 is connected via a further internal path 100 to storage control logic 90 The storage control logic 90 controls access to bridge registers 110 and to a random access memory (SRAM) 126 The routmg matrix 80 is therefore also operable to provide routing in both directions between the PA, PB and D bus interfaces 84, 86 and 82 and the storage control logic 90. The routmg matrix 80 is controlled by bridge control logic 88 over control paths 98 and 99. The bπdge control logic 88 is responsive to control signals, data and addresses on internal paths 93, 95 and 97, and also to clock signals on the clock lιne(s) 21.
In the embodiment of the mvention, each of the P buses (PA bus 24 and PB bus 26) operates under a PCI protocol The processmg set bus controllers 50 (see Figure 3) also operate under the PCI protocol Accordingly, the PA and PB bus mterfaces 84 and 86 each provide all the functionality required for a compatible interface providmg both master and slave operation for data transfened to and from the D bus 22 or internal memones and registers of the bridge m the storage subsystem 90 The bus mterfaces 84 and 86 can provide diagnostic information to internal bndge status registers m the storage subsystem 90 on transition of the bπdge to an enor state (EState) or on detection of an I/O enor
The device bus mterface 82 performs all the functionality required for a PCI compliant master and slave interface for transferring data to and from one of the PA and PB buses 84 and 86 The D bus 82 is operable dunng direct memory access (DMA) transfers to provide diagnostic information to internal status registers in the storage subsystem 90 of the bπdge on transition to an EState or on detection of an I/O enor Figure 7 illustrates in more detail the bridge registers 110 and the SRAM 124. The storage control logic 110 is connected via a path (e g a bus) 112 to a number of register components 114, 116, 118, 120. The storage control logic is also connected via a path (e g. a bus) 128 to the SRAM 126 in which a posted write buffer component 122 and a duly RAM component 124 are mapped. Although a particular configuration of the components 114, 116, 118, 120, 122 and 124 is shown in Figure 7, these components may be configured in other ways, with other components defined as regions of a common memory (e.g a random access memory such as the SRAM 126, with the path 112/128 being formed by the internal addressing of the regions of memory) As shown m Figure 7, the posted write buffer 122 and the duty RAM 124 are mapped to different regions of the SRAM memory 126, whereas the registers 114, 116, 118 and 120 are configured as separate from the SRAM memory. Control and status registers (CSRs) 114 form internal registers which allow the control of vaπous operating modes of the bndge, allow the capture of diagnostic information for an EState and for I/O enors, and control processing set access to PCI slots and devices connected to the D bus 22. These registers are set by signals from the routmg matπx 80.
Dissimilar data registers (DDRs) 116 provide locations for contammg dissimilar data for different processing sets to enable non-deterministic data events to be handled. These registers are set by signals from the PA and PB buses.
Bπdge decode logic enables a common wπte to disable a data comparator and allow writes to two DDRs 116, one for each processmg set 14 and 16. A selected one of the DDRs can then be read m-sync by the processmg sets 14 and 16 The DDRs thus provide a mechanism enablmg a location to be reflected from one processmg set (14/16) to another (16/14)
Slot response registers (SRRs) 118 determine ownership of device slots on the D bus 22 and to allow DMA to be routed to the appropriate processing set(s) These registers are linked to address decode logic Disconnect registers 120 are used for the storage of data phases of an I/O cycle which is aborted while data is in the bridge on the way to another bus The disconnect registers 120 receive all data queued m the bndge when a target device disconnects a transaction, or as the EState is detected These registers are connected to the routmg matnx 80 The routmg matnx can queue up to three data words and byte enables Provided the lniual addresses are voted as bemg equal, address target controllers denve addresses which mcrement as data is exchanged between the bπdge and the destmation (or target) Where a wnter (for example a processor I/O wnte, or a DVMA (D bus to P bus access)) is wnting data to a target, this data can be caught m the bndge when an enor occurs Accordmgly, this data is stored m the disconnect registers 120 when an enor occurs These disconnect registers can then be accessed on recovery from an EState to recover the data associated with the wnte or read cycle which was in progress when the EState was mitiated Although shown separately, the DDRs 116, the SRRs 118 and the disconnect registers may form an integral part of the CSRs 114
EState and enor CSRs 114 provided for the capture of a failmg cycle on the P buses 24 and 26, with an mdication of the failmg datum Following a move to an EState, all of the wntes mitiated to the P buses are logged in the posted wnte buffer 122 These may be other writes that have been posted m the processmg set bus controllers 50, or which may be initiated by software before an EState interrupt causes the processors to stop carrying out writes to the P buses 24 and 26
A dirty RAM 124 is used to mdicate which pages of the mam memory 56 of the processmg sets 14 and
16 have been modified by direct memory access (DMA) transactions from one or more devices on the D bus 22
Each page (e g each 8K page) is marked by a single bit m the dirty RAM 124 which is set when a DMA wπte occurs and can be cleared by a read and clear cycle initiated on the dirty RAM 124 by a processor 52 of a processmg set 14 and 16
The dirty RAM 124 and the posted wπte buffer 118 may both be mapped into the memory 124 in the bridge 12 This memory space can be accessed dunng normal read and write cycles for testing purposes Figure 8 is a schematic functional overview of the bridge control logic 88 shown in Figure 6 All of the devices connected to the D bus 22 are addressed geographically Accordingly, the bridge canies out decoding necessary to enable the isolating FETs for each slot before an access to those slots is initiated
The address decodmg performed by the address decode logic 136 and 138 essentially permits four basic access types - an out-of-sync access (l e not m the combined mode) by one processmg set (e g processmg set 14 of
Figure 1) to the other processing set (e g processing set 16 of Figure 1), m which case the access is routed from the PA bus interface 84 to the PB bus mterface 86, - an access by one of the processing sets 14 and 16 m the split mode, or both processmg sets 14 and 16 in the combined mode to an I/O device on the D bus 22, m which case the access is routed via the D bus interface 82,
- a DMA access by a device on the D bus 22 to one or both of the processing sets 14 and 16, which would be directed to both processing sets 14 and 16 in the combined mode, or to the relevant processmg set 14 or 16 if out-of-sync, and if in a split mode to a processmg set 14 or 16 which owns a slot m which the device is located; and
- a PCI configuration access to devices m I/O slots.
As mentioned above, geographic addressing is employed Thus, for example, slot 0 on motherboard A has the same address when refened to by processmg set 14 or by processmg set 16
Geographic addressing is used m combination with the PCI slot FET switching. Dunng a configuration access mentioned above, separate device select signals are provided for devices which are not
FET isolated A smgle device select signal can be provided for the switched PCI slots as the FET signals can be used to enable a conect card. Separate FET switch lines are provided to each slot for separately switchmg the FETs for the slots.
The SRRs 118, which could be incorporated m the CSR registers 114, are associated with the address decode functions. The SRRs 118 serve m a number of different roles which will be described m more detail later. However, some of the roles are summarized here.
In a combined mode, each slot may be disabled so that wntes are simply acknowledged without any transaction occumng on the device bus 22, whereby the data is lost Reads will return meanmgless data, once again without causing a transaction on the device board.
In the split mode, each slot can be in one of three states The states are:
- Not owned;
- Owned by processing set A 14, - Owned by processmg set B 16.
A slot that is not owned by a processing set 14 or 16 making an access (this includes not owned or unowned slots) cannot be accessed. Accordmgly, such an access is aborted.
When a processing set 14 or 16 is powered off, all slots owned by it move to the un-owned state A processing set 14 or 16 can only claim an un-owned slot, it cannot wrest ownership away from another processing set. This can only be done by powermg off the other processing set, or by getting the other processing set to relinquish ownership.
The ownership bits are assessable and settable while the combmed mode, but have no effect until a split state is entered. This allows the configuration of a split system to be determined while still in the combined mode Each PCI device is allocated an area of the processmg set address map The top bits of the address are determined by the PCI slot. Where a device carnes out DMA, the bridge is able to check that the device is using the conect address because a D bus arbiter informs the bridge which device is using the bus at a particular time If a device access is a processmg set address which is not valid for it, then the device access will be ignored It should be noted that an address presented by a device will be a virtual address which would be translated by an I/O memory management unit m the processing set bus controller 50 to an actual memory address.
The addresses output by the address decoders are passed via the initiator and target controllers 138 and 140 to the routing matnx 80 via the lines 98 under control of a bridge controller 132 and an arbiter 134 An arbiter 134 is operable m various different modes to arbitrate for use of the bridge on a first-come- first-served basis usmg conventional PCI bus signals on the P and D buses.
In a combmed mode, the arbiter 134 is operable to arbitrate between the ui-sync processmg sets 14 and 16 and any initiators on the device bus 22 for use of the bndge 12 Possible scenanos are-
- processmg set access to the device bus 22, - processmg set access to internal registers m the bπdge 12,
- Device access to the processmg set memory 56
In split mode, both processmg sets 14 and 16 must arbitrate the use of the bridge and thus access to the device bus 22 and internal bridge registers (e.g. CSR registers 114). The bndge 12 must also contend with initiators on the device bus 22 for use of that device bus 22. Each slot on the device bus has an arbitration enable bit associated with it These arbitration enable bits are cleared after reset and must be set to allow a slot to request a bus When a device on the device bus 22 is suspected of providmg an I/O enor, the arbitration enable bit for that device is automatically reset by the bndge
A PCI bus interface m the processmg set bus controller(s) 50 expects to be the master bus controller for the P bus concerned, that is it contains the PCI bus arbiter for the PA or PB bus to which it is connected The bridge 12 cannot directly control access to the PA and PB buses 24 and 26. The bridge 12 competes for access to the PA or PB bus with the processmg set on the bus concerned under the control of the bus controller 50 on the bus concerned.
Also shown m Figure 8 is a comparator 130 and a bridge controller 132. The comparator 130 is operable to compare I/O cycles from the processing sets 14 and 16 to determine any out-of-sync events On determining an out-of-sync event, the comparator 130 is operable to cause the bridge controller 132 to activate an EState for analysis of the out-of-sync event and possible recovery therefrom. Figure 9 is a schematic functional overview of the routmg matπx 80.
The routmg matπx 80 compnses a multiplexer 143 which is responsive to mitiator control signals 98 from the initiator controller 138 of Figure 8 to select one of the PA bus path 94 , PB bus path 96, D bus path 92 or internal bus path 100 as the cunent mput to the routing matrix Separate output buffers 144, 145, 146 and 147 are provided for output to each of the paths 94, 96, 92 and 100, with those buffers bemg selectively enabled by signals 99 from the target controller 140 of Figure 8 Between the multiplexer and the buffers 144-147 signals are held in a buffer 149. In the present embodiment three cycles of data for an I/O cycle will be held in the pipeline represented by the multiplexer 143, the buffer 149 and the buffers 144.
In Figures 6 to 9 a functional descπption of elements of the bridge has been given. Figure 10 is a schematic representation of a physical configuration of the bndge m which the bridge control logic 88, the storage control logic 90 and the bridge registers 110 are implemented in a first field programmable gate anay (FPGA) 89, the routmg matnx 80 is implemented in further FPGAs 80 1 and 80.2 and the SRAM 126 is implemented as one or more separate SRAMs addressed by a address control lmes 127 The bus interfaces 82, 84 and 86 shown m Figure 6 are not separate elements, but are mtegrated m the FPGAs 80 1, 80 2 and 89 Two FPGAs 80 1 and 80 2 are used for the upper 32 bits 32-63 of a 64 bit PCI bus and the lower 32 bits 0-31 of the 64 bit PCI bus It will be appreciated that a smgle FPGA could be employed for the routing matπx 80 where the necessary logic can be accommodated within the device Indeed, where a FPGA of sufficient capacity is available, the bπdge control logic, storage control logic and the bridge registers could be incorporated in the same FPGA as the routmg matnx Indeed many other configurations may be envisaged, and mdeed technology other than FPGAs, for example one or more Application Specific Integrated Circuits (ASICs) may be employed As shown m Figure 10, the FPGAs 89, 80 1 and 80 2 and the SRAM 126 are connected via internal bus paths 85 and path control lines 87
Figure 11 is a transition diagram illustrating in more detail the various operating modes of the bridge The bridge operation can be divided into three basic modes, namely an enor state (EState) mode 150, a split state mode 156 and a combmed state mode 158 The EState mode 150 can be further divided into 2 states
After initial resetting on powermg up the bridge, or following an out-of sync event, the bndge is in this initial EState 152 In this state, all wntes are stored m the posted wnte buffer 120 and reads from the internal bridge registers (e g , the CSR registers 116) are allowed, and all other reads are treated as enors (I e they are aborted) In this state, the mdividual processing sets 14 and 16 perform evaluations for determining a restart tune Each processmg set 14 and 16 will determine its own restart timer timing The timer setting depends on a "blame" factor for the transition to the EState A processing set which determines that it is likely to have caused the enor sets a long tune for the timer A processmg set which thinks it unlikely to have caused the enor sets a short time for the timer The first processmg set 14 and 16 which tunes out, becomes a primary processing set Accordingly, when this is determmed, the bndge moves (153) to the primary EState 154
When either processmg set 14/16 has become the primary processmg set, the bndge is then operating in the primary EState 154 This state allows the primary processmg set to wnte to bridge registers (specifically the SRRs 118) Other writes are no longer stored in the posted wnte buffer, but are simply lost Device bus reads are still aborted in the primary EState 154
Once the EState condition is removed, the bridge then moves (155) to the split state 156 In the split state 156, access to the device bus 22 is controlled by the SRR registers 118 while access to the bndge storage is simply arbitrated The pnmary status of the processing sets 14 and 16 is ignored Transition to a combined operation is achieved by means of a sync_reset (157) After issue of the sync_reset operation, the bridge is then operable in the combined state 158, whereby all read and write accesses on the D bus 22 and the PA and PB buses 24 and 26 are allowed All such accesses on the PA and PB buses 24 and 26 are compared in the comparator 130 Detection of a mismatch between any read and wnte cycles (with an exception of specific dissumlar data I/O cycles) cause a transition 151 to the EState 150 The various states described are controlled by the bridge controller 132
The role of the comparator 130 is to monitor and compare I/O operations on the PA and PB buses in the combined state 151 and, in response to a mismatched signal, to notify the bndge controller 132, whereby the bπdge controller 132 causes the transition 152 to the enor state 150 The I/O operations can mclude all I/O operations initiated by the processing sets, as well as DMA transfers in respect of DMA mitiated by a device on the device bus.
Table 1 below summarizes the various access operations which are allowed m each of the operational states
TABLE 1
D Bus - Read D Bus-Write
E State Master Abort Stored in Post Write Buffer
Primary EState Master Abort Lost
Split Controlled by SRR bits Controlled by SRR bits and arbitrated and arbitrated
Combined Allowed and compared Allowed and compared
As described above, after an mitial reset, the system is m the mitial EState 152. In this state, neither processing sets 14 or 16 can access the D bus 22 or the P bus 26 or 24 of the other processing set 16 or 14 The internal bridge registers 116 of the bridge are accessible, but are read only
A system running m the combined mode 158 transitions to the EState 150 where there is a companson failure detected m this bridge, or alternatively a comparison failure is detected m another bπdge m a multi- bndge system as shown, for example, in Figure 2 Also, transitions to an EState 150 can occur in other situations, for example m the case of a software controlled event forming part of a self test operation.
On movmg to the EState 150, an interrupt is signaled to all or a subset of the processors of the processing sets via an interrupt lme 95 Followmg this, all I/O cycles generated on a P bus 24 or 26 result m reads being returned with an exception and writes being recorded in the posted write buffer
The operation of the comparator 130 will now be described m more detail The comparator is connected to paths 94, 95, 96 and 97 for comparing address, data and selected control signals from the PA and PB bus interfaces 84 and 86. A failed comparison of m-sync accesses to device I/O bus 22 devices causes a move from the combmed state 158 to the EState 150
For processmg set I/O read cycles, the address, command, address parity, byte enables and parity enor parameters are compared. If the companson fails during the address phase, the bridge asserts a retry to the processing set bus controllers 50, which prevents data leavmg the I/O bus controllers 50 No activity occurs m this case on the device I/O bus 22. On the processor(s) retrying, no enor is returned.
If the compaπson fails duπng a data phase (only control signals and byte enables are checked), the bπdge signals a target-abort to the processing set bus controllers 50. An enor is returned to the processors In the case of processing set I/O bus write cycles, the address, command, parity, byte enables and data parameters are compared If the companson fails duπng the address phase, the bridge asserts a retry to the processing set bus controllers 50, which results in the processing set bus controllers 50 retrying the cycle agam The posted write buffer 122 is then active No activity occurs on the device I/O bus 22
If the comparison fails durmg the data phase of a wπte operation, no data is passed to the D bus 22 The failing data and any other transfer attnbutes from both processmg sets 14 and 16 are stored in the disconnect registers 122, and any subsequent posted wπte cycles are recorded m the posted wπte buffer 118
In the case of direct virtual memory access (DVMA) reads, the data control and parity are checked for each datum If the data does not match, the bπdge 12 terminates the transfer on the P bus In the case of DVMA writes, control and panty enor signals are checked for conectness Other signals in addition to those specifically mentioned above can be compared to give an indication of divergence of the processmg sets Examples of these are bus grants and various specific signals duπng processing set transfers and duπng DMA transfers
Enors fall roughly mto two types, those which are made visible to the software by the processmg set bus controller 50 and those which are not made visible by the processmg set bus controller 50 and hence need to be made visible by an mterrupt from the bridge 12 Accordmgly, the bridge is operable to capture enors reported in connection with processmg set read and write cycles, and DMA reads and wntes
Clock control for the bndge is performed by the bπdge controller 132 m response to the clock signals from the clock lme 21 Individual control lmes from the controller 132 to the vaπous elements of the bridge are not shown in Figures 6 to 10 Figure 12 is a flow diagram illustrating a possible sequence of operatmg stages where lockstep enors are detected dunng a combmed mode of operation
Stage SI represents the combmed mode of operation where lockstep enor checking is performed by the comparator 130 shown in Figure 8
In Stage S2, a lockstep enor is assumed to have been detected by the comparator 130 In Stage S3, the cunent state is saved in the CSR registers 114 and posted writes are saved in the posted write buffer 122 and/or m the disconnect registers 120
Figure 13 illustrates Stage S3 in more detail Accordmgly, m Stage S31, the bridge controller 132 detects whether the lockstep enor notified by the comparator 130 has occuned durmg a data phase in which it is possible to pass data to the device bus 22 In this case, in Stage S32, the bus cycle is terminated Then, in Stage S33 the data phases are stored m the disconnect registers 120 and control then passes to Stage S35 where an evaluation is made as to whether a further I/O cycle needs to be stored Alternatively, if at Stage S31 , it is determined that the lockstep enor did not occur durmg a data phase, the address and data phases for any posted write I/O cycles are stored in the posted write buffer 122 At Stage S34, if there are any further posted write I/O operations pendmg, these are also stored in the posted wnte buffer 122 Stage S3 is performed at the initiation of the initial enor state 152 shown in Figure 11 In this state, the first and second processing sets arbitrate for access to the bridge Accordingly, m Stage S31-S35, the posted wπte address and data phases for each of the processing sets 14 and 16 are stored m separate portions of the posted write buffer 122, and/or in the single set of disconnect registers as descnbed above Figure 14 illustrates the source of the posted write I/O cycles which need to be stored in the posted write buffer 122 Dunng normal operation of the processmg sets 14 and 16, output buffers 162 m the mdividual processors contain I/O cycles which have been posted for transfer via the processmg set bus controllers 50 to the bridge 12 and eventually to the device bus 22 Similarly, buffers 160 m the processing set controllers 50 also contam posted I/O cycles for transfer over the buses 24 and 26 to the bπdge 12 and eventually to the device bus 22
Accordmgly, it can be seen that when an enor state occurs, I/O write cycles may already have been posted by the processors 52, either m their own buffers 162, or already transfened to the buffers 160 of the processing set bus controllers 50 It is the I/O write cycles m the buffers 162 and 160 which gradually propagate through and need to be stored in the posted wnte buffer 122
As shown m Figure 15, a write cycle 164 posted to the posted wπte buffer 122 can compnse an address field 165 mcludmg an address and an address type, and between one and 16 data fields 166 including a byte enable field and the data itself
The data is wntten mto the posted write buffer 122 m the EState unless the initiating processmg set has been designated as a pnmary CPU set At that time, non-primary writes m an EState still go to the posted wπte buffer even after one of the CPU sets has become a primary processing set An address pointer m the CSR registers 114 points to the next available posted wnte buffer address, and also provides an overflow bit which is set when the bπdge attempts to wπte past of the top of the posted write buffer for any one of the processmg sets 14 and 16 Indeed, m the present implementation, only the first 16 K of data is recorded in each buffer Attempts to wnte beyond the top of the posted write buffer are ignored The value of the posted write buffer pointer can be cleared at reset, or by software using a wπte under the control of a primary processing set
Returning to Figure 12, after saving the status and posted writes, at Stage S4 the mdividual processing sets independently seek to evaluate the enor state and to determine whether one of the processmg sets is faulty This determination is made by the mdividual processors in an enor state m which they individually read status from the control state and EState registers 114 Dunng this enor mode, the arbiter 134 arbitrates for access to the bridge 12
In Stage S5, one of the processmg sets 14 and 16 establishes itself as the primary processmg set This is determined by each of the processmg sets identifying a time factor based on the estimated degree of responsibility for the enor, whereby the first processmg set to time out becomes the primary processing set In Stage S5, the status is recovered for that processing set and is copied to the other processmg set The primary processing is able to access the posted write buffer 122 and the disconnect registers 120
In Stage S6, the bridge is operable m a split mode If it is possible to re-establish an equivalent status for the first and second processmg sets, then a reset is issued at Stage S7 to put the processing sets in the combined mode at Stage SI However, it may not be possible to re-establish an equivalent state until a faulty processing set is replaced Accordmgly the system will stay m the Split mode of Stage S6 m order to continued operation based on a single processmg set After replacing the faulty processing set the system could then establish an equivalent state and move via Stage S7 to Stage SI
As descnbed above, the comparator 130 is operable in the combmed mode to compare the I/O operations output by the first and second processing sets 14 and 16 This is fine as long as all of the I/O operations of the first and second processmg sets 14 and 16 are fully synchronized and determmistic Any deviation from this will be interpreted by the comparator 130 as a loss of lockstep. This is m pπnciple conect as even a minor deviation from identical outputs, if not trapped by the comparator 130, could lead to the processing sets divergmg further from each other as the mdividual processing sets act on the deviating outputs However, a strict application of this puts significant constraints on the design of the mdividual processmg sets An example of this is that it would not be possible to have mdependent time of day clocks m the mdividual processing sets operatmg under their own clocks. This is because it is impossible to obtain two crystals which are 100% identical m operation. Even small differences in the phase of the clocks could be critical as to whether the same sample is taken at any one time, for example either side of a clock transition for the respective processing sets.
Accordmgly, a solution to this problem employs the dissimilar data registers (DDR) 116 mentioned earlier The solution is to wnte data from the processing sets mto respective DDRs m the bridge while disabling the compaπson of the data phases of the wnte operations and then to read a selected one of the DDRs back to each processing set, whereby each of the processmg sets is able to act on the same data. Figure 17 is a schematic representation of details of the bndge of Figures 6 to 10 It will be noted that details of the bndge not shown m Figure 6 to 8 are shown m Figure 17, whereas other details of the bridge shown in Figures 6 to 8 are not shown m Figure 17, for reasons of claπty.
The DDRs 116 are provided m the bndge registers 110 of Figure 7, but could be provided elsewhere m the bπdge in other embodiments. One DDR 116 is provided for each processmg set. In the example of the multi-processor system of Figure 1 where two processing sets 14 and 16 are provided, two DDRs 116A and 116B are provided, one for each of the first and second processing sets 14 and 16, respectively
Figure 17 represents a dissimilar data wnte stage. The addressing logic 136 is shown schematically to comprise two decoder sections, one decoder section 136A for the first processing set and one decoder section 136B for the second processmg set 16 Dunng an address phase of a dissimilar data I/O wnte operation each of the processing sets 14 and 16 outputs the same predetermined address DDR-W which is separately interpreted by the respective first and second decoding sections 136A and 136B as addressing the respective first and second respective DDRs 116A and 116B As the same address is output by the first and second processing sets 14 and 16, this is not interpreted by the comparator 130 as a lockstep enor.
The decoding section 136A, or the decodmg section 136B, or both are ananged to further output a disable signal 137 in response to the predetermmed write address supplied by the first and second processing sets 14 and 16 This disable signal is supplied to the comparator 130 and is operative during the data phase of the write operation to disable the comparator. As a result, the data output by the first processing set can be stored in the first DDR 116A and the data output by the second processmg set can be stored m the second DDR 116B without the comparator being operative to detect a difference, even if the data from the first and second piocessing sets is different The first decoding section is operable to cause the routing matrix to store the data from the first processmg set 14 m the first DDR 116A and the second decodmg section is operable to cause the routing matrix to store the data from the second processing set 16 in the second DDR 116B At the end of the data phase the comparator 130 is once agam enabled to detect any differences between I/O address and/or data phases as indicative of a lockstep enor Following the writing of the dissimilar data to the first and second DDRs 116A and 116B, the processing sets are then operable to read the data from a selected one of the DDRs 116A/116B.
Figure 18 illustrates an alternative anangement where the disable signal 137 is negated and is used to control a gate 131 at the output of the comparator 130 When the disable signal is active the output of the comparator is disabled, whereas when the disable signal is inactive the output of the comparator is enabled.
Figure 19 illustrates the readmg of the first DDR 116A in a subsequent dissimilar data read stage. As illustrated in Figure 19, each of the processing sets 14 and 16 outputs the same predetermmed address DDR-RA which is separately interpreted by the respective first and second decodmg sections 136A and 136B as addressing the same DDR, namely the first DDR 116A. As a result, the content of the first DDR 116A is read by both of the processmg sets 14 and 16, thereby enabling those processmg sets to receive the same data This enables the two processmg sets 14 and 16 to achieve determmistic behavior, even if the source of the data written into the DDRs 116 by the processing sets 14 and 16 was not determmistic.
As an alternative, the processing sets could each read the data from the second DDR 116B. Figure 20 illustrates the readmg of the second DDR 116B in a dissimilar data read stage followmg the dissimilar data wπte stage of Figure 15 As illustrated m Figure 20, each of the processing sets 14 and 16 outputs the same piedetermmed address DDR-RB which is separately interpreted by the respective first and second decoding sections 136A and 136B as addressing the same DDR, namely the second DDR 116B. As a result, the content of the second DDR 116B is read by both of the processmg sets 14 and 16, thereby enablmg those processing sets to receive the same data. As with the dissimilar data read stage of Figure 16, this enables the two processing sets 14 and 16 to achieve deterministic behavior, even if the source of the data wntten into the DDRs 116 by the processmg sets 14 and 16 was not determmistic.
The selection of which of the first and second DDRs 116A and 116B to be read can be determined m any appropriate manner by the software operatmg on the processmg modules This could be done on the basis of a simple selection of one or the other DDRs, or on a statistical basis or randomly or m any other manner as long as the same choice of DDR is made by both or all of the processmg sets
Figure 21 is a flow diagram summarizing the various stages of operation of the DDR mechanism described above.
In stage S10, a DDR write address DDR-W is received and decoded by the address decoders sections 136A and 136B dunng the address phase of the DDR write operation. In stage SI 1, the comparator 130 is disabled.
In stage SI 2, the data received from the processing sets 14 and 16 durmg the data phase of the DDR write operation is stored in the first and second DDRs 116A and 116B, respectively, as selected by the first and second decode sections 136A and 136B, respectively
In stage SI 3, a DDR read address is received from the first and second processing sets and is decoded by the decode sections 136A and 136B, respectively
If the received address DDR-RA is for the first DDR 116A, then m stage S 14 the content of that DDR 116A is read by both of the processing sets 14 and 16.
Alternatively, 116A if the received address DDR-RB is for the second DDR 116B, then m stage S15 the content of that DDR 116B is read by both of the processing sets 14 and 16 Figure 22 is a schematic representation of the arbitration performed on the respective buses 22, 24 and 26, and the arbitration for the bridge itself
Each of the processmg set bus controllers 50 m the respective processmg sets 14 and 16 includes a conventional PCI master bus arbiter 180 for providing arbitration to the respective buses 24 and 26 Each of the master arbiters 180 is responsive to request signals from the associated processmg set bus controller 50 and the bridge 12 on respective request (REQ) lines 181 and 182 The master arbiters 180 allocate access to the bus on a first-come-first-served basis, issumg a grant (GNT) signal to the winning party on an appropπate grants line 183 or 184
A conventional PCI bus arbiter 185 provides arbitration on the D bus 22 The D bus arbiter 185 can be configured as part of the D bus mterface 82 of Figure 6 or could be separate therefrom As with the P bus master arbiters 180, the D bus arbiter is responsive to request signals from the contending devices, including the bridge and the devices 30, 31, etc connected to the device bus 22 Respective request lmes 186, 187, 188, etc for each of the entities competing for access to the D bus 22 are provided for the request signals (REQ) The D bus arbiter 185 allocates access to the D bus on a first-come-frrst-served basis, issuing a grant (GNT) signal to the winning entity via respective grant lines 189, 190, 192, etc
Figure 23 is a state diagram summarising the operation of the D bus arbiter 185 In a particular embodiment up to six request signals may be produced by respective D bus devices and one by the bridge itself On a transition mto the GRANT state, these are sorted by a pnonty encoder and a request signal (REQ#) with the highest priority is registered as the winner and gets a grant (GNT#) signal Each winner which is selected modifies the pnonties in a priority encoder so that given the same REQ# signals on the next move to grant A different device has the highest priority, hence each device has a "fair" chance of accessing DEVs The bπdge REQ# has a higher weighting than D bus devices and will, under very busy conditions, get the bus for every second device
If a device requesting the bus fails to perform a transaction within 16 cycles it may lose GNT# via the BACKOFF state BACKOFF is required as, under PCI rules, a device may access the bus one cycle after GNT# is removed Devices may only be granted access to D bus if the bridge is not in the not in the EState A new GNT# is produced at the times when the bus is idle
In the GRANT and BUSY states, the FETs are enabled and an accessmg device is known and forwarded to the D bus address decode logic for checking against a DMA address provided by the device Turning now to the bridge arbiter 134, this allows access to the bridge for the first device which asserts the PCI FRAME# signal indicating an address phase Figure 24 is a state diagram summarising the operation of the bridge arbiter 134
As with the D bus arbiter, a priority encoder can be provided to resolve access attempts which collide In this case "a collision" the loser/losers are retried which forces them to give up the bus Under PCI rules retried devices must try repeatedly to access the bridge and this can be expected to happen
To prevent devices which are very quick with their retry attempt from hogging the bridge, retried interfaces are remembered and assigned a higher priority These remembered retries are prioritised m the same way as address phases However as a precaution this mechanism is timed out so as not to get stuck waiting for a faulty or dead device The algorithm employed prevents a device which hasn't yet been retried, but which would be a higher pnonty retry than a device cunently waiting for, from bemg retried at the first attempt
In combined operations a PA or PB bus input selects which P bus mterface will win a bridge access Both are informed they won Allowed selection enables latent fault checking during normal operation EState pi events the D bus from winning
The bridge arbiter 134 is responsive to standard PCI signals provided on standard PCI control lines 22, 24 and 25 to control access to the bridge 12
Figure 25 illustrates signals associated with an I/O operation cycle on the PCI bus A PCI frame signal (FRAME#) is initially asserted At the same time, address (A) signals will be available on the DATA BUS and the appropπate command (wπte/read) signals (C) will be available on the command bus (CMD BUS) Shortly after the frame signal being asserted low, the initiator ready signal (IRDY#) will also be asserted low When the device responds, a device selected signal (DEVSEL#) will be asserted low When a target ready signal is asserted low (TRDY#), data transfer (D) can occur on the data bus
The bridge is operable to allocate access to the bridge resources and thereby to negotiate allocation of a target bus in response to the FRAME# being asserted low for the mitiator bus concerned Accordmgly, the bndge arbiter 134 is operable to allocate access to the bridge resources and/or to a target bus on a first-come- first-served basis m response to the FRAME# being asserted low As well as the simple first-come-first-served basis, the arbiters may be additionally provided with a mechanism for loggmg the arbitration requests, and can imply a conflict resolution based on the request and allocation history where two requests are received at an identical time Alternatively, a simple priority can be allocated to the various requesters, whereby, in the case of identically timed requests, a particular requester always wins the allocation process
Each of the slots on the device bus 22 has a slot response register (SRR) 118, as well as other devices connected to the bus, such as a SCSI mterface Each of the SRRs 118 contams bits defining the ownership of the slots, or the devices connected to the slots on the direct memory access bus In this embodiment, and for reasons to be elaborated below, each SRR 118 comprises a four bit register However, it will be appreciated that a larger register will be required to determine ownership between more than two processing sets For example, if three processing sets are provided, then a five bit register will be required for each slot
Figure 16 illustrates schematically one such four bit register 600 As shown m Figure 16, a first bit 602 is identified as SRR[0], a second bit 604 is identified as SRR[1], a third bit 606 is identified as SRR[2] and a fourth bit 608 is identified as SRR[3]
Bit SRR[0] is a bit which is set when writes for valid transactions are to be suppressed Bit SRR[1] is set when the device slot is owned by the first processing set 14 This defines the access route between the first processmg set 14 and the device slot When this bit is set, the first processing set 14 can always be master of a device slot 22, while the ability for the device slot to be master depends on whether bit
Bit SRR[2] is set when the device slot is owned by the second processing set 16 This defines the access route between the second processing set 16 and the device slot When this bit is set, the second piocessing set 16 can always be master of the device slot or bus 22, while the ability for the device slot to be master depends on whether bit SRR[3] is set Bit SRR[3] is an arbitration bit which gives the device slot the ability to become master of the device bus 22, but only if it is owned by one of the processing sets 14 and 16, that is if one of the SRR fl] and SRR[2]
When the fake bit (SRR[0]) of an SRR 118 is set, writes to the device for that slot are ignored and do not appear on the device bus 22. Reads return indeterminate data without causing a transaction on the device bus 22. In the event of an I/O enor the fake bit SRR[0] of the SRR 188 coπespondmg to the device which caused the enor is set by the hardware configuration of the bridge to disable further access to the device slot concerned. An interrupt may also be generated by the bridge to inform the software which oπgmated the access leading to the I/O enor that the enor has occuπed. The fake bit has an effect whether the system is in the split or the combmed mode of operation.
The ownership bits only have effect, however, in the split system mode of operation. In this mode, each slot can be m three states: Not-owned;
Owned by processmg set 14; and Owned by processmg set 16
This is determined by the two SRR bits SRR[1] and SRR[2], with SRR[1] being set when the slot is owned by processmg set 14 and SRR[2] bemg set when the slot is owned by processing set B If the slot is unowned, then neither bit is set (both bits set is an illegal condition and is prevented by the hardware).
A slot which is not owned by the processing set making the access (this includes un-owned slots) cannot be accessed and results m an abort. A processing set can only claim an un-owned slot; it cannot wrest ownership away from another processmg set. This can only be done by powerrng-off the other processing set When a processing set is powered off, all slots owned by it move to the un-owned state. Whilst it is not possible for a processmg set to wrest ownership from another processing set, it is possible for a processing set to give ownership to another processing set. The owned bits can be altered when in the combined mode of operation state but they have no effect until the split mode is entered.
Table 2 below summarizes the access rights as determined by an SRR 118.
From Table 2, it can be seen that when the 4-bit SRR for a given device is set to 1100, for example, then the slot is owned by processing set B (i.e. SRR[2] is logic high) and processing set A may not read from or write to the device (i.e. SRR[1] is logic low), although it may read from or write to the bridge "FAKE_AT" is set logic low (i.e. SRR[0] is logic low) indicating that access to the device bus is allowed as there are no faults on the bus As "ARB_EN" is set logic high (i.e. SRR[3] is logic high), the device with which the register is associated can become master of the D bus. This example demonstrates the operation of the register when the bus and associated devices are operating conectly TABLE 2
SRR PA BUS PB BUS Device Interface
[3[2][1][0]
0000 xOOx Read/Write bridge SRR Read/Write bπdge SRR Access denied
0010 Read/Write bridge Read/Write bridge Access Denied because
Owned D Slot No access to D Slot arbitration bit is off
0100 Read/Wnte bndge Read/write bndge Access Denied because
No access to D Slot Access to D Slot arbitration bit is off 1010 Read/Write bridge, Read/Write Bndge Access to CPU B Denied
Owned D Slot No access to D Slot Access to CPU A OK
1 100 Read/Write bridge, Read/Write bπdge Access to CPU A Denied
No access to D Slot Access to D Slot Access to CPU B OK
001 1 Read/Write bndge, Read/Write bπdge Access Demed because Bndge discard writes No access to D Slot Arbitration bit is off
0101 Read/Wnte bπdge, Read/Write bπdge Access Denied because No access to D slot Bridge discards writes Arbitration bit is off
1011 Read/Write bridge, Read/Write bridge Access to CPU B Denied Bridge discard writes No access to D Slot Access to CPU A OK
1 101 Read/Write bπdge, Read/Write bridge Access to CPU B Denied No access to D slot Bridge discards writes Access to CPU A OK
In an alternative example, where the SRR for the device is set to 0101, the setting of SRR[2] logic high indicates that the device is owned by processing set B However, as the device is malfunctioning, SRR[3] is set logic low and the device is not allowed access to the processmg set SRR[0] is set high so that any writes to the device are ignored and reads therefrom return mdetermmate data In this way, the malfunctioning device is effectively isolated from the processing set, and provides indeterminate data to satisfy any device drivers, for example, that might be lookmg for a response from the device Figure 26 illustrates the operation of the bridge 12 for direct memory access by a device such as one of the devices 28, 29, 30, 31 and 32 to the memory 56 of the processing sets 14 and 16 When the D bus arbiter 185 leceives a direct memory access (DMA) request 193 from a device (e g , device 30 m slot 33) on the device bus, the D bus arbiter determines whether to allocate the bus to that slot As a result of this grantmg procedure, the D-bus arbiter knows the slot which has made the DMA request 193 The DMA request is supplied to the address decoder 142 m the bridge, where the addresses associated with the request are decoded The address decoder is responsive to the D bus grant signal 194 for the slot concerned to identify the slot which has been granted access to the D bus for the DMA request
The address decode logic 142 holds or has access to a geographic address map 196, which identifies the relationship between the processor address space and the slots as a result of the geographic address employed This geographic address map 196 could be held as a table in the bridge memory 126, along with the posted write buffer 122 and the dirty RAM 124 Alternatively, it could be held as a table in a separate memory element, possibly forming part of the address decoder 142 itself The map 182 could be configured m a form other than a table
The address decode logic 142 is configured to verify the conectness of the DMA addresses supplied by the device 30 In one embodiment of the mvention, this is achieved by comparing four significant address bits of the address supplied by the device 30 with the conesponding four address bits of the address held in the geographic addressing map 196 for the slot identified by the D bus grant signal for the DMA request In this example, four address bits are sufficient to determine whether the address supplied is within the conect address range In this specific example, 32 bit PCI bus addresses are used, with bits 31 and 30 always being set to 1, bit 29 being allocated to identify which of two bridges on a motherboard is being addressed (see Figure 2) and bits 28 to 26 identifying a PCI device Bits 25-0 define an offset from the base address for the address range for each slot Accordingly, by comparing bits 29-26, it is possible to identify whether the address(es) supplied fall(s) within the appropriate address range for the slot concerned It will be appreciated that in other embodiments a different number of bits may need to be compared to make this determination depending upon the allocation of the addresses
The address decode logic 142 could be ananged to use the bus grant signal 184 for the slot concerned to identify a table entry for the slot concerned and then to compare the address m that entry with the address(es) received with the DMA request as descnbed above Alternatively, the address decode logic 142 could be ananged to use the address(es) received with the DMA address to address a relational geographic address map and to determine a slot number therefrom, which could be compared to the slot for which the bus grant signal 194 is intended and thereby to determine whether the addresses fall within the address range appropriate for the slot concerned
Either way, the address decode logic 142 is ananged to permit DMA to proceed if the DMA addresses fall within the expected address space for the slot concerned Otherwise, the address decoder is ananged to ignore the slots and the physical addresses
The address decode logic 142 is further operable to control the routing of the DMA request to the appropriate processmg set(s) 14/16 If the bridge is in the combmed mode, the DMA access will automatically be allocated to all of the m-sync processing sets 14/16 The address decode logic 142 will be aware that the bridge is in the combmed mode as it is under the control of the bridge controller 132 (see Figure 8) However, where the bridge is in the split mode, a decision will need to be made as to which, if any, of the processing sets the DMA lequest is to be sent
When the system is in split mode, the access will be directed to a processmg set 14 or 16 which owns the slot concerned If the slot is un-owned, then the bridge does not respond to the DMA request In the split mode, the address decode logic 142 is operable to determine the ownership of the device originating the DMA request by accessing the SRR 118 for the slot concerned The appropriate slot can be identified by the D bus giant signal The address decode logic 142 is operable to control the target controller 140 (see Figure 8) to pass the DMA lequest to the appropriate processing set(s) 14/16 based on the ownership bits SRR[1] and SRR[2] If bit SRR[1] is set, the first processing set 14 is the owner and the DMA request is passed to the first processing set If bit SRR[2] is set, the second processing set 16 is the owner and the DMA request is passed to the second piocessing set If neither of the bit SRR[1] and SRR[2] is set, then the DMA request is ignored by the address decoder and is not passed to either of the processing sets 14 and 16
Figure 27 is a flow diagram summarizing the DMA verification process as illustrated with reference to Figure 24 In stage S20, the D-bus arbiter 160 arbitrates for access to the D bus 22
In stage S21, the address decoder 142 verifies the DMA addresses supplied with the DMA request by accessing the geographic address map.
In stage S22, the address decoder ignores the DMA access where the address falls outside the expected range for the slot concerned Alternatively, as represented by stage S23, the actions of the address decoder are dependent upon whether the bridge is m the combmed or the split mode
If the bridge is in the combined mode, then in stage S24 the address decoder controls the target controller 140 (see Figure 8) to cause the routing matπx 80 (see Figure 6) to pass the DMA request to both processing sets 14 and 16 If the bπdge is m the split mode, the address decoder is operative to verify the ownership of the slot concerned by reference to the SRR 118 for that slot in stage S25
If the slot is allocated to the first processing set 14 (I e the SRR[1] bit is set), then m stage S26 the address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matnx 80 (see Figure 6) to pass the DMA request to first processing set 14 If the slot is allocated to the second processing set 16 (I e the SRR[2] bit is set), then in stage S27 the address decoder 142 controls the target controller 140 (see Figure 8) to cause the routing matnx 80 (see Figure 6) to pass the DMA request to the second processmg set 16
If the slot is unallocated (1 e neither the SRR[1] bit nor the SRR[2] bit is set), then in step S18 the address decoder 142 ignores or discards the DMA request and the DMA request is not passed to the processing sets 14 and 16
A DMA, or direct vector memory access (DVMA), request sent to one or more of the processing sets causes the necessary memory operations (read or write as appropriate) to be effected on the processing set memory
There now follows a description of an example of a mechanism for enablmg automatic recovery from an EState (see Figure 11)
The automatic recovery process mcludes remtegration of the state of the processmg sets to a common status m order to attempt a restart m lockstep To achieve this, the processmg set which asserts itself as the primary processing set as descnbed above copies its complete state to the other processmg set This mvolves ensuπng that the content of the memory of both processors is the same before trying a restart m lockstep mode However, a problem with the copymg of the content of the memory from one processmg set to the other is that during this copymg process a device connected to the D bus 22 might attempt to make a direct memory access (DMA) request for access to the memory of the primary processmg set If DMA is enabled, then a wnte made to an area of memory which has already been copied would result m the memory state of the two processors at the end of the copy not being the same In principle, it would be possible to inhibit DMA for the whole of the copy process However, this would be undesirable, bearing m mind that it is desirable to minimise the time that the system or the resources of the system are unavailable As an alternattve, it would be possible to retry the whole copy operation when a DMA operation has occuned dunng the penod of the copy. However, it is likely that further DMA operations would be performed dunng the copy retry, and accordmgly this is not a good option either. Accordmgly, m the present system, a dirty RAM 124 is provided m the bndge. As descnbed earlier the dirty RAM 124 is configured as part of the bndge SRAM memory 126
The duty RAM 124 compnses a bit map havmg a dirty mdicator, for example a duty bit, for each block, or page, of memory. The bit for a page of memory is set when a wnte access to the area of memory concerned is made In an embodiment of the invention one bit is provided for every 8K page of mam processmg set memory The bit for a page of processmg set memory is set automatically by the address decoder 142 when this decodes a DMA request for that page of memory for either of the processmg sets 14 or 16 from a device connected to the D bus 22 The dirty RAM can be reset, or cleared when it is read by a processmg set, for example by means of read and clear instructions at the beginning of a copy pass, so that it can start to record pages which are dirtied smce a given tune The dirty RAM 124 can be read word by word. If a large word size is chosen for readmg the duty RAM
124, this will optimise the readmg and resetting of the duty RAM 124
Accordmgly, at the end of the copy pass the bits m the duty RAM 124 will mdicate those pages of piocessmg set memory which have been changed (or dirtied) by DMA wntes dunng the penod of the copy. A further copy pass can then be performed for only those pages of memory which have been dirtied. This will take less time that a full copy of the memory. Accordmgly, there are typically less pages marked as dirty at the end of the next copy pass and, as a result, the copy passes can become shorter and shorter. As some nme it is necessary to decide to inhibit DMA wntes for a short penod for a final, short, copy pass, at the end of which the memoπes of the two processmg sets will be the same and the primary processmg set can issue a reset operation to restart the combmed mode The duty RAM 124 is set and cleared m both the combined and split modes. This means that in split mode the duty RAM 124 may be cleared by either processmg set
The dirty RAM 124 address is decoded from bits 13 to 28 of the PCI address presented by the D bus device Erroneous accesses which present illegal combinations of the address bits 29 to 31 are mapped into the dirty RAM 124 and a bit is dirtied on a write, even though the bπdge will not pass these transactions to the piocessing sets
When reading the dirty RAM 124, the bridge defines the whole area from 0x00008000 to OxOOOOffff as dirty RAM and will clear the contents of any location m this range on a read.
As an alternative to providing a smgle dirty RAM 124 which is cleared on bemg read, another alternative would be to provide two dirty RAMs which are used in a toggle mode, with one being wπtten to while another is read.
Figure 28 is a flow diagram summansmg the operation of the dirty RAM 124.
In stage S41, the primary processmg set reads the dirty RAM 124 which has the effect of resetting the duty RAM 124 In stage S42, the primary processor (e.g processmg set 14) copies the whole of its memory 56 to the memory 56 of the other processmg set (e.g. processmg set 16)
In stage S43, the primary processmg set reads the dirty RAM 124 which has the effect of resetting the duty RAM 124 In stage S44, the primary processor determines whether less than a predetermmed number of bits have been wπtten in the dirty RAM 124.
If more than the predetermmed number of bits have been set, then the processor m stage S45 copies those pages of its memory 56 which have been dutied, as mdicated by the dirty bits read from the duty RAM 124 in stage S43, to the memory 56 of the other processmg set Control then passes back to stage S43. If, in stage S44, it is determmed less than the predetermmed number of bits have been wntten in the duty
RAM 124, then m Stage S45 the primary processor causes the bndge to inhibit DMA requests from the devices connected to the D bus 22. This could, for example, be achieved by clearmg the arbitration enable bit for each of the device slots, thereby denymg access of the DMA devices to the D bus 22 Alternatively, the address decoder 142 could be configured to ignore DMA requests under instructions from the primary processor Dunng the penod m which DMA accesses are prevented, the primary processor then makes a final copy pass from its memory to the memory 56 of the other processor for those memory pages coπesponding to the bits set m the duty RAM 124
In stage S47 the primary processor can issue a reset operation for initiating a combmed mode.
In stage S48, DMA accesses are once more permitted.
It will be appreciated that although particular embodiments of the mvention have been descnbed, many modifications/additions and/or substitutions may be made within the spint and scope of the present mvention as defined in the appended claims. For example, although m the specific descnption two processmg sets are provided, it will be appreciated that the specifically descnbed features may be modified to provide for three or more processmg sets

Claims

WHAT IS CLAIMED:
1. A bridge for a multi-processor system, the bridge comprising a first processor bus interface for connection to an I/O bus of a first processing set, a second processor bus interface for connection to an I/O bus of a second processing set, a device bus interface for connection to a device bus and a bridge control mechanism configured to be operable: in an operational mode to permit access by at least one of the first and second processing sets to bridge resources and to the device bus; and in an enor mode to prevent access by the processing sets to the device bus and to permit restricted access by at least one of the processing sets to at least predetermined bridge resources.
2. The bridge of claim 1, wherein the bridge control mechanism is configured to be operable, in response to detection of an enor state to cause the bridge to cease operation in the operational mode and instead to operate in the enor mode.
3. The bridge of claim 2, comprising storage for buffering data pending resolution of the enor.
4. The bridge of claim 1 comprising enor state registers for saving operating parameters on entry to the enor mode, read only access to the enor state registers being permitted by the processing sets during the enor mode.
5. The bridge of claim 1 comprising a posted write buffer for the storage of writes already posted by at least one processing set on entry to the enor mode, read only access to the posted write buffer being permitted by the processing sets during the enor mode.
6. The bridge of claim 1, wherein the bridge control mechanism is configured to be operable in an initial enor mode: to store in the posted write buffer any internal bridge write accesses initiated by the processing sets and to allow and to arbitrate any internal bridge read accesses initiated by the processing sets.
7. The bridge of claim 1, wherein the bridge control mechanism is configured to be operable in an initial enor mode: to store in a posted write buffer any device bus write accesses initiated by the processing sets and to abort any device bus read accesses initiated by the processing sets.
8. The bridge of claim 1, wherein the bridge control mechanism is configured to be operable in a primary enor mode in which a processing set asserts itself as a primary processmg set: to allow and to arbitrate any internal bridge write accesses mitiated by the primary processmg set, to discard any internal bπdge wπte accesses mitiated by any other processing set, and to allow and to arbitrate any internal bπdge read accesses initiated by the processing sets
9 The bndge of claim 8, wherem the bridge control mechanism is configured to be operable in the primary enor mode to discard any device bus write accesses initiated by the processing sets and to abort any device bus read accesses initiated by the processing sets
10 The bridge of claim 1 , wherein the bridge control mechanism is configured to be operable in a first, split, operational mode to arbitrate between the first and the second processing sets for access to each others I/O bus and to the device bus, and in a second, combmed, operational mode to monitor lockstep operation of the first and second processing sets
11 The bndge of claim 10, wherem the bridge control mechanism is configured to be operable on power up of the bridge to operate m an mitial enor mode until a processor set asserts itself as a pnmary processing set, then to operate in the split operational mode to enable all processmg sets to be set to a conesponding state before transfening to the combmed operational mode
12 The bridge of claim 1, compnsmg a memory sub-system and a controllable routmg matrix connected between the first processor bus mterface, the second processor bus interface, the device bus interface and the memory sub-system, the bπdge control mechanism bemg configured to be operable to control the routing matrix selectively to mtercormect the first processor bus interface, the second processor bus mterface, the device bus interface and the memory sub-system accordmg to a cunent mode of operation
13 The bridge of claim 1, comprising at least one further processor bus mterface for connection to an I/O bus of a further processmg set
14 A bridge for a multi-processor system, the bridge compnsmg means for connection to an I/O bus of a first processing set, to an I/O bus of a second processing set and to a device bus, and a means for controlling operation of the bridge in an operational mode to permit access by at least one of the first and second processing sets to bridge resources and to the device bus, and in an enor mode to prevent access by the processing sets to the device bus and to permit restricted access by at least one of the processmg sets to at least predetermmed bridge resources
15 A computer system comprising a first processing set having an I/O bus, a second processing set having an I/O bus, a device bus and a bridge, the bridge being connected to the I/O bus of the first processing set, the I/O bus of the second processmg set and the device bus and compπsmg a bπdge control mechanism configured to be operable in an operational mode to permit access by at least one of the first and second processmg sets to bridge resources and to the device bus; and m an enor mode to prevent access by the processmg sets to the device bus and to permit restricted access by at least one of the processmg sets to at least determmed bridge resources
16 The system of claim 15, wherem the bridge control mechanism is configured to be operable, in response to detection of an enor state to cause the bridge to cease operation in the operational mode and instead to operate in the enor mode
17 The system of claim 16, wherem the bridge compnses storage for buffering data pending resolution of the enor
18 The system of claim 15, wherem the bridge comprises enor state registers for saving operating parameteis on entry to the enor mode, read only access to the enor state registers being permitted by the processing sets durmg the enor mode
19 The system of claim 15, wherein the bridge compnses a posted wπte buffer for the storage of writes already posted by at least one processing set on entry to the enor mode, read only access to the posted write buffer being permitted by the processmg sets durmg the enor mode
20 The system of claim 15, wherein the bridge control mechanism is configured to be operable m an initial eιτor mode to store in the posted wnte buffer any internal bridge wπte accesses mitiated by the processing sets and to allow, to arbitrate any internal bridge read accesses mitiated by the processing sets, to store m a posted write buffer any device bus write accesses mitiated by the processing sets and to abort any device bus read accesses initiated by the processing sets
21 The system of claim 15, wherein the bridge control mechanism is configured to be operable in a primary enor mode m which a processing set asserts itself as a primary processing set to allow and to arbitrate any internal bridge wπte accesses initiated by the pnmary processing set, to discard any internal bπdge write accesses mitiated by any other processmg set, to allow and to arbitrate any internal bridge read accesses initiated by the processing sets, to discard any device bus write accesses initiated by the processing sets and to abort any device bus read accesses mitiated by the processing sets
22 The system of claim 15, wherem the bridge control mechanism is configured to be operable in a first, split, operational mode to arbitrate between the first and the second processing sets for access to each others I/O bus and to the device bus, and in a second, combmed, operational mode to monitor lockstep operation of the first and second processing sets.
23 The system of claim 22, wherein the bridge control mechanism is configured to be operable on power up of the bridge to operate m an initial enor mode until a processor set asserts itself as a primary processmg set, then to operate in the split operational mode to enable all processmg sets to be set to a conesponding state before transfemng to the combmed operational mode
24 The system of claim 15, wherem the bndge compnses a memory sub-system and a controllable routing matrix connected between the first processor bus mterface, the second processor bus mterface, the device bus interface and the memory sub-system, the bndge control mechanism being configured to be operable to control the routing matπx selectively to mtercormect the first processor bus mterface, the second processor bus interface, the device bus mterface and the memory sub-system accordmg to a cunent mode of operation
25 The system of claim 15, wherem each processmg set compnses at least one processor, memory and a processing set I/O bus controller.
26 The system of claim 15, further compnsmg at least one further processing set, the bridge compnsmg at least one further processor bus interface for connection to an I/O bus of the at least one further processing set
27 A method of operating a multi-processor system compnsmg a first processmg set having an I/O bus, a second processing set having an I/O bus, a device bus and a bndge, the bridge being connected to the I/O bus of the fust piocessing set, the I/O bus of the second processing set and the device bus, the method compnsmg selectively operatmg the bridge: in an operational mode to permit access by at least one of the first and second processmg sets to bridge resouices and to the device bus; and in an enor mode to prevent access by the processmg sets to the device bus and to permit restricted access by at least one of the processing sets to at least predetermined bridge resources.
28 The method of claim 27, wherem, in response to detection of an enor state, the bridge ceases operation m the operational mode and instead operates in the enor mode
29 The method of claim 28, comprising buffering data in the bridge pending resolution of the enor
30 The method of claim 27, comprising saving operatmg parameters m enor state registers m the bridge on entry to the enor mode and permitting read only access to the enor state registers by the processing sets during the enor mode. 31 The method of claim 27, compnsmg storing, m a posted write buffer in the bridge, writes already posted from at least one processing set on entry to the enor mode, and permitting read only access to the posted write buffer by the processmg sets durmg the enor mode.
32 The method of claim 27, compnsmg operatmg in an mitial enor mode in which: any internal bndge wnte accesses mitiated by the processmg sets are stored in a posted write buffer m the bridge and any internal bndge read accesses initiated by the processmg sets are allowed and arbitrated by the bridge, and any device bus write accesses mitiated by the processing sets are posted m a posted write buffer in the bridge and any device bus read accesses initiated by the processmg sets are aborted by the bπdge.
33 The method of claim 32, compnsmg subsequently operatmg in a primary enor mode m which a processing set asserts itself as a pnmary processmg set, in which any internal bridge write accesses mitiated by the pnmary processmg set are allowed and arbitrated by the bridge, any internal bndge wnte accesses initiated by any other processmg set are discarded by the bridge, and any internal bndge read accesses initiated by the processmg sets are allowed and arbitrated by the bridge, and any device bus write accesses initiated by the processing sets are discarded by the bπdge and any device bus read accesses initiated by the processing sets are aborted by the bndge.
34 The method of claim 27, comprising operatmg in: a split operational mode m which accesses by the first and the second processmg sets to each others I/O bus and to the device bus are arbitrated by the bridge; and a combined operational mode in which lockstep operation of the first and second processing sets is monitored by the bridge.
35 The method of claim 34, wherem on power up the bπdge operates m an initial enor mode until a processor set asserts itself as a primary processing set, and then the bridge operates in the split operational mode to enable all processing sets to be set to a conesponding state before transfemng to the combined operational mode
PCT/US1999/012431 1998-06-15 1999-06-03 Multi-processor system bridge with controlled access WO1999066404A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
DE69901255T DE69901255T2 (en) 1998-06-15 1999-06-03 MULTI-PROCESSOR SYSTEM BRIDGE WITH ACCESS CONTROL
EP99926161A EP1090350B1 (en) 1998-06-15 1999-06-03 Multi-processor system bridge with controlled access
AT99926161T ATE216098T1 (en) 1998-06-15 1999-06-03 MULTI-PROCESSOR SYSTEM BRIDGE WITH ACCESS CONTROL
JP2000555161A JP2002518736A (en) 1998-06-15 1999-06-03 Multiprocessor system bridge with controlled access

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/097,485 US6587961B1 (en) 1998-06-15 1998-06-15 Multi-processor system bridge with controlled access
US09/097,485 1998-06-15

Publications (1)

Publication Number Publication Date
WO1999066404A1 true WO1999066404A1 (en) 1999-12-23

Family

ID=22263618

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/012431 WO1999066404A1 (en) 1998-06-15 1999-06-03 Multi-processor system bridge with controlled access

Country Status (6)

Country Link
US (1) US6587961B1 (en)
EP (1) EP1090350B1 (en)
JP (1) JP2002518736A (en)
AT (1) ATE216098T1 (en)
DE (1) DE69901255T2 (en)
WO (1) WO1999066404A1 (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6948092B2 (en) * 1998-12-10 2005-09-20 Hewlett-Packard Development Company, L.P. System recovery from errors for processor and associated components
US8230411B1 (en) 1999-06-10 2012-07-24 Martin Vorbach Method for interleaving a program over a plurality of cells
US7181705B2 (en) * 2000-01-18 2007-02-20 Cadence Design Systems, Inc. Hierarchical test circuit structure for chips with multiple circuit blocks
US6976217B1 (en) * 2000-10-13 2005-12-13 Palmsource, Inc. Method and apparatus for integrating phone and PDA user interface on a single processor
US6691193B1 (en) * 2000-10-18 2004-02-10 Sony Corporation Efficient bus utilization in a multiprocessor system by dynamically mapping memory addresses
US6684346B2 (en) * 2000-12-22 2004-01-27 Intel Corporation Method and apparatus for machine check abort handling in a multiprocessing system
FR2819603B1 (en) * 2001-01-16 2003-06-13 Centre Nat Rech Scient INTERRUPTION ERROR INJECTOR METHOD
US7010715B2 (en) * 2001-01-25 2006-03-07 Marconi Intellectual Property (Ringfence), Inc. Redundant control architecture for a network device
US9411532B2 (en) * 2001-09-07 2016-08-09 Pact Xpp Technologies Ag Methods and systems for transferring data between a processing device and external devices
US9552047B2 (en) 2001-03-05 2017-01-24 Pact Xpp Technologies Ag Multiprocessor having runtime adjustable clock and clock dependent power supply
US9436631B2 (en) 2001-03-05 2016-09-06 Pact Xpp Technologies Ag Chip including memory element storing higher level memory data on a page by page basis
US6950893B2 (en) * 2001-03-22 2005-09-27 I-Bus Corporation Hybrid switching architecture
US20030065861A1 (en) * 2001-09-28 2003-04-03 Clark Clyde S. Dual system masters
US6981079B2 (en) * 2002-03-21 2005-12-27 International Business Machines Corporation Critical datapath error handling in a multiprocessor architecture
US9170812B2 (en) 2002-03-21 2015-10-27 Pact Xpp Technologies Ag Data processing system having integrated pipelined array data processor
US20040059862A1 (en) * 2002-09-24 2004-03-25 I-Bus Corporation Method and apparatus for providing redundant bus control
US7103808B2 (en) * 2003-04-10 2006-09-05 International Business Machines Corporation Apparatus for reporting and isolating errors below a host bridge
US20050193246A1 (en) * 2004-02-19 2005-09-01 Marconi Communications, Inc. Method, apparatus and software for preventing switch failures in the presence of faults
US7321985B2 (en) * 2004-02-26 2008-01-22 International Business Machines Corporation Method for achieving higher availability of computer PCI adapters
US7669073B2 (en) * 2005-08-19 2010-02-23 Stratus Technologies Bermuda Ltd. Systems and methods for split mode operation of fault-tolerant computer systems
US7734843B2 (en) * 2006-05-25 2010-06-08 International Business Machines Corporation Computer-implemented method, apparatus, and computer program product for stalling DMA operations during memory migration
US7912068B2 (en) * 2007-07-20 2011-03-22 Oracle America, Inc. Low-latency scheduling in large switches
DE102008012285B3 (en) * 2008-03-03 2009-07-23 Texas Instruments Deutschland Gmbh Electronic device, has clock stage clock cyclically determining whether received data are correct, where slope of clock cycle of system clock signal is clock cyclically permitted, when received data are determined as correct
CN103186491B (en) * 2011-12-30 2017-11-07 中兴通讯股份有限公司 The implementation method and device of a kind of end-to-end hardware message transmission
CN113641622A (en) * 2020-04-27 2021-11-12 富泰华工业(深圳)有限公司 Device, method and system for accessing data bus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0752656A2 (en) * 1992-12-17 1997-01-08 Tandem Computers Incorporated Fail-fast, fail-functional, fault-tolerant multiprocessor system
WO1997043712A2 (en) * 1996-05-16 1997-11-20 Resilience Corporation Triple modular redundant computer system

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5226152A (en) * 1990-12-07 1993-07-06 Motorola, Inc. Functional lockstep arrangement for redundant processors
GB2268817B (en) * 1992-07-17 1996-05-01 Integrated Micro Products Ltd A fault-tolerant computer system
JPH06187257A (en) * 1992-12-17 1994-07-08 Fujitsu Ltd System bus control system
US5426740A (en) * 1994-01-14 1995-06-20 Ast Research, Inc. Signaling protocol for concurrent bus access in a multiprocessor system
US5515516A (en) * 1994-03-01 1996-05-07 Intel Corporation Initialization mechanism for symmetric arbitration agents
US5530946A (en) * 1994-10-28 1996-06-25 Dell Usa, L.P. Processor failure detection and recovery circuit in a dual processor computer system and method of operation thereof
US5642506A (en) * 1994-12-14 1997-06-24 International Business Machines Corporation Method and apparatus for initializing a multiprocessor system
US5586253A (en) * 1994-12-15 1996-12-17 Stratus Computer Method and apparatus for validating I/O addresses in a fault-tolerant computer system
US6141769A (en) * 1996-05-16 2000-10-31 Resilience Corporation Triple modular redundant computer system and associated method
US5915082A (en) * 1996-06-07 1999-06-22 Lockheed Martin Corporation Error detection and fault isolation for lockstep processor systems
US6128711A (en) * 1996-11-12 2000-10-03 Compaq Computer Corporation Performance optimization and system bus duty cycle reduction by I/O bridge partial cache line writes
US5953538A (en) * 1996-11-12 1999-09-14 Digital Equipment Corporation Method and apparatus providing DMA transfers between devices coupled to different host bus bridges
SE511114C2 (en) * 1997-12-10 1999-08-09 Ericsson Telefon Ab L M Processor method, and processor adapted to operate according to the method
US6223320B1 (en) * 1998-02-10 2001-04-24 International Business Machines Corporation Efficient CRC generation utilizing parallel table lookup operations
JP3071752B2 (en) * 1998-03-24 2000-07-31 三菱電機株式会社 Bridge method, bus bridge and multiprocessor system
US6163815A (en) * 1998-05-27 2000-12-19 International Business Machines Corporation Dynamic disablement of a transaction ordering in response to an error
US6173351B1 (en) * 1998-06-15 2001-01-09 Sun Microsystems, Inc. Multi-processor system bridge
US5991900A (en) * 1998-06-15 1999-11-23 Sun Microsystems, Inc. Bus controller
US6141718A (en) * 1998-06-15 2000-10-31 Sun Microsystems, Inc. Processor bridge with dissimilar data registers which is operable to disregard data differences for dissimilar data direct memory accesses
US6167477A (en) * 1998-06-15 2000-12-26 Sun Microsystems, Inc. Computer system bridge employing a resource control mechanism with programmable registers to control resource allocation
US6138198A (en) * 1998-06-15 2000-10-24 Sun Microsystems, Inc. Processor bridge with dissimilar data registers which is operable to disregard data differences for dissimilar data write accesses
US6148348A (en) * 1998-06-15 2000-11-14 Sun Microsystems, Inc. Bridge interfacing two processing sets operating in a lockstep mode and having a posted write buffer storing write operations upon detection of a lockstep error
US6256753B1 (en) * 1998-06-30 2001-07-03 Sun Microsystems, Inc. Bus error handling in a computer system
US6189117B1 (en) * 1998-08-18 2001-02-13 International Business Machines Corporation Error handling between a processor and a system managed by the processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0752656A2 (en) * 1992-12-17 1997-01-08 Tandem Computers Incorporated Fail-fast, fail-functional, fault-tolerant multiprocessor system
WO1997043712A2 (en) * 1996-05-16 1997-11-20 Resilience Corporation Triple modular redundant computer system

Also Published As

Publication number Publication date
US6587961B1 (en) 2003-07-01
EP1090350B1 (en) 2002-04-10
JP2002518736A (en) 2002-06-25
DE69901255D1 (en) 2002-05-16
ATE216098T1 (en) 2002-04-15
DE69901255T2 (en) 2002-10-02
EP1090350A1 (en) 2001-04-11

Similar Documents

Publication Publication Date Title
US6148348A (en) Bridge interfacing two processing sets operating in a lockstep mode and having a posted write buffer storing write operations upon detection of a lockstep error
US6138198A (en) Processor bridge with dissimilar data registers which is operable to disregard data differences for dissimilar data write accesses
EP1086431B1 (en) Bus controller with cycle termination monitor
EP1086425B1 (en) Direct memory access in a bridge for a multi-processor system
EP1090350B1 (en) Multi-processor system bridge with controlled access
EP1088272B1 (en) Multi-processor system bridge
US6141718A (en) Processor bridge with dissimilar data registers which is operable to disregard data differences for dissimilar data direct memory accesses
US6167477A (en) Computer system bridge employing a resource control mechanism with programmable registers to control resource allocation
US6785763B2 (en) Efficient memory modification tracking with hierarchical dirty indicators

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 555161

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1999926161

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999926161

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1999926161

Country of ref document: EP