US9514083B1 - Topology specific replicated bus unit addressing in a data processing system - Google Patents

Topology specific replicated bus unit addressing in a data processing system Download PDF

Info

Publication number
US9514083B1
US9514083B1 US14/960,507 US201514960507A US9514083B1 US 9514083 B1 US9514083 B1 US 9514083B1 US 201514960507 A US201514960507 A US 201514960507A US 9514083 B1 US9514083 B1 US 9514083B1
Authority
US
United States
Prior art keywords
address
bus unit
replicated
replicated bus
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US14/960,507
Inventor
Richard L. Arndt
Florian Auernhammer
Hugh Shen
Derek E. Williams
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US14/960,507 priority Critical patent/US9514083B1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WILLIAMS, DEREK E., ARNDT, RICHARD L., AUERNHAMMER, FLORIAN, SHEN, HUGH
Priority to US15/082,751 priority patent/US9529760B1/en
Application granted granted Critical
Publication of US9514083B1 publication Critical patent/US9514083B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • G06F13/404Coupling between buses using bus bridges with address mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/24Handling requests for interconnection or transfer for access to input/output bus using interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/362Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
    • G06F13/364Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using independent requests or grants, e.g. using separated request and grant lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1056Simplification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/45Caching of specific data in cache memory
    • G06F2212/452Instruction code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Definitions

  • the present disclosure is generally directed to addressing replicated bus units and, more particularly, to techniques for addressing topology specific replicated bus units in a data processing system.
  • Topology specific replicated bus units are typically functional units replicated across a data processing system, possibly at varying levels of the interconnect hierarchy, with each TSRBU interacting with a specific subset of processors within the system.
  • a commonly occurring TSRBU is an interrupt controller.
  • an interrupt controller is a device that is used to combine several interrupt sources on one or more processor core lines, while allowing priority levels to be assigned to interrupt outputs.
  • Interrupt controllers typically have a common set of registers (e.g., an interrupt request register (IRR), an in-service register (ISR), and an interrupt mask register (IMR).
  • IRR specifies which interrupts are pending acknowledgement and is typically a symbolic register that cannot be directly accessed.
  • the ISR register specifies which interrupts have been acknowledged, but are still waiting for an end of interrupt (EOI) signal.
  • the IMR specifies which interrupts are to be ignored and not acknowledged.
  • an interrupt controller may have up to two distinct interrupt requests outstanding at one time (e.g., one interrupt request waiting for acknowledgement, and one interrupt request waiting for an EOI).
  • An interrupt controller may implement hard priorities, specific priorities, or rotating priorities and interrupts may be edge-triggered or level-triggered.
  • TSRBU e.g., an interrupt controller
  • a TSRBU has conventionally employed a unique address for each TSRBU in a data processing system.
  • each TSRBU has required software to utilize a different address to communicate with the TSRBU as the software is moved from one processor core to another processor core
  • user level software has conventionally been required to communicate indirectly with a TSRBU via system control software such as a hypervisor or operating system (OS).
  • system control software such as a hypervisor or operating system (OS).
  • OS operating system
  • a computer program product e.g., in the form of a design file embodied in a computer-readable storage device
  • a technique for handling cache-inhibited operations in a data processing system includes receiving, at a replicated bus unit, a cache-inhibited (CI) operation.
  • the replicated bus unit determines whether an address associated with the CI operation matches an address for the replicated bus unit and whether a source indicated by the CI operation is associated with the replicated bus unit.
  • the replicated bus unit processes the CI operation.
  • the replicated bus unit ignores the CI operation.
  • FIG. 1 is a diagram of a relevant portion of an exemplary data processing system that includes multiple processing nodes, each of which includes multiple processing clusters (PCs) configured according to one or more embodiments of the present disclosure;
  • PCs processing clusters
  • FIG. 2 is a diagram of a relevant portion of one of the PCs of FIG. 1 , which includes multiple processing units (PUs), configured according to an embodiment of the present disclosure;
  • PUs processing units
  • FIG. 3 depicts an exemplary structure of a cache-inhibited (CI) operation, according to one aspect of the present disclosure
  • FIG. 4A depicts a flowchart of an exemplary process implemented by a storage subsystem included in one of the PUs of FIG. 2 to issue CI operations at an indicated scope;
  • FIG. 4B depicts a flowchart of an exemplary process implemented by a replicated bus unit included in one the PCs of FIG. 1 to handle a CI operation, according to one embodiment of the present disclosure
  • FIG. 5A depicts a flowchart of an exemplary process implemented by a topology specific replicated bus unit (TSRBU) included in one the PCs of FIG. 1 to handle a CI operation, according to another embodiment of the present disclosure
  • FIG. 5B depicts a flowchart of an exemplary process implemented by a TSRBU included in one the PCs of FIG. 1 to handle a CI operation, according to yet another embodiment of the present disclosure.
  • the illustrative embodiments provide a method, a data processing system, and a computer program product (e.g., in the form of a design file embodied in a computer-readable storage device) for addressing topology specific replicated bus units (TSRBUs), e.g., interrupt controllers.
  • TSRBUs topology specific replicated bus units
  • the term ‘coupled’ encompasses a direct electrical connection between components or devices and an indirect electrical connection between components or devices achieved using one or more intervening components or devices.
  • the term ‘scope’ refers to a level, e.g., processing cluster level, processing node level, or data processing system level, at which an operation is issued.
  • addressing a TSRBU e.g., an interrupt controller
  • TSRBU e.g., an interrupt controller
  • determining the appropriate address to access a TSRBU is significant burden.
  • user level threads typically cannot be given access to TSRBUs associated with other threads due to system integrity concerns (e.g., a given software thread could corrupt the state of another software thread's TSRBU).
  • system control software communicates indirectly with an interrupt controller via system control software. Unfortunately, communicating through system control software decreases performance and increases complexity. It would therefore be advantageous to provide a mechanism that allows user level software to directly access TSRBUs utilizing a single address that maps to an appropriate TSRBU instance for the given physical location of the hardware thread.
  • CI load CI load
  • CI store instructions are load and store instructions that neither access nor populate caches, but instead directly access memory and/or devices through CI load and CI store bus operations.
  • CI LD is used interchangeably to refer to a CI load instruction or bus operation
  • CI ST is used interchangeably to refer to a CI store instruction or bus operation.
  • CI LDs and CI STs within a specific pre-defined address range are broadcast only within a certain portion of a data processing system.
  • a CI LD and/or CI ST may only be broadcast within a processing cluster (PC), e.g., included in a local chip possibly containing multiple processing cores.
  • PC processing cluster
  • limiting the broadcast scope of CI LDs and CI STs allows a set of TSRBUs replicated in a data processing system at that scope level (one TSRBU instance per scope instance) to all be addressed using the same address. Having the functional unit initiating the CI LD or CI ST limit the scope of the CI LD or CI ST allows the addressed TSRBU to be guaranteed that the CI LD or CI ST is from a thread serviced by that specific TSRBU instance. By limiting the broadcast scope at the initiating functional unit, the CI LD and CI ST operations are visible only to the TSRBU associated with the thread issuing the CI LD or CI ST.
  • An additional field within the CI LD or CI ST operation delivered to the TSRBU allows the TSRBU to determine which particular thread (and processor core) issued the CI LD or CI ST and process the CI LD or CI ST on behalf of that issuing thread.
  • user level software may utilize a fixed address to access a given group of TSRBUs and the underlying limited broadcast scope ensures that only the appropriate instance of the TSRBU observes and processes the CI LD or CI ST.
  • one or more base address registers are added to a storage subsystem.
  • all CI LD and CI ST operation addresses are compared to an address stored in a BAR of the storage subsystem. If a matching CI LD or CI ST address is detected, the CI LD or CI ST is restricted to only broadcast to an associated scope, e.g., processing cluster or processing node, indicated by a scope field of the appropriate BAR.
  • each TSRBU e.g., at a processing cluster level and a processing node level
  • a TSRBU only responds to CI operations whose address matches a value stored in the TSRBU BAR.
  • each CI LD and CI ST operation may be broadcast in such a way that multiple TSRBUs may observe the operation either by the operation being broadcast to the entire data processing system or to some subset of the data processing system possibly containing multiple TSRBUs associated with the given address.
  • each CI LD and CI ST operation includes a source identifier specifying the source processor and thread of the CI LD and CI ST operation.
  • the TSRBU when a TSRBU receives a CI LD or CI ST operation whose address matches the BAR in the TSRBU, the TSRBU further qualifies the access by examining the source identifier to determine if the CI LD or CI ST operation was sourced from a thread that is serviced by the given instance of the TSRBU.
  • the set of threads that are serviced by a given TSRBU can be assigned as fixed function of the topology of the data processing system, or may dynamically change. Each thread is assigned to one and only one TSRBU.
  • the mapping function indicating which threads are serviced by a given instance of a TSRBU may be implemented at the TSRBU with a mapping logic function. If the mapping function indicates that a given CI LD or CI ST is from a thread not serviced by the given TSRBU, the CI LD or CI ST is ignored by the TSRBU. In this manner, TSRBUs can be addressed by user level software utilizing the same address. and only the appropriate TSRBU will process any given CI LD or CI ST operation, if necessary.
  • FIG. 1 a high level block diagram of an exemplary embodiment of a cache coherent symmetric multiprocessor (SMP) data processing system 100 , configured in accordance with the present disclosure, is illustrated.
  • data processing system 100 includes multiple processing nodes 102 a , 102 b for processing data and instructions. While only two processing nodes are illustrated in FIG. 1 it should be appreciated that a data processing system configured according to the present disclosure may include more or less than two processing nodes.
  • Processing nodes 102 a , 102 b are coupled to a system interconnect 110 for conveying address, data, and control information between processing nodes 102 a , 102 b .
  • System interconnect 110 may be implemented, for example, as a bused interconnect, a switched interconnect, or a hybrid interconnect.
  • each processing node 102 may be realized as a multi-chip module (MCM) containing four processing clusters (PCs) 104 a - 104 d , each of which may be realized as a respective integrated circuit (IC). It should be appreciated that a processing node configured according to the present disclosure may include more or less than four processing clusters.
  • the processing clusters 104 a - 104 d within each processing node 102 are coupled for communication by a local interconnect 114 , which, like system interconnect 110 , may be implemented with one or more buses and/or switches.
  • TSRBU 116 includes a BAR 118 that stores an address (e.g., address ‘2000’) of TSRBU 116 , which is implemented at a processing node 102 level.
  • mapping logic (ML) 119 provides a mapping function that determines which threads on which processing cores are associated with TSRBU 116 .
  • TSRBU 116 is coupled to each PU 200 (see FIG. 2 ) in PC 104 via communication lines 122 , sixteen of which are illustrated in FIG. 1 .
  • each communication line 122 may include more than one wire or trace.
  • Local interconnect 114 is coupled for communication to system interconnect 110 via a bus interface unit 112 , which provides logic for communicating signals between local interconnect 114 and system interconnect 110 .
  • Data and instructions residing in system memories 108 can generally be accessed and modified by a processor core in any processing cluster 104 in any processing node 102 of data processing system 100 .
  • one or more system memories 108 can be coupled to local interconnect 114 and/or system interconnect 110 rather than processing clusters 104 .
  • data processing system 100 can include many additional components, such as interconnect bridges, non-volatile storage, ports for connection to networks or attached devices, etc.
  • additional components that are not necessary for an understanding of the present disclosure are not illustrated in FIG. 1 or discussed further herein.
  • the enhancements provided by the present disclosure are applicable to data processing systems of diverse architectures and are in no way limited to the generalized data processing system architecture illustrated in FIG. 1 .
  • each processing cluster 104 includes four processing units (PUs) 200 a - 200 d for independently processing instructions and data.
  • Each PU 200 a - 200 d is coupled to communication line 122 for communicating with TSRBU 116 . While four processing units are illustrated in FIG. 2 , it should be appreciated that a data processing system configured according to the present disclosure may include processing clusters with more or less than four processing units.
  • Each processing unit 200 includes a processor core 202 that is coupled to a storage subsystem 204 , which includes a base address register (BAR) 206 that is configured to store a value, among other information, that corresponds to an address of an associated TSRBU 216 at a processing cluster level and a base address register (BAR) 208 that is configured to store a value, among other information, that corresponds to an address of an associated TSRBU 116 at a processing node level.
  • storage subsystem 204 issues a CI LD or CI ST (with a matching address) at a scope indicated by a scope value stored in scope field 207 of BAR 206 or a scope value stored in scope field 209 of BAR 208 .
  • TSRBU 216 includes a BAR 218 that stores an address of TSRBU 216 , among other information.
  • mapping logic (ML) 219 provides a mapping function that determines which threads on which processing cores are associated with TSRBU 216 .
  • each processor core 202 is supported by a multi-level volatile memory hierarchy having at its lowest level shared system memory 108 , and at its upper levels one or more levels of cache memory (e.g., located in processor core 202 and storage subsystem 204 ).
  • each processing cluster 104 includes an integrated memory controller (IMC) 214 that controls read and write access to system memory 108 within its processing cluster 104 in response to requests received from processor cores 202 and operations snooped on local interconnect 220 .
  • Local interconnect 220 may be implemented, for example, as a bused interconnect, a switched interconnect, or a hybrid interconnect.
  • the cache memory hierarchy of processing cluster 104 may include a store-through level one (L1) cache (not separately shown), which may be bifurcated into separate L1 instruction and data caches, within each processor core 202 and a level two (L2) cache (not separately shown), within storage subsystem 204 , utilized by processor core 202 of processing unit 200 .
  • L1 cache level one
  • L2 cache level two cache
  • a cache hierarchy may include more than two levels of cache, which may be on-chip or off-chip, in-line, lookaside, or victim cache, which may be fully inclusive, partially inclusive, or non-inclusive of the contents the upper levels of cache.
  • Transactions may be initiated on local interconnect 220 , local interconnect 114 , and/or system interconnect 110 at a scope specified by processor cores 202 . That is, a processor core 202 may specify that a scope of a transaction is applicable to: an associated processing cluster 104 , each of which includes multiple processing units 200 (i.e., bus 220 of an associated processing cluster 104 ); or an associated processing node 102 , each of which includes multiple processing clusters 104 , each of which includes multiple processing units 200 (i.e., bus 114 of processing node 102 ); or at a full system scope encompassing all processing units 200 within system 100 via buses 220 , 114 , and 110 .
  • Bus 220 of each processing cluster 104 is coupled to an associated bus 114 via a bus interface unit (BIU) 212 .
  • BIU bus interface unit
  • Each processing cluster 104 further includes an instance of response logic 210 , which implements a portion of a distributed coherency signaling mechanism that maintains cache coherency within data processing system 100 .
  • each processing cluster 104 includes an integrated I/O (input/output) controller 230 supporting the attachment of one or more I/O devices, such as I/O device 232 .
  • I/O controller 230 may issue operations on local interconnects 220 and 114 and/or system interconnect 110 in response to requests by I/O device 232 .
  • TSRBU 116 is coupled to each PU 200 (see FIG. 2 ) in PC 104 via communication lines 122 .
  • communication lines 222 a - 222 d couple TSRBU 216 to PUs 200 a - 200 d , respectively.
  • CI operation 300 includes an address field 302 , an operation type field 304 , and a source field 306 .
  • Address field 302 provides a device address for a device to which operation 300 is directed
  • operation type field 304 provides an indication of whether the CI operation is a LD operation or a ST operation
  • source field 306 indicates a master (e.g., a processor core 202 and software thread within that core) that is the source of the CI operation and to which CI LD data should be returned or from which CI ST data should be provided.
  • master e.g., a processor core 202 and software thread within that core
  • a system control software may be configured to store values into BARs 206 and 208 of storage subsystem 204 to create a mapping between processor cores 202 and TSRBUs 216 and 116 .
  • an exemplary process 400 is illustrated that is executed by a storage subsystem 204 configured according to one or more embodiments of the present disclosure.
  • process 400 is initiated, at which point control transfers to block 404 .
  • storage subsystem 204 receives a CI LD or a CI ST from processor core 202 .
  • decision block 406 storage subsystem determines whether an address of the CI LD or CI ST matches an address in an associated base address register (BAR) 206 or BAR 208 .
  • BAR base address register
  • storage subsystem 204 issues the CI LD or CI ST at an appropriate scope or scopes.
  • a scope may correspond to, for example, a processing cluster level, a processing node level, or a full system level.
  • control transfers to block 410 , where process 400 terminates.
  • control transfers from block 406 to block 412 .
  • storage subsystem 204 issues the CI LD or CI ST at a scope indicated by a scope value stored in scope field 207 of BAR 206 or a scope value stored in scope field 209 of BAR 208 .
  • the scope of the broadcast in block 412 must be less than the full system scope. From block 412 control transfers to block 410 .
  • TSRBU i.e., TSRBU 216 and/or TSRBU 116
  • TSRBU 216 functions in a similar manner.
  • process 440 is initiated, at which point control transfers to block 444 .
  • TSRBU 216 receives a CI LD operation or a CI ST operation from processor core 202 .
  • TSRBU 216 determines whether an address of the CI LD operation or CI ST operation matches an address in an associated base address register (BAR) 218 . In response to the address of the CI LD operation or CI ST operation not matching the address stored in BAR 218 , control transfers from block 446 to block 448 . In block 448 TSRBU 216 ignores the CI LD operation or CI ST operation with the non-matching address. From block 448 control transfers to block 450 , where process 400 terminates. In response to the address of the CI LD operation or CI ST operation matching the address stored in BAR 218 , control transfers from block 446 to block 452 .
  • BAR base address register
  • TSRBU 216 processes the CI LD operation or CI ST operation, as TSRBU 216 has knowledge that a scope of the operation is limited, in this case, to a processing cluster level that can contain only this instance of the TSRBU. From block 452 control transfers to block 450 .
  • a storage subsystem 204 does not include BARs 206 and 208 and may not utilize broadcast scopes.
  • CI LD operations and CI ST operations may be broadcast to the entire data processing system 100 or may be broadcast to a scope or scopes containing more than one TSRBU associated with the given address.
  • each TSRBU 216 in each processing cluster 104 utilizes a same address (e.g., address ‘1000’) and that each TSRBU 116 in each processing node 102 utilizes a same address (e.g., address ‘2000’) that is different from the address utilized for TSRBUs 216 . While two levels of TSRBUs are illustrated, it should be appreciated that TSRBUs may be replicated at more or less than two levels. While the discussion below focuses on TSRBU 216 for brevity, it should be appreciated that TSRBU 116 functions in a similar manner.
  • TSRBU 216 communicates with all PUs 200 within a single PC 104
  • TSRBU 116 communicates with all PUs 200 within all PCs 104 within a processing node 102 .
  • process 500 is initiated, at which point control transfers to block 504 .
  • storage subsystem 204 receives a CI LD operation or a CI ST operation from processor core 202 .
  • storage subsystem 204 broadcasts the received CI LD operation or CI ST operation to the appropriate scope or scopes as necessary to deliver the CI LD or CI ST bus operations to the appropriate TSRBU 216 . From block 506 control transfers to block 508 , where process 500 terminates.
  • TSRBU i.e., TSRBU 216 and/or TSRBU 116
  • TSRBU 216 i.e., TSRBU 216 and/or TSRBU 116
  • process 540 is initiated, at which point control transfers to block 544 .
  • TSRBU 216 receives a CI LD operation or a CI ST operation from processor core 202 .
  • decision block 546 TSRBU 216 determines whether an address of the CI LD operation or CI ST operation matches an address in an associated base address register (BAR) 218 .
  • BAR base address register
  • TSRBU 216 ignores the CI LD operation or CI ST operation with the non-matching address. From block 548 control transfers to block 550 , where process 500 terminates.
  • matching logic 219 within TSRBU 216 determines whether a source value in source field 306 of the operation matches a thread assigned to TSRBU 216 .
  • control transfers from block 552 to block 556 where TSRBU 216 processes the CI LD operation or CI ST operation on behalf of the source thread identified by source field 306 . From block 556 control transfers to block 550 .
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

A technique for handling cache-inhibited operations in a data processing system includes receiving, at a replicated bus unit, a cache-inhibited (CI) operation. The replicated bus unit determines whether an address associated with the CI operation matches an address for the replicated bus unit and whether a source indicated by the CI operation is associated with the replicated bus unit. In response to the address associated with the CI operation matching the address for the replicated bus unit and the source indicated by the CI operation being associated with the replicated bus unit, the replicated bus unit processes the CI operation. In response to the address associated with the CI operation not matching the address for the replicated bus unit or the source indicated by the CI operation not being associated with the replicated bus unit, the replicated bus unit ignores the CI operation.

Description

BACKGROUND
The present disclosure is generally directed to addressing replicated bus units and, more particularly, to techniques for addressing topology specific replicated bus units in a data processing system. Topology specific replicated bus units (TSRBUs) are typically functional units replicated across a data processing system, possibly at varying levels of the interconnect hierarchy, with each TSRBU interacting with a specific subset of processors within the system. A commonly occurring TSRBU is an interrupt controller.
In computing, an interrupt controller is a device that is used to combine several interrupt sources on one or more processor core lines, while allowing priority levels to be assigned to interrupt outputs. Interrupt controllers typically have a common set of registers (e.g., an interrupt request register (IRR), an in-service register (ISR), and an interrupt mask register (IMR). The IRR specifies which interrupts are pending acknowledgement and is typically a symbolic register that cannot be directly accessed. The ISR register specifies which interrupts have been acknowledged, but are still waiting for an end of interrupt (EOI) signal. The IMR specifies which interrupts are to be ignored and not acknowledged. In general, an interrupt controller may have up to two distinct interrupt requests outstanding at one time (e.g., one interrupt request waiting for acknowledgement, and one interrupt request waiting for an EOI). An interrupt controller may implement hard priorities, specific priorities, or rotating priorities and interrupts may be edge-triggered or level-triggered.
Addressing a TSRBU, e.g., an interrupt controller, has conventionally employed a unique address for each TSRBU in a data processing system. Given that each TSRBU has required software to utilize a different address to communicate with the TSRBU as the software is moved from one processor core to another processor core, user level software has conventionally been required to communicate indirectly with a TSRBU via system control software such as a hypervisor or operating system (OS). Requiring software to utilize a different address to communicate with different TSRBUs through system control software increases operating complexity.
BRIEF SUMMARY
Disclosed are a method, a data processing system, and a computer program product (e.g., in the form of a design file embodied in a computer-readable storage device) for addressing topology specific replicated bus units.
A technique for handling cache-inhibited operations in a data processing system includes receiving, at a replicated bus unit, a cache-inhibited (CI) operation. The replicated bus unit determines whether an address associated with the CI operation matches an address for the replicated bus unit and whether a source indicated by the CI operation is associated with the replicated bus unit. In response to the address associated with the CI operation matching the address for the replicated bus unit and the source indicated by the CI operation being associated with the replicated bus unit, the replicated bus unit processes the CI operation. In response to the address associated with the CI operation not matching the address for the replicated bus unit or the source indicated by the CI operation not being associated with the replicated bus unit, the replicated bus unit ignores the CI operation.
The above summary contains simplifications, generalizations and omissions of detail and is not intended as a comprehensive description of the claimed subject matter but, rather, is intended to provide a brief overview of some of the functionality associated therewith. Other systems, methods, functionality, features and advantages of the claimed subject matter will be or will become apparent to one with skill in the art upon examination of the following figures and detailed written description.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
BRIEF DESCRIPTION OF THE DRAWINGS
The description of the illustrative embodiments is to be read in conjunction with the accompanying drawings, wherein:
FIG. 1 is a diagram of a relevant portion of an exemplary data processing system that includes multiple processing nodes, each of which includes multiple processing clusters (PCs) configured according to one or more embodiments of the present disclosure;
FIG. 2 is a diagram of a relevant portion of one of the PCs of FIG. 1, which includes multiple processing units (PUs), configured according to an embodiment of the present disclosure;
FIG. 3 depicts an exemplary structure of a cache-inhibited (CI) operation, according to one aspect of the present disclosure;
FIG. 4A depicts a flowchart of an exemplary process implemented by a storage subsystem included in one of the PUs of FIG. 2 to issue CI operations at an indicated scope;
FIG. 4B depicts a flowchart of an exemplary process implemented by a replicated bus unit included in one the PCs of FIG. 1 to handle a CI operation, according to one embodiment of the present disclosure;
FIG. 5A depicts a flowchart of an exemplary process implemented by a topology specific replicated bus unit (TSRBU) included in one the PCs of FIG. 1 to handle a CI operation, according to another embodiment of the present disclosure; and
FIG. 5B depicts a flowchart of an exemplary process implemented by a TSRBU included in one the PCs of FIG. 1 to handle a CI operation, according to yet another embodiment of the present disclosure.
DETAILED DESCRIPTION
The illustrative embodiments provide a method, a data processing system, and a computer program product (e.g., in the form of a design file embodied in a computer-readable storage device) for addressing topology specific replicated bus units (TSRBUs), e.g., interrupt controllers.
In the following detailed description of exemplary embodiments of the invention, specific exemplary embodiments in which the invention may be practiced are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, architectural, programmatic, mechanical, electrical and other changes may be made without departing from the spirit or scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims and equivalents thereof.
It is understood that the use of specific component, device and/or parameter names are for example only and not meant to imply any limitations on the invention. The invention may thus be implemented with different nomenclature/terminology utilized to describe the components/devices/parameters herein, without limitation. Each term utilized herein is to be given its broadest interpretation given the context in which that term is utilized. As may be utilized herein, the term ‘coupled’ encompasses a direct electrical connection between components or devices and an indirect electrical connection between components or devices achieved using one or more intervening components or devices. As may be used herein, the term ‘scope’ refers to a level, e.g., processing cluster level, processing node level, or data processing system level, at which an operation is issued.
As noted above, addressing a TSRBU, e.g., an interrupt controller, has conventionally employed a unique address for each instance of the TSRBU. As a software thread is moved from one physical hardware thread to another, the TSRBU associated with the software thread may change. In such a conventional system, determining the appropriate address to access a TSRBU is significant burden. Further, user level threads typically cannot be given access to TSRBUs associated with other threads due to system integrity concerns (e.g., a given software thread could corrupt the state of another software thread's TSRBU). For these reasons, among others, user level software has conventionally communicated indirectly with an interrupt controller via system control software. Unfortunately, communicating through system control software decreases performance and increases complexity. It would therefore be advantageous to provide a mechanism that allows user level software to directly access TSRBUs utilizing a single address that maps to an appropriate TSRBU instance for the given physical location of the hardware thread.
Software threads executing on processor cores often communicate with TSRBUs utilizing cache-inhibited load and store instructions. Cache-inhibited (CI) load—CI load—and store—CI store—instructions are load and store instructions that neither access nor populate caches, but instead directly access memory and/or devices through CI load and CI store bus operations. As used herein, ‘CI LD’ is used interchangeably to refer to a CI load instruction or bus operation and similarly ‘CI ST’ is used interchangeably to refer to a CI store instruction or bus operation. According to one aspect of the present disclosure, CI LDs and CI STs within a specific pre-defined address range are broadcast only within a certain portion of a data processing system. For example, a CI LD and/or CI ST may only be broadcast within a processing cluster (PC), e.g., included in a local chip possibly containing multiple processing cores.
In general, limiting the broadcast scope of CI LDs and CI STs allows a set of TSRBUs replicated in a data processing system at that scope level (one TSRBU instance per scope instance) to all be addressed using the same address. Having the functional unit initiating the CI LD or CI ST limit the scope of the CI LD or CI ST allows the addressed TSRBU to be guaranteed that the CI LD or CI ST is from a thread serviced by that specific TSRBU instance. By limiting the broadcast scope at the initiating functional unit, the CI LD and CI ST operations are visible only to the TSRBU associated with the thread issuing the CI LD or CI ST. An additional field within the CI LD or CI ST operation delivered to the TSRBU, allows the TSRBU to determine which particular thread (and processor core) issued the CI LD or CI ST and process the CI LD or CI ST on behalf of that issuing thread. In this manner, user level software may utilize a fixed address to access a given group of TSRBUs and the underlying limited broadcast scope ensures that only the appropriate instance of the TSRBU observes and processes the CI LD or CI ST.
According to one aspect of the present disclosure, one or more base address registers (BARs) are added to a storage subsystem. In this case, all CI LD and CI ST operation addresses are compared to an address stored in a BAR of the storage subsystem. If a matching CI LD or CI ST address is detected, the CI LD or CI ST is restricted to only broadcast to an associated scope, e.g., processing cluster or processing node, indicated by a scope field of the appropriate BAR. In at least one embodiment, each TSRBU (e.g., at a processing cluster level and a processing node level) also includes a BAR that is loaded with a level appropriate address. In this embodiment, a TSRBU only responds to CI operations whose address matches a value stored in the TSRBU BAR.
However, some data processing systems do not provide the ability to limit the broadcast scope of CI LD and CI ST operations and/or the replication of TSRBUs may not have one replicated TSRBU per broadcast scope. For these reasons, among other reasons, in another aspect of the present disclosure, an alternative means of allowing single address addressing of replicated TSRBUs in such systems is provided. In this alternate embodiment, each CI LD and CI ST operation may be broadcast in such a way that multiple TSRBUs may observe the operation either by the operation being broadcast to the entire data processing system or to some subset of the data processing system possibly containing multiple TSRBUs associated with the given address. In this case, each CI LD and CI ST operation includes a source identifier specifying the source processor and thread of the CI LD and CI ST operation. In such systems, when a TSRBU receives a CI LD or CI ST operation whose address matches the BAR in the TSRBU, the TSRBU further qualifies the access by examining the source identifier to determine if the CI LD or CI ST operation was sourced from a thread that is serviced by the given instance of the TSRBU. The set of threads that are serviced by a given TSRBU can be assigned as fixed function of the topology of the data processing system, or may dynamically change. Each thread is assigned to one and only one TSRBU. The mapping function indicating which threads are serviced by a given instance of a TSRBU may be implemented at the TSRBU with a mapping logic function. If the mapping function indicates that a given CI LD or CI ST is from a thread not serviced by the given TSRBU, the CI LD or CI ST is ignored by the TSRBU. In this manner, TSRBUs can be addressed by user level software utilizing the same address. and only the appropriate TSRBU will process any given CI LD or CI ST operation, if necessary.
With reference to FIG. 1, a high level block diagram of an exemplary embodiment of a cache coherent symmetric multiprocessor (SMP) data processing system 100, configured in accordance with the present disclosure, is illustrated. As shown, data processing system 100 includes multiple processing nodes 102 a, 102 b for processing data and instructions. While only two processing nodes are illustrated in FIG. 1 it should be appreciated that a data processing system configured according to the present disclosure may include more or less than two processing nodes. Processing nodes 102 a, 102 b are coupled to a system interconnect 110 for conveying address, data, and control information between processing nodes 102 a, 102 b. System interconnect 110 may be implemented, for example, as a bused interconnect, a switched interconnect, or a hybrid interconnect.
In the depicted embodiment, each processing node 102 may be realized as a multi-chip module (MCM) containing four processing clusters (PCs) 104 a-104 d, each of which may be realized as a respective integrated circuit (IC). It should be appreciated that a processing node configured according to the present disclosure may include more or less than four processing clusters. The processing clusters 104 a-104 d within each processing node 102 are coupled for communication by a local interconnect 114, which, like system interconnect 110, may be implemented with one or more buses and/or switches. As is illustrated, TSRBU 116 includes a BAR 118 that stores an address (e.g., address ‘2000’) of TSRBU 116, which is implemented at a processing node 102 level. In one embodiment of the present invention, mapping logic (ML) 119 provides a mapping function that determines which threads on which processing cores are associated with TSRBU 116. TSRBU 116 is coupled to each PU 200 (see FIG. 2) in PC 104 via communication lines 122, sixteen of which are illustrated in FIG. 1. As is shown, four communication lines 122 a are illustrated between TSRBU 116 and PC 104 a, four communication lines 122 b are illustrated between TSRBU 116 and PC 104 b, four communication lines 122 c are illustrated between TSRBU 116 and PC 104 c, and four communication lines 122 d are illustrated between TSRBU 116 and PC 104 d. It should be appreciated that each communication line 122 may include more than one wire or trace.
Local interconnect 114 is coupled for communication to system interconnect 110 via a bus interface unit 112, which provides logic for communicating signals between local interconnect 114 and system interconnect 110. Data and instructions residing in system memories 108 can generally be accessed and modified by a processor core in any processing cluster 104 in any processing node 102 of data processing system 100. In alternative embodiments, one or more system memories 108 can be coupled to local interconnect 114 and/or system interconnect 110 rather than processing clusters 104.
Those skilled in the art will appreciate that data processing system 100 can include many additional components, such as interconnect bridges, non-volatile storage, ports for connection to networks or attached devices, etc. For brevity, additional components that are not necessary for an understanding of the present disclosure are not illustrated in FIG. 1 or discussed further herein. It should also be understood, however, that the enhancements provided by the present disclosure are applicable to data processing systems of diverse architectures and are in no way limited to the generalized data processing system architecture illustrated in FIG. 1.
Referring now to FIG. 2, a more detailed block diagram of an exemplary processing cluster 104 in accordance with an embodiment of the present disclosure is illustrated. In the depicted embodiment, each processing cluster 104 includes four processing units (PUs) 200 a-200 d for independently processing instructions and data. Each PU 200 a-200 d is coupled to communication line 122 for communicating with TSRBU 116. While four processing units are illustrated in FIG. 2, it should be appreciated that a data processing system configured according to the present disclosure may include processing clusters with more or less than four processing units. Each processing unit 200 includes a processor core 202 that is coupled to a storage subsystem 204, which includes a base address register (BAR) 206 that is configured to store a value, among other information, that corresponds to an address of an associated TSRBU 216 at a processing cluster level and a base address register (BAR) 208 that is configured to store a value, among other information, that corresponds to an address of an associated TSRBU 116 at a processing node level. In one or more embodiments, storage subsystem 204 issues a CI LD or CI ST (with a matching address) at a scope indicated by a scope value stored in scope field 207 of BAR 206 or a scope value stored in scope field 209 of BAR 208. As is illustrated, TSRBU 216 includes a BAR 218 that stores an address of TSRBU 216, among other information. In one embodiment of the present disclosure, mapping logic (ML) 219 provides a mapping function that determines which threads on which processing cores are associated with TSRBU 216.
The operation of each processor core 202 is supported by a multi-level volatile memory hierarchy having at its lowest level shared system memory 108, and at its upper levels one or more levels of cache memory (e.g., located in processor core 202 and storage subsystem 204). In the depicted embodiment, each processing cluster 104 includes an integrated memory controller (IMC) 214 that controls read and write access to system memory 108 within its processing cluster 104 in response to requests received from processor cores 202 and operations snooped on local interconnect 220. Local interconnect 220 may be implemented, for example, as a bused interconnect, a switched interconnect, or a hybrid interconnect.
The cache memory hierarchy of processing cluster 104 may include a store-through level one (L1) cache (not separately shown), which may be bifurcated into separate L1 instruction and data caches, within each processor core 202 and a level two (L2) cache (not separately shown), within storage subsystem 204, utilized by processor core 202 of processing unit 200. It should also be appreciated that a cache hierarchy may include more than two levels of cache, which may be on-chip or off-chip, in-line, lookaside, or victim cache, which may be fully inclusive, partially inclusive, or non-inclusive of the contents the upper levels of cache.
Transactions may be initiated on local interconnect 220, local interconnect 114, and/or system interconnect 110 at a scope specified by processor cores 202. That is, a processor core 202 may specify that a scope of a transaction is applicable to: an associated processing cluster 104, each of which includes multiple processing units 200 (i.e., bus 220 of an associated processing cluster 104); or an associated processing node 102, each of which includes multiple processing clusters 104, each of which includes multiple processing units 200 (i.e., bus 114 of processing node 102); or at a full system scope encompassing all processing units 200 within system 100 via buses 220, 114, and 110. Bus 220 of each processing cluster 104 is coupled to an associated bus 114 via a bus interface unit (BIU) 212.
Each processing cluster 104 further includes an instance of response logic 210, which implements a portion of a distributed coherency signaling mechanism that maintains cache coherency within data processing system 100. Finally, each processing cluster 104 includes an integrated I/O (input/output) controller 230 supporting the attachment of one or more I/O devices, such as I/O device 232. I/O controller 230 may issue operations on local interconnects 220 and 114 and/or system interconnect 110 in response to requests by I/O device 232. As noted above with respect to FIG. 1, TSRBU 116 is coupled to each PU 200 (see FIG. 2) in PC 104 via communication lines 122. Similarly, communication lines 222 a-222 d couple TSRBU 216 to PUs 200 a-200 d, respectively.
With reference to FIG. 3, an exemplary structure for a cache-inhibited (CI) bus operation 300, e.g., a cache-inhibited (CI) load (LD) or store (ST) bus operation, is illustrated according to aspects of the present disclosure. CI operation 300 includes an address field 302, an operation type field 304, and a source field 306. Address field 302 provides a device address for a device to which operation 300 is directed, operation type field 304 provides an indication of whether the CI operation is a LD operation or a ST operation, and source field 306 indicates a master (e.g., a processor core 202 and software thread within that core) that is the source of the CI operation and to which CI LD data should be returned or from which CI ST data should be provided. It should be appreciated that a system control software may be configured to store values into BARs 206 and 208 of storage subsystem 204 to create a mapping between processor cores 202 and TSRBUs 216 and 116.
With reference to FIG. 4A, an exemplary process 400 is illustrated that is executed by a storage subsystem 204 configured according to one or more embodiments of the present disclosure. In block 402 process 400 is initiated, at which point control transfers to block 404. In block 404 storage subsystem 204 receives a CI LD or a CI ST from processor core 202. Next, in decision block 406 storage subsystem determines whether an address of the CI LD or CI ST matches an address in an associated base address register (BAR) 206 or BAR 208. In response to the address of the CI LD or CI ST not matching the address stored in BARs 206 and 208, control transfers from block 406 to block 408. In block 408 storage subsystem 204 issues the CI LD or CI ST at an appropriate scope or scopes. As previously discussed, a scope may correspond to, for example, a processing cluster level, a processing node level, or a full system level. From block 408 control transfers to block 410, where process 400 terminates. In response to the address of the CI LD or CI ST matching the address stored in BAR 206 or BAR 208, control transfers from block 406 to block 412. In block 412 storage subsystem 204 issues the CI LD or CI ST at a scope indicated by a scope value stored in scope field 207 of BAR 206 or a scope value stored in scope field 209 of BAR 208. For a TSRBU to be replicated more than once in system 100 the scope of the broadcast in block 412 must be less than the full system scope. From block 412 control transfers to block 410.
With reference to FIG. 4B, an exemplary process 440 is illustrated that is executed by a TSRBU (i.e., TSRBU 216 and/or TSRBU 116) configured according to one or more embodiments of the present disclosure. While the discussion below focuses on TSRBU 216 for brevity, it should be appreciated that TSRBU 116 functions in a similar manner. In block 442 process 440 is initiated, at which point control transfers to block 444. In block 444 TSRBU 216 receives a CI LD operation or a CI ST operation from processor core 202. Next, in decision block 446, TSRBU 216 determines whether an address of the CI LD operation or CI ST operation matches an address in an associated base address register (BAR) 218. In response to the address of the CI LD operation or CI ST operation not matching the address stored in BAR 218, control transfers from block 446 to block 448. In block 448 TSRBU 216 ignores the CI LD operation or CI ST operation with the non-matching address. From block 448 control transfers to block 450, where process 400 terminates. In response to the address of the CI LD operation or CI ST operation matching the address stored in BAR 218, control transfers from block 446 to block 452. In block 452 TSRBU 216 processes the CI LD operation or CI ST operation, as TSRBU 216 has knowledge that a scope of the operation is limited, in this case, to a processing cluster level that can contain only this instance of the TSRBU. From block 452 control transfers to block 450.
With reference to FIG. 5A, an exemplary process 500 is illustrated that is executed by a storage subsystem 204 configured according to one or more embodiments of the present disclosure. In this embodiment, a storage subsystem 204 does not include BARs 206 and 208 and may not utilize broadcast scopes. In such an embodiment, CI LD operations and CI ST operations may be broadcast to the entire data processing system 100 or may be broadcast to a scope or scopes containing more than one TSRBU associated with the given address. It should be appreciated that each TSRBU 216 in each processing cluster 104 utilizes a same address (e.g., address ‘1000’) and that each TSRBU 116 in each processing node 102 utilizes a same address (e.g., address ‘2000’) that is different from the address utilized for TSRBUs 216. While two levels of TSRBUs are illustrated, it should be appreciated that TSRBUs may be replicated at more or less than two levels. While the discussion below focuses on TSRBU 216 for brevity, it should be appreciated that TSRBU 116 functions in a similar manner.
As mentioned above, TSRBU 216 communicates with all PUs 200 within a single PC 104, while TSRBU 116 communicates with all PUs 200 within all PCs 104 within a processing node 102. In block 502 process 500 is initiated, at which point control transfers to block 504. In block 504, storage subsystem 204 receives a CI LD operation or a CI ST operation from processor core 202. Next, in block 506, storage subsystem 204 broadcasts the received CI LD operation or CI ST operation to the appropriate scope or scopes as necessary to deliver the CI LD or CI ST bus operations to the appropriate TSRBU 216. From block 506 control transfers to block 508, where process 500 terminates.
With reference to FIG. 5B, an exemplary process 540 is illustrated that is executed by a TSRBU (i.e., TSRBU 216 and/or TSRBU 116) configured according to one or more embodiments of the present disclosure. As above, the discussion below focuses on TSRBU 216 for brevity. In block 542 process 540 is initiated, at which point control transfers to block 544. In block 544 TSRBU 216 receives a CI LD operation or a CI ST operation from processor core 202. Next, in decision block 546, TSRBU 216 determines whether an address of the CI LD operation or CI ST operation matches an address in an associated base address register (BAR) 218. In response to the address of the CI LD operation or CI ST operation not matching the address stored in BAR 218, control transfers from block 546 to block 548. In block 548 TSRBU 216 ignores the CI LD operation or CI ST operation with the non-matching address. From block 548 control transfers to block 550, where process 500 terminates.
In response to the address of the CI LD operation or CI ST operation matching the address stored in BAR 218 in block 546, control transfers to decision block 552. In block 552 matching logic 219 within TSRBU 216 determines whether a source value in source field 306 of the operation matches a thread assigned to TSRBU 216. In response to a source value in source field 306 not matching a thread assigned to TSRBU 216, control transfers from block 552 to block 548. In response to a source value in source field 306 matching a thread assigned to TSRBU 216, control transfers from block 552 to block 556, where TSRBU 216 processes the CI LD operation or CI ST operation on behalf of the source thread identified by source field 306. From block 556 control transfers to block 550.
Accordingly, techniques have been disclosed herein that advantageously address topology specific replicated bus units in a data processing system at a given level using a same address.
The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular system, device or component thereof to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (12)

What is claimed is:
1. A data processing system, comprising:
a storage subsystem; and
a replicated bus unit coupled to the storage subsystem, wherein the replicated bus unit is configured to:
receive a cache-inhibited (CI) operation;
determine whether an address of the CI operation matches an address for the replicated bus unit and whether a source indicated by the CI operation is a source that is serviced by the replicated bus unit;
in response to the address of the CI operation matching the address for the replicated bus unit and the source indicated by the CI operation being a source that is serviced by the replicated bus unit, process the CI operation; and
in response to the address of the CI operation not matching the address for the replicated bus unit or the source indicated by the CI operation not being a source that is serviced by the replicated bus unit, ignore the CI operation.
2. The data processing system of claim 1, wherein the replicated bus unit is included in a processing cluster.
3. The data processing system of claim 1, wherein the replicated bus unit is included in a processing node.
4. The data processing system of claim 1, wherein the replicated bus unit includes a base address register (BAR) that stores the address of the replicated bus unit.
5. The data processing system of claim 1, wherein the CI operation is a CI load operation or a CI store operation.
6. The data processing system of claim 1, wherein the replicated bus unit is an interrupt controller.
7. The data processing system of claim 1, wherein the replicated bus unit is further configured to:
return data to the source indicated by the CI operation.
8. The data processing system of claim 7, wherein the source corresponds to a master located within the storage subsystem.
9. A data processing system, comprising:
a storage subsystem; and
a first replicated bus unit coupled to the storage subsystem by a first bus, wherein the first replicated bus unit is located at a processing cluster level and is configured to:
receive a cache-inhibited (CI) operation;
determine whether an address of the CI operation matches an address for the first replicated bus unit and whether a source indicated by the CI operation is a source that is serviced by the first replicated bus unit;
in response to the address of the CI operation matching the address for the first replicated bus unit and the source indicated by the CI operation being a source that is serviced by the first replicated bus unit, process the CI operation; and
in response to the address of the CI operation not matching the address for the first replicated bus unit or the source indicated by the CI operation not being a source that is serviced by the first replicated bus unit, ignore the CI operation.
10. The data processing system of claim 9, further comprising:
a second replicated bus unit coupled to the storage subsystem by a second bus, wherein the second bus is coupled to the first bus, and wherein the second replicated bus unit is located at a processing node level and is configured to:
receive the CI operation;
determine whether the address of the CI operation matches an address for the second replicated bus unit and whether a source indicated by the CI operation is a source that is serviced by the second replicated bus unit;
in response to the address of the CI operation matching the address for the second replicated bus unit and the source indicated by the CI operation being a source that is serviced by the second replicated bus unit, process the CI operation; and
in response to the address of the CI operation not matching the address for the second replicated bus unit or the source indicated by the CI operation not being a source that is serviced by the second replicated bus unit, ignore the CI operation.
11. The data processing system of claim 10, wherein the CI operation is a CI load operation or a CI store operation.
12. The data processing system of claim 11, wherein the first and second replicated bus units are interrupt controllers.
US14/960,507 2015-12-07 2015-12-07 Topology specific replicated bus unit addressing in a data processing system Active US9514083B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/960,507 US9514083B1 (en) 2015-12-07 2015-12-07 Topology specific replicated bus unit addressing in a data processing system
US15/082,751 US9529760B1 (en) 2015-12-07 2016-03-28 Topology specific replicated bus unit addressing in a data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/960,507 US9514083B1 (en) 2015-12-07 2015-12-07 Topology specific replicated bus unit addressing in a data processing system

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/082,751 Continuation US9529760B1 (en) 2015-12-07 2016-03-28 Topology specific replicated bus unit addressing in a data processing system

Publications (1)

Publication Number Publication Date
US9514083B1 true US9514083B1 (en) 2016-12-06

Family

ID=57400030

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/960,507 Active US9514083B1 (en) 2015-12-07 2015-12-07 Topology specific replicated bus unit addressing in a data processing system
US15/082,751 Active US9529760B1 (en) 2015-12-07 2016-03-28 Topology specific replicated bus unit addressing in a data processing system

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/082,751 Active US9529760B1 (en) 2015-12-07 2016-03-28 Topology specific replicated bus unit addressing in a data processing system

Country Status (1)

Country Link
US (2) US9514083B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10083142B2 (en) * 2015-12-07 2018-09-25 International Business Machines Corporation Addressing topology specific replicated bus units
US20220283814A1 (en) * 2019-08-06 2022-09-08 Ictk Holdings Co., Ltd. Processor, processor operation method and electronic device comprising same

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4564900A (en) * 1981-09-18 1986-01-14 Christian Rovsing A/S Multiprocessor computer system
US5638516A (en) * 1994-08-01 1997-06-10 Ncube Corporation Parallel processor that routes messages around blocked or faulty nodes by selecting an output port to a subsequent node from a port vector and transmitting a route ready signal back to a previous node
US5809533A (en) * 1993-02-18 1998-09-15 Unisys Corporation Dual bus system with multiple processors having data coherency maintenance
US6205508B1 (en) * 1999-02-16 2001-03-20 Advanced Micro Devices, Inc. Method for distributing interrupts in a multi-processor system
US6266744B1 (en) * 1999-05-18 2001-07-24 Advanced Micro Devices, Inc. Store to load forwarding using a dependency link file
US6295573B1 (en) * 1999-02-16 2001-09-25 Advanced Micro Devices, Inc. Point-to-point interrupt messaging within a multiprocessing computer system
US6516368B1 (en) * 1999-11-09 2003-02-04 International Business Machines Corporation Bus master and bus snooper for execution of global operations utilizing a single token for multiple operations with explicit release
US6591321B1 (en) * 1999-11-09 2003-07-08 International Business Machines Corporation Multiprocessor system bus protocol with group addresses, responses, and priorities
US20040230751A1 (en) * 2003-05-12 2004-11-18 International Business Machines Corporation Coherency management for a "switchless'' distributed shared memory computer system
US20050102559A1 (en) * 2003-11-10 2005-05-12 Nokia Corporation Computer cluster, computer unit and method to control storage access between computer units
US20060150007A1 (en) * 2002-08-14 2006-07-06 Victor Gostynski Parallel processing platform with synchronous system halt/resume
US20080209134A1 (en) * 2005-02-10 2008-08-28 International Business Machines Corporation Apparatus for Operating Cache-Inhibited Memory Mapped Commands to Access Registers
US7657683B2 (en) * 2008-02-01 2010-02-02 Redpine Signals, Inc. Cross-thread interrupt controller for a multi-thread processor
US7895407B2 (en) * 2006-11-22 2011-02-22 International Business Machines Corporation Memory consistency protection in a multiprocessor computing system
US20110219208A1 (en) * 2010-01-08 2011-09-08 International Business Machines Corporation Multi-petascale highly efficient parallel supercomputer
US20130019085A1 (en) * 2011-07-12 2013-01-17 International Business Machines Corporation Efficient Recombining for Dual Path Execution
US20130152103A1 (en) * 2011-12-08 2013-06-13 International Business Machines Corporation Preparing parallel tasks to use a synchronization register
US20130151867A1 (en) * 2011-12-08 2013-06-13 International Business Machines Corporation Synchronized command throttling for multi-channel duty-cycle based memory power management
US20130173867A1 (en) * 2011-12-28 2013-07-04 Fujitsu Limited Information processing apparatus and unauthorized access prevention method
US8484307B2 (en) * 2008-02-01 2013-07-09 International Business Machines Corporation Host fabric interface (HFI) to perform global shared memory (GSM) operations
US20140089591A1 (en) * 2012-09-24 2014-03-27 Oracle International Corporation Supporting targeted stores in a shared-memory multiprocessor system
US20150293863A1 (en) * 2014-04-09 2015-10-15 International Business Machines Corporation Broadcast and unicast communication between non-coherent processors using coherent address operations

Patent Citations (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4564900A (en) * 1981-09-18 1986-01-14 Christian Rovsing A/S Multiprocessor computer system
US5809533A (en) * 1993-02-18 1998-09-15 Unisys Corporation Dual bus system with multiple processors having data coherency maintenance
US5638516A (en) * 1994-08-01 1997-06-10 Ncube Corporation Parallel processor that routes messages around blocked or faulty nodes by selecting an output port to a subsequent node from a port vector and transmitting a route ready signal back to a previous node
US6205508B1 (en) * 1999-02-16 2001-03-20 Advanced Micro Devices, Inc. Method for distributing interrupts in a multi-processor system
US6295573B1 (en) * 1999-02-16 2001-09-25 Advanced Micro Devices, Inc. Point-to-point interrupt messaging within a multiprocessing computer system
US6266744B1 (en) * 1999-05-18 2001-07-24 Advanced Micro Devices, Inc. Store to load forwarding using a dependency link file
US6516368B1 (en) * 1999-11-09 2003-02-04 International Business Machines Corporation Bus master and bus snooper for execution of global operations utilizing a single token for multiple operations with explicit release
US6591321B1 (en) * 1999-11-09 2003-07-08 International Business Machines Corporation Multiprocessor system bus protocol with group addresses, responses, and priorities
US20060150007A1 (en) * 2002-08-14 2006-07-06 Victor Gostynski Parallel processing platform with synchronous system halt/resume
US20040230751A1 (en) * 2003-05-12 2004-11-18 International Business Machines Corporation Coherency management for a "switchless'' distributed shared memory computer system
US7085898B2 (en) * 2003-05-12 2006-08-01 International Business Machines Corporation Coherency management for a “switchless” distributed shared memory computer system
US20050102559A1 (en) * 2003-11-10 2005-05-12 Nokia Corporation Computer cluster, computer unit and method to control storage access between computer units
US20080209134A1 (en) * 2005-02-10 2008-08-28 International Business Machines Corporation Apparatus for Operating Cache-Inhibited Memory Mapped Commands to Access Registers
US7895407B2 (en) * 2006-11-22 2011-02-22 International Business Machines Corporation Memory consistency protection in a multiprocessor computing system
US7657683B2 (en) * 2008-02-01 2010-02-02 Redpine Signals, Inc. Cross-thread interrupt controller for a multi-thread processor
US8484307B2 (en) * 2008-02-01 2013-07-09 International Business Machines Corporation Host fabric interface (HFI) to perform global shared memory (GSM) operations
US20110219208A1 (en) * 2010-01-08 2011-09-08 International Business Machines Corporation Multi-petascale highly efficient parallel supercomputer
US20160011996A1 (en) * 2010-01-08 2016-01-14 International Business Machines Corporation Multi-petascale highly efficient parallel supercomputer
US9081501B2 (en) * 2010-01-08 2015-07-14 International Business Machines Corporation Multi-petascale highly efficient parallel supercomputer
US20130019085A1 (en) * 2011-07-12 2013-01-17 International Business Machines Corporation Efficient Recombining for Dual Path Execution
US8867304B2 (en) * 2011-12-08 2014-10-21 International Business Machines Corporation Command throttling for multi-channel duty-cycle based memory power management
US9092272B2 (en) * 2011-12-08 2015-07-28 International Business Machines Corporation Preparing parallel tasks to use a synchronization register
US20130152103A1 (en) * 2011-12-08 2013-06-13 International Business Machines Corporation Preparing parallel tasks to use a synchronization register
US20130304997A1 (en) * 2011-12-08 2013-11-14 International Business Machines Corporation Command Throttling for Multi-Channel Duty-Cycle Based Memory Power Management
US8675444B2 (en) * 2011-12-08 2014-03-18 International Business Machines Corporation Synchronized command throttling for multi-channel duty-cycle based memory power management
US20130151867A1 (en) * 2011-12-08 2013-06-13 International Business Machines Corporation Synchronized command throttling for multi-channel duty-cycle based memory power management
US20130152101A1 (en) * 2011-12-08 2013-06-13 International Business Machines Corporation Preparing parallel tasks to use a synchronization register
US9104501B2 (en) * 2011-12-08 2015-08-11 International Business Machines Corporation Preparing parallel tasks to use a synchronization register
US20130173867A1 (en) * 2011-12-28 2013-07-04 Fujitsu Limited Information processing apparatus and unauthorized access prevention method
US8521977B2 (en) * 2011-12-28 2013-08-27 Fujitsu Limited Information processing apparatus and access control method
US9110718B2 (en) * 2012-09-24 2015-08-18 Oracle International Corporation Supporting targeted stores in a shared-memory multiprocessor system
US20140089591A1 (en) * 2012-09-24 2014-03-27 Oracle International Corporation Supporting targeted stores in a shared-memory multiprocessor system
US20150293863A1 (en) * 2014-04-09 2015-10-15 International Business Machines Corporation Broadcast and unicast communication between non-coherent processors using coherent address operations
US20150293844A1 (en) * 2014-04-09 2015-10-15 International Business Machines Corporation Broadcast and unicast communication between non-coherent processors using coherent address operations

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
'A New Solution to Coherence Problems in Multicache Systems' by Censier and Feautrier, IEEE Transactions on Computers, vol. c-27, No. 12, Dec. 1978. *
'Coherence Controller Architectures for SMP-Based CC-NUMA Multiprocessors' by Michael et al., copyright 1997 by ACM. *
'e500mc Core Reference Manual' by Freescale Semiconductor, Rev. 3, Mar. 2013. *
'PowerQUICC Data Cache Coherency' by Freescale Semiconductor, Application Note, AN3544, Rev. 0, Dec. 2007. *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10083142B2 (en) * 2015-12-07 2018-09-25 International Business Machines Corporation Addressing topology specific replicated bus units
US20220283814A1 (en) * 2019-08-06 2022-09-08 Ictk Holdings Co., Ltd. Processor, processor operation method and electronic device comprising same
US11886879B2 (en) * 2019-08-06 2024-01-30 Ictk Holdings Co., Ltd. Processor, processor operation method and electronic device comprising same for selective instruction execution based on operand address

Also Published As

Publication number Publication date
US9529760B1 (en) 2016-12-27

Similar Documents

Publication Publication Date Title
US11892949B2 (en) Reducing cache transfer overhead in a system
US9886397B2 (en) Load and store ordering for a strongly ordered simultaneous multithreading core
US9842081B2 (en) Implementing modal selection of bimodal coherent accelerator
US9747225B2 (en) Interrupt controller
US11138101B2 (en) Non-uniform memory access latency adaptations to achieve bandwidth quality of service
US9367478B2 (en) Controlling direct memory access page mappings
US10628314B2 (en) Dual clusters of fully connected integrated circuit multiprocessors with shared high-level cache
US20160179720A1 (en) Device table in system memory
US20180293172A1 (en) Hot cache line fairness arbitration in distributed modular smp system
US20150261679A1 (en) Host bridge with cache hints
US9529760B1 (en) Topology specific replicated bus unit addressing in a data processing system
US10310759B2 (en) Use efficiency of platform memory resources through firmware managed I/O translation table paging
US9330024B1 (en) Processing device and method thereof
US10083142B2 (en) Addressing topology specific replicated bus units
US8938588B2 (en) Ensuring forward progress of token-required cache operations in a shared cache
WO2017011021A1 (en) Systems and methods facilitating reduced latency via stashing in systems on chips
US9529717B2 (en) Preserving an invalid global domain indication when installing a shared cache line in a cache
US10802966B2 (en) Simultaneous, non-atomic request processing within an SMP environment broadcast scope for multiply-requested data elements using real-time parallelization
US11099989B2 (en) Coherency maintenance via physical cache coordinate comparison
US20200301832A1 (en) Page-based memory operation with hardware initiated secure storage key update
US10380020B2 (en) Achieving high bandwidth on ordered direct memory access write stream into a processor cache
US20170371710A1 (en) Detecting and enforcing control-loss restrictions within an application programming interface

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARNDT, RICHARD L.;AUERNHAMMER, FLORIAN;SHEN, HUGH;AND OTHERS;SIGNING DATES FROM 20151123 TO 20151130;REEL/FRAME:037221/0182

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FEPP Fee payment procedure

Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4