US6105113A - System and method for maintaining translation look-aside buffer (TLB) consistency - Google Patents

System and method for maintaining translation look-aside buffer (TLB) consistency Download PDF

Info

Publication number
US6105113A
US6105113A US08/915,912 US91591297A US6105113A US 6105113 A US6105113 A US 6105113A US 91591297 A US91591297 A US 91591297A US 6105113 A US6105113 A US 6105113A
Authority
US
United States
Prior art keywords
page table
table entry
tlb
page
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/915,912
Inventor
Curt F. Schimmel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
RPX Corp
Morgan Stanley and Co LLC
Original Assignee
Silicon Graphics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Silicon Graphics Inc filed Critical Silicon Graphics Inc
Priority to US08/915,912 priority Critical patent/US6105113A/en
Assigned to SILICON GRAPHICS, INC. reassignment SILICON GRAPHICS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SCHIMMEL, CURT F.
Application granted granted Critical
Publication of US6105113A publication Critical patent/US6105113A/en
Assigned to FOOTHILL CAPITAL CORPORATION reassignment FOOTHILL CAPITAL CORPORATION SECURITY AGREEMENT Assignors: SILICON GRAPHICS, INC.
Assigned to U.S. BANK NATIONAL ASSOCIATION, AS TRUSTEE reassignment U.S. BANK NATIONAL ASSOCIATION, AS TRUSTEE SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SILICON GRAPHICS, INC.
Assigned to GENERAL ELECTRIC CAPITAL CORPORATION reassignment GENERAL ELECTRIC CAPITAL CORPORATION SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SILICON GRAPHICS, INC.
Assigned to MORGAN STANLEY & CO., INCORPORATED reassignment MORGAN STANLEY & CO., INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GENERAL ELECTRIC CAPITAL CORPORATION
Assigned to GRAPHICS PROPERTIES HOLDINGS, INC. reassignment GRAPHICS PROPERTIES HOLDINGS, INC. CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: SILICON GRAPHICS, INC.
Assigned to RPX CORPORATION reassignment RPX CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRAPHICS PROPERTIES HOLDINGS, INC.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1072Decentralised address translation, e.g. in distributed shared memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/682Multiprocessor TLB consistency

Definitions

  • the present invention relates to translational look-aside buffers that are used for storing virtual memory address-to-physical memory address translations for a processor.
  • Computer systems including uni-processor computer systems and multiprocessor computer systems, typically run multiple processes or threads at a time. Each process requires some amount of physical memory. Often, physical memory is limited and must be allocated among the different processes.
  • Virtual memory schemes divide physical memory into pages and allocate the pages to the different processes. Physical memory that is so allocated is referred to as mapped memory.
  • each process that is allocated a block of physical memory is also provided with a set of translations for translating virtual addresses to assigned physical addresses of the allocated block.
  • Each set of translations can be stored in, for example, a page table.
  • a page table can be associated with a specific user or shared by multiple users. Alternatively, reverse page table techniques can be employed.
  • Page tables are commonly indexed by virtual page numbers and include a page table,entry (PTE) for each virtual page address. If a virtual page is stored in memory, a corresponding PTE includes a physical address of the page and control information such as a valid bit, permission bits, etc. The PTE for a page can be found by looking at an index that corresponds to the virtual address. Page tables can be implemented as sparse arrays and are typically stored in main memory.
  • a page table that is associated with the process is searched for the requested virtual address.
  • the process can access the desired page using the physical address in the PTE that is associated with the virtual address.
  • Computer systems typically employ one or more levels of cache memory between main memory and each processor in order to reduce memory access time.
  • Cache memories store data that is retrieved from main memory. Data that is retrieved by a processor must pass through the one or more levels of cache in order to get to the processor. Because caches tend to be small and physically close to the processor, sometimes located on-chip with the processor, cached data can generally be accessed much faster than data that is stored in main memory. Thus, caches are typically used to store data that needs to be repeatedly accessed by a processor, such as PTE translations.
  • TLB translational look-aside buffer
  • processors can overwrite least recently used translations in a TLB with more recently used translations.
  • the processor When the processor needs a translation, it first looks to the TLB. If a translation exists in the TLB, the processor retrieves the physical address from the TLB and accesses the data using the physical address. If the translation does not exist in the TLB (i.e, a TLB "miss"), the processor looks to the cache or main memory. These operations can be performed with hardware, software, firmware or any combination thereof.
  • a PTE can be retrieved from main memory and stored in both a processor cache and a TLB. Later, an operating system can change or invalidate the PTE. For example, in a distributed shared memory (DSM) system, data that is stored in the mapped physical memory location can be migrated or moved to another physical location. As a result of the migration, the PTE is no longer valid because it stores the physical address of the data prior to the migration. Thus, the operating system updates the PTE to reflect the new physical location of the data. The copy of the PTE that is stored in the processor cache and the TLB, however, is no longer valid.
  • DSM distributed shared memory
  • invalid translations In order to prevent processors from retrieving invalid translations from caches and TLBs, invalid translations must be flagged with an invalid bit or have a valid bit turned off. Alternatively, invalid translations can be updated with current, valid translations.
  • Cache consistency hardware schemes are divided into two main types, directory protocols and snooping protocols.
  • directory protocols the sharing status of memory is maintained in a centralized directory.
  • DSM distributed shared memory
  • a centralized controller is responsible for maintaining consistency of shared data. Any time that data stored in a memory location is changed, a check is performed in the centralized directory to determine whether a copy of the data is stored in a cache. If so, each copy is either updated or invalidated. For example, copies of a PTE stored in one or more caches could be invalidated by sending an invalidation signal and a page table entry address directly to each cache that stores a copy of the PTE.
  • Directory protocols are highly scalable and are preferred in large multi-processing systems.
  • each cache In snooping protocols, no central directory is maintained. Instead, each cache is responsible for the consistency of its data. Actions involving shared memory are broadcast to all caches. Each cache includes snooping hardware for snooping the broadcasts and for determining whether a broadcast requires that some action be taken on the cache. For example, a PTE stored in a cache could be invalidated by broadcasting an invalidation signal and a PTE address. The address snooping hardware for each cache receives the broadcast and searches the associated cache for the PTE address. If the PTE address is found, the associated translation is invalidated or updated. Because of the broadcast requirement, snooping protocols are generally implemented in shared bus architectures. Since the number of broadcasts generally increases with increasing caches, snooping protocols are not very scalable.
  • TLB consistency schemes In TLB consistency schemes, stale TLB entries are typically removed by broadcasting invalidation requests to all central processing units (CPUs) in a system via inter-CPU interrupts and then waiting for acknowledgments to come back from each CPU. This is almost universally performed with software. Few, if any systems, uses hardware. Interrupting the operating system or the CPU each time that a TLB entry needs to be invalidated, however, reduces the time that can be spent performing other tasks.
  • CPUs central processing units
  • invalidation requests can be sent only to CPUs where the process has executed. In either situation, substantial communication and synchronization overhead is incurred. Moreover, operating systems can invalidate every entry in a TLB rather than just the stale entries. This results in additional TLB miss overhead for the entries that were valid.
  • TLBs translational look-aside buffer when PTE in a cache is updated, that does not interrupt CPU processing or the operating system and does not invalidate valid TLB entries.
  • a system and method for updating TLBs is needed which reduces communication and synchronization overhead.
  • the present invention is a system and method for maintaining consistency between translational look-aside buffers (TLBs) and page tables by combining cache consistency techniques with TLBs.
  • the system includes a TLB having a TLB table for storing a list of virtual memory address-to-physical memory address translations or page table entries (PTEs).
  • PTEs virtual memory address-to-physical memory address translations or page table entries
  • the TLB also includes a hardware-based controller for identifying and updating, or invalidating, PTEs that are stored in the TLB table when the page table entry is changed by an operating system.
  • the TLB table includes a virtual memory (VM) address tag and a PTE address tag for indexing the list of translations, or PTEs.
  • the VM address tag can be searched for virtual memory address by the CPU. The after the word "The" and before the can retrieve translations, or PTEs, that are associated with virtual memory address.
  • the PTE address tag can be searched by the TLB controller for PTE addresses that are changed by the operating system and updated in a cache. Updating can include replacing an invalid translation with a valid translation or invalidating a PTE.
  • the TLB controller includes a snooping controller, a search engine and an updating module.
  • the snooping controller snoops a cachememory interconnect for cache consistency data.
  • Cache consistency data can be sent from a directory protocol cache consistency scheme or from a snooping protocol cache consistency scheme.
  • the search engine searches the PTE address tag of the TLB table for PTE addresses that are snooped by the snooping controller.
  • the updating module updates or invalidates PTEs, or translations, in the TLB table when the PTE is changed by the operating system.
  • the CPU when a CPU requires a physical memory address that is associated with a virtual memory address, the CPU first searches the virtual address tag of the TLB table. If a valid translation is not found in the TLB table, the translation is retrieved from a cache or from main memory and a copy of the translation is placed in the TLB table.
  • an update or invalidate signal is sent over the cache-memory interconnect to one or more caches, along with the PTE address.
  • the update or invalidate signal and PTE address are sent only to caches that have copies of the PTE.
  • the update or invalidate signal and PTE address are broadcast to all caches.
  • the TLB snooping controller snoops the cache-memory interconnect and detects the update or invalidate signal and the PTE address.
  • the TLB search engine searches the PTE address tag of the TLB table for the PTE address. If the PTE address is found in the TLB table, the associated translation is updated or invalidated by the TLB updating module. Thus, when the operating system changes a page table entry, translations in TLBs are automatically kept consistent.
  • the present invention can be implemented on any computer system that employs virtual memory.
  • the present invention can be implemented in both uni-processor environments and multiple processor environments.
  • the present invention is especially useful in shared memory, multi-processor systems where page migration occurs.
  • Shared memory systems that benefit from the present invention include centralized shared memory systems, such as, symmetric multiple processor (SMP) systems and distributed shared memory (DSM) systems.
  • SMP symmetric multiple processor
  • DSM distributed shared memory
  • the present invention can be employed to maintain consistency for any number of TLBs in a system.
  • TLBs By updating TLBs via hardware, communication overhead and costly operating system interrupts, that would otherwise occur with a software-based TLB update technique, are avoided.
  • FIG. 1 is a block diagram of a uni-processor system that can employ the present invention
  • FIG. 2 is a block diagram of a symmetric, shared memory, multiprocessor (SMP) system that can employ the present invention
  • FIG. 3 is a block diagram of a distributed, shared memory (DSM) system that can employ the present invention
  • FIG. 4 is a block diagram of a processor and cache node that can be employed by any of the processor systems of FIGS. 1, 2 and 3;
  • FIG. 5 is a block diagram of a virtual memory mapping scheme
  • FIG. 6 is an illustration of a page table that can be employed by a virtual memory mapping scheme for storing virtual memory address-to-physical memory address translations
  • FIG. 7 is a block diagram of the processor and cache node illustrated in FIG. 4, including a CPU and a translational look-aside buffer (TLB) for storing virtual memory address-to-physical memory address translations;
  • TLB translational look-aside buffer
  • FIG. 8 is a detailed block diagram of the processor and cache node illustrated in FIG. 7, including details of a TLB in accordance with the present invention
  • FIG. 9 is a process flowchart illustrating a method for placing page table entries, virtual memory page tags and page table entry address tags in a TLB.
  • FIG. 10 is a process flowchart illustrating a method for updating page table entries in TLBs when an operating system changes a page table entry.
  • virtual memory address-to-physical address translations are typically stored in a page table entry of a page table.
  • Each page table entry has a page table entry (PTE) address.
  • Copies of page table entries, or translations stored therein, can be cached in a processor cache and can be placed in translational look-aside buffers (TLBs) that are coupled to CPUs.
  • TLBs translational look-aside buffers
  • the present invention is a hardware-based system and method for maintaining consistency between virtual memory address translations that are stored in a TLB with virtual memory address translations that are stored in a memory. More specifically, the present invention updates, or invalidates, translations in TLBs when an operating system changes a translation in a PTE.
  • the present invention can be implemented in any computer system that employs a virtual memory scheme.
  • the present invention can be implemented in a variety of computer systems and environments, including, but not limited to, uni-processor computer systems, shared memory, symmetric multi-processing (SMP) systems and distributed shared memory (DSM) multi-processor systems.
  • SMP symmetric multi-processing
  • DSM distributed shared memory
  • the present invention can be implemented in an OriginTM scalable, distributed shared-memory multi-processor platform, manufactured by Silicon Graphics, Inc., Mountain View, Calif.
  • OriginTM scalable, distributed shared-memory multi-processor platform, manufactured by Silicon Graphics, Inc., Mountain View, Calif.
  • a uni-processor system 110 includes a single processor and cache node 114 coupled to a main memory 112.
  • Main memory 112 stores data for use by processor and cache node 114.
  • An input and output I/O system 116 provides interfacing to peripheral devices, such as, for example, user interfaces and memory devices which can include computer terminals and memory disks.
  • processor and cache node 114 can be implemented as processor and cache node 410.
  • Processor and cache node 410 includes a processor 412 coupled to a main memory, which can be main memory 112, via a cache 416.
  • main memory which can be main memory 112
  • cache 416 For clarity, only a single processor 412 and cache memory 416 are shown.
  • processors and multiple levels of cache can be employed.
  • Cache 416 is provided for caching data retrieved from a main memory such as main memory 112. Once data is cached in cache 416, processor 412 can retrieve data from cache 416. Processor 412 can generally retrieve data from cache 416 faster than it can access data in main memory 112 because of the proximity of cache 416 to processor 412 and because the memory parts that are used to make cache 416 are faster than the memory parts that are used to make main memory. Cache 416 can include one or more levels of cache, as dictated by needs of users.
  • processor 412 processes threads for one or more processes.
  • processor 412 needs to access data stored in main memory 112
  • an access request is sent. If processor 412 is permitted access to the requested data, main memory 112 returns the requested data to cache 416. Once the requested data is stored in cache 416, processor 412 can access the data as necessary. In the future, processor 412 can access data in cache 416 without accessing main memory 112.
  • a centralized, shared memory, symmetric multi-processing (SMP) system 210 includes a plurality of processor and cache nodes 212-218.
  • SMP 210 can include any number of nodes 212-218.
  • Processor and cache nodes 212-218 are coupled to a centralized, shared, main memory 220 via a bus 222.
  • An input/output I/O system 224 can be provided for interfacing SMP 210 with various external and peripheral devices, such as computer terminals and memory disks.
  • Processor and cache nodes 212-218 can be implemented, for example, as processor and cache node 410, in FIG. 4, described above. Alternatively, one or more processor and cache nodes 212-218 can employ a plurality of processors 412 and caches 416. In either implementation, SMP 210 permits multiple processors 412 to process a plurality of tasks in parallel. Centralized, shared memory 220 permits multiple processors 412 to share data between tasks.
  • a distributed shared memory (DSM) system 310 includes a number of processing nodes 350-364, interconnected via an interconnection network 344.
  • DSM 310 can include any number of processing nodes 350-364.
  • Each processing node 350-364 is illustrated with a processor and cache node 312-326 and a portion of distributed shared memory 328-342.
  • one or more of processing nodes 350-364 need not employ a processor and cache node.
  • Processor and cache nodes 312-326 can be implemented, for example, as processor and cache node 410 in FIG. 4, where each processor 412 accesses a portion of shared memory 328-342 through one or more levels of cache 416.
  • processor and cache nodes 312-326 can have a plurality of processors 412 and caches 416.
  • Distributed shared memory portions 328-342 are accessed by the processors within processing nodes 350-364 as if they formed a single continuous block of physical memory. As would be apparent to a person skilled in the art, one or more of processing nodes 350-364 need not employ a portion of shared memory.
  • each processing node 350-364 is shown with an optional input/output (I/O) device.
  • I/O input/output
  • one or more of processing nodes 350-364 need not have an I/O device.
  • different types of I/O devices and combinations of external peripherals and resources can be used in a DSM system.
  • one or more of processing nodes 350-364 can include any combination of processors or no processors, shared memory or no shared memory and I/O or no I/O.
  • each processing node 350-364 is shown with an optional cache consistency directory.
  • the present invention is not limited to directory-based cache consistency protocols. As would be apparent to one skilled in the art, a snooping protocol or any other hardware-based protocol or software-based cache consistency scheme can be employed.
  • each processing node 350-364 can include a portion of main memory. This physical proximity between processor and memory reduces memory latency with respect to the processor and memory within a processing node.
  • DSM 310 is preferably configured so that data which is accessed most frequently by a particular processing node is placed in the portion of main memory within the processing node. If that data is subsequently accessed more frequently by a processor in another processing node, the data is migrated, or moved, to a portion of main memory within the other processing node.
  • Virtual memory pages 512 include virtual pages F 1 , F 2 , F 3 and F 4 Virtual pages F 1 , F 2 , F 3 and F 4 can be referenced by a relative offset that is typically based upon page size. For example, if an operating system employs 4K pages, page F 1 has an offset of zero since it is the first virtual page. Page F 2 has a virtual offset of 4K. Page F 3 has a virtual offset of 8K and page F 4 has a virtual offset of 12K. Virtual address spaces are usually much larger than the amount of physical memory in a system.
  • One or more of virtual pages F 1 , F 2 , F 3 and F 4 can be stored in physical page frames 516.
  • F 1 is stored at P 4
  • F 2 is stored at P6
  • F 3 is stored at P1.
  • F4 is not currently stored in page frames 516.
  • a user that needs to access one or more of virtual pages F 1 , F 2 , F 3 or F 4 need to know if and where, in physical memory, the virtual page is stored.
  • a forward page table 610 provides virtual memory address-to-physical memory address translations. Each user or application that is allotted a portion of physical memory can be provided with a separate page table 610. In some cases, page tables can be shared. Each page table 610 includes a set of page table entries (PTEs) 616-622 for storing physical addresses and control information such as a valid bit, permission bits, etc.
  • PTEs page table entries
  • a processor translates the virtual memory address to a physical memory address and accesses the physical memory location.
  • PTEs 616-622 can also be referenced by a physical address at which each PTE is stored.
  • Virtual memory pages 614 can be referred to as implied virtual memory pages because page table 610 does not have to include specific virtual memory pages. Instead, the first PTE 616 is impliedly, or automatically, associated with the first virtual memory page. Similarly, the second PTE 618 is impliedly associated with the second virtual memory page.
  • first virtual memory page F1 is stored in page P4.
  • first PTE 616 references P4.
  • second virtual memory page F2 is stored in page P6 so the second PTE 618 references P6.
  • Third virtual memory page F3 is stored in page P1 so third PTE 620 references P1. Since fourth virtual memory page F4 is not stored in physical memory, fourth PTE 622 does not reference any page of physical memory. Thus, a reference F4 will result in a page table miss or page fault and F4 will have to be retrieved from disk 514.
  • page table 610 Another way of describing page table 610 is as an array that is indexed by the virtual page number of the desired mapping. For example, if virtual address is 0 ⁇ 10000 and the page size of a system is 4K bytes (i.e., 0 ⁇ 1000 bytes), then the virtual page number is the virtual address divided by the page size, or 0 ⁇ 10.
  • the PTE for page 0 ⁇ 10 can be found by simply looking at index 0 ⁇ 10 in the page table array.
  • the starting address of the array itself is maintained by the operating system in such a way that it is easy to find.
  • the work can be done in hardware where the base address of the page table array is loaded into a special hardware register inside a translational look-aside buffer (TLB). TLBs are discussed more fully below.
  • TLB translational look-aside buffer
  • Virtual address spaces are usually much larger than the amount of physical memory in a system.
  • forward page tables such as page table 610, are usually implemented as sparse arrays.
  • FIG. 6 The array implementation of a page table illustrated in FIG. 6 is just one possible data structure for translating virtual memory addresses to physical memory addresses. As would be apparent to one skilled in the art, other data structures can be employed. For example, a reverse page table can be employed where the number of page table entries equals the number of virtual pages that are stored in physical memory. When a translation is needed, the page table is searched for an entry that is tagged with the virtual address that is needed. Reverse page tables are typically implemented with a hash table.
  • page tables 610 are typically stored in main memory 112 and 220 portion respectively.
  • each page table 610 is preferably stored in a main memory portion 328-342 that is associated with a processor and cache node 312-326 on which an associated user is running. Page tables can also be cached in cache 416.
  • Page table entries can be placed in one or more translational look-aside buffers.
  • processor and cache node 410 is provided in greater detail wherein processor 412 includes a central processing unit (CPU) 714 and a translational look-aside buffer (TLB) 718.
  • TLB 718 is a relatively small, and typically on-chip, memory device that is used for storing a small number of most recently used virtual memory address-to-physical memory address translations. Because TLB 718 is small and typically on-chip, CPU 714 can quickly search and retrieve address translations from TLB 718.
  • CPU 714 retrieves a translation from a main memory, such as main memories 112, 220 or 328-342, the translation is stored in cache 416. A copy of the translation is also placed in TLB 718. Later, when CPU 714 requires a translation, CPU 714 or an operating system searches TLB 718. If the translation is not found in TLB 718 (i.e., a TLB "miss"), the desired translation can be loaded from the page tables in memory by hardware, software, firmware, or any combination thereof.
  • RISC Reduced Instruction Set
  • MIPS processors from Silicon Graphics, Inc.
  • CISC processors from Motorola Corporation's 68000 line
  • TLB 718 can provide translations to CPU 714 faster than cache 416 can to computer systems
  • TLB 718 introduces consistency problems. For example, if a translation from page table 610 is cached in cache 416 and in TLB 718, and if the page table entry is changed, the translation in cache 416 and TLB 718 must both be updated. Updating can include replacing an invalid translation with a valid translation or simply identifying the invalid translation as invalid.
  • a variety of systems and methods can be employed for maintaining consistency between translations stored in page table 610 in main memories 112, 220, 328-342 and translations that are cached in nodes such as nodes 114, 212-218 and 312-326.
  • TLB 718 Consistency of TLB 718, however, cannot be maintained with existing cache consistency systems and methods. This is because, in existing systems, TLB 718 can only be accessed by CPU 714 or by the operating system. For example, stale TLB entries are typically removed by broadcasting invalidation requests to all central processing units (CPUs) in a system via inter-CPU interrupts and then waiting for acknowledgments to come back from each CPU. Interrupting the operating system or the CPU each time that a TLB entry needs to be invalidated, however, reduces the time that can be spent performing other tasks.
  • CPUs central processing units
  • TLB invalidation requests are sent only to CPUs where the process has executed. In either situation, substantial communication and synchronization overhead is incurred. Moreover, operating systems can invalidate every entry in a TLB rather than just the stale entries. This results in additional TLB miss overhead for the entries that were valid. Thus, whenever a translation in TLB 718 is to be invalidated, CPU 714 or the operating system must be interrupted in order to execute the invalidation.
  • processor and cache node 410 a hardware-based system and method for updating a TLB when an operating system changes a PTE, without interrupting an associated CPU or the operating system, is illustrated as implemented in processor and cache node 410.
  • Processor and cache node 410 is coupled to a main memory 810 which can be main memory 112, 220, or 328-342, depending upon whether node 410 is implemented in a uni-processor environment, an SMP environment, or a DSM environment.
  • Cache 416 can include a cache controller 812 for controlling the contents of cache 416.
  • Cache consistency can be maintained between data cached in cache 416 -and data stored in main memory 810 by a variety of consistency techniques.
  • cache consistency can be maintained with an optional cache consistency directory 814 in main memory 810.
  • cache consistency can be maintained by a snooping protocol implemented within cache controller 812 which snoops bus 816 for broadcast messages.
  • page tables such as, for example, page tables 610, 818, 820 and 822 can be provided for storing virtual memory address-to-physical memory address translations.
  • reverse page tables or any other suitable mechanism can be employed for storing virtual memory address-to-physical memory address translations.
  • TLB 718 includes a TLB table 824 for storing a list 828 of virtual memory-to-physical memory address translations, or PTEs.
  • PTE list 828 is indexed by a virtual memory (VM) address tags 830.
  • VM address tags 830 can be searched by search engine 840.
  • TLB table 824 also includes PTE address tags 824 for indexing PTE in PTE list 828. The purpose of PTE address tags 824 are discussed below.
  • CPU 714 When CPU 714 needs to access a virtual memory address, CPU 714 first searches VM address tags 830, using search engine 840. If the requested virtual memory is found in VM address tags 830, the PTE that is associated with the VM address tag, or the physical address therein, is retrieved from PTE list 828. The retrieved PTE, or physical address therein, is used to access a physical page of main memory 810.
  • CPU 714 computes the page table entry address and looks in the cache 416 or main memory 810 for the computed address. If the page table entry address in main memory 810 does not store a valid physical address, the system will have to access an external memory device, such as disk drive 514, via an I/O interface such as IO system 116 or 224 to retrieve the data. After the data is brought into main memory 810, an appropriate PTE is updated with the physical address of the data. The PTE and the PTE address is sent to processor and cache node 410 and stored in cache 416. The PTE and PTE address are also sent to TLB 718 where they are placed in PTE list 828 and PTE address tag 832, respectively, and associated with a virtual memory tag 830.
  • I/O interface such as IO system 116 or 224
  • TLB 718 also includes a TLB controller 826 for detecting changes to PTEs and for updating PTEs in PTE list 828 when PTE changes are detected.
  • TLB controller 826 includes a snooping controller 838 for detecting activity on cache-memory interconnect 816 that affects PTEs.
  • Snooping controller 838 can be analogous to snooping protocols employed by cache controllers. As would be apparent to one skilled in the art given this description, snooping controller 838 can employ a variety of snooping protocols.
  • Snooping controller 838 constantly snoops cache-memory interconnect 816 for cache consistency data. Snooping controller 838 detects PTE changes on cache-memory interconnect 816 regardless of whether a snooping protocol or directory protocol is employed for maintaining consistency between cache 416 and main memory 810. This is because cached consistency data is present on cache-memory interconnect 816 either as a broadcast, in a snooping protocol based system, or as an instruction sent directly from directory cache consistency directory 814, in a directory-based protocol.
  • TLB controller 826 includes a search engine 834 for searching TLB table PTE address tags 832 and an updating module 836 for updating or invalidating PTEs in PTE list 828.
  • search engine 834 searches PTE address tags 824 for the PTE address that was detected by snooping controller 838. If search engine 834 finds the PTE address tag in column 832, updating module 836 updates or invalidates the associated PTE.
  • FIG. 9 a preferred method for storing PTE data in TLBs is illustrated.
  • FIG. 10 a preferred method for updating or invalidating PTE data in TLBs, when an operating system changes a PTE, is provided.
  • the process flowcharts of FIGS. 9 and 10 are described with reference to FIG. 8, where TLB table 824 and TLB controller 826 are implemented in processor and cache node 410. It is to be understood, however, that the present invention is not limited to implementation in processor and cache node 410.
  • the present invention can be practiced in any processor system which employs a virtual memory scheme, regardless of whether a cache system is employed.
  • the present invention can be implemented for any number of processors and TLBs in a processing system.
  • the process of placing PTE data in TLBs begins at step 910, where data is mapped into main memory such as main memory 810, according to any virtual memory mapping scheme.
  • main memory such as main memory 810
  • an operating system that controls processor and cache node 410 can map virtual memory, as discussed above in FIGS. 5 and 6, where virtual memory addresses 512 are mapped to physical addresses 516.
  • Main memory 810 can represent main memory 112 in uniprocessor system 110, main memory 220 in SMP 210 or main memory 328-342 in DSM 310.
  • the operating system In step 912, the operating system generates virtual memory address-to-physical memory address translations for the mapped data.
  • a page table such as page table 610 can be generated for a process that is provided with mapped memory.
  • the page table can be for the exclusive use of one user or can be shared by multiple users.
  • Page table 610 represents a variety of methods that can be employed for providing virtual memory address-to-physical memory address translations.
  • the generated page table is preferably stored in main memory 810.
  • the page table is preferably stored in main memory 112.
  • the page table is stored in main memory 220.
  • the page table can be stored in any portion of main memory 328-342.
  • the page table is stored in a portion of main memory 328-342 that is adjacent to a processor on which an associated application or user is running. Generation and storage of page tables is typically handled by the operating system.
  • step 914 virtual memory address-to-physical memory address translations, such as those in page tables 610, 818, 820 and 822 are available for use by processors, such as processor 412 and/or the operating system.
  • processors such as processor 412 and/or the operating system.
  • processor 412 when it needs to access mapped data, it references a virtual memory page.
  • processor 412 uses search engine 840 to search for the referenced virtual memory page in VM page tags 830. If the virtual memory page tag is found in column 830, processing proceeds to step 918 where the a translation, or physical address, is retrieved an associated PTE in PTE list 828. The physical address can be used to access the page of memory. Processing then stops at step 920.
  • step 916 if the referenced virtual page is not found in VM page tags 830 (i.e., a TLB "miss"), processing proceeds to step 922 where a PTE address is calculated for the referenced VM page.
  • step 924 the calculated PTE address is used to retrieve the PTE from cache 416, if it is stored there, or from main memory 810. If the referenced VM page is not in memory 810, it is brought into memory 810 and a PTE at the calculated PTE address is updated accordingly.
  • step 926 the PTE and PTE address is sent from main memory 810 to cache 416 on cache-memory interconnect 816, under control of the operating system and/or a memory controller.
  • the data is received by, and cached in, cache 416.
  • step 928 the PTE, or the virtual memory address-to-physical memory address translation that is stored therein, is sent to processor 412.
  • processor 412 When the PTE or translation is retrieved by processor 412, CPU 714 within processor 412 can use the translation to access data in physical memory.
  • step 930 the PTE, or the translation, is placed in TLB table 824. More specifically, the PTE, or portions thereof, are placed in PTE list 828, a virtual memory page tag that is associated with the PTE is placed in VM page tags 830 and the PTE address is placed in PTE address tags 832.
  • Step 930 can be performed by CPU 714, by the operating system, by TLB controller 826, or any combination thereof. Processing stops at step 932. Steps 910-932 can be repeated, in whole or in part, as necessary for any number of processes, processors, TLBs and computer systems.
  • a preferred method for updating TLB entries when a PTE entry is changed by the operating system begins at step 1010, where a page table entry is changed.
  • Page table entries can change for a variety of reasons. For example, referring to FIG. 5, a page table entry can change if a memory management scheme associated with main memories 810 re-maps, migrates or pages-out a page of data. The associated page table entry is changed to reflect the new physical address of the mapped data. This can occur, for example, where a mapped page of physical memory location is needed for some other use.
  • a cached copy of the PTE is updated or invalidated.
  • a command can be sent on cache-memory interconnect 816 to invalidate a copy of the PTE in cache 416.
  • the command can be sent as part of a hardware-based cache consistency protocol, such as a directory-based protocol or a snooping protocol, or as part of a software-based cache consistency protocol.
  • a consistency directory such as directory 814, records where data from memory 810 is cached. If action is taken on data within main memory 810, directory 814 is checked to determine whether the data is cached. If the data is cached, an invalidate command is sent to each cache that caches the data. Thus, if the data is cached in cache 416, an invalidate signal is sent to cache 416 via cache-memory interconnect 816.
  • the invalidate signal includes data for identifying the physical address in memory 810 that stores the data that has changed. If the changed data is a PTE, the invalidate signal will identify the page table entry address of the PTE.
  • Cache 416 can employ a cache controller 812 for receiving the invalidate signal and address identification data.
  • Cache controller 812 locates the data in cache 416 and sets an invalid bit for that data or otherwise marks it as invalid.
  • main memory 810 could send an updated or current PTE, if one exists, to cache 416.
  • TLB snooping controller 838 constantly snoops cache-memory interconnect 816 for activity that might affect translations in TLB table 824.
  • TLB snooping controller 838 can detect invalidate signals and addresses on cache-memory interconnect 816, regardless of whether the signals are sent from a directory based consistency scheme or a snooping based scheme.
  • step 1016 when TLB controller 826 detects activity that might affect translations in TLB table 824, search engine 834 searches PTE address tag 832 to determine whether an address that is detected on cache-memory interconnect 816 matches a PTE address tag in column 832.
  • TLB controller 826 determines whether an address that is detected on cache-memory interconnect 816 is in PTE address tag 832. If not, no further action is taken by TLB controller 826 and processing stops in step 1020.
  • step 1018 if the detected address is found in PTE address tag 832, processing proceeds to step 1022, where TLB controller updating module 836 updates a corresponding PTE in list 828. Updating module 836 can, for example, set an invalid bit for the PTE in list 828 that is associated with the PTE address. Thereafter, when CPU 714 uses search engine 840 to search VM page tag 830 for a virtual memory page that has an invalid bit set, CPU 714 will look to cache 418 or main memory 810 for a valid PTE at the PTE address.
  • TLB updating module 836 preferably updates TLB list 828 with a valid PTE for the detected PTE address.
  • TLB controller 826 in conjunction with PTE address tag 832, thus provides a system and method for updating TLB 718 when a PTE is changed by the operating system, without interrupting CPU 714 or the operating system. This frees CPU 714 to process other tasks.

Abstract

A system and method for maintaining consistency between translational look-aside buffers (TLB) and page tables. A TLB has a TLB table for storing a list of virtual memory address-to-physical memory address translations, or page table entries (PTES) and a hardware-based controller for invalidating a translation that is stored in the TLB table when a corresponding page table entry changes. The TLB table includes a virtual memory (VM) page tag and a page table entry address tag for indexing the list of translations The VM page tag can be searched for VM pages that are referenced by a process. If a referenced VM page is found, an associated physical address is retrieved for use by the processor. The TLB controller includes a snooping controller for snooping a cache-memory interconnect for activity that affects PTEs. The page table entry address tag can be searched by a search engine in the TLB controller for snooped page table entry addresses. The TLB controller includes an updating module for invalidating or updating translations associated with snooped page table entry addresses. Translations in TLBs are thus updated or invalidated through hardware when an operating system changes a PTE, without intervention by an operating system or other software.

Description

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to translational look-aside buffers that are used for storing virtual memory address-to-physical memory address translations for a processor.
2. Related Art
Computer systems, including uni-processor computer systems and multiprocessor computer systems, typically run multiple processes or threads at a time. Each process requires some amount of physical memory. Often, physical memory is limited and must be allocated among the different processes.
In order to allocate limited physical memory among multiple processes, computer systems employ virtual memory schemes. Virtual memory schemes divide physical memory into pages and allocate the pages to the different processes. Physical memory that is so allocated is referred to as mapped memory.
In a virtual memory scheme, each process that is allocated a block of physical memory is also provided with a set of translations for translating virtual addresses to assigned physical addresses of the allocated block. Each set of translations can be stored in, for example, a page table. A page table can be associated with a specific user or shared by multiple users. Alternatively, reverse page table techniques can be employed.
Page tables are commonly indexed by virtual page numbers and include a page table,entry (PTE) for each virtual page address. If a virtual page is stored in memory, a corresponding PTE includes a physical address of the page and control information such as a valid bit, permission bits, etc. The PTE for a page can be found by looking at an index that corresponds to the virtual address. Page tables can be implemented as sparse arrays and are typically stored in main memory.
When a process requests access to a virtual memory address, a page table that is associated with the process is searched for the requested virtual address. When the virtual address is found, the process can access the desired page using the physical address in the PTE that is associated with the virtual address.
Computer systems typically employ one or more levels of cache memory between main memory and each processor in order to reduce memory access time. Cache memories store data that is retrieved from main memory. Data that is retrieved by a processor must pass through the one or more levels of cache in order to get to the processor. Because caches tend to be small and physically close to the processor, sometimes located on-chip with the processor, cached data can generally be accessed much faster than data that is stored in main memory. Thus, caches are typically used to store data that needs to be repeatedly accessed by a processor, such as PTE translations.
In addition to caching translations, most processors employ an on-chip, translational look-aside buffer (TLB) for storing a number of most recently used, virtual memory address-to-physical memory address translations. When a processor retrieves a translation from main memory or from cache, it stores the translation in an associated TLB. The processor can retrieve a translation from the TLB faster than from the cache or from main memory. Because TLBs tend to be small, storing for example, forty eight translations, processors can overwrite least recently used translations in a TLB with more recently used translations.
When the processor needs a translation, it first looks to the TLB. If a translation exists in the TLB, the processor retrieves the physical address from the TLB and accesses the data using the physical address. If the translation does not exist in the TLB (i.e, a TLB "miss"), the processor looks to the cache or main memory. These operations can be performed with hardware, software, firmware or any combination thereof.
One problem that confronts both TLBs and caches is maintaining consistency of data that is stored in more than one location. For example, a PTE can be retrieved from main memory and stored in both a processor cache and a TLB. Later, an operating system can change or invalidate the PTE. For example, in a distributed shared memory (DSM) system, data that is stored in the mapped physical memory location can be migrated or moved to another physical location. As a result of the migration, the PTE is no longer valid because it stores the physical address of the data prior to the migration. Thus, the operating system updates the PTE to reflect the new physical location of the data. The copy of the PTE that is stored in the processor cache and the TLB, however, is no longer valid.
In order to prevent processors from retrieving invalid translations from caches and TLBs, invalid translations must be flagged with an invalid bit or have a valid bit turned off. Alternatively, invalid translations can be updated with current, valid translations.
Systems and methods for maintaining cache consistency, including hardware and software methods, are well known. Cache consistency hardware schemes, or protocols, are divided into two main types, directory protocols and snooping protocols. In directory protocols, the sharing status of memory is maintained in a centralized directory. In a distributed shared memory (DSM) system, the directory can be distributed. A centralized controller is responsible for maintaining consistency of shared data. Any time that data stored in a memory location is changed, a check is performed in the centralized directory to determine whether a copy of the data is stored in a cache. If so, each copy is either updated or invalidated. For example, copies of a PTE stored in one or more caches could be invalidated by sending an invalidation signal and a page table entry address directly to each cache that stores a copy of the PTE. Directory protocols are highly scalable and are preferred in large multi-processing systems.
In snooping protocols, no central directory is maintained. Instead, each cache is responsible for the consistency of its data. Actions involving shared memory are broadcast to all caches. Each cache includes snooping hardware for snooping the broadcasts and for determining whether a broadcast requires that some action be taken on the cache. For example, a PTE stored in a cache could be invalidated by broadcasting an invalidation signal and a PTE address. The address snooping hardware for each cache receives the broadcast and searches the associated cache for the PTE address. If the PTE address is found, the associated translation is invalidated or updated. Because of the broadcast requirement, snooping protocols are generally implemented in shared bus architectures. Since the number of broadcasts generally increases with increasing caches, snooping protocols are not very scalable.
Software cache coherence schemes rely on both operating system and application programmers to insure consistency. It is not always possible for programmers to predict process execution and interaction, however, limiting the effectiveness of software techniques.
Additional details of cache consistency schemes are provided in, for example: Schimmel, "UNIX Systems for Modern Architectures, Symmetric Multiprocessing and Caching for Kernel Programmers,"pp.287-333, Addison Wesley, 1994; Hennessy and Patterson, Computer Architecture A Quantitative Approach, 2d Ed., pp. 655-693, Morgan Kaufmann, 1996; Tomasevic and Milutinovic, The Cache Coherence Problem in Shared-Memory Multiprocessor: Hardware Solutions, IEEE Computer Society Press, 1993; and Cache Coherency Problem in Shared Memory Multiprocessors: Software Solutions, edited by Milo Tomasevic and Veljko Milutinovic, Los Angeles, Calif., IEEE Computer Society Press, 1993, each of which is incorporated by reference herein.
In TLB consistency schemes, stale TLB entries are typically removed by broadcasting invalidation requests to all central processing units (CPUs) in a system via inter-CPU interrupts and then waiting for acknowledgments to come back from each CPU. This is almost universally performed with software. Few, if any systems, uses hardware. Interrupting the operating system or the CPU each time that a TLB entry needs to be invalidated, however, reduces the time that can be spent performing other tasks.
Alternatively, invalidation requests can be sent only to CPUs where the process has executed. In either situation, substantial communication and synchronization overhead is incurred. Moreover, operating systems can invalidate every entry in a TLB rather than just the stale entries. This results in additional TLB miss overhead for the entries that were valid.
What is needed is a hardware-based, system and method for updating a translational look-aside buffer when PTE in a cache is updated, that does not interrupt CPU processing or the operating system and does not invalidate valid TLB entries. A system and method for updating TLBs is needed which reduces communication and synchronization overhead.
SUMMARY OF THE INVENTION
The present invention is a system and method for maintaining consistency between translational look-aside buffers (TLBs) and page tables by combining cache consistency techniques with TLBs. The system includes a TLB having a TLB table for storing a list of virtual memory address-to-physical memory address translations or page table entries (PTEs). The TLB also includes a hardware-based controller for identifying and updating, or invalidating, PTEs that are stored in the TLB table when the page table entry is changed by an operating system.
The TLB table includes a virtual memory (VM) address tag and a PTE address tag for indexing the list of translations, or PTEs. The VM address tag can be searched for virtual memory address by the CPU. The after the word "The" and before the can retrieve translations, or PTEs, that are associated with virtual memory address. The PTE address tag can be searched by the TLB controller for PTE addresses that are changed by the operating system and updated in a cache. Updating can include replacing an invalid translation with a valid translation or invalidating a PTE.
In one embodiment, the TLB controller includes a snooping controller, a search engine and an updating module. The snooping controller snoops a cachememory interconnect for cache consistency data. Cache consistency data can be sent from a directory protocol cache consistency scheme or from a snooping protocol cache consistency scheme. The search engine searches the PTE address tag of the TLB table for PTE addresses that are snooped by the snooping controller. The updating module updates or invalidates PTEs, or translations, in the TLB table when the PTE is changed by the operating system.
In operation, when a CPU requires a physical memory address that is associated with a virtual memory address, the CPU first searches the virtual address tag of the TLB table. If a valid translation is not found in the TLB table, the translation is retrieved from a cache or from main memory and a copy of the translation is placed in the TLB table.
When a translation in a page table is changed or invalidated by the operating system, an update or invalidate signal is sent over the cache-memory interconnect to one or more caches, along with the PTE address. In a directorybased protocol, the update or invalidate signal and PTE address are sent only to caches that have copies of the PTE. In a snooping protocol, the update or invalidate signal and PTE address are broadcast to all caches.
Regardless of which protocol is employed, the TLB snooping controller snoops the cache-memory interconnect and detects the update or invalidate signal and the PTE address. The TLB search engine searches the PTE address tag of the TLB table for the PTE address. If the PTE address is found in the TLB table, the associated translation is updated or invalidated by the TLB updating module. Thus, when the operating system changes a page table entry, translations in TLBs are automatically kept consistent.
The present invention can be implemented on any computer system that employs virtual memory. Thus, the present invention can be implemented in both uni-processor environments and multiple processor environments. The present invention is especially useful in shared memory, multi-processor systems where page migration occurs. Shared memory systems that benefit from the present invention include centralized shared memory systems, such as, symmetric multiple processor (SMP) systems and distributed shared memory (DSM) systems. The present invention can be employed to maintain consistency for any number of TLBs in a system.
By updating TLBs via hardware, communication overhead and costly operating system interrupts, that would otherwise occur with a software-based TLB update technique, are avoided.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the following drawings.
BRIEF DESCRIPTION OF THE FIGURES
The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
The present invention is described with reference to the accompanying figures, wherein:
FIG. 1 is a block diagram of a uni-processor system that can employ the present invention;
FIG. 2 is a block diagram of a symmetric, shared memory, multiprocessor (SMP) system that can employ the present invention;
FIG. 3 is a block diagram of a distributed, shared memory (DSM) system that can employ the present invention;
FIG. 4 is a block diagram of a processor and cache node that can be employed by any of the processor systems of FIGS. 1, 2 and 3;
FIG. 5 is a block diagram of a virtual memory mapping scheme;
FIG. 6 is an illustration of a page table that can be employed by a virtual memory mapping scheme for storing virtual memory address-to-physical memory address translations;
FIG. 7 is a block diagram of the processor and cache node illustrated in FIG. 4, including a CPU and a translational look-aside buffer (TLB) for storing virtual memory address-to-physical memory address translations;
FIG. 8 is a detailed block diagram of the processor and cache node illustrated in FIG. 7, including details of a TLB in accordance with the present invention;
FIG. 9 is a process flowchart illustrating a method for placing page table entries, virtual memory page tags and page table entry address tags in a TLB; and
FIG. 10 is a process flowchart illustrating a method for updating page table entries in TLBs when an operating system changes a page table entry.
In the drawings, like reference numbers typically indicate identical or finctionally similar elements. Additionally, the left-most digit(s) of a reference number typically identifies the drawing in which the reference number first appears.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Table of Contents
1. Overview
2. Example Embodiment
3. Virtual Memory Mapping and TLB Consistency
4. System and Method for Maintaining TLB Consistency
5. Conclusions
1. Overview
In conventional computer systems, virtual memory address-to-physical address translations are typically stored in a page table entry of a page table. Each page table entry has a page table entry (PTE) address. Copies of page table entries, or translations stored therein, can be cached in a processor cache and can be placed in translational look-aside buffers (TLBs) that are coupled to CPUs.
The present invention is a hardware-based system and method for maintaining consistency between virtual memory address translations that are stored in a TLB with virtual memory address translations that are stored in a memory. More specifically, the present invention updates, or invalidates, translations in TLBs when an operating system changes a translation in a PTE.
2. Example Embodiment
The present invention can be implemented in any computer system that employs a virtual memory scheme. Thus, the present invention can be implemented in a variety of computer systems and environments, including, but not limited to, uni-processor computer systems, shared memory, symmetric multi-processing (SMP) systems and distributed shared memory (DSM) multi-processor systems. For example, the present invention can be implemented in an Origin™ scalable, distributed shared-memory multi-processor platform, manufactured by Silicon Graphics, Inc., Mountain View, Calif. Brief descriptions of uni-processor systems, SMP systems and DSM systems are provided below. These examples are provided to assist in the description of the present invention, not to limit the present invention.
Referring to FIG. 1, a uni-processor system 110 includes a single processor and cache node 114 coupled to a main memory 112. Main memory 112 stores data for use by processor and cache node 114. An input and output I/O system 116 provides interfacing to peripheral devices, such as, for example, user interfaces and memory devices which can include computer terminals and memory disks.
Referring to FIG. 4, processor and cache node 114 can be implemented as processor and cache node 410. Processor and cache node 410 includes a processor 412 coupled to a main memory, which can be main memory 112, via a cache 416. For clarity, only a single processor 412 and cache memory 416 are shown. One skilled in the art will recognize that multiple processors and multiple levels of cache can be employed.
Cache 416 is provided for caching data retrieved from a main memory such as main memory 112. Once data is cached in cache 416, processor 412 can retrieve data from cache 416. Processor 412 can generally retrieve data from cache 416 faster than it can access data in main memory 112 because of the proximity of cache 416 to processor 412 and because the memory parts that are used to make cache 416 are faster than the memory parts that are used to make main memory. Cache 416 can include one or more levels of cache, as dictated by needs of users.
In operation, processor 412 processes threads for one or more processes. When processor 412 needs to access data stored in main memory 112, an access request is sent. If processor 412 is permitted access to the requested data, main memory 112 returns the requested data to cache 416. Once the requested data is stored in cache 416, processor 412 can access the data as necessary. In the future, processor 412 can access data in cache 416 without accessing main memory 112.
Referring to FIG. 2, a centralized, shared memory, symmetric multi-processing (SMP) system 210 includes a plurality of processor and cache nodes 212-218. SMP 210 can include any number of nodes 212-218. Processor and cache nodes 212-218 are coupled to a centralized, shared, main memory 220 via a bus 222. An input/output I/O system 224 can be provided for interfacing SMP 210 with various external and peripheral devices, such as computer terminals and memory disks.
Processor and cache nodes 212-218 can be implemented, for example, as processor and cache node 410, in FIG. 4, described above. Alternatively, one or more processor and cache nodes 212-218 can employ a plurality of processors 412 and caches 416. In either implementation, SMP 210 permits multiple processors 412 to process a plurality of tasks in parallel. Centralized, shared memory 220 permits multiple processors 412 to share data between tasks.
Referring to FIG. 3, a distributed shared memory (DSM) system 310 includes a number of processing nodes 350-364, interconnected via an interconnection network 344. DSM 310 can include any number of processing nodes 350-364. Each processing node 350-364 is illustrated with a processor and cache node 312-326 and a portion of distributed shared memory 328-342. As would be apparent to a person skilled in the art, one or more of processing nodes 350-364 need not employ a processor and cache node.
Processor and cache nodes 312-326 can be implemented, for example, as processor and cache node 410 in FIG. 4, where each processor 412 accesses a portion of shared memory 328-342 through one or more levels of cache 416. Alternatively, one or more processor and cache nodes 312-326 can have a plurality of processors 412 and caches 416.
Distributed shared memory portions 328-342 are accessed by the processors within processing nodes 350-364 as if they formed a single continuous block of physical memory. As would be apparent to a person skilled in the art, one or more of processing nodes 350-364 need not employ a portion of shared memory.
In the example of FIG. 3, each processing node 350-364 is shown with an optional input/output (I/O) device. As would be apparent to a person skilled in the art, one or more of processing nodes 350-364 need not have an I/O device. Moreover, different types of I/O devices and combinations of external peripherals and resources can be used in a DSM system. Thus, one or more of processing nodes 350-364 can include any combination of processors or no processors, shared memory or no shared memory and I/O or no I/O.
In the example of FIG. 3, each processing node 350-364 is shown with an optional cache consistency directory. However, the present invention is not limited to directory-based cache consistency protocols. As would be apparent to one skilled in the art, a snooping protocol or any other hardware-based protocol or software-based cache consistency scheme can be employed.
By distributing physical or main memory 328-342 throughout DSM 310, each processing node 350-364 can include a portion of main memory. This physical proximity between processor and memory reduces memory latency with respect to the processor and memory within a processing node. DSM 310 is preferably configured so that data which is accessed most frequently by a particular processing node is placed in the portion of main memory within the processing node. If that data is subsequently accessed more frequently by a processor in another processing node, the data is migrated, or moved, to a portion of main memory within the other processing node.
Uni-processor systems, SMPs and DSMs, such as systems 110, 210 and 310 described with reference to FIGS. 1-3, are well known. Further details of such systems can be found in, for example, Hennessy and Patterson, Computer Architecture A Quantitative Approach, 2d Ed. (Morgan and Kaufmann Publ.: USA 1996), incorporated herein by reference.
3. Virtual Memory Mapping and TLB Consistency
Referring to FIG. 5, a virtual memory address-to-physical main memory address translation scheme 510 is illustrated. Virtual memory pages 512 include virtual pages F1, F2, F3 and F4 Virtual pages F1, F2, F3 and F4 can be referenced by a relative offset that is typically based upon page size. For example, if an operating system employs 4K pages, page F1 has an offset of zero since it is the first virtual page. Page F2 has a virtual offset of 4K. Page F3 has a virtual offset of 8K and page F4 has a virtual offset of 12K. Virtual address spaces are usually much larger than the amount of physical memory in a system.
One or more of virtual pages F1, F2, F3 and F4 can be stored in physical page frames 516. Here, F1 is stored at P4, F2 is stored at P6 and F3 is stored at P1. F4 is not currently stored in page frames 516. A user that needs to access one or more of virtual pages F1, F2, F3 or F4 need to know if and where, in physical memory, the virtual page is stored.
A variety of schemes are available for providing virtual memory address-to-physical memory address translations. The most common scheme is a forward page table scheme.
Referring to FIG. 6, a forward page table 610 provides virtual memory address-to-physical memory address translations. Each user or application that is allotted a portion of physical memory can be provided with a separate page table 610. In some cases, page tables can be shared. Each page table 610 includes a set of page table entries (PTEs) 616-622 for storing physical addresses and control information such as a valid bit, permission bits, etc. A user or application that is running on a processor, such as processor 412 for example, can reference virtual pages F1, F2, F3 and F4 using their virtual memory addresses or offsets. A processor translates the virtual memory address to a physical memory address and accesses the physical memory location. PTEs 616-622 can also be referenced by a physical address at which each PTE is stored.
Virtual memory pages 614 can be referred to as implied virtual memory pages because page table 610 does not have to include specific virtual memory pages. Instead, the first PTE 616 is impliedly, or automatically, associated with the first virtual memory page. Similarly, the second PTE 618 is impliedly associated with the second virtual memory page.
Using the example from FIG. 5, the first virtual memory page F1 is stored in page P4. Thus, first PTE 616 references P4. Similarly, second virtual memory page F2 is stored in page P6 so the second PTE 618 references P6. Third virtual memory page F3 is stored in page P1 so third PTE 620 references P1. Since fourth virtual memory page F4 is not stored in physical memory, fourth PTE 622 does not reference any page of physical memory. Thus, a reference F4 will result in a page table miss or page fault and F4 will have to be retrieved from disk 514.
Another way of describing page table 610 is as an array that is indexed by the virtual page number of the desired mapping. For example, if virtual address is 0×10000 and the page size of a system is 4K bytes (i.e., 0×1000 bytes), then the virtual page number is the virtual address divided by the page size, or 0×10. The PTE for page 0×10 can be found by simply looking at index 0×10 in the page table array. The starting address of the array itself is maintained by the operating system in such a way that it is easy to find. Alternatively, the work can be done in hardware where the base address of the page table array is loaded into a special hardware register inside a translational look-aside buffer (TLB). TLBs are discussed more fully below.
Virtual address spaces are usually much larger than the amount of physical memory in a system. Thus, forward page tables, such as page table 610, are usually implemented as sparse arrays.
The array implementation of a page table illustrated in FIG. 6 is just one possible data structure for translating virtual memory addresses to physical memory addresses. As would be apparent to one skilled in the art, other data structures can be employed. For example, a reverse page table can be employed where the number of page table entries equals the number of virtual pages that are stored in physical memory. When a translation is needed, the page table is searched for an entry that is tagged with the virtual address that is needed. Reverse page tables are typically implemented with a hash table.
Additional details of virtual memory management can be found in, for example: Schimmel, "UNIX Systems for Modern Architectures, Symmetric Multiprocessing and Caching for Kernel Programmers," pp.7-8; and Goodheart and Cox, "The Magic Garden Explained," pp. 69-140, Prentice Hall, 1994, incorporated herein by reference.
In systems 110 and 210, page tables 610 are typically stored in main memory 112 and 220 portion respectively. In DSM 310, each page table 610 is preferably stored in a main memory portion 328-342 that is associated with a processor and cache node 312-326 on which an associated user is running. Page tables can also be cached in cache 416.
Page table entries can be placed in one or more translational look-aside buffers. For example, referring to FIG. 7, processor and cache node 410 is provided in greater detail wherein processor 412 includes a central processing unit (CPU) 714 and a translational look-aside buffer (TLB) 718. TLB 718 is a relatively small, and typically on-chip, memory device that is used for storing a small number of most recently used virtual memory address-to-physical memory address translations. Because TLB 718 is small and typically on-chip, CPU 714 can quickly search and retrieve address translations from TLB 718.
When CPU 714 retrieves a translation from a main memory, such as main memories 112, 220 or 328-342, the translation is stored in cache 416. A copy of the translation is also placed in TLB 718. Later, when CPU 714 requires a translation, CPU 714 or an operating system searches TLB 718. If the translation is not found in TLB 718 (i.e., a TLB "miss"), the desired translation can be loaded from the page tables in memory by hardware, software, firmware, or any combination thereof.
For example, many Reduced Instruction Set (RISC) processors, such as MIPS processors from Silicon Graphics, Inc., handle TLB misses through software. More traditional CISC processors, such as Motorola Corporation's 68000 line, handle TLB misses with hardware.
While TLB 718 can provide translations to CPU 714 faster than cache 416 can to computer systems, TLB 718 introduces consistency problems. For example, if a translation from page table 610 is cached in cache 416 and in TLB 718, and if the page table entry is changed, the translation in cache 416 and TLB 718 must both be updated. Updating can include replacing an invalid translation with a valid translation or simply identifying the invalid translation as invalid.
A variety of systems and methods can be employed for maintaining consistency between translations stored in page table 610 in main memories 112, 220, 328-342 and translations that are cached in nodes such as nodes 114, 212-218 and 312-326.
Consistency of TLB 718, however, cannot be maintained with existing cache consistency systems and methods. This is because, in existing systems, TLB 718 can only be accessed by CPU 714 or by the operating system. For example, stale TLB entries are typically removed by broadcasting invalidation requests to all central processing units (CPUs) in a system via inter-CPU interrupts and then waiting for acknowledgments to come back from each CPU. Interrupting the operating system or the CPU each time that a TLB entry needs to be invalidated, however, reduces the time that can be spent performing other tasks.
Alternatively, TLB invalidation requests are sent only to CPUs where the process has executed. In either situation, substantial communication and synchronization overhead is incurred. Moreover, operating systems can invalidate every entry in a TLB rather than just the stale entries. This results in additional TLB miss overhead for the entries that were valid. Thus, whenever a translation in TLB 718 is to be invalidated, CPU 714 or the operating system must be interrupted in order to execute the invalidation.
4. System and Method for Maintaining TLB Consistency
Referring to FIG. 8, a hardware-based system and method for updating a TLB when an operating system changes a PTE, without interrupting an associated CPU or the operating system, is illustrated as implemented in processor and cache node 410. Processor and cache node 410 is coupled to a main memory 810 which can be main memory 112, 220, or 328-342, depending upon whether node 410 is implemented in a uni-processor environment, an SMP environment, or a DSM environment. Cache 416 can include a cache controller 812 for controlling the contents of cache 416.
Cache consistency can be maintained between data cached in cache 416 -and data stored in main memory 810 by a variety of consistency techniques. For example, cache consistency can be maintained with an optional cache consistency directory 814 in main memory 810. Alternatively, cache consistency can be maintained by a snooping protocol implemented within cache controller 812 which snoops bus 816 for broadcast messages.
Any number of page tables, such as, for example, page tables 610, 818, 820 and 822 can be provided for storing virtual memory address-to-physical memory address translations. Alternatively, reverse page tables or any other suitable mechanism can be employed for storing virtual memory address-to-physical memory address translations.
When a virtual memory address-to-physical memory address translation is retrieved by a processor and cache node 410 from any of the page tables 610, 818-822, the translation is cached in cache 416, like any other memory access. The translation is also sent to CPU 714 so that it can access main memory 810, using the physical memory address. When CPU 714 receives a translation from any of page tables 610, 818-822, CPU 714 preferably stores the translation in TLB 718 for future use. Note that when an operation is described as being performed by a CPU, such as CPU 714, the operation can be performed solely by the CPU, solely by an operating system (not shown) that controls the CPU or by a combination of the CPU and the operating system.
TLB 718 includes a TLB table 824 for storing a list 828 of virtual memory-to-physical memory address translations, or PTEs. PTE list 828 is indexed by a virtual memory (VM) address tags 830. VM address tags 830 can be searched by search engine 840. TLB table 824 also includes PTE address tags 824 for indexing PTE in PTE list 828. The purpose of PTE address tags 824 are discussed below.
When CPU 714 needs to access a virtual memory address, CPU 714 first searches VM address tags 830, using search engine 840. If the requested virtual memory is found in VM address tags 830, the PTE that is associated with the VM address tag, or the physical address therein, is retrieved from PTE list 828. The retrieved PTE, or physical address therein, is used to access a physical page of main memory 810.
If the requested virtual memory address is not found in VM address tag 830, CPU 714 computes the page table entry address and looks in the cache 416 or main memory 810 for the computed address. If the page table entry address in main memory 810 does not store a valid physical address, the system will have to access an external memory device, such as disk drive 514, via an I/O interface such as IO system 116 or 224 to retrieve the data. After the data is brought into main memory 810, an appropriate PTE is updated with the physical address of the data. The PTE and the PTE address is sent to processor and cache node 410 and stored in cache 416. The PTE and PTE address are also sent to TLB 718 where they are placed in PTE list 828 and PTE address tag 832, respectively, and associated with a virtual memory tag 830.
TLB 718 also includes a TLB controller 826 for detecting changes to PTEs and for updating PTEs in PTE list 828 when PTE changes are detected. TLB controller 826 includes a snooping controller 838 for detecting activity on cache-memory interconnect 816 that affects PTEs. Snooping controller 838 can be analogous to snooping protocols employed by cache controllers. As would be apparent to one skilled in the art given this description, snooping controller 838 can employ a variety of snooping protocols.
Snooping controller 838 constantly snoops cache-memory interconnect 816 for cache consistency data. Snooping controller 838 detects PTE changes on cache-memory interconnect 816 regardless of whether a snooping protocol or directory protocol is employed for maintaining consistency between cache 416 and main memory 810. This is because cached consistency data is present on cache-memory interconnect 816 either as a broadcast, in a snooping protocol based system, or as an instruction sent directly from directory cache consistency directory 814, in a directory-based protocol.
TLB controller 826 includes a search engine 834 for searching TLB table PTE address tags 832 and an updating module 836 for updating or invalidating PTEs in PTE list 828. When snooping controller 838 detects activity on cache-memory interconnect 816 that affects a PTE, search engine 834 searches PTE address tags 824 for the PTE address that was detected by snooping controller 838. If search engine 834 finds the PTE address tag in column 832, updating module 836 updates or invalidates the associated PTE.
Referring to the process flowcharts of FIG. 9, a preferred method for storing PTE data in TLBs is illustrated. In FIG. 10, a preferred method for updating or invalidating PTE data in TLBs, when an operating system changes a PTE, is provided. The process flowcharts of FIGS. 9 and 10 are described with reference to FIG. 8, where TLB table 824 and TLB controller 826 are implemented in processor and cache node 410. It is to be understood, however, that the present invention is not limited to implementation in processor and cache node 410. The present invention can be practiced in any processor system which employs a virtual memory scheme, regardless of whether a cache system is employed. Moreover, the present invention can be implemented for any number of processors and TLBs in a processing system.
The process of placing PTE data in TLBs begins at step 910, where data is mapped into main memory such as main memory 810, according to any virtual memory mapping scheme. For example, an operating system that controls processor and cache node 410 can map virtual memory, as discussed above in FIGS. 5 and 6, where virtual memory addresses 512 are mapped to physical addresses 516. Main memory 810 can represent main memory 112 in uniprocessor system 110, main memory 220 in SMP 210 or main memory 328-342 in DSM 310.
In step 912, the operating system generates virtual memory address-to-physical memory address translations for the mapped data. For example, a page table, such as page table 610 can be generated for a process that is provided with mapped memory. The page table can be for the exclusive use of one user or can be shared by multiple users. Page table 610 represents a variety of methods that can be employed for providing virtual memory address-to-physical memory address translations.
The generated page table is preferably stored in main memory 810. Thus in uni-processor system 110, the page table is preferably stored in main memory 112. In SMP 210, the page table is stored in main memory 220. In DSM 310, the page table can be stored in any portion of main memory 328-342. Preferably, the page table is stored in a portion of main memory 328-342 that is adjacent to a processor on which an associated application or user is running. Generation and storage of page tables is typically handled by the operating system.
In step 914, virtual memory address-to-physical memory address translations, such as those in page tables 610, 818, 820 and 822 are available for use by processors, such as processor 412 and/or the operating system. Thus, in step 914, when processor 412 needs to access mapped data, it references a virtual memory page.
In step 916, processor 412 uses search engine 840 to search for the referenced virtual memory page in VM page tags 830. If the virtual memory page tag is found in column 830, processing proceeds to step 918 where the a translation, or physical address, is retrieved an associated PTE in PTE list 828. The physical address can be used to access the page of memory. Processing then stops at step 920.
In step 916, if the referenced virtual page is not found in VM page tags 830 (i.e., a TLB "miss"), processing proceeds to step 922 where a PTE address is calculated for the referenced VM page.
In step 924, the calculated PTE address is used to retrieve the PTE from cache 416, if it is stored there, or from main memory 810. If the referenced VM page is not in memory 810, it is brought into memory 810 and a PTE at the calculated PTE address is updated accordingly.
In step 926, the PTE and PTE address is sent from main memory 810 to cache 416 on cache-memory interconnect 816, under control of the operating system and/or a memory controller. The data is received by, and cached in, cache 416.
In step 928, the PTE, or the virtual memory address-to-physical memory address translation that is stored therein, is sent to processor 412. When the PTE or translation is retrieved by processor 412, CPU 714 within processor 412 can use the translation to access data in physical memory.
In step 930, the PTE, or the translation, is placed in TLB table 824. More specifically, the PTE, or portions thereof, are placed in PTE list 828, a virtual memory page tag that is associated with the PTE is placed in VM page tags 830 and the PTE address is placed in PTE address tags 832. Step 930 can be performed by CPU 714, by the operating system, by TLB controller 826, or any combination thereof. Processing stops at step 932. Steps 910-932 can be repeated, in whole or in part, as necessary for any number of processes, processors, TLBs and computer systems.
Referring to FIG. 10, a preferred method for updating TLB entries when a PTE entry is changed by the operating system, is illustrated. The process begins at step 1010, where a page table entry is changed. Page table entries can change for a variety of reasons. For example, referring to FIG. 5, a page table entry can change if a memory management scheme associated with main memories 810 re-maps, migrates or pages-out a page of data. The associated page table entry is changed to reflect the new physical address of the mapped data. This can occur, for example, where a mapped page of physical memory location is needed for some other use.
When a page table entry is changed, or deleted, any copies that are cached in cache 416 or in TLB 718 must be invalidated and/or updated. Thus, in step 1012, a cached copy of the PTE is updated or invalidated. For example, a command can be sent on cache-memory interconnect 816 to invalidate a copy of the PTE in cache 416. The command can be sent as part of a hardware-based cache consistency protocol, such as a directory-based protocol or a snooping protocol, or as part of a software-based cache consistency protocol.
For example, where a directory based cache consistency scheme is employed, a consistency directory, such as directory 814, records where data from memory 810 is cached. If action is taken on data within main memory 810, directory 814 is checked to determine whether the data is cached. If the data is cached, an invalidate command is sent to each cache that caches the data. Thus, if the data is cached in cache 416, an invalidate signal is sent to cache 416 via cache-memory interconnect 816. The invalidate signal includes data for identifying the physical address in memory 810 that stores the data that has changed. If the changed data is a PTE, the invalidate signal will identify the page table entry address of the PTE.
In a snooping-based cache consistency scheme, an invalidate signal is broadcast to all caches in the system. Thus, in a snooping protocol, all invalidate signals are detectable on cache-memory interconnect 816.
Cache 416 can employ a cache controller 812 for receiving the invalidate signal and address identification data. Cache controller 812 locates the data in cache 416 and sets an invalid bit for that data or otherwise marks it as invalid. Alternatively, instead of simply invalidating the cached translation in cache 416, main memory 810 could send an updated or current PTE, if one exists, to cache 416.
In step 1014, TLB snooping controller 838 constantly snoops cache-memory interconnect 816 for activity that might affect translations in TLB table 824. TLB snooping controller 838 can detect invalidate signals and addresses on cache-memory interconnect 816, regardless of whether the signals are sent from a directory based consistency scheme or a snooping based scheme.
In step 1016, when TLB controller 826 detects activity that might affect translations in TLB table 824, search engine 834 searches PTE address tag 832 to determine whether an address that is detected on cache-memory interconnect 816 matches a PTE address tag in column 832.
In step 1018, TLB controller 826 determines whether an address that is detected on cache-memory interconnect 816 is in PTE address tag 832. If not, no further action is taken by TLB controller 826 and processing stops in step 1020.
In step 1018, if the detected address is found in PTE address tag 832, processing proceeds to step 1022, where TLB controller updating module 836 updates a corresponding PTE in list 828. Updating module 836 can, for example, set an invalid bit for the PTE in list 828 that is associated with the PTE address. Thereafter, when CPU 714 uses search engine 840 to search VM page tag 830 for a virtual memory page that has an invalid bit set, CPU 714 will look to cache 418 or main memory 810 for a valid PTE at the PTE address.
Alternatively, where the cache consistency scheme employed by cache 416 replaces invalid PTEs with valid PTEs, TLB updating module 836 preferably updates TLB list 828 with a valid PTE for the detected PTE address.
After the PTE is invalidated or updated, processing stops at step 1026. TLB controller 826, in conjunction with PTE address tag 832, thus provides a system and method for updating TLB 718 when a PTE is changed by the operating system, without interrupting CPU 714 or the operating system. This frees CPU 714 to process other tasks.
5. Conclusions
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims (21)

What is claimed is:
1. A system for updating virtual memory address-to-physical memory address translations for a computer system, comprising:
a central processing unit;
a cache memory coupled to said central processing unit;
a translational look-aside buffer (TLB) table coupled to the central processing unit, said TLB table having:
a page table entry field for storing a list of page table entries, each page table entry identifying a page of physical memory that stores a virtual memory page,
a virtual memory address tag field for storing a list of virtual memory address tags, each virtual memory address tag associating a virtual memory address with each page table entry in said page table entry field, and
a page table entry address tag field for storing a list of page table entry address tags, each page table entry address tag associating a page table entry address with each page table entry in said page table entry field; and
a TLB controller that updates said TLB table without broadcast instructions by an operating system, said TLB controller having:
a snooping controller that snoops a cache-memory interconnect for page table entry addresses and for an indication that a page table entry in said cache memory has changed,
a search engine that searches said page table entry address tag in said TLB table for a snooped page table entry address when said snooping controller detects that a page table entry has changed, and
an updating module that updates a page table entry in said TLB table that is associated with the snooped page table entry address.
2. The system according to claim 1, wherein said updating module comprises a module that invalidates a translation in said TLB table by changing the state of a valid bit.
3. The system according to claim 1, wherein said updating module comprises a module that replaces an invalid translation in said TLB table with a valid translation.
4. A translational look-aside buffer (TLB), comprising:
a TLB table having:
a page table entry field for storing a list of page table entries, each page table entry identifying a page of physical memory that stores a virtual memory page,
a virtual memory address tag field for storing a list of virtual memory address tags, each virtual memory address tag associating a virtual memory address with each page table entry in said page table entry field, and
a page table entry address tag field for storing a list of page table entry address tags, each page table entry address tag associating a page table entry address with each page table entry in said page table entry field;
a snooping controller that snoops a cache-memory interconnect for page table entry addresses and for an indication that a page table entry in said cache memory has changed;
a search engine, coupled to said snooping controller, that searches said page table entry address tag in said TLB table for a snooped page table entry address when said snooping controller detects that a page table entry has changed; and
an updating module, coupled to said search engine and to said TLB table, that updates a page table entry in said TLB table that is associated with the snooped page table entry address.
5. The system according to claim 4, wherein said updating module comprises a module that invalidates a translation in said TLB table by changing the state of a valid bit.
6. The system according to claim 4, wherein said updating module comprises a module that invalidates less than all translations in said TLB table.
7. The system according to claim 4, wherein said updating module comprises a module that replaces an invalid translation in said TLB table with a valid translation.
8. A method for maintaining consistency of a translational look-aside buffer (TLB), comprising the steps of:
(1) storing information in a TLB coupled to a central processing unit, said TLB having:
a page table entry field for storing a list of page table entries, each page table entry identifying a page of physical memory that stores a virtual memory page,
a virtual memory address tag field for storing a list of virtual memory address tags, each virtual memory address tag associating a virtual memory address with each page table entry in said page table entry field, and
a page table entry address tag field for storing a list of page table entry address tags, each page table entry address tag associating a page table entry address with each page table entry in said page table entry field;
(2) snooping a cache-memory interconnect for a page table entry address and for an indication that a page table entry associated with the page table entry address has changed;
(3) searching the TLB for the page table entry address when it has been determined in step (2) that a page table entry has changed; and
(4) updating a page table entry associated with the changed page table address when the TLB includes the page table entry address.
9. The method according to claim 8, wherein step (1) comprises snooping a cache-memory interconnect of a directory-based cache consistency system.
10. The method according to claim 8, wherein step (1) comprises snooping a cache-memory interconnect of a snooping-based cache consistency system.
11. The method according to claim 8, wherein step (3) comprises updating the page table entry without interrupting a central processing unit associated with the TLB.
12. The method according to claim 8, wherein step (3) comprises updating the page table entry without intervention by an operating system.
13. The method according to claim 8, wherein step (3) comprises updating the page table entry without software intervention.
14. The method according to claim 8, wherein step (3) comprises updating the page table entry when data migrates to another memory location.
15. The method according to claim 8, wherein step (3) comprises invalidating a translation in the TLB table.
16. The method according to claim 8, wherein step (3) comprises invalidating less than all translations in the TLB table.
17. The method according to claim 8, wherein step (3) comprises replacing an invalid with a valid translation.
18. A computer system comprising:
at least one processor having at least one translational look-aside buffer (TLB), said TLB including:
a TLB table having:
a page table entry field for storing a list of page table entries, each page table entry identifying a page of physical memory that stores a virtual memory page,
a virtual memory address tag field for storing a list of virtual memory address tags, each virtual memory address tag associating a virtual memory address with each page table entry in said page table entry field, and
a page table entry address tag field for storing a list of page table entry address tags, each page table entry address tag associating a page table entry address with each page table entry in said page table entry field,
a snooping controller that snoops a cache-memory interconnect for page table entry addresses and for an indication that a page table entry in said cache memory has changed,
a search engines coupled to said snooping controller, that searches said page table entry address tag in said TLB table for a snooped page table entry address when said snooping controller detects that a page table entry has changed, and
an updating module, coupled to said search engine and to said TLB table, that updates a page table entry in said TLB table that is associated with the snooped page table entry address.
19. The computer system according to claim 18, wherein said updating module comprises a module that replaces an invalid translation with a valid translation.
20. The computer system according to claim 18, wherein said updating module comprises a module that invalidates a translation in said TLB table by changing the state of a valid bit.
21. The computer system according to claim 18, wherein said updating module comprises a module that invalidates less than all of the translations in said TLB table.
US08/915,912 1997-08-21 1997-08-21 System and method for maintaining translation look-aside buffer (TLB) consistency Expired - Lifetime US6105113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/915,912 US6105113A (en) 1997-08-21 1997-08-21 System and method for maintaining translation look-aside buffer (TLB) consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/915,912 US6105113A (en) 1997-08-21 1997-08-21 System and method for maintaining translation look-aside buffer (TLB) consistency

Publications (1)

Publication Number Publication Date
US6105113A true US6105113A (en) 2000-08-15

Family

ID=25436420

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/915,912 Expired - Lifetime US6105113A (en) 1997-08-21 1997-08-21 System and method for maintaining translation look-aside buffer (TLB) consistency

Country Status (1)

Country Link
US (1) US6105113A (en)

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6263403B1 (en) * 1999-10-31 2001-07-17 Hewlett-Packard Company Method and apparatus for linking translation lookaside buffer purge operations to cache coherency transactions
US20020039365A1 (en) * 1999-03-17 2002-04-04 Broadcom Corporation Pipelined searches with a cache table
US6490671B1 (en) * 1999-05-28 2002-12-03 Oracle Corporation System for efficiently maintaining translation lockaside buffer consistency in a multi-threaded, multi-processor virtual memory system
US6496907B1 (en) * 1999-10-22 2002-12-17 Apple Computer, Inc. System and method for updating from a read-only to a read-write entry and concurrently invalidating stale cache copies from head-to-tail and tail-to-head directions
US20030055889A1 (en) * 2001-08-27 2003-03-20 Meng-Cheng Chen Cache method
US6594735B1 (en) * 1998-12-28 2003-07-15 Nortel Networks Limited High availability computing system
US20030140197A1 (en) * 2002-01-18 2003-07-24 Vanderwiel Steven Paul Multi-processor computer system using partition group directories to maintain cache coherence
US20030163543A1 (en) * 2002-02-28 2003-08-28 Silicon Graphics, Inc. Method and system for cache coherence in DSM multiprocessor system without growth of the sharing vector
US6633967B1 (en) * 2000-08-31 2003-10-14 Hewlett-Packard Development Company, L.P. Coherent translation look-aside buffer
US6651143B2 (en) * 2000-12-21 2003-11-18 International Business Machines Corporation Cache management using a buffer for invalidation requests
US20040015969A1 (en) * 2002-06-24 2004-01-22 Chang Stephen S. Controlling snoop activities using task table in multiprocessor system
US20040064654A1 (en) * 2001-03-30 2004-04-01 Willis Thomas E. Method and apparatus including heuristic for sharing TLB entries
US20040107321A1 (en) * 2000-12-14 2004-06-03 Altman Erik R. Symmetric multi-processing system
US20040215898A1 (en) * 2003-04-28 2004-10-28 International Business Machines Corporation Multiprocessor system supporting multiple outstanding TLBI operations per partition
US6851038B1 (en) * 2000-05-26 2005-02-01 Koninklijke Philips Electronics N.V. Background fetching of translation lookaside buffer (TLB) entries
US20050044128A1 (en) * 2003-08-18 2005-02-24 Scott Steven L. Decoupled store address and data in a multiprocessor system
US20050044340A1 (en) * 2003-08-18 2005-02-24 Kitrick Sheets Remote translation mechanism for a multinode system
US20050044339A1 (en) * 2003-08-18 2005-02-24 Kitrick Sheets Sharing memory within an application using scalable hardware resources
US6868481B1 (en) * 2000-10-31 2005-03-15 Hewlett-Packard Development Company, L.P. Cache coherence protocol for a multiple bus multiprocessor system
US6931510B1 (en) * 2000-07-31 2005-08-16 Sun Microsystems, Inc. Method and system for translation lookaside buffer coherence in multiprocessor systems
US20060004941A1 (en) * 2004-06-30 2006-01-05 Shah Hemal V Method, system, and program for accessesing a virtualized data structure table in cache
US20060004795A1 (en) * 2004-06-30 2006-01-05 Intel Corporation Method, system, and program for utilizing a virtualized data structure table
US20060101227A1 (en) * 2001-03-30 2006-05-11 Willis Thomas E Method and apparatus for sharing TLB entries
US20060112242A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Application transparent autonomic data replication improving access performance for a storage area network aware file system
US20060112243A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Application transparent autonomic availability on a storage area network aware file system
US20060112140A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Autonomic data caching and copying on a storage area network aware file system using copy services
US7069413B1 (en) 2003-01-29 2006-06-27 Vmware, Inc. Method and system for performing virtual to physical address translations in a virtual machine monitor
US20060168419A1 (en) * 2004-12-23 2006-07-27 Fujitsu Siemens Computers Gmbh Method for updating entries of address conversion buffers in a multi-processor computer system
US20060230237A1 (en) * 2005-04-07 2006-10-12 Fujitsu Limited Method and system for maintaining cache coherence of distributed shared memory system
US20060285397A1 (en) * 2005-06-06 2006-12-21 Sony Corporation Storage device
US20070174558A1 (en) * 2005-11-17 2007-07-26 International Business Machines Corporation Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
CN1333350C (en) * 2002-01-09 2007-08-22 国际商业机器公司 Method and apparatus for using global snooping to provide cache coherence to distributed computer nodes in a single coherent system
US20070214339A1 (en) * 2006-03-10 2007-09-13 Microsoft Corporation Selective address translation for a resource such as a hardware device
US20070283127A1 (en) * 2003-08-18 2007-12-06 Cray Inc. Method and apparatus for indirectly addressed vector load-add-store across multi-processors
US7334110B1 (en) 2003-08-18 2008-02-19 Cray Inc. Decoupled scalar/vector computer architecture system and method
US20080046657A1 (en) * 2006-08-18 2008-02-21 Eichenberger Alexandre E System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses
US7366873B1 (en) 2003-08-18 2008-04-29 Cray, Inc. Indirectly addressed vector load-operate-store method and apparatus
US7437521B1 (en) 2003-08-18 2008-10-14 Cray Inc. Multistream processing memory-and barrier-synchronization method and apparatus
CN100428198C (en) * 2005-03-31 2008-10-22 国际商业机器公司 System and method of improving task switching
US20080301398A1 (en) * 2007-06-01 2008-12-04 Intel Corporation Linear to physical address translation with support for page attributes
US7503048B1 (en) 2003-08-18 2009-03-10 Cray Incorporated Scheduling synchronization of programs running as streams on multiple processors
US7519771B1 (en) 2003-08-18 2009-04-14 Cray Inc. System and method for processing memory instructions using a forced order queue
US7543133B1 (en) * 2003-08-18 2009-06-02 Cray Inc. Latency tolerant distributed shared memory multiprocessor computer
US7617378B2 (en) 2003-04-28 2009-11-10 International Business Machines Corporation Multiprocessor system with retry-less TLBI protocol
US7735088B1 (en) 2003-08-18 2010-06-08 Cray Inc. Scheduling synchronization of programs running as streams on multiple processors
US7757497B1 (en) 2005-03-09 2010-07-20 Cray Inc. Method and apparatus for cooling electronic components
US20100217951A1 (en) * 2005-11-04 2010-08-26 Jesse Pan R and C Bit Update Handling
US20100235586A1 (en) * 2009-03-11 2010-09-16 Apple Inc. Multi-core processor snoop filtering
US20110047376A1 (en) * 2000-06-30 2011-02-24 Intel Corporation Method and apparatus for secure execution using a secure memory partition
CN101346706B (en) * 2005-12-29 2011-06-22 英特尔公司 Virtual translation look-aside buffer
US20120023296A1 (en) * 2010-05-11 2012-01-26 Shoumeng Yan Recording Dirty Information in Software Distributed Shared Memory Systems
US8307194B1 (en) 2003-08-18 2012-11-06 Cray Inc. Relaxed memory consistency model
US20130191577A1 (en) * 2012-01-04 2013-07-25 Ramesh Thomas Increasing virtual-memory efficiencies
US20140089572A1 (en) * 2012-09-24 2014-03-27 Oracle International Corporation Distributed page-table lookups in a shared-memory system
US20140129798A1 (en) * 2012-11-02 2014-05-08 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US20140281296A1 (en) * 2013-03-14 2014-09-18 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
WO2014182584A1 (en) * 2013-05-06 2014-11-13 Microsoft Corporation Instruction set specific execution isolation
US9069715B2 (en) 2012-11-02 2015-06-30 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US20160140040A1 (en) * 2014-11-14 2016-05-19 Cavium, Inc. Filtering translation lookaside buffer invalidations
US20160154584A1 (en) * 2008-06-20 2016-06-02 Netapp, Inc. System and method for achieving high performance data flow among user space processes in storage systems
US9684606B2 (en) 2014-11-14 2017-06-20 Cavium, Inc. Translation lookaside buffer invalidation suppression
WO2017190266A1 (en) * 2016-05-03 2017-11-09 华为技术有限公司 Method for managing translation lookaside buffer and multi-core processor
CN109032533A (en) * 2018-08-29 2018-12-18 新华三技术有限公司 A kind of date storage method, device and equipment
US20190361815A1 (en) * 2018-05-25 2019-11-28 Red Hat, Inc. Enhanced address space layout randomization
US10740239B2 (en) 2018-12-11 2020-08-11 International Business Machines Corporation Translation entry invalidation in a multithreaded data processing system
US10776281B2 (en) * 2018-10-04 2020-09-15 International Business Machines Corporation Snoop invalidate filter for distributed memory management unit to reduce snoop invalidate latency
US10817434B2 (en) 2018-12-19 2020-10-27 International Business Machines Corporation Interruptible translation entry invalidation in a multithreaded data processing system
US10977183B2 (en) 2018-12-11 2021-04-13 International Business Machines Corporation Processing a sequence of translation entry invalidation requests with regard to draining a processor core
CN113742333A (en) * 2020-05-29 2021-12-03 杭州海康威视数字技术股份有限公司 Dimension table data updating method and device and electronic equipment
US11741015B2 (en) * 2013-03-14 2023-08-29 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339399A (en) * 1991-04-12 1994-08-16 Intel Corporation Cache controller that alternately selects for presentation to a tag RAM a current address latch and a next address latch which hold addresses captured on an input bus
US5428761A (en) * 1992-03-12 1995-06-27 Digital Equipment Corporation System for achieving atomic non-sequential multi-word operations in shared memory
US5437017A (en) * 1992-10-09 1995-07-25 International Business Machines Corporation Method and system for maintaining translation lookaside buffer coherency in a multiprocessor data processing system
US5463750A (en) * 1993-11-02 1995-10-31 Intergraph Corporation Method and apparatus for translating virtual addresses in a data processing system having multiple instruction pipelines and separate TLB's
US5737756A (en) * 1995-04-28 1998-04-07 Unisys Corporation Dual bus computer network using dual busses with dual spy modules enabling clearing of invalidation queue for processor with store through cache while providing retry cycles for incomplete accesses to invalidation queue
US5752274A (en) * 1994-11-08 1998-05-12 Cyrix Corporation Address translation unit employing a victim TLB
US5761734A (en) * 1996-08-13 1998-06-02 International Business Machines Corporation Token-based serialisation of instructions in a multiprocessor system
US5765022A (en) * 1995-09-29 1998-06-09 International Business Machines Corporation System for transferring data from a source device to a target device in which the address of data movement engine is determined

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339399A (en) * 1991-04-12 1994-08-16 Intel Corporation Cache controller that alternately selects for presentation to a tag RAM a current address latch and a next address latch which hold addresses captured on an input bus
US5428761A (en) * 1992-03-12 1995-06-27 Digital Equipment Corporation System for achieving atomic non-sequential multi-word operations in shared memory
US5437017A (en) * 1992-10-09 1995-07-25 International Business Machines Corporation Method and system for maintaining translation lookaside buffer coherency in a multiprocessor data processing system
US5463750A (en) * 1993-11-02 1995-10-31 Intergraph Corporation Method and apparatus for translating virtual addresses in a data processing system having multiple instruction pipelines and separate TLB's
US5752274A (en) * 1994-11-08 1998-05-12 Cyrix Corporation Address translation unit employing a victim TLB
US5737756A (en) * 1995-04-28 1998-04-07 Unisys Corporation Dual bus computer network using dual busses with dual spy modules enabling clearing of invalidation queue for processor with store through cache while providing retry cycles for incomplete accesses to invalidation queue
US5765022A (en) * 1995-09-29 1998-06-09 International Business Machines Corporation System for transferring data from a source device to a target device in which the address of data movement engine is determined
US5761734A (en) * 1996-08-13 1998-06-02 International Business Machines Corporation Token-based serialisation of instructions in a multiprocessor system

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Goodheart, B. and James Cox, The Magic Garden Explained , Prentice Hall, 1994, pp. ix xix and 69 457 and 634 760, Morgan and Kaufman Publishing, USA 1996. *
Goodheart, B. and James Cox, The Magic Garden Explained, Prentice Hall, 1994, pp. ix-xix and 69-457 and 634-760, Morgan and Kaufman Publishing, USA 1996.
Hennessy and Patterson, Computer Architecture: A Quantitative Approach , 2 nd Ed., pp. ix xii, 372 457 and 634 760, Morgan and Kaufman Publishing, USA 1996. *
Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 2nd Ed., pp. ix-xii, 372-457 and 634-760, Morgan and Kaufman Publishing, USA 1996.
Schimmel, UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers , Addison Wesley Professional Computing Series, 1994, pp. vii xiv, 5 8 and 287 340. *
Schimmel, UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers, Addison-Wesley Professional Computing Series, 1994, pp. vii-xiv, 5-8 and 287-340.
Tomasevic, M. and V. Milutinovic, The Cache Coherence Problem in Shared Memory Multiprocessors: Hardware Solution , IEEE Computer Society Press, 1993. *
Tomasevic, M. and V. Milutinovic, The Cache Coherence Problem in Shared-Memory Multiprocessors: Hardware Solution, IEEE Computer Society Press, 1993.

Cited By (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594735B1 (en) * 1998-12-28 2003-07-15 Nortel Networks Limited High availability computing system
US20020039365A1 (en) * 1999-03-17 2002-04-04 Broadcom Corporation Pipelined searches with a cache table
US6490671B1 (en) * 1999-05-28 2002-12-03 Oracle Corporation System for efficiently maintaining translation lockaside buffer consistency in a multi-threaded, multi-processor virtual memory system
US6496907B1 (en) * 1999-10-22 2002-12-17 Apple Computer, Inc. System and method for updating from a read-only to a read-write entry and concurrently invalidating stale cache copies from head-to-tail and tail-to-head directions
US6263403B1 (en) * 1999-10-31 2001-07-17 Hewlett-Packard Company Method and apparatus for linking translation lookaside buffer purge operations to cache coherency transactions
US6851038B1 (en) * 2000-05-26 2005-02-01 Koninklijke Philips Electronics N.V. Background fetching of translation lookaside buffer (TLB) entries
US9305183B2 (en) 2000-06-30 2016-04-05 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9507962B2 (en) 2000-06-30 2016-11-29 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9547779B2 (en) 2000-06-30 2017-01-17 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US8549275B2 (en) * 2000-06-30 2013-10-01 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9619672B2 (en) 2000-06-30 2017-04-11 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9323954B2 (en) 2000-06-30 2016-04-26 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9507963B2 (en) 2000-06-30 2016-11-29 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US9971909B2 (en) 2000-06-30 2018-05-15 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US10572689B2 (en) 2000-06-30 2020-02-25 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US20110047376A1 (en) * 2000-06-30 2011-02-24 Intel Corporation Method and apparatus for secure execution using a secure memory partition
US6931510B1 (en) * 2000-07-31 2005-08-16 Sun Microsystems, Inc. Method and system for translation lookaside buffer coherence in multiprocessor systems
US6633967B1 (en) * 2000-08-31 2003-10-14 Hewlett-Packard Development Company, L.P. Coherent translation look-aside buffer
US7373457B2 (en) * 2000-10-31 2008-05-13 Hewlett-Packard Development Company, L.P. Cache coherence protocol for a multiple bus multiprocessor system
US6868481B1 (en) * 2000-10-31 2005-03-15 Hewlett-Packard Development Company, L.P. Cache coherence protocol for a multiple bus multiprocessor system
US20050177688A1 (en) * 2000-10-31 2005-08-11 Gaither Blain D. Cache coherence protocol for a multiple bus multiprocessor system
US20040107321A1 (en) * 2000-12-14 2004-06-03 Altman Erik R. Symmetric multi-processing system
US6970982B2 (en) * 2000-12-14 2005-11-29 International Business Machines Corporation Method and system for maintaining coherency in a multiprocessor system by broadcasting TLB invalidated entry instructions
US6651143B2 (en) * 2000-12-21 2003-11-18 International Business Machines Corporation Cache management using a buffer for invalidation requests
US7194597B2 (en) 2001-03-30 2007-03-20 Intel Corporation Method and apparatus for sharing TLB entries
US7073044B2 (en) * 2001-03-30 2006-07-04 Intel Corporation Method and apparatus for sharing TLB entries
US6728858B2 (en) 2001-03-30 2004-04-27 Intel Corporation Method and apparatus including heuristic for sharing TLB entries
US20060101227A1 (en) * 2001-03-30 2006-05-11 Willis Thomas E Method and apparatus for sharing TLB entries
US20040064654A1 (en) * 2001-03-30 2004-04-01 Willis Thomas E. Method and apparatus including heuristic for sharing TLB entries
US7165164B2 (en) 2001-03-30 2007-01-16 Intel Corporation Method and apparatus including heuristic for sharing TLB entries
US20030055889A1 (en) * 2001-08-27 2003-03-20 Meng-Cheng Chen Cache method
CN1333350C (en) * 2002-01-09 2007-08-22 国际商业机器公司 Method and apparatus for using global snooping to provide cache coherence to distributed computer nodes in a single coherent system
US20030140197A1 (en) * 2002-01-18 2003-07-24 Vanderwiel Steven Paul Multi-processor computer system using partition group directories to maintain cache coherence
US6820174B2 (en) * 2002-01-18 2004-11-16 International Business Machines Corporation Multi-processor computer system using partition group directories to maintain cache coherence
US20030163543A1 (en) * 2002-02-28 2003-08-28 Silicon Graphics, Inc. Method and system for cache coherence in DSM multiprocessor system without growth of the sharing vector
US6877030B2 (en) 2002-02-28 2005-04-05 Silicon Graphics, Inc. Method and system for cache coherence in DSM multiprocessor system without growth of the sharing vector
US7530066B2 (en) * 2002-06-24 2009-05-05 Chang Stephen S Controlling snoop activities using task table in multiprocessor system
US20040015969A1 (en) * 2002-06-24 2004-01-22 Chang Stephen S. Controlling snoop activities using task table in multiprocessor system
US7069413B1 (en) 2003-01-29 2006-06-27 Vmware, Inc. Method and system for performing virtual to physical address translations in a virtual machine monitor
US7617378B2 (en) 2003-04-28 2009-11-10 International Business Machines Corporation Multiprocessor system with retry-less TLBI protocol
US20040215898A1 (en) * 2003-04-28 2004-10-28 International Business Machines Corporation Multiprocessor system supporting multiple outstanding TLBI operations per partition
US7073043B2 (en) * 2003-04-28 2006-07-04 International Business Machines Corporation Multiprocessor system supporting multiple outstanding TLBI operations per partition
US20050044340A1 (en) * 2003-08-18 2005-02-24 Kitrick Sheets Remote translation mechanism for a multinode system
US7334110B1 (en) 2003-08-18 2008-02-19 Cray Inc. Decoupled scalar/vector computer architecture system and method
US20050044128A1 (en) * 2003-08-18 2005-02-24 Scott Steven L. Decoupled store address and data in a multiprocessor system
US7366873B1 (en) 2003-08-18 2008-04-29 Cray, Inc. Indirectly addressed vector load-operate-store method and apparatus
US20050044339A1 (en) * 2003-08-18 2005-02-24 Kitrick Sheets Sharing memory within an application using scalable hardware resources
US7543133B1 (en) * 2003-08-18 2009-06-02 Cray Inc. Latency tolerant distributed shared memory multiprocessor computer
US7421565B1 (en) 2003-08-18 2008-09-02 Cray Inc. Method and apparatus for indirectly addressed vector load-add -store across multi-processors
US7437521B1 (en) 2003-08-18 2008-10-14 Cray Inc. Multistream processing memory-and barrier-synchronization method and apparatus
US7735088B1 (en) 2003-08-18 2010-06-08 Cray Inc. Scheduling synchronization of programs running as streams on multiple processors
US7793073B2 (en) 2003-08-18 2010-09-07 Cray Inc. Method and apparatus for indirectly addressed vector load-add-store across multi-processors
US7577816B2 (en) * 2003-08-18 2009-08-18 Cray Inc. Remote translation mechanism for a multinode system
US7529906B2 (en) * 2003-08-18 2009-05-05 Cray Incorporated Sharing memory within an application using scalable hardware resources
US8307194B1 (en) 2003-08-18 2012-11-06 Cray Inc. Relaxed memory consistency model
US20070283127A1 (en) * 2003-08-18 2007-12-06 Cray Inc. Method and apparatus for indirectly addressed vector load-add-store across multi-processors
US7503048B1 (en) 2003-08-18 2009-03-10 Cray Incorporated Scheduling synchronization of programs running as streams on multiple processors
US7519771B1 (en) 2003-08-18 2009-04-14 Cray Inc. System and method for processing memory instructions using a forced order queue
US8504795B2 (en) 2004-06-30 2013-08-06 Intel Corporation Method, system, and program for utilizing a virtualized data structure table
US20060004795A1 (en) * 2004-06-30 2006-01-05 Intel Corporation Method, system, and program for utilizing a virtualized data structure table
US20060004941A1 (en) * 2004-06-30 2006-01-05 Shah Hemal V Method, system, and program for accessesing a virtualized data structure table in cache
US7779219B2 (en) 2004-11-19 2010-08-17 International Business Machines Corporation Application transparent autonomic availability on a storage area network aware file system
US7464124B2 (en) 2004-11-19 2008-12-09 International Business Machines Corporation Method for autonomic data caching and copying on a storage area network aware file system using copy services
US20060112243A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Application transparent autonomic availability on a storage area network aware file system
US8095754B2 (en) 2004-11-19 2012-01-10 International Business Machines Corporation Transparent autonomic data replication improving access performance for a storage area network aware file system
US20060112242A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Application transparent autonomic data replication improving access performance for a storage area network aware file system
US7991736B2 (en) 2004-11-19 2011-08-02 International Business Machines Corporation Article of manufacture and system for autonomic data caching and copying on a storage area network aware file system using copy services
US7383406B2 (en) * 2004-11-19 2008-06-03 International Business Machines Corporation Application transparent autonomic availability on a storage area network aware file system
US20090043980A1 (en) * 2004-11-19 2009-02-12 International Business Machines Corporation Article of manufacture and system for autonomic data caching and copying on a storage area network aware file system using copy services
US7457930B2 (en) * 2004-11-19 2008-11-25 International Business Machines Corporation Method for application transparent autonomic data replication improving access performance for a storage area network aware file system
US20060112140A1 (en) * 2004-11-19 2006-05-25 Mcbride Gregory E Autonomic data caching and copying on a storage area network aware file system using copy services
US20060168419A1 (en) * 2004-12-23 2006-07-27 Fujitsu Siemens Computers Gmbh Method for updating entries of address conversion buffers in a multi-processor computer system
US7757497B1 (en) 2005-03-09 2010-07-20 Cray Inc. Method and apparatus for cooling electronic components
CN100428198C (en) * 2005-03-31 2008-10-22 国际商业机器公司 System and method of improving task switching
US20060230237A1 (en) * 2005-04-07 2006-10-12 Fujitsu Limited Method and system for maintaining cache coherence of distributed shared memory system
US20060285397A1 (en) * 2005-06-06 2006-12-21 Sony Corporation Storage device
US8285916B2 (en) * 2005-06-06 2012-10-09 Sony Corporation Storage device
US8341379B2 (en) * 2005-11-04 2012-12-25 Apple Inc. R and C bit update handling
US20100217951A1 (en) * 2005-11-04 2010-08-26 Jesse Pan R and C Bit Update Handling
US7958513B2 (en) * 2005-11-17 2011-06-07 International Business Machines Corporation Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
US20070174558A1 (en) * 2005-11-17 2007-07-26 International Business Machines Corporation Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
CN101346706B (en) * 2005-12-29 2011-06-22 英特尔公司 Virtual translation look-aside buffer
US20070214339A1 (en) * 2006-03-10 2007-09-13 Microsoft Corporation Selective address translation for a resource such as a hardware device
US7493452B2 (en) * 2006-08-18 2009-02-17 International Business Machines Corporation Method to efficiently prefetch and batch compiler-assisted software cache accesses
US20080046657A1 (en) * 2006-08-18 2008-02-21 Eichenberger Alexandre E System and Method to Efficiently Prefetch and Batch Compiler-Assisted Software Cache Accesses
US20080301398A1 (en) * 2007-06-01 2008-12-04 Intel Corporation Linear to physical address translation with support for page attributes
US8799620B2 (en) 2007-06-01 2014-08-05 Intel Corporation Linear to physical address translation with support for page attributes
US11074191B2 (en) 2007-06-01 2021-07-27 Intel Corporation Linear to physical address translation with support for page attributes
US9158703B2 (en) 2007-06-01 2015-10-13 Intel Corporation Linear to physical address translation with support for page attributes
US9164916B2 (en) 2007-06-01 2015-10-20 Intel Corporation Linear to physical address translation with support for page attributes
US9164917B2 (en) 2007-06-01 2015-10-20 Intel Corporation Linear to physical address translation with support for page attributes
US9891839B2 (en) * 2008-06-20 2018-02-13 Netapp, Inc. System and method for achieving high performance data flow among user space processes in storage systems
US20160154584A1 (en) * 2008-06-20 2016-06-02 Netapp, Inc. System and method for achieving high performance data flow among user space processes in storage systems
US20100235586A1 (en) * 2009-03-11 2010-09-16 Apple Inc. Multi-core processor snoop filtering
US8868847B2 (en) * 2009-03-11 2014-10-21 Apple Inc. Multi-core processor snoop filtering
US8516220B2 (en) * 2010-05-11 2013-08-20 Intel Corporation Recording dirty information in software distributed shared memory systems
US20120023296A1 (en) * 2010-05-11 2012-01-26 Shoumeng Yan Recording Dirty Information in Software Distributed Shared Memory Systems
US20170344494A1 (en) * 2012-01-04 2017-11-30 Intel Corporation Increasing virtual-memory efficiencies
US9141559B2 (en) * 2012-01-04 2015-09-22 Intel Corporation Increasing virtual-memory efficiencies
US20130191577A1 (en) * 2012-01-04 2013-07-25 Ramesh Thomas Increasing virtual-memory efficiencies
US10169254B2 (en) * 2012-01-04 2019-01-01 Intel Corporation Increasing virtual-memory efficiencies
US9965403B2 (en) 2012-01-04 2018-05-08 Intel Corporation Increasing virtual-memory efficiencies
US9213649B2 (en) * 2012-09-24 2015-12-15 Oracle International Corporation Distributed page-table lookups in a shared-memory system
US20140089572A1 (en) * 2012-09-24 2014-03-27 Oracle International Corporation Distributed page-table lookups in a shared-memory system
US9092382B2 (en) 2012-11-02 2015-07-28 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US9330018B2 (en) * 2012-11-02 2016-05-03 International Business Machines Corporation Suppressing virtual address translation utilizing bits and instruction tagging
US9697135B2 (en) 2012-11-02 2017-07-04 International Business Machines Corporation Suppressing virtual address translation utilizing bits and instruction tagging
US9330017B2 (en) * 2012-11-02 2016-05-03 International Business Machines Corporation Suppressing virtual address translation utilizing bits and instruction tagging
US20140129798A1 (en) * 2012-11-02 2014-05-08 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US9069715B2 (en) 2012-11-02 2015-06-30 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US20140129800A1 (en) * 2012-11-02 2014-05-08 International Business Machines Corporation Reducing microprocessor performance loss due to translation table coherency in a multi-processor system
US11487673B2 (en) * 2013-03-14 2022-11-01 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
US20140281296A1 (en) * 2013-03-14 2014-09-18 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
US11741015B2 (en) * 2013-03-14 2023-08-29 Nvidia Corporation Fault buffer for tracking page faults in unified virtual memory system
WO2014182584A1 (en) * 2013-05-06 2014-11-13 Microsoft Corporation Instruction set specific execution isolation
US20160140040A1 (en) * 2014-11-14 2016-05-19 Cavium, Inc. Filtering translation lookaside buffer invalidations
US9697137B2 (en) * 2014-11-14 2017-07-04 Cavium, Inc. Filtering translation lookaside buffer invalidations
US9684606B2 (en) 2014-11-14 2017-06-20 Cavium, Inc. Translation lookaside buffer invalidation suppression
US10795826B2 (en) 2016-05-03 2020-10-06 Huawei Technologies Co., Ltd. Translation lookaside buffer management method and multi-core processor
WO2017190266A1 (en) * 2016-05-03 2017-11-09 华为技术有限公司 Method for managing translation lookaside buffer and multi-core processor
US11030112B2 (en) * 2018-05-25 2021-06-08 Red Hat, Inc. Enhanced address space layout randomization
US20190361815A1 (en) * 2018-05-25 2019-11-28 Red Hat, Inc. Enhanced address space layout randomization
CN109032533B (en) * 2018-08-29 2021-02-26 新华三技术有限公司 Data storage method, device and equipment
CN109032533A (en) * 2018-08-29 2018-12-18 新华三技术有限公司 A kind of date storage method, device and equipment
US10776281B2 (en) * 2018-10-04 2020-09-15 International Business Machines Corporation Snoop invalidate filter for distributed memory management unit to reduce snoop invalidate latency
US10740239B2 (en) 2018-12-11 2020-08-11 International Business Machines Corporation Translation entry invalidation in a multithreaded data processing system
US10977183B2 (en) 2018-12-11 2021-04-13 International Business Machines Corporation Processing a sequence of translation entry invalidation requests with regard to draining a processor core
US10817434B2 (en) 2018-12-19 2020-10-27 International Business Machines Corporation Interruptible translation entry invalidation in a multithreaded data processing system
CN113742333A (en) * 2020-05-29 2021-12-03 杭州海康威视数字技术股份有限公司 Dimension table data updating method and device and electronic equipment
CN113742333B (en) * 2020-05-29 2023-08-04 杭州海康威视数字技术股份有限公司 Method and device for updating dimension table data and electronic equipment

Similar Documents

Publication Publication Date Title
US6105113A (en) System and method for maintaining translation look-aside buffer (TLB) consistency
JP3924206B2 (en) Non-uniform memory access (NUMA) data processing system
US7363462B2 (en) Performing virtual to global address translation in processing subsystem
US6647466B2 (en) Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
US5897664A (en) Multiprocessor system having mapping table in each node to map global physical addresses to local physical addresses of page copies
US7765381B2 (en) Multi-node system in which home memory subsystem stores global to local address translation information for replicating nodes
EP0945805B1 (en) A cache coherency mechanism
US8285969B2 (en) Reducing broadcasts in multiprocessors
JP3889044B2 (en) Page movement in non-uniform memory access (NUMA) systems
EP1019840B1 (en) Look-up table and method of storing data therein
US6289420B1 (en) System and method for increasing the snoop bandwidth to cache tags in a multiport cache memory subsystem
US5394555A (en) Multi-node cluster computer system incorporating an external coherency unit at each node to insure integrity of information stored in a shared, distributed memory
JPH04227552A (en) Store-through-cache control system
JPH11506852A (en) Reduction of cache snooping overhead in a multi-level cache system having a large number of bus masters and a shared level 2 cache
US6625694B2 (en) System and method for allocating a directory entry for use in multiprocessor-node data processing systems
US11392508B2 (en) Lightweight address translation for page migration and duplication
US20030115402A1 (en) Multiprocessor system
Cekleov et al. Virtual-address caches. 2. Multiprocessor issues
US7093080B2 (en) Method and apparatus for coherent memory structure of heterogeneous processor systems
US8473686B2 (en) Computer cache system with stratified replacement
US7360056B2 (en) Multi-node system in which global address generated by processing subsystem includes global to local translation information
JP3295436B2 (en) Microprocessor cache consistency
US7809922B2 (en) Translation lookaside buffer snooping within memory coherent system
US9442856B2 (en) Data processing apparatus and method for handling performance of a cache maintenance operation
US20020002659A1 (en) System and method for improving directory lookup speed

Legal Events

Date Code Title Description
AS Assignment

Owner name: SILICON GRAPHICS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SCHIMMEL, CURT F.;REEL/FRAME:008776/0962

Effective date: 19970819

STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
AS Assignment

Owner name: FOOTHILL CAPITAL CORPORATION, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:SILICON GRAPHICS, INC.;REEL/FRAME:012428/0236

Effective date: 20011109

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: U.S. BANK NATIONAL ASSOCIATION, AS TRUSTEE, CALIFO

Free format text: SECURITY INTEREST;ASSIGNOR:SILICON GRAPHICS, INC.;REEL/FRAME:014805/0855

Effective date: 20031223

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: GENERAL ELECTRIC CAPITAL CORPORATION,CALIFORNIA

Free format text: SECURITY INTEREST;ASSIGNOR:SILICON GRAPHICS, INC.;REEL/FRAME:018545/0777

Effective date: 20061017

Owner name: GENERAL ELECTRIC CAPITAL CORPORATION, CALIFORNIA

Free format text: SECURITY INTEREST;ASSIGNOR:SILICON GRAPHICS, INC.;REEL/FRAME:018545/0777

Effective date: 20061017

AS Assignment

Owner name: MORGAN STANLEY & CO., INCORPORATED, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENERAL ELECTRIC CAPITAL CORPORATION;REEL/FRAME:019995/0895

Effective date: 20070926

Owner name: MORGAN STANLEY & CO., INCORPORATED,NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENERAL ELECTRIC CAPITAL CORPORATION;REEL/FRAME:019995/0895

Effective date: 20070926

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: GRAPHICS PROPERTIES HOLDINGS, INC., NEW YORK

Free format text: CHANGE OF NAME;ASSIGNOR:SILICON GRAPHICS, INC.;REEL/FRAME:028066/0415

Effective date: 20090604

AS Assignment

Owner name: RPX CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GRAPHICS PROPERTIES HOLDINGS, INC.;REEL/FRAME:029564/0799

Effective date: 20121224