US5699543A - Profile guided TLB and cache optimization - Google Patents

Profile guided TLB and cache optimization Download PDF

Info

Publication number
US5699543A
US5699543A US08/536,951 US53695195A US5699543A US 5699543 A US5699543 A US 5699543A US 53695195 A US53695195 A US 53695195A US 5699543 A US5699543 A US 5699543A
Authority
US
United States
Prior art keywords
application
working
set information
sets
working set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/536,951
Inventor
Sunil Saxena
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US08/536,951 priority Critical patent/US5699543A/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SAXENA, SUNIL
Application granted granted Critical
Publication of US5699543A publication Critical patent/US5699543A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/885Monitoring specific for caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/652Page size control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/654Look-ahead translation

Definitions

  • the present invention is related to the field of memory access during application execution. More specifically, the method and apparatus of the present invention is related to profile guided TLB (translation look-aside buffer) and cache optimization in an operating system.
  • profile guided TLB translation look-aside buffer
  • a TLB refers to hardware cache that contains virtual-to-real address translations. Such reduction of the number of TLB-miss and page faults processing by the CPU can greatly improve overall system efficiency.
  • each translation entry in the TLB typically deals with one fixed page size.
  • a page size is a unit of address space that a translation entry defines a mapping from a virtual address to a physical address. For example, given that a pages size is 4K bytes, a translation entry defines the mapping between 4K bytes of virtual address to 4K bytes of physical address. If an application requires 13K bytes, the operating system uses four pages of memory or 16K bytes to accommodate the needs of the application. The 4K page size would require 4 translation entries for program execution. If the execution profile of the application shows that it only needs two pages of memory resident for most of its execution, the operating system could use this information and optimize memory allocation for the program. It could also optimize the TLB usage of the program, by preloading the 2 translation entries to avoid TLB lookup when the translation entries are not in the cache.
  • a method and an apparatus for profile guided TLB and cache optimization in an operating system are disclosed.
  • a typical operating system has a working set of information for any application which is running at some time.
  • this working set of information can be written out by the operating system in some section of the object file. Once this information is in the object file, it may be utilized by the operating system in various ways.
  • the information may be stored in the form of one bit or many bytes per page. In addition, when there are many bytes per page, the information may provide per page usage.
  • An embodiment of the present invention allows for a bitmap for a set of pages. Such information provides among other things, the set of pages that constitute the working set of the application. Having the working set information allows an operating system to allocate large page sizes (for example, 8k, 16k, 32k and 64k) that are becoming available on a new generation of processors. The ability to allocate a large piece size reduces the number of TLB entries required for an application while it's running and provides lower TLB miss rates. Furthermore, this information may be utilized to preload TLBs as newer generations of processors provide instructions to load TLBs.
  • This bitmap information is to allow effective usage of TLB and TLB's with a large page size.
  • the amount of memory utilized by an application using a multiple enlarged page size would be the same as used by applications using only 4K page size.
  • a more detailed information may be made available on a per page basis. This is advantageous for applications running for a reasonable length of time, since detailed information requires more space in the object file. Such detailed information may provide true page usage of an application. Operating systems may therefore avoid much of the TLB misses for benchmarks including SPEC and TPC. Further, the operating system may also disable pages from disturbing the caches when the pages are not frequently used and therefore provide better hit rates on caches for the remaining pages. In most processors available today, the caches may be disabled on a per page basis. With the present invention, the caches may be disabled for infrequently used pages. Additionally, the present invention may enable the warming up of caches for frequently used pages.
  • the method and apparatus of the present invention decreases TLB misses for the benchmarks, disables infrequently used pages from disturbing the caches, and provides better hit rates on caches. This and many other advantages of the invention allow an increased efficiency and optimization on execution of applications by a given operating system.
  • FIG. 1 illustrates an exemplary computer system with an exemplary implementation of the present invention.
  • FIG. 2 illustrates a more detailed description of the optimizer illustrated in FIG. 1.
  • FIG. 3 is an exemplary implementation of the present invention.
  • FIG. 4 illustrates two exemplary implementations of profile from FIG. 3.
  • FIG. 5 illustrates the general steps followed by an exemplary implementation of the present invention in generating a working set information for an application.
  • FIG. 6 illustrates the general steps followed by an exemplary implementation of the present invention in applying the working set information to optimize TLB.
  • TLB translation look-aside buffer
  • FIG. 1 illustrates an exemplary computer system 100 with an exemplary implementation of the present invention.
  • Computer 100 also has CPU 108 coupled to memory 110 and disk 111.
  • Memory 110 has operating system 112 with the present invention's optimizer 115 for optimizing profile guided TLBs and cache access.
  • Disk 111 has one or more applications 114 with log portion 116 where information regarding applications 114 are stored. Once log 116 is produced for a given application 114, it is written into application 114 in disk 111. A copy application 124 and its log 126 of application 114 and its log 116 is brought into memory 110 when operating system 112 is ready to execute the application.
  • FIG. 2 illustrates a more detailed description of the optimizer illustrated in FIG. 1.
  • Optimizer 115 has evaluator 200, logger 202 and working set information 204.
  • evaluator 200 determines the working set 204 of the application 114.
  • Logger 202 logs the working set of that particular application 114 into log 116 of application 114.
  • FIG. 3 illustrates an exemplary process followed by an exemplary embodiment of the present invention.
  • Application 114 is object file 300.
  • Object file 300 has a file header 302, a text section 304 which is mainly composed of a sequence of instructions, a read only data section 306 and a data section 308.
  • text section 304 is read from disk and loaded into memory along with read only section 306 and data section 308 from object file 300.
  • Sections 304, 306 and 308 are exemplary sections brought into memory.
  • Operating system 112 then begins executing the instructions in text section 304.
  • File header 302 indicates various information including where the text section is, how many sections there are and what operating system 112 needs to load in order to execute application 114. Given that in this implementation, text section 304 is 32K, read only section 306 is 16K and data section 308 is 24K, eighteen pages of physical memory are necessary with a conventional operating system, where each page or piece size is defined to be 4K. Furthermore, 18 TLB entries are necessary.
  • Logger 200 logs into the object file a number that reflects relative usage of a single page during the execution of a given application under profile 310.
  • the first four pages in physical memory are used more often than the fifth or sixth page.
  • the first four pages which are used often may be mapped into a page with a piece size of 16K covering all four pages.
  • the fifth or sixth page which are not used very often may be assigned two pages with a piece size of 4K each.
  • mapping scheme needs only 8 translation entries as compared to 18 entries for the conventional operating system. In addition, it provides an opportunity for an operating system to preload the TLB entry for frequently used pages and avoid TLB-miss processing on those frequently used pages. Further, the above described mapping scheme allows an operating system to load the frequently used pages from disk to altogether avoid page fault processing for those pages. Whenever the application is executed, the present invention reduces the time required to handle TLB-miss processing by the CPU. The storage organization of the present invention effectively optimizes the TLB usage.
  • FIG. 4 illustrates two exemplary implementations of profile 310 from FIG. 3.
  • Profile 310 may be implemented by having a one-bit data reflecting whether or not the given page is used often.
  • exemplary profile 400's first entry has a one indicating that the corresponding page one is used often
  • the fifth entry in the exemplary profile 400 has a zero indicating that the fifth entry in the physical memory corresponding to this entry in the profile is not used as often as the other entries in the profile.
  • each entry may have four bits allowing for an entry to indicate how many times the corresponding physical memory page is used in a single execution of an application, assuming that the maximum time such pages are utilized is under eight.
  • the embodiment of the implementation of the present invention illustrated herein is for exemplary purposes only, and other implementations may occur to a person skilled in the art given the detailed description of the present invention herein.
  • FIG. 5 illustrates the general steps followed by the present invention in generating a working set information for an application.
  • the operating system prepares an application for execution.
  • variables used in collecting the working set information are initialized.
  • the process then switches in, i.e. an application is about to be executed.
  • the virtual to physical mappings are set up to capture references to pages by the application.
  • the application program executes, during which the CPU sets the virtual to physical mapping references for referenced pages.
  • the process switches out, it's determined whether its from a program exit in step 505. If it is not a program exit, i.e.
  • step 506 total number of times pages are referenced by the application is updated and the process described between steps 503 and 505 are repeated.
  • step 507 the working set information with the total counts of the number of times pages are referenced by the application is written into the object file.
  • FIG. 6 illustrates the general steps followed by the present invention in applying the working set information generated for an application to optimize the TLB.
  • the operating system prepares an application for execution.
  • the working set information is read from the object file.
  • the virtual to physical translations for all the pages in the working set are set up and read from the disk.
  • the TLBs for the pages in the working set for which the virtual to physical translations exist are preloaded.
  • the program (application) begins execution. Once the process switches out, then it is determined whether or not the switching out indicates a program exit, i.e. that the application has completed its execution. If it does not signify a program exit, then the steps described in steps 603 through 605 are repeated. On the other hand, if the process switching out signifies the program exit, then the process is completed.

Abstract

A method and an apparatus for profile guided TLB's (translation look-aside buffer) and cache optimization in an operating system. A typical operating system has a working set of information for any application which is running at some time. This working set of information can be written out by the operating system in some section of the object file. Once this information is in the object file, it may be utilized by the operating system in various ways. The method and apparatus decreases TLB misses for the benchmarks, disables infrequently used pages from disturbing the caches, and provides better rates on caches. This and many other advantages of the invention allow an increased efficiency and optimization of a given operating system.

Description

BACKGROUND OF THE INVENTION
(1) Field of the Invention
The present invention is related to the field of memory access during application execution. More specifically, the method and apparatus of the present invention is related to profile guided TLB (translation look-aside buffer) and cache optimization in an operating system.
(2) Prior Art
Existing operating systems do not take into account any information regarding the behavior of a given application. More specifically, as an application is running, the operating system does not save or store the history of the application as they are being generated, and therefore can not take advantage of the information at a later time when the application is being executed once again.
There are many different types of information on a given application that are useful to an operating system which must repeatedly run the application. One such example is the branching information of a given application. In order to run branch instructions, the branch penalty in terms of time must be paid. Branch instructions take up more CPU time than sequential instructions in an application being run. If the destination and the resources required of branch instructions are known by the operating system before an application is run, then the branch penalty associated with a branch instruction may be minimized. There are many other types of information which would be helpful if available to the operating system in improving overall system efficiency. For example, it would be helpful to have available information which helps to decrease the number of translation entries necessary during program execution. Additionally, it would be helpful to have information which helps reduce the number of TLB-miss and page faults processing by the CPU. A TLB refers to hardware cache that contains virtual-to-real address translations. Such reduction of the number of TLB-miss and page faults processing by the CPU can greatly improve overall system efficiency.
More specifically, each translation entry in the TLB typically deals with one fixed page size. A page size is a unit of address space that a translation entry defines a mapping from a virtual address to a physical address. For example, given that a pages size is 4K bytes, a translation entry defines the mapping between 4K bytes of virtual address to 4K bytes of physical address. If an application requires 13K bytes, the operating system uses four pages of memory or 16K bytes to accommodate the needs of the application. The 4K page size would require 4 translation entries for program execution. If the execution profile of the application shows that it only needs two pages of memory resident for most of its execution, the operating system could use this information and optimize memory allocation for the program. It could also optimize the TLB usage of the program, by preloading the 2 translation entries to avoid TLB lookup when the translation entries are not in the cache.
It is therefore desirable to have a method and an apparatus which provides for information useful to the operating system which is generated by an application while it is executing and which may be saved and made available for later use by the operating system for program re-execution.
BRIEF SUMMARY OF THE INVENTION
A method and an apparatus for profile guided TLB and cache optimization in an operating system are disclosed. A typical operating system has a working set of information for any application which is running at some time. In a presently preferred embodiment of the present invention, this working set of information can be written out by the operating system in some section of the object file. Once this information is in the object file, it may be utilized by the operating system in various ways.
The information may be stored in the form of one bit or many bytes per page. In addition, when there are many bytes per page, the information may provide per page usage. An embodiment of the present invention allows for a bitmap for a set of pages. Such information provides among other things, the set of pages that constitute the working set of the application. Having the working set information allows an operating system to allocate large page sizes (for example, 8k, 16k, 32k and 64k) that are becoming available on a new generation of processors. The ability to allocate a large piece size reduces the number of TLB entries required for an application while it's running and provides lower TLB miss rates. Furthermore, this information may be utilized to preload TLBs as newer generations of processors provide instructions to load TLBs.
The most important usage of this bitmap information is to allow effective usage of TLB and TLB's with a large page size. The amount of memory utilized by an application using a multiple enlarged page size would be the same as used by applications using only 4K page size.
With the present invention, a more detailed information may be made available on a per page basis. This is advantageous for applications running for a reasonable length of time, since detailed information requires more space in the object file. Such detailed information may provide true page usage of an application. Operating systems may therefore avoid much of the TLB misses for benchmarks including SPEC and TPC. Further, the operating system may also disable pages from disturbing the caches when the pages are not frequently used and therefore provide better hit rates on caches for the remaining pages. In most processors available today, the caches may be disabled on a per page basis. With the present invention, the caches may be disabled for infrequently used pages. Additionally, the present invention may enable the warming up of caches for frequently used pages.
The method and apparatus of the present invention decreases TLB misses for the benchmarks, disables infrequently used pages from disturbing the caches, and provides better hit rates on caches. This and many other advantages of the invention allow an increased efficiency and optimization on execution of applications by a given operating system.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 illustrates an exemplary computer system with an exemplary implementation of the present invention.
FIG. 2 illustrates a more detailed description of the optimizer illustrated in FIG. 1.
FIG. 3 is an exemplary implementation of the present invention.
FIG. 4 illustrates two exemplary implementations of profile from FIG. 3.
FIG. 5 illustrates the general steps followed by an exemplary implementation of the present invention in generating a working set information for an application.
FIG. 6 illustrates the general steps followed by an exemplary implementation of the present invention in applying the working set information to optimize TLB.
DETAILED DESCRIPTION OF THE INVENTION
An apparatus and a method for a profile guided TLB (translation look-aside buffer) and cache optimization in an operating system are disclosed.
FIG. 1 illustrates an exemplary computer system 100 with an exemplary implementation of the present invention. Several peripheral devices may be coupled to computer system 100, including monitor 102, modem 104 and keyboard 106. Computer 100 also has CPU 108 coupled to memory 110 and disk 111. Memory 110 has operating system 112 with the present invention's optimizer 115 for optimizing profile guided TLBs and cache access. Disk 111 has one or more applications 114 with log portion 116 where information regarding applications 114 are stored. Once log 116 is produced for a given application 114, it is written into application 114 in disk 111. A copy application 124 and its log 126 of application 114 and its log 116 is brought into memory 110 when operating system 112 is ready to execute the application.
FIG. 2 illustrates a more detailed description of the optimizer illustrated in FIG. 1. Optimizer 115 has evaluator 200, logger 202 and working set information 204. When application 114 is initially executed, evaluator 200 determines the working set 204 of the application 114. Logger 202 then logs the working set of that particular application 114 into log 116 of application 114.
FIG. 3 illustrates an exemplary process followed by an exemplary embodiment of the present invention. Application 114 is object file 300. Object file 300 has a file header 302, a text section 304 which is mainly composed of a sequence of instructions, a read only data section 306 and a data section 308. When application 114 is brought into memory for execution by operating system 112, text section 304 is read from disk and loaded into memory along with read only section 306 and data section 308 from object file 300. Sections 304, 306 and 308 are exemplary sections brought into memory. Operating system 112 then begins executing the instructions in text section 304.
File header 302 indicates various information including where the text section is, how many sections there are and what operating system 112 needs to load in order to execute application 114. Given that in this implementation, text section 304 is 32K, read only section 306 is 16K and data section 308 is 24K, eighteen pages of physical memory are necessary with a conventional operating system, where each page or piece size is defined to be 4K. Furthermore, 18 TLB entries are necessary.
Logger 200 logs into the object file a number that reflects relative usage of a single page during the execution of a given application under profile 310. For example, the first four pages in physical memory are used more often than the fifth or sixth page. Thus, the first four pages which are used often may be mapped into a page with a piece size of 16K covering all four pages. The fifth or sixth page which are not used very often may be assigned two pages with a piece size of 4K each.
Similarly, pages seven and eight are mapped into a page with a piece size of 8k; pages nine through twelve are mapped into a page with a piece size of 16k; pages thirteen through sixteen are mapped into a page with a piece size of 16k. The above mapping scheme needs only 8 translation entries as compared to 18 entries for the conventional operating system. In addition, it provides an opportunity for an operating system to preload the TLB entry for frequently used pages and avoid TLB-miss processing on those frequently used pages. Further, the above described mapping scheme allows an operating system to load the frequently used pages from disk to altogether avoid page fault processing for those pages. Whenever the application is executed, the present invention reduces the time required to handle TLB-miss processing by the CPU. The storage organization of the present invention effectively optimizes the TLB usage.
FIG. 4 illustrates two exemplary implementations of profile 310 from FIG. 3. Profile 310 may be implemented by having a one-bit data reflecting whether or not the given page is used often. For example, exemplary profile 400's first entry has a one indicating that the corresponding page one is used often, whereas the fifth entry in the exemplary profile 400 has a zero indicating that the fifth entry in the physical memory corresponding to this entry in the profile is not used as often as the other entries in the profile.
If more bits are allocated per profile entry, then more information regarding the usage of each page may be provided. For example, with exemplary profile 402, each entry may have four bits allowing for an entry to indicate how many times the corresponding physical memory page is used in a single execution of an application, assuming that the maximum time such pages are utilized is under eight. The embodiment of the implementation of the present invention illustrated herein is for exemplary purposes only, and other implementations may occur to a person skilled in the art given the detailed description of the present invention herein.
FIG. 5 illustrates the general steps followed by the present invention in generating a working set information for an application. In step 501, the operating system prepares an application for execution. In step 502, variables used in collecting the working set information are initialized. The process then switches in, i.e. an application is about to be executed. In step 503, the virtual to physical mappings are set up to capture references to pages by the application. In step 504, the application (program) executes, during which the CPU sets the virtual to physical mapping references for referenced pages. When the process switches out, it's determined whether its from a program exit in step 505. If it is not a program exit, i.e. the application will continue to execute, then in step 506, total number of times pages are referenced by the application is updated and the process described between steps 503 and 505 are repeated. When the process switches out signifying a program exit in step 505, then step 507, the working set information with the total counts of the number of times pages are referenced by the application is written into the object file.
FIG. 6 illustrates the general steps followed by the present invention in applying the working set information generated for an application to optimize the TLB. In step 601, the operating system prepares an application for execution. In step 602, the working set information is read from the object file. The virtual to physical translations for all the pages in the working set are set up and read from the disk. Once the process switches in, in step 603, the TLBs for the pages in the working set for which the virtual to physical translations exist are preloaded. In step 604, the program (application) begins execution. Once the process switches out, then it is determined whether or not the switching out indicates a program exit, i.e. that the application has completed its execution. If it does not signify a program exit, then the steps described in steps 603 through 605 are repeated. On the other hand, if the process switching out signifies the program exit, then the process is completed.
While certain exemplary embodiments have been described in detail and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention is not to be limited to the specific arrangements and constructions shown and described, since various other modifications may occur to those with ordinary skill in the art.

Claims (21)

What is claimed:
1. A method for a profile guided translation look-aside buffer (TLB) for reducing the time required to handle TLB miss processing by a processor coupled to a memory comprising the steps of:
collecting information on a plurality of working sets for an application from a corresponding object file in said memory;
determining the frequency with which each of said plurality of working sets is accessed during execution of said application from said information:
assigning pages in said memory to said plurality of working sets according to said determined frequency, the same page being assigned to said plurality of working sets having said frequency substantially equal to said determined frequency, said same page having a size sufficient to store all said plurality of working sets assigned said same page, said plurality of working sets having a virtual address mapping to a physical address corresponding to a single page in said memory; and
preloading virtual address to physical address translation entries in said TLB for said pages assigned to said plurality of working sets, said plurality of working sets accessed by an operating system in said memory during execution of said application.
2. The method of claim 1 further comprising the step of setting up virtual to physical address mappings to capture references to pages by said application during execution of said application.
3. The method of claim 2 further comprising the step of determining whether a switching out by said application during program execution indicates a program exit and if there is no program exit, then total number of times pages are referenced by said application is updated, otherwise, if there is a program exit, total number of times said pages are referenced by said application is written into an object file associated with said application.
4. The method of claim 1 wherein said step of preloading is performed before said application begins execution.
5. The method of claim 3 further comprising the step of
determining whether said application switching out indicates that said application has completed execution.
6. An apparatus for a profile guided translation look-aside buffer and cache optimization in an operating system comprising:
a working set information for a plurality of working sets for an application stored in a corresponding object file, said plurality of working sets for use by an operating system during execution of said application; and
an optimizer coupled to said working set information, said optimizer for,
i) collecting said working set information to determine the frequency with which each of said plurality of working sets is accessed during execution of said application;
ii) assigning pages to said plurality of working sets according to said determined frequency, same page being assigned to said plurality of working sets having said frequency substantially equal to said determined frequency, said same page having a size sufficient to store all said plurality of working sets which are assigned said same page, and
iii) preloading said TLB for said pages assigned to said plurality of working sets for which virtual to physical address translation exists.
7. The apparatus of claim 6 wherein said optimizer further comprises an evaluator coupled to said working set information for determining said working set of said application.
8. The apparatus of claim 6 wherein said optimizer further comprises a logger coupled to said working set information for logging said working set information into a log corresponding to said application.
9. The apparatus of claim 6 wherein said optimizer determines whether a switching out of a process indicates said application completing execution.
10. A computer system for a profile guided translation look-aside buffer and cache optimization in an operating system comprising:
a storage element for storing,
a working set information for a plurality of working sets for an application, said plurality of working sets for use by an operating system during execution of said application, and
an optimizer coupled to said working set information, said optimizer for,
i) collecting said working set information to determine the frequency with which each of said plurality of working sets is accessed during execution of said application,
ii) assigning pages to said plurality of working sets according to said determined frequency, same page being assigned to said plurality of working sets having said frequency substantially equal to said determined frequency, said same page having a size sufficient to store all said plurality of working sets which are assigned said same page, and
iii) preloading said TLB for said pages assigned to said plurality of working sets for which virtual to physical address translation exists and
a processor coupled to said storage element for driving said optimizer.
11. The system of claim 10 wherein said optimizer further comprises an evaluator coupled to said working set information for determining said working set of said application.
12. The system of claim 10 wherein said optimizer further comprises a logger coupled to said working set information for logging said working set information into a log corresponding to said application.
13. The system of claim 10 wherein said optimizer determines whether a switching out of a process indicates said application completing execution.
14. An apparatus for a profile guided translation look-aside buffer and cache optimization in an operating system comprising:
a working set information for a plurality of working sets for an application, said plurality of working sets for use by an operating system during execution of said application; and
means for optimizing coupled to said working set information, said means for optimizing,
i) collecting said working set information to determine the frequency with which each of said plurality of working sets is accessed during execution of said application,
ii) assigning pages to said plurality of working sets according to said determined frequency, same page being assigned to said plurality of working sets having said frequency substantially equal to said determined frequency, said same page having a size sufficient to store all said plurality of working sets which are assigned said same page, and
iii) preloading said TLB for said pages assigned to said plurality of working sets for which virtual to physical address translation exists.
15. The apparatus of claim 14 wherein said means for optimizing further comprises means for evaluating coupled to said working set information for determining said working set of said application.
16. The apparatus of claim 14 wherein said means for optimizing further comprises means for logging coupled to said working set information for logging said working set information into a log corresponding to said application.
17. The apparatus of claim 14 wherein said means for optimizing determines whether a switching out of a process indicates said application completing execution.
18. A computer system for a profile guided translation look-aside buffer and cache optimization in an operating system comprising:
means for storing,
a working set information for a plurality of working sets for an application, said plurality of working sets for use by an operating system during execution of said application, and
means for optimizing coupled to said working set information, said means for optimizing,
i) collecting said working set information to determine the frequency with which each of said plurality of working sets is accessed during execution of said application,
ii) assigning pages to said plurality of working sets according to said determined frequency, same page being assigned to said plurality of working sets having said frequency substantially equal to said determined frequency, said same page having a size sufficient to store all said plurality of working sets which are assigned said same page, and
iii) preloading said TLB for said pages assigned to said plurality of working sets for which virtual to physical address translation exists; and means for processing coupled to said storage element for driving said means for optimizing.
19. The system of claim 18 wherein said means for optimizing further comprises means for evaluating coupled to said working set information for determining said working set of said application.
20. The system of claim 18 wherein said means for optimizing further comprises means for logging coupled to said working set information for logging said working set information into a log corresponding to said application.
21. The system of claim 18 wherein said means for optimizing determines whether a switching out of a process indicates said application completing execution.
US08/536,951 1995-09-29 1995-09-29 Profile guided TLB and cache optimization Expired - Lifetime US5699543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/536,951 US5699543A (en) 1995-09-29 1995-09-29 Profile guided TLB and cache optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/536,951 US5699543A (en) 1995-09-29 1995-09-29 Profile guided TLB and cache optimization

Publications (1)

Publication Number Publication Date
US5699543A true US5699543A (en) 1997-12-16

Family

ID=24140589

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/536,951 Expired - Lifetime US5699543A (en) 1995-09-29 1995-09-29 Profile guided TLB and cache optimization

Country Status (1)

Country Link
US (1) US5699543A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020135611A1 (en) * 1999-03-04 2002-09-26 Trevor Deosaran Remote performance management to accelerate distributed processes
US6560689B1 (en) 2000-03-31 2003-05-06 Intel Corporation TLB using region ID prevalidation
US20030093621A1 (en) * 2001-11-09 2003-05-15 International Business Machines Corporation Different caching treatment of memory contents based on memory region
US6580431B1 (en) 1999-03-04 2003-06-17 Nexmem System, method, and computer program product for intelligent memory to accelerate processes
US6678815B1 (en) 2000-06-27 2004-01-13 Intel Corporation Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end
US20040117800A1 (en) * 2001-05-21 2004-06-17 Alexander Benedix Method for determining the optimum access strategy
US6848026B2 (en) 2001-11-09 2005-01-25 International Business Machines Corporation Caching memory contents into cache partitions based on memory locations
US20050160229A1 (en) * 2004-01-16 2005-07-21 International Business Machines Corporation Method and apparatus for preloading translation buffers
US20070112999A1 (en) * 2005-11-15 2007-05-17 Microsoft Corporation Efficient power management of a system with virtual machines
US20080133875A1 (en) * 2006-12-05 2008-06-05 Microsoft Corporation Operational efficiency of virtual TLBs
US20080134174A1 (en) * 2006-12-05 2008-06-05 Microsoft Corporation Reduction of operational costs of virtual TLBs
US20080155168A1 (en) * 2006-12-22 2008-06-26 Microsoft Corporation Scalability of virtual TLBs for multi-processor virtual machines
US20080215848A1 (en) * 2005-05-13 2008-09-04 John Te-Jui Sheu Method and System For Caching Address Translations From Multiple Address Spaces In Virtual Machines

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4638426A (en) * 1982-12-30 1987-01-20 International Business Machines Corporation Virtual memory address translation mechanism with controlled data persistence
US4669043A (en) * 1984-02-17 1987-05-26 Signetics Corporation Memory access controller
US4730249A (en) * 1986-01-16 1988-03-08 International Business Machines Corporation Method to operate on large segments of data in a virtual memory data processing system
US4881075A (en) * 1987-10-15 1989-11-14 Digital Equipment Corporation Method and apparatus for adaptive data compression
US5060137A (en) * 1985-06-28 1991-10-22 Hewlett-Packard Company Explicit instructions for control of translation lookaside buffers
US5282274A (en) * 1990-05-24 1994-01-25 International Business Machines Corporation Translation of multiple virtual pages upon a TLB miss
US5394537A (en) * 1989-12-13 1995-02-28 Texas Instruments Incorporated Adaptive page placement memory management system
US5423014A (en) * 1991-10-29 1995-06-06 Intel Corporation Instruction fetch unit with early instruction fetch mechanism
US5493660A (en) * 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4638426A (en) * 1982-12-30 1987-01-20 International Business Machines Corporation Virtual memory address translation mechanism with controlled data persistence
US4669043A (en) * 1984-02-17 1987-05-26 Signetics Corporation Memory access controller
US5060137A (en) * 1985-06-28 1991-10-22 Hewlett-Packard Company Explicit instructions for control of translation lookaside buffers
US4730249A (en) * 1986-01-16 1988-03-08 International Business Machines Corporation Method to operate on large segments of data in a virtual memory data processing system
US4881075A (en) * 1987-10-15 1989-11-14 Digital Equipment Corporation Method and apparatus for adaptive data compression
US5394537A (en) * 1989-12-13 1995-02-28 Texas Instruments Incorporated Adaptive page placement memory management system
US5282274A (en) * 1990-05-24 1994-01-25 International Business Machines Corporation Translation of multiple virtual pages upon a TLB miss
US5423014A (en) * 1991-10-29 1995-06-06 Intel Corporation Instruction fetch unit with early instruction fetch mechanism
US5493660A (en) * 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6580431B1 (en) 1999-03-04 2003-06-17 Nexmem System, method, and computer program product for intelligent memory to accelerate processes
US20020135611A1 (en) * 1999-03-04 2002-09-26 Trevor Deosaran Remote performance management to accelerate distributed processes
US6560689B1 (en) 2000-03-31 2003-05-06 Intel Corporation TLB using region ID prevalidation
US6678815B1 (en) 2000-06-27 2004-01-13 Intel Corporation Apparatus and method for reducing power consumption due to cache and TLB accesses in a processor front-end
US20040117800A1 (en) * 2001-05-21 2004-06-17 Alexander Benedix Method for determining the optimum access strategy
US7127553B2 (en) 2001-05-21 2006-10-24 Infineon Technologies Ag Method for determining the optimum access strategy
US20030093621A1 (en) * 2001-11-09 2003-05-15 International Business Machines Corporation Different caching treatment of memory contents based on memory region
US6829679B2 (en) 2001-11-09 2004-12-07 International Business Machines Corporation Different caching treatment of memory contents based on memory region
US6848026B2 (en) 2001-11-09 2005-01-25 International Business Machines Corporation Caching memory contents into cache partitions based on memory locations
US7711903B2 (en) 2004-01-16 2010-05-04 International Business Machines Corporation Preloading translation buffers
US20050160229A1 (en) * 2004-01-16 2005-07-21 International Business Machines Corporation Method and apparatus for preloading translation buffers
US20070113044A1 (en) * 2004-01-16 2007-05-17 Day Michael N Method and Apparatus for Preloading Translation Buffers
US8095771B2 (en) 2005-05-13 2012-01-10 Microsoft Corporation Method and system for caching address translations from multiple address spaces in virtual machines
US20080215848A1 (en) * 2005-05-13 2008-09-04 John Te-Jui Sheu Method and System For Caching Address Translations From Multiple Address Spaces In Virtual Machines
US8909946B2 (en) 2005-11-15 2014-12-09 Microsoft Corporation Efficient power management of a system with virtual machines
US9772860B2 (en) 2005-11-15 2017-09-26 Microsoft Technology Licensing, Llc Efficient power management of a system with virtual machines
US9489035B2 (en) 2005-11-15 2016-11-08 Microsoft Technology Licensing, Llc Efficient power management of a system with virtual machines
US9218047B2 (en) 2005-11-15 2015-12-22 Microsoft Technology Licensing, Llc Efficient power management of a system with virtual machines
US20070112999A1 (en) * 2005-11-15 2007-05-17 Microsoft Corporation Efficient power management of a system with virtual machines
US8615643B2 (en) 2006-12-05 2013-12-24 Microsoft Corporation Operational efficiency of virtual TLBs
US8694712B2 (en) 2006-12-05 2014-04-08 Microsoft Corporation Reduction of operational costs of virtual TLBs
US20080134174A1 (en) * 2006-12-05 2008-06-05 Microsoft Corporation Reduction of operational costs of virtual TLBs
US9104594B2 (en) 2006-12-05 2015-08-11 Microsoft Technology Licensing, Llc Operational efficiency of virtual TLBs
US20080133875A1 (en) * 2006-12-05 2008-06-05 Microsoft Corporation Operational efficiency of virtual TLBs
US7788464B2 (en) 2006-12-22 2010-08-31 Microsoft Corporation Scalability of virtual TLBs for multi-processor virtual machines
US20080155168A1 (en) * 2006-12-22 2008-06-26 Microsoft Corporation Scalability of virtual TLBs for multi-processor virtual machines

Similar Documents

Publication Publication Date Title
US20210374069A1 (en) Method, system, and apparatus for page sizing extension
US8151085B2 (en) Method for address translation in virtual machines
US5630097A (en) Enhanced cache operation with remapping of pages for optimizing data relocation from addresses causing cache misses
EP1027656B1 (en) Method and apparatus utilizing a region based page table walk bit
US5852738A (en) Method and apparatus for dynamically controlling address space allocation
US4811209A (en) Cache memory with multiple valid bits for each data indication the validity within different contents
US5796971A (en) Method for generating prefetch instruction with a field specifying type of information and location for it such as an instruction cache or data cache
US4774653A (en) Hybrid hardware/software method and apparatus for virtual memory address translation using primary and secondary translation buffers
US6625714B1 (en) Parallel distributed function translation lookaside buffer
US6073226A (en) System and method for minimizing page tables in virtual memory systems
US6418522B1 (en) Translation lookaside buffer for virtual memory systems
US5555395A (en) System for memory table cache reloads in a reduced number of cycles using a memory controller to set status bits in the main memory table
US5699543A (en) Profile guided TLB and cache optimization
JP2014078248A (en) Caching memory attribute indicators with cached memory data
AU2247492A (en) Improving computer performance by simulated cache associativity
US5708790A (en) Virtual memory mapping method and system for address translation mapping of logical memory partitions for BAT and TLB entries in a data processing system
US6766434B2 (en) Method for sharing a translation lookaside buffer between CPUs
US7237084B2 (en) Method and program product for avoiding cache congestion by offsetting addresses while allocating memory
US8688952B2 (en) Arithmetic processing unit and control method for evicting an entry from a TLB to another TLB
US6587923B1 (en) Dual line size cache directory
Mehlhorn et al. Scanning multiple sequences via cache memory
US6766435B1 (en) Processor with a general register set that includes address translation registers
US6567907B1 (en) Avoiding mapping conflicts in a translation look-aside buffer
Wang et al. Minimizing area cost of on-chip cache memories by caching address tags
US20050268021A1 (en) Method and system for operating a cache memory

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SAXENA, SUNIL;REEL/FRAME:007692/0368

Effective date: 19950928

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12