Search Images Maps Play YouTube Gmail Drive Calendar More »
Sign in
Screen reader users: click this link for accessible mode. Accessible mode has the same essential features but works better with your reader.

Patents

  1. Advanced Patent Search
Publication numberCN1278224 C
Publication typeGrant
Application numberCN 03156991
Publication date4 Oct 2006
Filing date17 Sep 2003
Priority date17 Sep 2002
Also published asCN1490717A, US7496494, US7844446, US20040054517, US20090157377
Publication number03156991.9, CN 03156991, CN 1278224 C, CN 1278224C, CN-C-1278224, CN03156991, CN03156991.9, CN1278224 C, CN1278224C
InventorsER奥尔特曼, R奈尔, JK奥布赖恩, KM奥布赖恩, PH奥登, DA普莱纳, SW萨特耶
Applicant国际商业机器公司
Export CitationBiBTeX, EndNote, RefMan
External Links: SIPO, Espacenet
Method and system for multiprocessor simulation in multiprocessor host system
CN 1278224 C
Abstract  translated from Chinese
一种用于在宿主计算系统上执行为目标指令集体系结构编写的多处理器程序的方法(和系统),该宿主计算系统具有多个被设计来处理第二指令集体系结构的指令的处理器,该方法包括将被设计用来在目标计算系统的处理器上运行的程序的每个部分表示为将在宿主计算系统上被执行的一个或多个程序线程。 Method (and system) for performing a target instruction set architecture for the preparation of multi-processor program on the host computing system, said host computing system having a plurality of instructions are designed to handle a second instruction set architecture of the processing device, the method comprising in each section is designed to run on the target computing system processor program is represented as one or more program threads to be executed on the host computing system.
Claims(15)  translated from Chinese
1.一种在一个宿主计算系统上执行为目标指令集体系结构编写的程序的方法,该宿主计算系统具有多个被设计用来处理一个第二指令集体系结构的指令的处理器,该方法包括:将设计用来在目标计算系统的一个处理器上运行的该程序的每个部分表示为将在宿主计算系统上执行的一个或多个程序线程;以及调度所述线程,以便在宿主系统的一个或多个处理器上执行。 An execution of the target instruction set architecture of a program written in a method of the host computing system, said host computing system having a plurality of instructions are designed to handle a second instruction set architecture of the processor, the method comprising: in each section designed to run on a target processor computing system of the program is represented as one or more threads to be executed on the host computing system; and dispatching the thread, so that the host system performed on one or more processors.
2.根据权利要求1所述的方法,其特征在于,进一步包括:将目标系统的不是特定于目标系统的一个处理器的功能表示为将由宿主系统上的一个或多个处理器执行的一个或多个线程。 2. A method according to claim 1, characterized in that, further comprising: the target system is not specific to a target system processor represents a function performed by a host system or on a plurality of processors or multiple threads.
3.根据权利要求1所述的方法,其特征在于,线程的数量超过该宿主系统中的处理器的数量。 3. The method according to claim 1, characterized in that the number of threads exceeds the number of the host system processor.
4.根据权利要求1所述的方法,其特征在于,当一个给定指令组被解释了一个预先确定的次数之后,将该指令组从该目标指令集转换到该第二个指令集。 4. The method according to claim 1, characterized in that, when a given instruction set is explained by a predetermined number of times, the set of instructions from the target instruction set conversion to the second instruction set.
5.根据权利要求4所述的方法,其特征在于,该解释处理过程被作为多处理器系统中的一个线程执行。 5. The method according to claim 4, characterized in that the interpretation process is performed as a multi-processor system of a thread.
6.根据权利要求1所述的方法,其特征在于,在一个给定时间所述处理器的多个功能被表示为多个线程。 6. The method according to claim 1, characterized in that, in a given time a plurality of functions of the processor is represented as a plurality of threads.
7.根据权利要求1所述的方法,其特征在于,进一步包括:动态创建线程。 7. The method according to claim 1, characterized in that, further comprising: dynamically creating a thread.
8.根据权利要求4所述的方法,其特征在于,进一步包括:同时识别一个第一线程的转换结果和执行一个第二线程。 8. The method according to claim 4, characterized in that, further comprising: identifying a first thread while the conversion result and execute a second thread.
9.根据权利要求4所述的方法,其特征在于,进一步包括:同时转换一个第一线程和执行一个第二线程。 9. The method according to claim 4, characterized in that, further comprising: converting a first thread and simultaneously executing a second thread.
10.根据权利要求1所述的方法,其特征在于,进行所述线程到所述处理器的多对多映射。 10. The method according to claim 1, characterized in that in the thread to the processor-many mapping.
11.根据权利要求1所述的方法,其特征在于,当仿真线程正在被执行时,一个宿主应用正在运行。 11. The method according to claim 1, wherein, when the simulation thread being executed, a host application running.
12.一种用于一个多处理器系统的仿真的方法,包括:将一个目标系统的多个硬件资源映射到多个软件线程;将所述多个软件线程映射到一个宿主系统的多个硬件资源;将该目标系统的状态信息映射到该宿主系统的一个存储器;以及通过将多个仿真任务划分为大量线程,而改善仿真的性能。 12. A simulation method for a multiprocessor system, comprising: mapping a plurality of hardware resources of the target system to a plurality of software threads; the plurality of software threads are mapped to the host system of a plurality of hardware resources; the state of the target system information is mapped to a memory of the host system; and by the plurality of tasks into a large number of threads simulation, emulation and improve performance.
13.一种用于一个多处理器系统的仿真的系统,包括:用于将一个目标系统的多个硬件资源映射到多个软件线程的装置;用于将所述多个软件线程映射到一个宿主系统的多个硬件资源的装置;用于将该目标系统的状态信息映射到该宿主系统的一个存储器的装置;以及用于通过将多个仿真任务划分为大量线程改善仿真的性能的装置。 13. A simulation system for a multi-processor system, comprising: means for mapping the plurality of hardware resources of a target system to a plurality of software threads; means for mapping the plurality of software threads to a means a plurality of hardware resources of the host system; state for the target system information is mapped to a memory means of the host system; and by the plurality of tasks into a large number of simulations to improve the simulation of the performance of a thread means for.
14.一种用于在一个宿主计算系统上执行为目标指令集体系结构编写的程序的系统,该宿主计算系统具有多个被设计来处理一个第二指令集体系结构的指令的处理器,该系统包括:一个表示单元,用于将被设计用来在该目标计算系统的一个处理器上运行的程序的每个部分表示为将在该宿主计算系统被执行的一个或多个程序线程;以及一个调度单元,用于调度所述线程,以便在宿主系统的一个或多个处理器上执行。 14. A method for a host computing system to execute the target instruction set architecture for a program written in the system, said host computing system having a plurality of instruction sets designed to process a second instruction architecture of the processor, the the system comprising: a representation unit, designed to be used on a processor of the computing system for each part of the target program running is expressed as one or more threads of the host computing system are executed in; and a scheduling unit for scheduling the threads for execution on one or more processors of the host system.
15.一种用于多处理器系统的宿主计算系统的线程处理装置,包括:一个用于保持多个线程的线程池;一个线程处理器,用于访问所述宿主计算系统的存储器,并决定选择该线程池中的哪个线程以进行仿真;一个线程生成器,用于创建新线程,并将所述新线程放入所述线程池中;以及一个线程调度器,用于调度保持在所述线程池中的所述多个线程,所述调度器扫描等待的多个线程,并按照优先次序将下一个线程分配给所述多处理器系统的一个可用处理器。 15. A threading apparatus multiprocessor system for a host computing system, comprising: a plurality of threads for holding a thread pool; a thread processor, the host computing system for accessing a memory, and decides select which thread the thread pool to be simulated; a thread generator, for creating a new thread and the new thread into the thread pool; and a thread scheduler for scheduling is maintained at the the plurality of threads in the pool, the scheduler scans multiple threads to wait and follow the priorities for the next available processor threads allocated to the multi-processor systems.
Description  translated from Chinese
在多处理器宿主系统上进行多处理器仿真的方法和系统 Method on a multi-processor host system and multi-processor simulation system

技术领域 FIELD

本发明一般地涉及计算机系统,具体地涉及一种在一个多重处理计算系统(multiprocessing computing system)上再现另一个多重处理计算系统的行为的方法(和系统)。 The present invention relates generally to computer systems and more particularly to a method (and system) reproducing the behavior of the other multi-processing computing system of a computing system in a multi-processing (multiprocessing computing system) on.

技术背景人们很早就认识到,需要在一个计算机系统上仿真(emulate)另一个计算机系统的行为。 BACKGROUND It has long recognized the need to emulate (emulate) the behavior of the other computer systems on a single computer system. 为此提出过几种方案。 For this reason put forward several options. 在作为参考包含于此的美国专利No.5,832,205中有对这些技术的综述。 Review of these techniques have as U.S. Patent No.5,832,205 incorporated herein by reference in.

美国专利No.5,832,205的解决方案包括一个组合的硬件/软件方案,它在一个处理器上进行对另一个处理器的指令集的仿真。 U.S. Patent No.5,832,205 solution includes a combination of hardware / software program, which simulates the other processor's instruction set on a processor. 该方案允许硬件设计纳入有助于目标指令集的执行的特征。 The program allows the hardware design features incorporated into the implementation of help target instruction set. 然而,由于同一个原因,它无法同样有效地仿真所有系统。 However, due to the same reason, it can not be equally effective in all simulation systems.

作为参考包含于此的SimOS(例如,参见Stephen A.Herrod,“UsingComplete Machine Simulation to Understand Computer SystemBehavior”,博士论文,斯坦福大学,1998年2月)和SimICS(例如,参见Peter S.Magnusson,“A Design For Efficient Simulation of aMultiprocessor”,第一届国际计算机和电信系统的建模、分析和仿真(MASCOTS)研讨会会议记录,La Jolla,加利福尼亚,1993年1月,69-78页)是无需特殊硬件特征而能进行仿真的例子。 Incorporated by reference herein SimOS (for example, see Stephen A.Herrod, "UsingComplete Machine Simulation to Understand Computer SystemBehavior", PhD thesis, Stanford University, February 1998) and SimICS (for example, see Peter S.Magnusson, "A Design For Efficient Simulation of aMultiprocessor ", modeling, analysis and simulation of the first international computer and telecommunications systems (MASCOTS) seminar meeting, La Jolla, Calif., January 1993, p. 69-78) is no special hardware characteristics and can be simulated examples. 然而,它们的性能却不如美国专利No.5,832,205的有效。 However, their performance is not as good as a valid US patent No.5,832,205.

一般而言,这些系统使用了多级转换。 Generally, these systems use a multi-stage conversion. 已有人描述过这样的技术(例如,参见Jim Turley,“Alpha Runs x86 Code with FX132”,1996年3月5日,Microprocessor Report),其中转换的程度(范围)根据代码执行的程度(范围)而变化。 It has been described in such a technology (for example, see Jim Turley, "Alpha Runs x86 Code with FX132", 1996 年 3 月 5 日, Microprocessor Report), in which the degree of (range) according to the degree of conversion (range) and code execution change.

然而,今天的计算机系统包含不止一个处理器(如与单处理器系统相对的多处理器系统)。 However, today's computer systems contain more than one processor (e.g., a single processor system with a relatively multiprocessor system). 传统计技术尚未很好地解决这些多处理器系统的仿真。 Biography statistical simulation technology is not yet good solution to these multi-processor systems.

另外,除了在这样的系统上仿真多个处理器以外,其它方面需要仿真的有处理器之间的各种形式的通信,以及决定多个处理器对存储器位置访问顺序的规则。 Further, in addition to the simulation of multiple processors on such a system outside, other aspects are to be simulated in various forms of communication between the processors, and the decision rule for multiple processors to access the memory location order.

SimOS和SimICS都试图仿真多处理器系统的行为。 SimOS and SimICS are trying to emulate the behavior of multi-processor systems. 但是它们没有使用多处理器系统作为宿主(host)计算系统。 But they do not use a multi-processor system as the host (host) computing system.

因而,传统技术没有解决在多处理器系统中的多处理器仿真的问题。 Thus, the conventional technique does not resolve in a multiprocessor system of the multiprocessor simulation problem.

这就是说,传统技术(及指令集结构)常局限于(并指的是)被仿真单个处理器系统,而今天的大多数系统是多处理器系统,尤其大系统(如超出个人计算机(PC)领域之外的)更是如此。 This means that traditional technologies (and instruction set architecture) is often restricted (and refers) is a single-processor system simulation, while most of today's system is a multi-processor system, especially in large-scale systems (eg beyond the personal computer (PC ) especially outside) fields. 因此,被用于以一个处理器简单地仿真另一个处理器的技术在多处理器系统环境中无法使用。 Thus, a processor is used to simulate a simple technique can not be used the other processor in a multiprocessor system environment. 就是说,当有多个处理器存在时,为单处理器系统设计的传统仿真技术不起作用。 That is, when there is the presence of multiple processors, designed for single-processor systems traditional simulation technology does not work.

发明内容 SUMMARY

鉴于传统方法和结构的上述及其它问题、障碍和缺点,本发明的一个目标是提供一种方法和结构,其中利用某种处理器指令集和存储器结构,使得一个多处理器系统可以有效地仿真使用其它处理器指令的另一个多处理器系统的行为。 In view of the above and other problems, obstacles and disadvantages of the conventional methods and structures, an object of the present invention is to provide a method and structure, wherein the processor instruction set and the use of a memory structure, such that a multi-processor system can effectively simulate Another use of other multi-processor instruction processor system behavior.

根据本发明的第一个方面,一种用于在具有被设计用来处理第二指令集结构的指令的多个处理器的宿主计算系统上执行为目标指令集结构所编写的程序的方法(和系统),包括,将设计用来在目标计算系统的处理器上运行的程序的每个部分表示为将在宿主计算系统上执行的一个或多个程序线程(thread),以及调度所述线程,以便在宿主系统的一个或多个处理器上执行。 According to a first aspect of the present invention, a method for performing the target instruction set architecture for programs written in the host computing system has been designed to handle the instruction of the second instruction set architecture of the plurality of processors for ( and system), including, the design on the target computing system to a processor running the program for each portion is represented as one or more threads (thread) to be executed on the host computing system, and the thread scheduler to execute on one or more processors of the host system.

根据本发明的第二个方面,一个系统(和方法),包括:将目标系统的硬件资源映射到软件线程的装置;将线程映射到宿主系统的硬件资源的装置;将目标系统的状态信息映射到宿主系统的存储器中的装置;以及通过将仿真任务划分为大量线程而提高仿真性能的装置。 A second aspect of the invention, a system (and method), including by:; means the thread is mapped to the host system hardware resources; the hardware resources of the target system software threads are mapped to the state of the target system device information map to the host system memory; and means by simulation tasks into a large number of threads and improve the performance of the simulation apparatus.

根据本发明的第三个方面,一种多处理器系统的宿主计算机的线程处理装置,包括:一个线程池,用于保持线程;一个线程处理器中,用于访问宿主系统的存储器,并决定从线程池中选择哪个线程用于仿真;一个线程生成器,用于创建新线程,并将所述新线程放入线程池;以及一个线程调度器,用于对线程池中保持的线程进行调度,该调度器扫描正在等待的线程并按照优先级次序将下一个线程分配给可用的处理器。 According to a third aspect of the present invention, a multi-processor host computer system threading apparatus, comprising: a thread pool, for holding the thread; one thread processor, for accessing the host system memory, and decides Choose which thread from the thread pool used for the simulation; a thread generator for creating a new thread, and a new thread into the thread pool; and a thread scheduler, for keeping the thread pool thread scheduling The scheduler scans waiting thread and follow the order of priority will be assigned to the next available processor threads.

根据本发明的第四个方面,一种以有形的方式体现机器可读指令的程序的信号承载介质,该程序由数字处理设备执行来完成在具有被设计用来处理第二指令集结构的指令的多个处理器的宿主计算系统上执行为目标指令集结构而编写的程序的方法,该方法包括把设计用来在目标计算系统的一个处理器上运行的程序的每一部分表现为在宿主计算系统上执行的一个或多个程序线程。 According to a fourth aspect of the present invention, a signal embodied in a tangible manner a machine-readable medium bearing instructions of the program, the program is executed by a digital processing apparatus to perform instructions have been designed to handle a second instruction set architecture The method of execution for the target instruction set architecture and the program written on a plurality of host processor computing system, the method comprising the performance of each part of the design to the target processor on a computing system running a program for the calculation in a host One or more threads execute on the system.

使用本发明的独有的和非显而易见的方面,对具有多个处理器的系统的仿真可以被高效地进行。 Use of the present invention is unique and non-obvious aspects of the system having a plurality of processors of the simulation can be performed efficiently. 进一步地,本发明使用宿主计算机进行该仿真。 Further, the present invention uses the host computer for the simulation.

此外,本发明的一个关键特征是摆脱了宾客系统(guest system)是一件硬件的概念。 In addition, a key feature of the present invention is to get rid of the guest system (guest system) is a concept of hardware. 代之以,本发明将宾客系统当作一个软件。 Instead, the present invention as a guest system software.

这样,宾客(guest)被更抽象地视为具有多个并行的、需要执行的线程,然后这些线程被映射到宿主机的硬件资源中。 Thus, the guests (guest) is regarded as a more abstract, with a plurality of parallel threads to be executed, then these threads are mapped to the host's hardware resources. 这从本质上消除了通常把程序中的并行性映射到宾客的硬件、然后宾客的硬件由宿主(机)的硬件仿真的中间步骤。 This eliminates the usual parallelism of the program mapped to hardware guests essence, then the guests by the host hardware (computer) hardware emulation of the intermediate steps. 本发明消除该中间步骤,即使知道在宾客上可能实际上存在硬件,其可能已映射到宿主的硬件。 The present invention eliminates the intermediate steps, even if they know the hardware may actually exist on the guests, it may have been mapped to the host hardware.

因而,本发明消除了将宾客(机)的应用的软件线程映射到宾客的硬件的步骤。 Accordingly, the present invention eliminates the step of clients (machine) application software threads are mapped to the guest hardware. 其后,宾客的这些线程中的每一个都被调度,用于在宿主的一个或多个处理器上运行。 Thereafter, guests Each of these threads are scheduled to run on one or more host processors.

本发明的另一个优点是建造和调试这样一个系统更容易了,因为不再需要操心使宾客机的硬件细节正确。 Another advantage of the invention is the construction and commissioning of such a system is easier, because you no longer need to worry about the details of the guest to make the hardware correctly.

附图说明 Brief Description

通过下面参照附图对本发明的优选实施例的详细描述,会使本发明的上述和其它的目的、方面和优点得到更好的理解,在附图中:图1示出一个目标多处理器计算系统100,包括多个处理单元、一个存储器子系统、一个一致性总线(coherent bus)互连,和一个输入/输出(I/O)处理器;图2示出在图1所示系统100上执行的各种指令的一个分类方案200;图3示出一个宿主多处理器计算系统300,包括多个宿主处理单元,一个宿主存储器子系统,一个一致性宿主总线互连,和一个宿主I/O处理器;图4示出将目标系统100中的各种资源映射到400宿主系统300的存储器中;图5示出宿主系统300上的一个线程处理软件结构500;图6示出一个系统,其中存储器访问使用特定于线程的存储器可以被更快地进行;图7示出一个系统700,用于将目标系统的功能简单地映射到宿主系统的各线程中;图8示出一个系统800,用于从图7所示线程到多重处理宿主系统中的处理器的普通映射(trivial mapping);图9示出一个系统900,用于将图7所示线程更有效地映射到多重处理宿主系统中的处理器;图10示出一个更一般的系统1000,其可以被映射到一个多重处理宿主系统上;图11示出一个仿真方案1100,它将转换(结果)缓存以便重用(reuse)。 Detailed description of the preferred embodiment of the present invention by referring to the following drawings, will make the aforementioned and other objects, aspects and advantages of the present invention will be better understood, in which: Figure 1 shows a multi-processor computing target system 100 includes a plurality of processing units, a memory subsystem, a consistency bus (coherent bus) interconnection, and an input / output (I / O) processor; Fig. 2 shows the system 100 shown in FIG. 1 A classification scheme of execution of various instructions 200; FIG. 3 shows a multi-processor host computing system 300, including a plurality of host processing unit, a host memory subsystem, a consistency host bus interconnect, and a host I / O processor; Figure 4 shows the mapping of the target system 100 in a variety of resources 300 to 400 of the host system memory; Fig. 5 shows the structure of a thread processing software 300 on the host system 500; Fig. 6 shows a system, wherein the particular memory access can be used to quickly carry out the thread of the memory; Figure 7 shows a system 700 for the function of the target system is simply mapped to each of the threads in the host system; FIG. 8 illustrates a system 800, from the thread shown in Figure 7 for the multiprocessing host system processor normal map (trivial mapping); Figure 9 shows a system 900, shown in Figure 7 for the thread is mapped to more effectively multiprocessing host system processor; FIG. 10 shows a more general system 1000, which can be mapped to a multi-processing system on a host; FIG. 11 shows a simulation program 1100, it is converted to (result) cache for reuse (reuse).

图12示出一个系统1200,用于产生并行转换线程;图13示出一个系统1300,它是图9的增强,以容纳附加的转换线程;以及图14示出一种信号承载介质1400(例如存储介质),用于保存按照本发明的程序的步骤。 Figure 12 shows a system 1200 for generating parallel conversion thread; Fig. 13 shows a system 1300, which is enhanced in FIG. 9, to accommodate the additional conversion thread; and Figure 14 illustrates a signal bearing medium 1400 (e.g. storage medium), according to the program of the present invention is a step for saving.

具体实施方式 DETAILED DESCRIPTION

现在参照附图,更具体地说参照图1至14,其中示出了根据本发明的方法和结构的优选实施例。 Referring now to the drawings, and more particularly with reference to Figs 1-14, there is shown a method and structure according to a preferred embodiment of the present invention.

图1示出了一个用于被仿真的一般多重处理系统100。 Figure 1 shows a general multi-processing system 100 and one for the emulated. 它包括多个处理器110A至110D,其中每一个都可能有自己的局部高速缓存(localcaches),并通过某互连网络120与存储器分层体系结构(memoryhierarchy)130连接,该存储器分层体系结构可能包括由主存储器(未示出)支持(备份)的附加的高速缓存级别。 It comprises a plurality of processors 110A to 110D, each of which may have its own local cache (localcaches), connected to a memory 130 and a layered architecture (memoryhierarchy) via an interconnection network 120, the hierarchical memory architecture may includes a main memory (not shown) support (back up) the additional cache level. 该系统也可以通过I/O处理器140访问包括盘(磁盘、光盘)和通信网络的I/O设备,该I/O处理器将来自系统的进向请求格式化为设备能理解的形式。 The system can also be I / O processor 140 includes a disc access (disk, CD-ROM) and a communication network I / O device, the I / O processors from the system into the format for the requesting device can understand form. 显然,该系统不限于如图所示的4个处理器等,而是事实上可以使用任何数量的处理器等。 Obviously, this system is not limited to four processors, etc. as shown, but in fact can be used any number of processors and the like.

图1中的每个处理器110A-110D都可以被看作好像正在执行影响到系统100的状态的指令。 Figure 1 each processor 110A-110D can be seen as if the state of the instruction being executed affect system 100. 每个指令的效果可按如图2的方案200所示进行分类。 The effect of each instruction can be classified as shown in Figure 2 200 program.

例如,一个指令可以根据其是否只影响执行它的处理器的局部资源还是影响在所有处理器中所共享的资源,而宽范地划分为“局部资源指令”或“共享资源指令”。 For example, an instruction can be executed according to whether it affects only its local resources in the processor or affect all processors in the shared resource, and divided into a wide range of "local resource instruction" or "shared resource instruction." 局部资源的例子有每个处理器局部的通用寄存器、浮点寄存器、处理器状态寄存器和控制寄存器。 Examples of local resources, each processor has a local general-purpose registers, floating point registers, the processor status and control registers. 共享资源可以包括存储器和I/O设备。 Shared resources may include memory and I / O devices.

对共享资源的仿真必须格外小心,因为多个处理器可能试图在一个给定的期间访问这些资源。 Simulation of shared resources must be careful, because multiple processors may attempt in a given period of access to these resources. 重要的是,在仿真的系统中共享资源的访问的顺序应当与被仿真的系统中可能发生的顺序相同。 Importantly, the order of shared resources in a simulation system access should be simulated with the sequence that may occur in the same system.

为了对此进行有效的管理,共享资源指令进一步被划分为(a)“排它指令”,意即它们访问由执行处理器独占(排它)使用的共享资源,(b)“共享读指令”,意即该指令所使用的共享资源只被读取而不被改变,以及(c)“通信指令”,其包括所有其它共享资源指令。 In order to effectively manage this shared resource instructions are further divided into (a) "exclusive command," meaning they are accessed by the processor to perform an exclusive (exclusive) use of shared resources, (b) "shared reading instruction." , which means that the shared resource is only used by the instruction is read without being changed, and (c) "communication command", which includes all other shared resource instruction. 之所以称为通信指令,是因为它们通常被用于从一个处理器到另一个处理器之间传递信息,如一个处理器写入一个值,而另一个或多个处理器读取该值。 It is called the communication command, because they are usually used to pass information between from one processor to another, such as a processor writes a value, and one or more other processors read the value.

图3显示了一个希望在其上进行仿真的宿主系统300。 Figure 3 shows a simulation of a desired host system 300 thereon. 它在物理结构上与图1的目标系统100相似,尽管某些细节可能不同。 It is similar in the physical structure of the target system 100 of Figure 1, although some of the details may differ.

例如,宿主处理器的指令集可能与目标处理器的不同。 For example, the host processor's instruction set may be different from the target processor. 处理器310A至310D的数量和存储器大小也可能不同。 Number of processors and memory size 310A to 310D may be different. 访问共享资源的互连网络320在形式上和功能上都可能不同。 Access to shared resources interconnection network 320 may be different in form and function. 我们将假设宿主系统被配置为SMP(symmetric multiprocessing,对称多重处理)配置。 We will assume that the host system is configured to SMP (symmetric multiprocessing, symmetric multiprocessing) configurations. 这一点隐含的意思之一是宿主(机)中的所有处理器310A至310D都将访问相同的存储器,并在访问存储器位置时具有大体相似的延迟。 One meaning is implied by this host (machine) all processors 310A to 310D will access the same memory, and having a substantially similar delay in accessing the memory location. 图中也示出了宿主I/O处理器340,它与图1所示处理器140相似。 The figure also shows the host I / O processor 340, which is similar to the processor 140 shown in FIG.

目标系统中每个资源的状态是通过分配宿主系统300的存储器的区域而被建模的。 The status of each resource in the target system through the distribution area of the memory of the host system 300 is modeled. 这里假设仿真是在共享虚拟存储器操作系统环境中进行的。 It is assumed that the simulation is in a shared virtual memory operating system environment performed. 这提供了可以容纳目标系统的所有真实资源的宿主存储器大小(尺寸)。 This provides the host system memory size to accommodate all of the real target resource (size).

图4示出了宿主(机)300的虚拟存储器(VM)400的分解,它仿真了目标系统100的各种资源。 Figure 4 illustrates a host (machine) virtual memory (VM) 400 300 Decomposition, which simulates a variety of resources 100 of the target system. VM 400包括共享资源410(如目标真实存储器)、处理器局部资源420(如通用寄存器、浮点寄存器、程序计数器、控制寄存器等)、I/O局部资源和仿真程序存储器。 VM 400 includes shared resource 410 (e.g., target real memory), the local resource processor 420 (such as general registers, floating point registers, program counter, control registers, etc.), I / O resources and local emulation program memory.

我们注意到,64位虚拟寻址宿主(机)完全可以适应数十甚至数百千兆字节的真实存储器,以及数百个处理器的局部资源,而仍然有足够的可用寻址能力用于仿真程序本身。 We note that 64-bit virtual addressing the host (machine) can adapt to real dozens or even hundreds of gigabytes of memory, as well as hundreds of local resources, processors, and still have enough available capacity for addressing simulation program itself.

除了共享存储器SMP,我们还将假设操作系统支持多线程。 In addition to the shared memory SMP, we will assume that the operating system supports multi-threading. 此种支持的一个例子是Unix操作系统中的p-threads软件包(例如,参见BradfordNichols等人的“Pthreads Programming:A POSIX Standard for BetterMultiprocessing,”(O'Reilly Nutshell),1996年9月)。 An example of such support is Unix operating system p-threads package (for example, see BradfordNichols et al "Pthreads Programming: A POSIX Standard for BetterMultiprocessing," (O'Reilly Nutshell), 1996 September 2001). 这样的软件包(package)允许在软件中创建并行执行的多个程序流,同时也允许在这些程序流之间安全地共享变量。 Such a package (package) allows the software to create the parallel execution of multiple program streams, but also allows the sharing of variables between these procedures safely flow. 此外,这些软件包通常也提供了工具(手段)来繁育(spawn)新线程,“杀死”(终止)线程,以及中断或唤醒线程。 In addition, these packages usually provide tools (means) to breed (spawn) a new thread, "kill" (termination) threads, threads and interrupt or wake.

我们注意到,存在如图1至4所示的系统,本发明可以示例性地在其上实施。 We note that the system 1-4 shown in FIG exists, can exemplarily embodiment of the present invention thereon. 确实,本发明的一个目的就是实施本发明,而无需更改宿主系统中的物理硬件(如插入任何硬件修改),从而根据本发明进行仿真。 Indeed, an object of the present invention is to implement the present invention, the host system without having to change the physical hardware (such as inserting any hardware modifications), thereby performing the simulation in accordance with the present invention.

图5示出了根据本发明的一个线程处理系统500。 Figure 5 shows a thread processing system according to the present invention 500. 系统500包括一个线程处理器510、一个线程生成器520、一个线程池530和一个线程调度器(scheduler)400。 System 500 includes a processor 510 thread, a thread generator 520, a thread pool 530, and a thread scheduler (scheduler) 400.

如图5所示,线程处理器(引擎)510决定从线程池530中选择哪个线程用于仿真,以此处理(调度)线程池530中保持的线程。 As shown in Figure 5, the thread processor (engine) 510 determines which thread from the thread pool to select 530 for the simulation, whereby the process (scheduler) 530 held in the thread pool threads.

在处理线程的操作中,有时线程处理器决定必须创建一些新线程。 In operation processing thread, the thread processor sometimes decisions must create some new threads. 因而,线程被线程生成器520创建,并被放置在线程池530中。 Thus, the thread is the thread generator 520 to create, and placed in the thread pool 530. 线程调度器540扫描等待的线程,并按优先次序将下一个线程分配给一个可用的处理器。 Thread scheduler 540 scans the waiting threads, and prioritize the next thread assigned to an available processor.

线程在一个处理器中的执行涉及到读宿主虚拟存储器400中的某些位置,并修改这些或其它位置。 Thread of execution in a processor related to the virtual host memory 400 is read in a certain position, and modify these or other locations. 因此,线程处理器与宿主虚拟存储器400交互,从而也把宾客系统的存储器映射到宿主的存储器中。 Therefore, the thread processor 400 interacts with the host virtual memory, thus the memory is mapped to the guest system memory of the host. 对于该线程处理器可用的只是宿主虚拟存储器,因此会发生这种映射。 For this thread processor available only host virtual memory, so this mapping will occur.

我们注意到,如果图5所示的模型不可用,则在传统系统中所发生的将是事先确定什么线程存在(例如,假设每一个宿主处理器是一个线程),然后使用宿主上可用的(线程)进行一到一的映射。 We note that, if the model shown in Figure 5 is not available, then occurring in conventional systems determine in advance what will be the presence of a thread (e.g., assume that each host processor is a thread), then use the host available ( thread) in one to one mapping. 如上所述,这种技术存在许多问题。 As mentioned above, there are many problems with this technology.

因此,本发明使用了创造性的线程处理器500,以决定需要创建和调度什么线程。 Accordingly, the present invention uses creativity thread processor 500, to determine what needs to create and dispatch a thread.

假设一个大的线程池500,随着可用来处理这些线程的处理器的数量的增加,系统的效率提高。 Assuming a large pool of threads 500, as can be used to deal with these threads the number of processors increases, the efficiency of the system is improved. 然而,这种效率可能会受到如图2中所定义的通信类型的指令的数量的限制。 However, this may be limited by the efficiency of the communication type of instruction 2 as defined in Fig number.

即使目标系统的整个真实存储器是共享的,也常常能够将该存储器进一步划分为三个类别,这三个类别与图2所示共享资源指令的三个子类相对应。 Even if the entire target system's real memory is shared, the memory can often further divided into three categories, the three categories as shown in Fig. 2 shared resource instruction corresponding to the three subclasses.

这些区域是(a)“排它访问区域”,(b)“只读区域”,和(c)“其它共享区域”。 These areas are (a) "exclusive access area", (b) "read-only area", and (c) "other shared region." 排它访问区域是只能由单个线程访问的那些区域。 Exclusive access areas are those areas accessible only by a single thread. 只读区域可以被多个线程访问,但从来不被修改。 Read-only area that can be accessed by multiple threads, but never modified. 因此,复制这些区域,并将一个拷贝(复本)包括作为该线程的局部排它区域的一部分是允许的。 Thus, replication of these regions, and a copy (duplicate) comprises a partial discharge of the thread part of its area is allowed.

其它共享区域应当以不同方式对待。 Other shared area should be treated differently. 例如,如果为了有效的局部访问而制作了复本,则重要的是一个线程所作的更改应当正确地传达给所有其他可能正在访问或将在未来访问该相同区域的线程。 For example, if the local access and in order to effectively produce the replica, it is important that a thread should be made to correct for changes communicated to all other threads that may be accessing or future access to the same region.

图6示出了使用特定于线程的存储器610如何可以使存储器访问的速度更快。 Figure 6 illustrates the use of thread-specific memory 610 how you can make faster memory access. 就是说,线程处理器510可以访问线程指定(局部)存储器而进行快速访问,而宿主虚拟存储器的共享部分是以受保护访问方式被访问的。 That is, the thread processor 510 can access the thread of the specified (local) memory and fast access, and share part of the host's virtual memory is protected access is accessed.

因此,为了提高效率,存储器可以被划分为多个(例如两个)部分。 Therefore, in order to improve efficiency, the memory may be divided into a plurality (e.g., two) sections. 第一个部分是通信最少的部分,而第二个部分是在线程之间有大量通信的部分。 The first part is the least part of the communication, and the second part is the part of the communication between a large number of threads.

因而,如果存储器的一些部分是专用于每个线程的,则可以使这些部分成为快速访问存储器,而需要彼此“对话”(例如,需要被共享)且可能不需要快速访问(例如,由于必须检查它们的权限,以确定是否允许这种访问等)的线程则被形成宿主虚拟存储器的一个共享部分。 Thus, if some portion of the memory is dedicated to each thread, so that these portions can become fast access memory, and need each "conversation" (e.g., needs to be shared), and may not require fast access (e.g., since it is necessary to check their permissions, to determine whether to allow such access, etc.) to form a thread that were part of the host virtual shared memory. 因此,通过把存储器划分为两个部分,就可实现更快的总的存储器访问速度。 Thus, by dividing the memory into two parts, can achieve a faster overall memory access speed.

图7示出了把仿真一个目标多重处理系统的任务(功能)直接映射700到宿主系统的线程。 Figure 7 shows the task (function) to simulate a target multiprocessing system 700 is directly mapped into the host system threads.

每个处理器710A、710B、710C、710D等及其资源都分别被仿真作为线程720A、720B、720C、720D等。 Each processor 710A, 710B, 710C, 710D and so its resources are being emulated as a thread 720A, 720B, 720C, 720D, respectively. 同时示出的还有I/O处理器730和I/O线程740。 While also shown I / O processor 730 and I / O thread 740. 显然,本发明不要求处理仅仅是与传统意义上的处理器有关的处理,而是也包括I/O处理器、IBM 390系统的通道、某些系统中的协处理器等。 Obviously, the present invention does not require processing and processing only the traditional sense of the processor-related, but may also include I / O processor, IBM 390 channel system, some systems coprocessors.

另外,提供了一个系统线程750,它包括目标系统的所有不是特定于处理器的功能760,以及所有仿真系统本身的功能,包括处理线程创建、线程间通信、调试和性能监视方面的任务。 In addition, the system provides a thread 750, which includes all non-specific target system 760, as well as all the simulation system itself functions in the processor functionality, including thread creation, task communication between threads, debugging and performance monitoring aspects.

我们注意到,图7所示概念可以在单个处理器上实现,其中一个单个处理器处理来自宿主的(多个)线程。 We note that the concept shown in Figure 7 may be implemented on a single processor, wherein a single processing from the host processor (s) of threads. 就是说,使用单个处理器,可以认为该概念是一个多重程序设计系统(multiprogramming system),其中在单个处理器上不断地发生各种线程之间的切换。 That is, using a single processor, it is considered that the concept is a multi-programming system (multiprogramming system), wherein the switching between the various threads continuously occur on a single processor. 仿真系统本身处于包含前述线程软件包的共享存储器SMP操作系统之下。 Simulation of the system itself is under the shared memory contains the aforementioned thread SMP operating system package.

图8示出了一个多处理器系统,其中的线程软件包可被编写为将每个线程820A、820B、820C、820D等映射到宿主处理器810A、810B、810C、810D等中的一个(与上述单处理器的情况形成对比)。 Figure 8 shows a multi-processor system, which can be written as the thread package to each thread 820A, 820B, 820C, 820D, etc. are mapped to the host processor 810A, 810B, 810C, 810D and other one (and the case of the single-processor contrast). 此外还示出了I/O线程840被映射到宿主处理器810E,以及系统线程850被映射到宿主处理器810F。 Also shows an I / O thread 840 is mapped to the host processor 810E, and the thread 850 is mapped to the system host processor 810F.

图8所示方法的优点是,进行仿真的各宿主处理器之间的物理通信限于各线程本身之间所发生的通信。 Advantage of the process shown in Figure 8, the physical simulation of communication between the various host processors is limited to communications between each of the threads themselves. 由于线程紧密地映射到该被仿真的目标系统的结构,因而宿主的通信行为与目标系统的通信行为是相似的。 Since the thread closely mapped to the structure of the target system to be simulated, and thus the communication with the communication behavior of the target system behavior is similar to the host.

然而,伴随图8所示此种方法的一个缺点(例如,该方法暗示了线程和宿主处理器之间的一对一的关系)是宿主系统有可能未充分利用。 However, one drawback associated with such a method as shown in FIG. 8 (e.g., the method implies the one to one relationship between the thread and the host processor) is the host system may not be fully utilized. 就是说,在每个宿主处理器都分别专用于一个单个线程的系统中,如果被仿真的目标处理器中的一个处于空闲状态,则相应的宿主处理器也不会被充分利用。 That is, in each of the host processors respectively dedicated to a single thread system, if the simulated target processor in an idle state, then the corresponding host processor will not be fully utilized.

该技术的另一个缺点是其可伸缩性。 Another disadvantage of this technique is its scalability. 如果宿主系统具有比目标系统多很多的处理器,则很多额外的处理器就不能被适当利用。 If the host system has a lot more than the target system processor, the processor can not be a lot of extra proper use. 相反,如果宿主系统具有较少的处理器,则线程到处理器的一对一映射将只能通过将多个目标处理器映射到相同线程来实现。 Conversely, if the host system has fewer processor, the thread-one mapping to the processor by a plurality of targets will only be mapped to the same processor threads to achieve.

图9示出了一个系统900,它提供了避免上述某些问题的解决方案,其中包括一个宿主处理器簇(cluster)910、一个线程调度器920、一个仿真线程簇930。 Figure 9 illustrates a system 900, which provides some solutions to avoid the aforementioned problem, which includes a host processor cluster (cluster) 910, a thread scheduler 920, a simulation thread cluster 930. 正如下面将要讨论的,由于图9所示系统采取行动平衡各宿主处理器的负载,更高的效率得以产生。 As will be discussed, because the system shown in Figure 9 act balance the load on the host processor, a higher efficiency can be produced. 确实,可能会有某些期间,其中某些处理器完全空闲而某些则完全超负荷。 Indeed, there may be some period, some of the processor completely free and some are completely overloaded. 包括线程调度器920的图9所示系统用于使负载平滑(平衡)。 9 comprises the system shown in FIG thread scheduler 920 for smoothing the load (balanced). 更具体地说,线程调度器920决定何时把哪个线程放在何处(例如哪个宿主处理器),从而使负载均衡最优化。 More specifically, the thread scheduler 920 determines which thread on when to where (e.g., which host processor), so that the load balancing optimization.

这样,如前所述,当线程的数量增多时,动态线程映射相对于静态映射的优势将增大。 Thus, as described above, when the number of threads increases, dynamic thread map with respect to the static mapping will increase the advantages. 因而,有可能重新构造仿真系统,以提供更多的并行线程,而不是如图7所示的每个处理器一个线程的方案。 Thus, it is possible to reconstruct the simulation system, to provide more parallel threads, instead of each processor 7 shown in FIG program of a thread.

图10示出如何把一个更一般的系统1000映射(仿真)到多重处理宿主系统上。 Figure 10 shows how a more general mapping system 1000 (simulation) to the multiprocessing host system. 系统1000可以包括一个宿主处理器簇1010、一个线程调度器1020、一个或多个仿真线程簇1030和多个宿主应用线程1040。 System 1000 may include a host processor cluster 1010, a thread scheduler 1020, one or more simulation threads clusters 1030 and 1040 more host application threads.

就是说,本发明不仅可用于多重处理主机,也可用于如图10所示类似于IBM 390系统的主机,其中有一个簇配置,它具有多个不同的相互之间通信的多处理器。 That is, the present invention is applicable not only to multiprocessing host, can also be used as shown in FIG IBM 390 system similar to the master 10, which has a cluster configuration, having a plurality of multi-processors of different communication among themselves. 因此,即使这样的系统也可以在上述类型的多重处理宿主上被仿真。 Therefore, even if such a system can also be simulated on the type of multiprocessing host.

此外,本发明系统不仅限于仿真。 Further, the present invention is not limited to the simulation system. 就是说,该特定系统不仅限于进行仿真,而是例如,应用(如Linux)可以直接在该宿主上运行,此时应用并没有处于仿真之中而是在宿主上作本机运行。 That is, the system is not limited to this particular simulation, but, for example, applications (such as Linux) can be run directly on the host, then the application does not work in simulation but being run natively on the host. 在这种情况下,宿主应用线程1040可以在宿主上运行,并由线程调度器1020调度/管理(该线程调度器也管理仿真线程簇的线程)。 In this case, the host application threads can run on a host 1040, 1020 by the thread scheduler scheduling / management (the thread scheduler also manages cluster emulation thread thread). 因此,本发明不仅对于仿真,而且对于直接在宿主上运行应用都有极大的用处。 Accordingly, the present invention is not only for simulation, but also for running directly on the host application has a great usefulness.

我们注意到,Herrod的关于SimOS的上述文章和美国专利No.5,832,205已指出,仿真系统的性能可以通过如下进行缓存而得到极大提高。 We note that, Herrod the above article on SimOS and US Patent No.5,832,205 already pointed out, the performance of the simulation system can be carried out by caching greatly improved.

就是说,如果预期一组指令将被执行数次,则首先将其由目标指令集转换为宿主指令集,然后将转换结果(translation)保存在一个称为“转换高速缓存”的特殊存储器区域中。 That is, if the expected set of instructions to be executed a number of times, it is first converted to a host instruction set by the target instruction set, then the conversion result (translation) stored in the special memory area called "transcode cache" in . 当此后遇到此组指令的地址时,则直接执行本机宿主指令。 When subsequent encounters address this set of instructions, the direct implementation of the native host command. 通过避免重新取目标指令和重新转换这些指令,指令组的执行显著加快。 By avoiding re-fetch target instruction and re-conversion of these instructions, the execution was significantly accelerated. 通过分析这组指令并使产生的转换结果最优化,可以获得进一步的益处。 By analyzing the set of instructions generated by the conversion results and optimization, a further benefit can be obtained. 此种最优化的一些例子可参见作为参考包含于此的美国专利No.5,832,205。 Some examples of such optimization can be found as U.S. Patent No.5,832,205 incorporated herein by reference in.

通过缓存转换结果而获得的益处既依赖于转换指令组所需要的努力,也依赖于被转换的(指令)组最终被执行的次数。 Benefits by caching the conversion result obtained depends both on the efforts needed to convert the instruction set in, is also dependent on the number of converted (instruction) group eventually executed. 由于对大多数代码而言,后者是无法预测的,因而使用了试探法(heuristics)来决定潜在的转换候选对象。 Because for most of the code, the latter is unpredictable, thus the use of heuristics (heuristics) to determine the potential conversion candidates.

图11显出了执行以上操作的一个方法1100,一个简单的试探法是记录一个给定指令组过去执行的次数,并在该计数超过预定阈值时转换该(指令)组。 Figure 11 show a method of performing more than 1100 operations, a simple heuristic is to record the number of times a given set of instructions executed in the past, and the conversion of the (command) when the count exceeds a predetermined threshold value set.

图11示出一个转换表1110,它是由下一次被执行的指令组的地址索引的。 Figure 11 shows a conversion table 1110, which is indexed by the address of the next instruction to be executed is set.

在步骤1120中,如果在转换表1110中有相应于此地址的有效的条目,则它指向已转换的本机指令应当从中读取并执行的位置。 In step 1120, if there is a valid entry corresponding to this address, then it points to the converted native instructions should be read therefrom and executed in the conversion table 1110 in position.

如果没有如步骤1120所确定的有效条目,则目标指令直接被解释(步骤1130),并且与该指令组相关联的计数器被递增(步骤1140)。 If there is no valid entry determined in step 1120, then the target instructions are interpreted directly (step 1130), and is associated with the instruction set of the counter is incremented (step 1140).

如果该计数器超过了一个阈值(步骤1150),例如如果该组已被解释了5次,则该指令组即被调度进行转换(步骤1160)。 If the counter exceeds a threshold value (step 1150), for example, if the group has been interpreted by 5 times, the instruction set Jibei dispatch conversion (step 1160).

如果在步骤1120中,确定该指令已经被转换(即判断结果为“是”),则通过访问转换高速缓存1175,在步骤1170中执行缓存的指令组转换结果。 If, in step 1120, it is determined that the instruction has been converted (i.e., the determination result is "YES"), then converted by accessing the cache 1175, in step 1170 to perform caching instruction set conversion result.

然后,在步骤1180中,确定下一个要被仿真的指令组。 Then, in step 1180, it is determined to be a next instruction set simulation.

在如美国专利No.5,832,205所描述的系统中,被仿真的线程执行转换或是在发现临界条件满足之时,或是在下次执行程序组之前。 In the system as described in U.S. Patent No.5,832,205, the thread being executed in the simulation of the conversion or the critical conditions of the discovery, or the execution of the program until the next group. 线程执行转换所花费的时间本来可能被用于开始执行下一组指令,因此它代表了系统的一种开销。 The time it takes to perform the conversion thread might otherwise be used to begin the next set of instructions, and therefore it represents a system overhead.

一个更有效的方法是线程简单地把指令组放入转换池(如1190),并继续下一组指令的执行。 A more effective method is to simply put the thread of instructions to convert into pool (such as 1190), and proceed to the next set of instructions executed. 这样,当转换已经被完成时,转换的目标(object)被放入转换高速缓存,带有转换表1110中的一个指针,正如下面将讨论的图12所示的。 Thus, when the conversion has been completed, the conversion of the target (object) is converted into the cache, with the conversion table 1110 a pointer, as shown in FIG. 12, as will be discussed below.

图12示出了用于产生并行转换线程的系统1200。 Figure 12 illustrates a system 1200 for generating the parallel converter threads.

在图12中,另一个称为转换池管理器(即图12所示的1210)的线程,监视转换池(图11所示的1190),以挑选需要转换的指令组,并独立于处理器线程的执行。 In Figure 12, another known converter pool manager (i.e., 1210 shown in Figure 12) of the thread monitor conversion cell (Fig. 11 1190), the selection needs to be converted to an instruction group, and independent of the processor thread of execution. 进一步讲,转换池管理器1210本身也不需要执行转换。 Further, the conversion Pool Manager 1210 itself does not need to perform the conversion.

由于转换一组指令的处理过程大体上独立于转换另一组指令的处理过程,转换池管理器1210可以繁育(spawn)几个线程,其中每个线程都对来自转换池的一组指令执行转换,如图12所示。 Since the conversion process a set of instructions is substantially independent of the conversion process another set of instructions, the pool manager 1210 can converter breeding (spawn) several threads, wherein each thread performs the conversion of a set of instructions from the conversion pool , as shown in Figure 12.

在图12中,转换池管理器1210从转换池1190中选择一个指令组进行转换。 In Figure 12, the pool manager 1210 converter to select a group of instructions from the conversion tank 1190 for conversion. 转换池管理器1210更新转换表1110,然后轮流将线程提供给转换线程调度器1220。 Conversion Pool Manager 1210 updates the conversion table 1110, and then turns to the converter thread thread scheduler 1220. 转换线程调度器调度转换线程1230A、1230B、1230C、1230D等,并将它们写入转换高速速缓存1240。 Converter converts the thread scheduler schedules thread 1230A, 1230B, 1230C, 1230D, etc., and writes them into the high-speed cache 1240.

此系统/操作的效果本质上是将仿真任务进一步划分为可由宿主多重处理系统更好地利用的独立平行线程。 The effect of the nature of this system / operation of the simulation task is further divided into independent parallel processing by multiple host systems to make better use of threads. 因而,图12的系统利用了转换特性,把它纳入其中,并将它映射到本发明系统的框架中。 Thus, the system of Figure 12 utilizes the conversion characteristics, put it into them, and map it to the frame system of the present invention. 因而,这种缓存转换结果(并实际上执行该转换)的特性加强了由本发明系统进行的仿真(以及所处理的线程类型)。 Thus, this conversion result cache (and actually execute the conversion) features enhanced emulation (and the type of thread being processed) by the system of the present invention.

图13示出了系统1300,它包括一个宿主处理器簇1310、一个线程调度器1320和一个仿真线程簇1330。 Figure 13 shows a system 1300, which includes a host processor cluster 1310, a thread scheduler 1320 and an emulation thread cluster 1330. 图13是图9的修改视图,它具有可以提高宿主系统的性能和利用率的附加的线程。 Figure 13 is a modified view of Fig. 9, which has improved performance and utilization of the host system of additional threads. 这种附加线程的益处进一步延伸到如图10所示的多簇大型机仿真系统。 This additional benefit of further extending the thread to multibank mainframe simulation system 10 shown in Fig. 因而,尽管图13为了清楚和简洁的缘故,只示例性地示出了一个仿真线程簇,可以如图10所示提供多个这样的仿真线程簇,以及如图10所示的各宿主应用线程1040。 Thus, although FIG. 13 for the sake of clarity and brevity, only exemplarily illustrates a simulation thread cluster, shown in Figure 10 can provide a plurality of such simulation threads cluster, shown in Figure 10 and the respective host application thread 1040.

除了如上所述的硬件/软件环境,本发明的另一个不同方面包括一个由计算机实施的、用于完成上述方法的方法。 In addition to the hardware / software environment described above, a different aspect of the present invention comprises a computer implemented method for the above method. 作为一个例子,该方法可以在如上讨论的特定环境中实施。 As an example, the method may be implemented in the particular environment discussed above.

例如,这样一种方法可以通过操作多处理器系统中的、如数字数据处理器所体现的计算机,以执行机器可读的指令序列而实现。 For example, such an approach can be multi-processor operating systems, such as digital data processor embodied in a computer to perform a sequence of machine-readable instructions and implementation. 这些指令可以存在(驻留)于各类信号承载介质中。 These instructions may be present (resident) in various types of signal-bearing medium.

因此,本发明的这个方面被指向编程产品,包括以有形的方式体现机器可读的指令的程序的信号承载介质,该指令程序可以由包含处理器/中央处理单元(CPU)和上述硬件的多重处理系统中的一个或多个数字数据处理器执行,以实施本发明的方法。 Thus, signal-bearing medium of this aspect of the invention is directed to programming product, comprising embodied in a tangible manner a program of machine-readable instructions, which may be composed of multiple instruction program comprising processor / central processing unit (CPU) and said hardware processing system of one or more digital data processors, to implement the method of the invention.

例如,该信号承载介质可以包括CPU中包含的RAM,如由快速访问存储装置所代表的。 For example, the signal-bearing medium may comprise a CPU contained in RAM, such as by a quick access to the storage device represented. 可替代地,该指令也可以被包含在其它信号存储介质中,如包含在可由CPU直接或间接访问的磁性数据存储盘1400(图14)中。 Alternatively, the instruction may be included in the other signal storage medium, such as a CPU can be included in the direct or indirect access to the magnetic data storage disc 1400 (Fig. 14).

不管是包含在磁盘1400中,还是包含在计算机/CPU中,或包含在其它地方,该指令可以被存储在各种机器可读的数据存储介质中,如DASD存储(如传统的“硬盘驱动器”或RAID阵列)、磁带、电子只读存储器(如ROM、EPROM或EEPROM)、光存储设备(如CD-ROM、WORM、DVD、数字光带等)、纸质“穿孔”卡或其它适当的信号承载介质,包括传输介质,如数字的和模拟的和通信链路和无线链路。 Whether it is included in the disk 1400, or included in the computer / CPU in, or included in other places, the instructions may be stored in a variety of machine-readable data storage medium, such as DASD storage (such as the traditional "hard drive" or a RAID array), magnetic tape, electronic read-only memory (e.g., ROM, EPROM, or EEPROM), an optical storage device (such as a CD-ROM, WORM, DVD, digital optical tape, etc.), paper "punch hole" cards, or other suitable signal bearing media, including transmission media such as digital and analog and communication links and wireless links. 在本发明的一个说明性实施例中,机器可读的指令包括软件目标代码,该目标代码是由诸如C等语言编译而成的。 In one illustrative embodiment of the present invention, the machine-readable instructions comprise software object code, the object code is compiled by a language such as C, etc. made.

尽管本发明被按照几个优选实施例进行了说明,但本领域的专业人士能够认识到,实施本发明时,可以在所附权利要求的精神和范围内进行修改。 Although the present invention has been described in accordance with embodiments of several preferred, but professionals in the art will recognize, the practice of the present invention, may be modified within the spirit and scope of the appended claims.

我们注意到,如上所述,本发明有很多益处,包括有效的仿真。 We note that, as described above, the present invention provides many benefits, including effective simulation. 此外,本发明可以被用作进行有助于负载均衡的“虚拟化”(virtualization)的基础。 Further, the present invention can be used as a "virtual" (virtualization) basis helps load balancing. 虚拟化可以采取各种形式,包括负载均衡。 Virtualization may take various forms, including load balancing. 例如,虚拟化也可用于容错,其中在一个具有一对一映射(或其它映射方案)的系统中,如果一个处理器发生故障,系统仍可以继续工作,因为发生故障的系统可以得到抽象,并可以有一个较小的宿主处理器池。 For example, virtualization can also be used for fault tolerance, which in a one to one mapping (or other mapping scheme) system, if a processor fails, the system can continue to work, because the system failure can be abstract, and You can have a smaller pool of host processors. 因而,另一个处理器可以承担起发生故障的处理器的职责。 Thus, another processor can assume responsibility faulty processor.

本发明的另一个可能应用是在节能方面。 Another application of the present invention may be in energy efficiency. 就是说,当确定在多处理器系统中消耗了过多能量后,某些处理器可以关闭,而整个仿真系统的功能可以在较低的能量水平上维持。 That is, when it is determined excessive amount of energy consumed in a multi-processor system, the processor may turn off some, but the function of the simulation system can be maintained at lower energy levels. 在这种情况下,被关闭的处理器的功能可以被转移到另一个处理器上。 In this case, the function of the processor is turned off can be transferred to another processor. 通过本发明做到这一点比较容易,因为每个处理器被视为一个线程,而不是线程到处理器的一对一映射。 By the present invention is relatively easy to do this, because each processor is treated as a thread, not one mapping of threads to processors.

就是说,本发明的方案是多对多映射,而不是上述Herrod文章中SimOS技术的多对一映射,或美国专利No.5,832,205的一对一映射方案。 That is, the present invention is a many to many mapping, rather than by the foregoing Herrod article many to one mapping SimOS technology, or U.S. Patent No.5,832,205 is one mapping scheme.

此外,应当注意,申请人的意图是涵盖所有权利要求元素的等同物,即在获得权力的过程中进行了修改。 Further, it should be noted that the applicant's intention to cover all the elements of equivalency of the claims, i.e., in the process of obtaining the power has been modified.

Classifications
International ClassificationG06F9/46, G06F9/50, G06F15/16, G06F9/45, G06F9/30, G06F9/455, G06F9/38
Cooperative ClassificationG06F9/4881, G06F9/45537, G06F2209/483
European ClassificationG06F9/455H1, G06F9/48C4S
Legal Events
DateCodeEventDescription
21 Apr 2004C06Publication
30 Jun 2004C10Request of examination as to substance
4 Oct 2006C14Granted