US6865662B2 - Controlling VLIW instruction operations supply to functional units using switches based on condition head field - Google Patents

Controlling VLIW instruction operations supply to functional units using switches based on condition head field Download PDF

Info

Publication number
US6865662B2
US6865662B2 US10/064,713 US6471302A US6865662B2 US 6865662 B2 US6865662 B2 US 6865662B2 US 6471302 A US6471302 A US 6471302A US 6865662 B2 US6865662 B2 US 6865662B2
Authority
US
United States
Prior art keywords
conditional
vliw
condition
operations
executed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US10/064,713
Other versions
US20040030860A1 (en
Inventor
Yu-Min Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Novatek Microelectronics Corp
Original Assignee
Faraday Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faraday Technology Corp filed Critical Faraday Technology Corp
Priority to US10/064,713 priority Critical patent/US6865662B2/en
Assigned to FARADAY TECHNOLOGY GROP. reassignment FARADAY TECHNOLOGY GROP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WANG, YU-MIN
Publication of US20040030860A1 publication Critical patent/US20040030860A1/en
Application granted granted Critical
Publication of US6865662B2 publication Critical patent/US6865662B2/en
Assigned to FARADAY TECHNOLOGY CORP. reassignment FARADAY TECHNOLOGY CORP. REQUEST FOR CORRECTION OF THE ASSIGNEE'S NAME. PREVIOUSLY RECORDED ON REEL 012970 FRAME 0194. Assignors: WANG, YU-MIN
Assigned to NOVATEK MICROELECTRONICS CORP. reassignment NOVATEK MICROELECTRONICS CORP. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FARADAY TECHNOLOGY CORP.
Adjusted expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30072Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units

Definitions

  • the present invention relates to very long instruction word (VLIW) processors and more particularly to a method and apparatus for a conditional control head instruction for VLIW processors.
  • VLIW very long instruction word
  • VLIW instruction level parallelism
  • Each VLIW comprises a plurality of fields, or “slots”.
  • Each slot is designed to comprise a single, basic instruction comprising an operational code (opcode) and any associated operands.
  • the number of slots in the VLIW is typically machine architectural dependent and depends on the number of functional units (FU), such as arithmetic and logic units (ALU) and floating point unit(s) (FPU) in the machine.
  • FU functional units
  • ALU arithmetic and logic units
  • FPU floating point unit
  • Each slot corresponds to a specific ALU or FPU.
  • the ALU performs operations such as addition, subtraction, and multiplication of integers and bit-wise and other Boolean operations.
  • the FPU performs floating-point operations and due to cost, only one is generally utilized in a CPU.
  • the FUs execute the operation indicated by the opcode in the corresponding slot.
  • one VLIW is transferred from memory to a pipeline during each machine cycle.
  • instruction pipelining is well known in the art and proven extremely effective in increasing throughput when the executing program code merely comprises a series of instructions to be executed sequentially.
  • Each stage in the pipeline performs a dedicated functional step related to the execution of the instruction, such as fetching a value of an operand from memory.
  • fetching a value of an operand from memory During sequential execution, the next instruction to be executed is known and can be transferred to the pipeline one machine cycle after the current instruction was transferred. Therefore, even though each instruction may require several steps to complete, once the pipeline is full, one instruction can be completed with each machine cycle.
  • conditional instructions for example an “if” statement, where it is unknown which specific instruction is to be executed next until after the conditional instruction has been completed. Because the next instruction to be executed is unknown, it is difficult or impossible to keep the pipeline full of sequentially required instructions, resulting in each conditional instruction slowing throughput.
  • FIG. 1 of a sample program segment with a conditional “if” statement 12 written in the C programming language.
  • FIG. 2 illustrates how a conventional VLIW compiler may generate the assembly code for the program segment in FIG. 1 .
  • FIG. 2 it is not known if the flow of execution is to be lines 32 -> 34 -> 36 -> 38 -> 40 -> 48 or lines 32 -> 34 -> 42 -> 44 -> 46 - 48 until after the expression (CMPGT R 0 , 0 ) in line 32 has been evaluated. If the condition (CMPGT R 0 , 0 ) is true, one set of instructions is needed.
  • CPGT R 0 , 0 If the condition (CMPGT R 0 , 0 ) is false, a different set of instructions is needed. This ambiguity is known as a “branch delay” problem and the delay results from possibly having to flush the pipeline and wait for the transfer of the next correct instruction to the pipeline after the instruction in line 12 has been completed, a very undesirable result in high-performance processors.
  • One common method is to attempt to predict the most likely instruction to be needed following a conditional statement based on the history of execution of a particular program. For example, if “R0” has always been greater than “0” before, “R0” will be greater than “0” is predicted when line 34 is encountered during program execution. Based on this prediction, the instructions in lines 36 , 38 , 40 , and 48 are loaded into the pipeline immediately after the instruction in line 34 . If the prediction turns out correct, the branch delay problem has been circumvented.
  • a second advanced approach to the branch delay problem loads all possible sequences of instructions, each sequence corresponding to one possible result of a conditional expression such as indicated in line 34 .
  • all of the instructions in lines 36 - 48 are transferred to the pipeline.
  • Each operation in the VLIW comprises not only the opcode and related operands, but also a flag of one or more bits indicating the operation belongs to one specific possible program branch.
  • the flag with each operation in the lines 36 - 40 may be equal to a “1”, meaning branch “1”, and the flag with each operation in the lines 42 - 46 may be set to “0”.
  • VLIW very long instruction word
  • the claimed invention discloses a device and method for conditionally delivering instructions enclosed in a VLIW to corresponding functional units according to the results of a prior comparison instruction.
  • the claimed invention includes a VLIW processor for executing a sequence of very long instruction words having a plurality of operations to be executed in parallel.
  • the VLIW processor has a plurality of functional units for parallel execution of the operations specified by the VLIW, an instruction register for holding the VLIW, and a condition flag for indicating the results of a comparison operation.
  • the VLIW includes a conditional head and a plurality of slots, each slot including an operational code and any related operands.
  • the conditional head has a plurality of conditional indicators, bit flags, or bits, each conditional indicator uniquely corresponding to the operation within one of the slots and specifying a condition in which the corresponding operation is to be executed if the indicated condition exists.
  • a control circuit is connected to the instruction register and the functional units to deliver the operation from the instruction register to the corresponding functional unit for execution when the condition specified in the corresponding conditional indicator exists.
  • the condition indicating that the operation is to be executed exists when the corresponding conditional indicator and the condition flag are set to the same value.
  • condition indicating that the operation is to be executed exists when the corresponding conditional indicator and a value other than the condition flag, possibly even user defined, are set to the same value.
  • Such a system by adjusting the value to be used for comparison to an appropriate value for use in selecting the individual operations to be executed, could be used for selecting only specific portions of a program code to be executed, for example, for allowing a single program to function in different, possibly restricted modes in different situations, or for any other purpose.
  • FIG. 1 is a sample program segment written in the C programming language according to the prior art.
  • FIG. 2 illustrates how a conventional very long instruction word compiler may generate the assembly code for the program segment in FIG. 1 .
  • FIG. 3 illustrates how a very long instruction word compiler may generate the assembly code for the program segment in FIG. 1 according to the present invention.
  • FIG. 4 illustrates the hardware implementation of the present invention.
  • FIG. 1 is a sample program segment illustrating an “if” statement in the C programming language.
  • FIG. 2 illustrates how a conventional very long instruction word (VLIW) compiler may generate the assembly code for the program segment in FIG. 1 .
  • FIG. 3 illustrates how a very long instruction word compiler may generate the assembly code for the program segment in FIG. 1 according to the present invention.
  • VLIW very long instruction word
  • Lines 62 and 64 each correspond to a VLIW and comprise a plurality of fields, or slots separated by vertical double bars in the image. Each slot comprises a single operation including an operational code (opcode) and any related operands. Note that the jumps in lines 34 and 40 of FIG. 2 are not present in FIG. 3 . Because jumps nearly always result in a branch delay, their absence in the code used by the present invention clearly improves throughput.
  • opcode operational code
  • the present invention accomplishes the reduction in jumps through the use of a conditional execution head (CEX) shown in line 62 as “CEX.C.C.NC.NC”.
  • CEX conditional execution head
  • the “.C.C.NC.NC” extension of the CEX indicates conditions under which each of the four instructions included in the example VLIW of line 62 are to be executed. While four instructions are included in the example VLIW, the maximum number of instructions in one VLIW is usually hardware dependant and the present invention is not limited by the number of instructions in the VLIW.
  • the CEX instructions normally, but not necessarilly follow a comparison operation as shown in FIG. 60 where the value in R 0 is compared with zero.
  • the results of the comparison are indicated in a flag register by setting a condition flag to “1” (meaning true) or “0” (meaning false).
  • each CEX conditional indicator (also known as a bit flag), a “.C” for example, directly corresponds to one of the slots and is compared with the condition flag.
  • the conditional indicator “.C” means to execute the corresponding operation if and only if the condition flag is set to “1”.
  • An conditional indicator of “.NC” means to execute the corresponding operation if and only if the condition flag is set to “0”.
  • a CEX.C.C.NC.NC conditional execution header indicates that the first and second operations specified in the VLIW are to be executed while the third and fourth specified operations are not to be executed.
  • conditional execution head of the present invention reduces the number of short jumps and keeps the pipeline full, improving throughput. Additionally, by using the conditional execution head, there is no need to increase the size of each instruction and bloating the code as is done in a prior art.
  • FIG. 4 illustrates the basic hardware of a VLIW processor according to the present invention.
  • the VLIW processor 73 comprises an instruction register 79 for holding a VLIW.
  • the VLIW comprises a conditional execution head (CEX) 80 and a plurality of slots 82 , 84 , 86 , and 88 , each holding an operation to be executed in parallel.
  • the VLIW processor 73 further comprises a plurality of functional units (FU) 83 , 85 , 87 , and 89 for executing the operations specified in the VLIW and a plurality of switches 92 , 94 , 96 , and 98 .
  • FU functional units
  • Each switch 92 , 94 , 96 , and 98 is uniquely disposed between the location of one slot 82 , 84 , 86 , and 88 in the VLIW and the corresponding FU 83 , 85 , 87 , and 89 for permitting or not permitting delivery of the specified operation from the slot 82 , 84 , 86 , and 88 to the FU 83 , 85 , 87 , and 89 as shown in FIG. 4 .
  • Also comprised by the VLIW processor 73 are a control circuit 75 for controlling the switches 92 , 94 , 96 , and 98 and a flag register 77 comprising a condition flag 76 for holding the result of a comparison operation.
  • the switches 92 , 94 , 96 , and 98 may be multiplexers, transistors, or any other device designed to controllably allow or halt the passage of an electrical signal and all such devices are intended to fall within the scope of the present invention. Additionally, the quantities of slots 82 , 84 , 86 , and 88 , FUs 83 , 85 , 87 , and 89 , and switches 92 , 94 , 96 , and 98 are not limited by the present invention to being exactly four as disclosed in FIG. 4 , but could be of any quantity. The exact quantity in any implementation of the present invention normally will depend on the number of FUs 83 , 85 , 87 , and 89 comprised in the implementation.
  • a VLIW is loaded into the instruction register 79 for execution.
  • the control circuit 75 possibly a multiplexer or a plurality of comparators or transistors, compares the condition flag 76 in the flag register 77 with each of the conditional indicators 80 a , 80 b , 80 c , and 80 d in the CEX 80 .
  • the conditional indicators 80 a , 80 b , 80 c , and 80 d respectively correspond to the operations located in slots 82 , 84 , 86 , and 88 of the VLIW.
  • condition flag 76 and one or more of the conditional indicators 80 a , 80 b , 80 c , and 80 d are set to the same value, each operation whose conditional indicator 80 a , 80 b , 80 c , or 80 d matches the condition flag 76 is to be executed.
  • conditional indicators are “.C.C.NC.NC” which can be translated into the bits “1100” and assume that the comparison operation in line 60 has set the condition flag 76 to “1”.
  • the control circuit 75 compares the condition flag 76 (a “1”) with the conditional indicator 80 a (a “C” which equals a “1”). Since the condition flag 76 is set to the same value as the conditional indicator 80 a , the control circuit opens the corresponding switch 92 allowing delivery of the relevant operation from the slot 82 to the FU 83 for execution.
  • control circuit compares the condition flag 76 with the remaining conditional indicators 80 b , 80 c , and 80 d and controls the corresponding switches 94 , 96 , 98 based on the respective comparisons. Therefore, in this example, if the value in R 0 is greater than zero (line 60 ), the operations in slots 82 and 84 are executed and the operations in slots 86 and 88 are not executed.
  • Line 64 has an second CEX VLIW with the conditional indicators “.C.NC”. Thus, in line 64 , the operation in slot 82 is executed and the operation in slot 84 is not executed.
  • conditional indicators 80 a , 80 b , 80 c , and 80 d are not matched to the condition flag 76 to indicate the desired execution of a specific operation. Instead, a different flag, register, or value, or even a combination of these, is used for comparison with the conditional indicators 80 a , 80 b , 80 c , and 80 d .
  • a different flag, register, or value or even a combination of these, is used for comparison with the conditional indicators 80 a , 80 b , 80 c , and 80 d .
  • Such a system by adjusting the specific flag, register, or value appropriately, could be used for selecting only specific portions of a program code to be executed, for debugging, for allowing a single program to function in different, possibly restricted modes in different situations, or for any other purpose.
  • the control circuit of this embodiment would require the input of the flag, register, or value to be used for comparison with the conditional indicators 80 a , 80 b , 80 c , and
  • the present invention can reduce the number of short jumps in program execution to reduce branch delays in a VLIW processor.
  • the present invention achieves reducing the branch delays without requiring additional bits stored in each operation in a VLIW, avoiding bloating the program code. Additionally, the present invention provides for individual control over the execution of each operation. The reduction of branch delays caused by short jumps during program execution is an important factor in maximizing throughput in a high-performance VLIW processor and is facilitated by the present invention.

Abstract

A VLIW processor for executing a sequence of very long instruction words having a plurality of operations to be executed in parallel. The VLIW processor has a plurality of functional units for parallel execution of the operations specified by the VLIW, an instruction register for holding the VLIW, and a condition flag for indicating the results of a comparison operation. The VLIW includes a conditional head and a plurality of slots, each slot including an operational code and any related operands. The conditional head has a plurality of conditional indicators, each conditional indicator uniquely corresponding to one operation and specifying a condition in which the operation is to be executed if the indicated condition exists. A control circuit is connected to the instruction register and the functional units to deliver the operation from the instruction register to the corresponding functional unit for execution when the condition exists.

Description

BACKGROUND OF INVENTION
1. Field of the Invention
The present invention relates to very long instruction word (VLIW) processors and more particularly to a method and apparatus for a conditional control head instruction for VLIW processors.
2. Description of the Prior Art
Increasing demand for computer-processing performance has been met, at least in part, with computers that are able to employ instruction level parallelism (ILP), meaning that the computer can execute a plurality of instructions simultaneously. A very long instruction word (VLIW) processor achieves this using very long instruction words. VLIW processors are employed in super-computers, mainframes, and many other applications where high-performance processing power is required.
Each VLIW comprises a plurality of fields, or “slots”. Each slot is designed to comprise a single, basic instruction comprising an operational code (opcode) and any associated operands. The number of slots in the VLIW is typically machine architectural dependent and depends on the number of functional units (FU), such as arithmetic and logic units (ALU) and floating point unit(s) (FPU) in the machine. Each slot corresponds to a specific ALU or FPU. The ALU performs operations such as addition, subtraction, and multiplication of integers and bit-wise and other Boolean operations. The FPU performs floating-point operations and due to cost, only one is generally utilized in a CPU. When the VLIW is executed, the FUs execute the operation indicated by the opcode in the corresponding slot.
In a conventional VLIW processor, one VLIW is transferred from memory to a pipeline during each machine cycle. The use of instruction pipelining is well known in the art and proven extremely effective in increasing throughput when the executing program code merely comprises a series of instructions to be executed sequentially. Each stage in the pipeline performs a dedicated functional step related to the execution of the instruction, such as fetching a value of an operand from memory. During sequential execution, the next instruction to be executed is known and can be transferred to the pipeline one machine cycle after the current instruction was transferred. Therefore, even though each instruction may require several steps to complete, once the pipeline is full, one instruction can be completed with each machine cycle.
However, most program code also comprises conditional instructions, for example an “if” statement, where it is unknown which specific instruction is to be executed next until after the conditional instruction has been completed. Because the next instruction to be executed is unknown, it is difficult or impossible to keep the pipeline full of sequentially required instructions, resulting in each conditional instruction slowing throughput.
Please refer to FIG. 1 of a sample program segment with a conditional “if” statement 12 written in the C programming language. FIG. 2 illustrates how a conventional VLIW compiler may generate the assembly code for the program segment in FIG. 1. Note the use of multiple slots in lines 36 and 44 where a double vertical line indicates a slot boundary. In FIG. 2, it is not known if the flow of execution is to be lines 32->34->36->38->40->48 or lines 32->34->42->44->46-48 until after the expression (CMPGT R0, 0) in line 32 has been evaluated. If the condition (CMPGT R0, 0) is true, one set of instructions is needed. If the condition (CMPGT R0, 0) is false, a different set of instructions is needed. This ambiguity is known as a “branch delay” problem and the delay results from possibly having to flush the pipeline and wait for the transfer of the next correct instruction to the pipeline after the instruction in line 12 has been completed, a very undesirable result in high-performance processors.
Different approaches to the branch delay problem have been advanced. One common method is to attempt to predict the most likely instruction to be needed following a conditional statement based on the history of execution of a particular program. For example, if “R0” has always been greater than “0” before, “R0” will be greater than “0” is predicted when line 34 is encountered during program execution. Based on this prediction, the instructions in lines 36, 38, 40, and 48 are loaded into the pipeline immediately after the instruction in line 34. If the prediction turns out correct, the branch delay problem has been circumvented. However, if the prediction turns out to be incorrect, the pipeline must be flushed to clear unwanted instructions and the correct instructions in lines 42, 44, 46, and 48 must be transferred to the flushed pipeline and time is wasted waiting for the correct instructions to work their way through the pipeline.
A second advanced approach to the branch delay problem loads all possible sequences of instructions, each sequence corresponding to one possible result of a conditional expression such as indicated in line 34. Thus, in this method, all of the instructions in lines 36-48 are transferred to the pipeline. Each operation in the VLIW comprises not only the opcode and related operands, but also a flag of one or more bits indicating the operation belongs to one specific possible program branch. In FIG. 2 for an example of this approach, the flag with each operation in the lines 36-40 may be equal to a “1”, meaning branch “1”, and the flag with each operation in the lines 42-46 may be set to “0”. If the condition (CMPGT R0, 0) in line 32 turns out to be true, only the instructions in slots that have a flag equal to “1” will be executed. If the condition (CMPGT R0, 0) in line 32 turns out to be false, only the instructions in slots that have a flag equal to “0” will be executed. While this second approach helps to keep the pipeline full, it requires that each slot in each VLIW include extra room for the flag bits. The number of bits required depends on the number of program branches possible at any given time during program execution.
Therefore, the prior art still lacks a solution to the branch delay problem. The predictive approach works sporadically and the flag bits approach requires additional bits to be stored with each operation in each VLIW instruction, bloating program size.
SUMMARY OF INVENTION
It is therefore a primary objective of the claimed invention to reduce the number of short jumps in program execution to reduce branch delays in a very long instruction word (VLIW) processor.
It is another objective of the claimed invention to avoid increasing the number of bits required to store a specification of an operation in a slot of the VLIW while conditionally controlling execution of each operation.
It is another objective of the claimed invention to allow each instruction in a VLIW to be individually controlled.
Briefly summarized, the claimed invention discloses a device and method for conditionally delivering instructions enclosed in a VLIW to corresponding functional units according to the results of a prior comparison instruction.
The claimed invention includes a VLIW processor for executing a sequence of very long instruction words having a plurality of operations to be executed in parallel. The VLIW processor has a plurality of functional units for parallel execution of the operations specified by the VLIW, an instruction register for holding the VLIW, and a condition flag for indicating the results of a comparison operation. The VLIW includes a conditional head and a plurality of slots, each slot including an operational code and any related operands. The conditional head has a plurality of conditional indicators, bit flags, or bits, each conditional indicator uniquely corresponding to the operation within one of the slots and specifying a condition in which the corresponding operation is to be executed if the indicated condition exists.
A control circuit is connected to the instruction register and the functional units to deliver the operation from the instruction register to the corresponding functional unit for execution when the condition specified in the corresponding conditional indicator exists. In one example of the claimed invention, the condition indicating that the operation is to be executed exists when the corresponding conditional indicator and the condition flag are set to the same value.
Another example of the claimed invention discloses where the condition indicating that the operation is to be executed exists when the corresponding conditional indicator and a value other than the condition flag, possibly even user defined, are set to the same value. Such a system, by adjusting the value to be used for comparison to an appropriate value for use in selecting the individual operations to be executed, could be used for selecting only specific portions of a program code to be executed, for example, for allowing a single program to function in different, possibly restricted modes in different situations, or for any other purpose.
It is an advantage of the claimed invention that by allowing individually controlled conditional execution of the operations specified in the VLIW, the number of short jumps during program execution can be reduced, improving throughput.
These and other objectives of the claimed invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment, which is illustrated in the various figures and drawings.
BRIEF DESCRIPTION OF DRAWINGS
FIG. 1 is a sample program segment written in the C programming language according to the prior art.
FIG. 2 illustrates how a conventional very long instruction word compiler may generate the assembly code for the program segment in FIG. 1.
FIG. 3 illustrates how a very long instruction word compiler may generate the assembly code for the program segment in FIG. 1 according to the present invention.
FIG. 4 illustrates the hardware implementation of the present invention.
DETAILED DESCRIPTION
Please refer to FIGS. 1-3. FIG. 1 is a sample program segment illustrating an “if” statement in the C programming language. FIG. 2 illustrates how a conventional very long instruction word (VLIW) compiler may generate the assembly code for the program segment in FIG. 1. FIG. 3 illustrates how a very long instruction word compiler may generate the assembly code for the program segment in FIG. 1 according to the present invention.
Lines 62 and 64 each correspond to a VLIW and comprise a plurality of fields, or slots separated by vertical double bars in the image. Each slot comprises a single operation including an operational code (opcode) and any related operands. Note that the jumps in lines 34 and 40 of FIG. 2 are not present in FIG. 3. Because jumps nearly always result in a branch delay, their absence in the code used by the present invention clearly improves throughput.
The present invention accomplishes the reduction in jumps through the use of a conditional execution head (CEX) shown in line 62 as “CEX.C.C.NC.NC”. The “.C.C.NC.NC” extension of the CEX indicates conditions under which each of the four instructions included in the example VLIW of line 62 are to be executed. While four instructions are included in the example VLIW, the maximum number of instructions in one VLIW is usually hardware dependant and the present invention is not limited by the number of instructions in the VLIW.
The CEX instructions normally, but not necessarilly follow a comparison operation as shown in FIG. 60 where the value in R0 is compared with zero. As in prior art VLIW processors, when a comparison operation is performed, the results of the comparison are indicated in a flag register by setting a condition flag to “1” (meaning true) or “0” (meaning false).
When the VLIW shown in line 62 is loaded into an instruction register, each CEX conditional indicator (also known as a bit flag), a “.C” for example, directly corresponds to one of the slots and is compared with the condition flag. The conditional indicator “.C” means to execute the corresponding operation if and only if the condition flag is set to “1”. An conditional indicator of “.NC” means to execute the corresponding operation if and only if the condition flag is set to “0”. Thus, if the result of the comparison in line 60 is true and the condition flag is set to “1”, a CEX.C.C.NC.NC conditional execution header indicates that the first and second operations specified in the VLIW are to be executed while the third and fourth specified operations are not to be executed. Consequently, using the conditional execution head of the present invention reduces the number of short jumps and keeps the pipeline full, improving throughput. Additionally, by using the conditional execution head, there is no need to increase the size of each instruction and bloating the code as is done in a prior art.
FIG. 4 illustrates the basic hardware of a VLIW processor according to the present invention. The VLIW processor 73 comprises an instruction register 79 for holding a VLIW. The VLIW comprises a conditional execution head (CEX) 80 and a plurality of slots 82, 84, 86, and 88, each holding an operation to be executed in parallel. The VLIW processor 73 further comprises a plurality of functional units (FU) 83, 85, 87, and 89 for executing the operations specified in the VLIW and a plurality of switches 92, 94, 96, and 98. Each switch 92, 94, 96, and 98 is uniquely disposed between the location of one slot 82, 84, 86, and 88 in the VLIW and the corresponding FU 83, 85, 87, and 89 for permitting or not permitting delivery of the specified operation from the slot 82, 84, 86, and 88 to the FU 83, 85, 87, and 89 as shown in FIG. 4. Also comprised by the VLIW processor 73 are a control circuit 75 for controlling the switches 92, 94, 96, and 98 and a flag register 77 comprising a condition flag 76 for holding the result of a comparison operation.
The switches 92, 94, 96, and 98 may be multiplexers, transistors, or any other device designed to controllably allow or halt the passage of an electrical signal and all such devices are intended to fall within the scope of the present invention. Additionally, the quantities of slots 82, 84, 86, and 88, FUs 83, 85, 87, and 89, and switches 92, 94, 96, and 98 are not limited by the present invention to being exactly four as disclosed in FIG. 4, but could be of any quantity. The exact quantity in any implementation of the present invention normally will depend on the number of FUs 83, 85, 87, and 89 comprised in the implementation.
A VLIW is loaded into the instruction register 79 for execution. The control circuit 75, possibly a multiplexer or a plurality of comparators or transistors, compares the condition flag 76 in the flag register 77 with each of the conditional indicators 80 a, 80 b, 80 c, and 80 d in the CEX 80. In this example, the conditional indicators 80 a, 80 b, 80 c, and 80 d respectively correspond to the operations located in slots 82, 84, 86, and 88 of the VLIW. If the condition flag 76 and one or more of the conditional indicators 80 a, 80 b, 80 c, and 80 d are set to the same value, each operation whose conditional indicator 80 a, 80 b, 80 c, or 80 d matches the condition flag 76 is to be executed.
Please refer to the CEX VLIW in line 62 of FIG. 3. Here, the conditional indicators are “.C.C.NC.NC” which can be translated into the bits “1100” and assume that the comparison operation in line 60 has set the condition flag 76 to “1”. The control circuit 75 compares the condition flag 76 (a “1”) with the conditional indicator 80 a (a “C” which equals a “1”). Since the condition flag 76 is set to the same value as the conditional indicator 80 a, the control circuit opens the corresponding switch 92 allowing delivery of the relevant operation from the slot 82 to the FU 83 for execution. Similarly, the control circuit compares the condition flag 76 with the remaining conditional indicators 80 b, 80 c, and 80 d and controls the corresponding switches 94, 96, 98 based on the respective comparisons. Therefore, in this example, if the value in R0 is greater than zero (line 60), the operations in slots 82 and 84 are executed and the operations in slots 86 and 88 are not executed. Line 64 has an second CEX VLIW with the conditional indicators “.C.NC”. Thus, in line 64, the operation in slot 82 is executed and the operation in slot 84 is not executed.
Another embodiment of the present invention functions similarly to the above description except that the conditional indicators 80 a, 80 b, 80 c, and 80 d are not matched to the condition flag 76 to indicate the desired execution of a specific operation. Instead, a different flag, register, or value, or even a combination of these, is used for comparison with the conditional indicators 80 a, 80 b, 80 c, and 80 d. Such a system, by adjusting the specific flag, register, or value appropriately, could be used for selecting only specific portions of a program code to be executed, for debugging, for allowing a single program to function in different, possibly restricted modes in different situations, or for any other purpose. Obviously, the control circuit of this embodiment would require the input of the flag, register, or value to be used for comparison with the conditional indicators 80 a, 80 b, 80 c, and 80 d.
In contrast to the prior art, the present invention can reduce the number of short jumps in program execution to reduce branch delays in a VLIW processor. The present invention achieves reducing the branch delays without requiring additional bits stored in each operation in a VLIW, avoiding bloating the program code. Additionally, the present invention provides for individual control over the execution of each operation. The reduction of branch delays caused by short jumps during program execution is an important factor in maximizing throughput in a high-performance VLIW processor and is facilitated by the present invention.
Those skilled in the art will readily observe that numerous modifications and alterations of the method and device may be made while retaining the teachings of the invention. For example, it is obvious that it does not have to be the condition flag that effectively selects the operations to be executed but could be any kind of indicator, a variable for example. This useful feature of the present invention is also lacking in the prior art and clearly further distinguishes itself from the prior art. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.

Claims (13)

1. A very long instruction word (VLIW) processor for executing a very long instruction word, the VLIW having a conditional execution head (CEX) and a plurality of operations, the VLIW processor comprising:
a plurality of functional units for executing operations in parallel;
an instruction register having a plurality of slots for holding the conditional execution head and the operations, the conditional execution head comprising a plurality of conditional indicators each specifying a condition under which a corresponding operation comprised in the VLIW is to be executed;
a plurality of switches each connected to a slot and a functional unit for controlling delivery of an operation from the slot to the functional unit for execution; and
a control circuit connected to the instruction register and each switch, the control circuit being capable of opening or closing the switch according to the condition specified in the corresponding conditional indicator;
wherein functional units which are connected to opened switches will receive operations from slots connected to the switches, functional units which are connected to closed switches will be prohibited from receiving operations, and operations which are received by functional units will be executed in parallel.
2. The VLIW processor of claim 1 wherein each conditional indicator uniquely corresponds to only one operation comprised in the VLIW and each operation uniquely corresponds to only one conditional indicator comprised in the conditional execution head.
3. The VLIW processor of claim 1 wherein the operation comprises an operational code specifying the operation to be executed by the functional unit.
4. The VLIW processor of claim 1 further comprises a condition flag indicating the results of a comparison operation.
5. The VLIW processor of claim 4 wherein the conditional indicator is a single bit.
6. The VLIW processor of claim 5 wherein the condition to execute the operation is met when the conditional indicator and the condition flag are set to the same value.
7. A very long instruction word (VLIW) processor having a plurality of functional units for executing a very long instruction word, the VLIW comprising a conditional execution head (CEX) and a plurality of operations, the maximum number of operations to be executed being equal to the number of functional units, the VLIW processor comprising:
an instruction register having a plurality of slots for holding the conditional execution head and the operations, the conditional execution head comprising a plurality of conditional indicators, the number of conditional indicators being equal to the number of operations comprised in the VLIW and each conditional indicator uniquely corresponding to only one operation, each conditional indicator indicating a condition under which the corresponding operation is to be executed;
a plurality of switches for controlling the delivery of the operation from the instruction register to the functional unit, each switch connecting only one slot and only one functional unit in a one-to-one correspondence; and
a control circuit connected to the instruction register and the switch, the control circuit capable of opening the switch when the condition indicating the corresponding operation is to be executed is met and for closing the switch when the condition indicating the corresponding operation is to be executed is not met.
8. The VLIW processor of claim 7 wherein the operation comprises an operational code specifying the operation to be executed by the functional unit.
9. The VLIW processor of claim 7 further comprises a condition flag indicating the results of a comparison operation.
10. The VLIW processor of claim 9 wherein the condition to execute the operation is met when the conditional indicator and the condition flag are set to the same value.
11. A method of program execution by a very long instruction word (VLIW) processor, the VLIW comprising a conditional execution head (CEX) and a plurality of operations, the conditional head comprising a plurality of conditional indicators, each conditional indicator corresponding to a predetermined operation and for specifying a condition when the operation is to be executed, the VLIW processor comprising an instruction register having a plurality of slots for holding the conditional execution head and the operations, a plurality of functional units for executing operations in parallel, and a plurality of switches, each switch in a one-to-one correspondence with and connected between only one slot and only one functional unit, the switch for controlling delivery of the operation to the functional unit, the method comprising:
opening the switch when the condition specified by the conditional indicator is met and closing the switch when the condition specified by the conditional indicator is not met.
12. The method of claim 11 wherein the VLIW processor further comprises a condition flag indicating the results of a comparison operation.
13. The method of claim 12 wherein the condition to execute the operation is met when the conditional indicator and the condition flag are set to the same value.
US10/064,713 2002-08-08 2002-08-08 Controlling VLIW instruction operations supply to functional units using switches based on condition head field Expired - Fee Related US6865662B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/064,713 US6865662B2 (en) 2002-08-08 2002-08-08 Controlling VLIW instruction operations supply to functional units using switches based on condition head field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/064,713 US6865662B2 (en) 2002-08-08 2002-08-08 Controlling VLIW instruction operations supply to functional units using switches based on condition head field

Publications (2)

Publication Number Publication Date
US20040030860A1 US20040030860A1 (en) 2004-02-12
US6865662B2 true US6865662B2 (en) 2005-03-08

Family

ID=31493946

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/064,713 Expired - Fee Related US6865662B2 (en) 2002-08-08 2002-08-08 Controlling VLIW instruction operations supply to functional units using switches based on condition head field

Country Status (1)

Country Link
US (1) US6865662B2 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040210748A1 (en) * 2003-04-15 2004-10-21 Sunplus Technology Co., Ltd. Processor and method capable of executing conditional instructions
US20060031662A1 (en) * 2002-09-27 2006-02-09 Lsi Logic Corporation Processor implementing conditional execution and including a serial queue
US7275149B1 (en) * 2003-03-25 2007-09-25 Verisilicon Holdings (Cayman Islands) Co. Ltd. System and method for evaluating and efficiently executing conditional instructions
US20140052960A1 (en) * 2009-02-18 2014-02-20 Samsung Electronics Co., Ltd. Apparatus and method for generating vliw, and processor and method for processing vliw
US8769245B2 (en) 2010-12-09 2014-07-01 Industrial Technology Research Institute Very long instruction word (VLIW) processor with power management, and apparatus and method of power management therefor
US20140359255A1 (en) * 2003-08-28 2014-12-04 Pact Xpp Technologies Ag Coarse-Grained Data Processor Having Both Global and Direct Interconnects
US9141390B2 (en) 2001-03-05 2015-09-22 Pact Xpp Technologies Ag Method of processing data with an array of data processors according to application ID
US9170812B2 (en) 2002-03-21 2015-10-27 Pact Xpp Technologies Ag Data processing system having integrated pipelined array data processor
US9250908B2 (en) 2001-03-05 2016-02-02 Pact Xpp Technologies Ag Multi-processor bus and cache interconnection system
US9256575B2 (en) 2000-10-06 2016-02-09 Pact Xpp Technologies Ag Data processor chip with flexible bus system
US9274984B2 (en) 2002-09-06 2016-03-01 Pact Xpp Technologies Ag Multi-processor with selectively interconnected memory units
US9411532B2 (en) 2001-09-07 2016-08-09 Pact Xpp Technologies Ag Methods and systems for transferring data between a processing device and external devices
US9436631B2 (en) 2001-03-05 2016-09-06 Pact Xpp Technologies Ag Chip including memory element storing higher level memory data on a page by page basis
US9552047B2 (en) 2001-03-05 2017-01-24 Pact Xpp Technologies Ag Multiprocessor having runtime adjustable clock and clock dependent power supply
US9690747B2 (en) 1999-06-10 2017-06-27 PACT XPP Technologies, AG Configurable logic integrated circuit having a multidimensional structure of configurable elements
CN108139911A (en) * 2015-10-22 2018-06-08 德州仪器公司 In the same execution packet of vliw processor specification is performed using having ready conditions for the instruction for expansion slot of having ready conditions
US10031733B2 (en) 2001-06-20 2018-07-24 Scientia Sol Mentis Ag Method for processing data

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7303120B2 (en) * 2001-07-10 2007-12-04 American Express Travel Related Services Company, Inc. System for biometric security using a FOB
JP4283131B2 (en) * 2004-02-12 2009-06-24 パナソニック株式会社 Processor and compiling method
FR3021432B1 (en) * 2014-05-20 2017-11-10 Bull Sas PROCESSOR WITH CONDITIONAL INSTRUCTIONS
WO2019090032A1 (en) * 2017-11-03 2019-05-09 Coherent Logix, Inc. Memory network processor

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5941984A (en) * 1997-01-31 1999-08-24 Mitsubishi Denki Kabushiki Kaisha Data processing device
US5996070A (en) * 1996-07-30 1999-11-30 Mitsubishi Denki Kabushiki Kaisha Microprocessor capable of executing condition execution instructions using encoded condition execution field in the instructions
US6154828A (en) * 1993-06-03 2000-11-28 Compaq Computer Corporation Method and apparatus for employing a cycle bit parallel executing instructions
US6366999B1 (en) * 1998-01-28 2002-04-02 Bops, Inc. Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
US6418527B1 (en) * 1998-10-13 2002-07-09 Motorola, Inc. Data processor instruction system for grouping instructions with or without a common prefix and data processing system that uses two or more instruction grouping methods
US6516407B1 (en) * 1998-12-28 2003-02-04 Fujitsu Limited Information processor
US20030056088A1 (en) * 2001-09-20 2003-03-20 Matsushita Electric Industrial Co., Ltd. Processor, compiler and compilation method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154828A (en) * 1993-06-03 2000-11-28 Compaq Computer Corporation Method and apparatus for employing a cycle bit parallel executing instructions
US5996070A (en) * 1996-07-30 1999-11-30 Mitsubishi Denki Kabushiki Kaisha Microprocessor capable of executing condition execution instructions using encoded condition execution field in the instructions
US5941984A (en) * 1997-01-31 1999-08-24 Mitsubishi Denki Kabushiki Kaisha Data processing device
US6366999B1 (en) * 1998-01-28 2002-04-02 Bops, Inc. Methods and apparatus to support conditional execution in a VLIW-based array processor with subword execution
US6418527B1 (en) * 1998-10-13 2002-07-09 Motorola, Inc. Data processor instruction system for grouping instructions with or without a common prefix and data processing system that uses two or more instruction grouping methods
US6516407B1 (en) * 1998-12-28 2003-02-04 Fujitsu Limited Information processor
US20030056088A1 (en) * 2001-09-20 2003-03-20 Matsushita Electric Industrial Co., Ltd. Processor, compiler and compilation method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9690747B2 (en) 1999-06-10 2017-06-27 PACT XPP Technologies, AG Configurable logic integrated circuit having a multidimensional structure of configurable elements
US9256575B2 (en) 2000-10-06 2016-02-09 Pact Xpp Technologies Ag Data processor chip with flexible bus system
US9552047B2 (en) 2001-03-05 2017-01-24 Pact Xpp Technologies Ag Multiprocessor having runtime adjustable clock and clock dependent power supply
US9436631B2 (en) 2001-03-05 2016-09-06 Pact Xpp Technologies Ag Chip including memory element storing higher level memory data on a page by page basis
US9141390B2 (en) 2001-03-05 2015-09-22 Pact Xpp Technologies Ag Method of processing data with an array of data processors according to application ID
US9250908B2 (en) 2001-03-05 2016-02-02 Pact Xpp Technologies Ag Multi-processor bus and cache interconnection system
US10031733B2 (en) 2001-06-20 2018-07-24 Scientia Sol Mentis Ag Method for processing data
US9411532B2 (en) 2001-09-07 2016-08-09 Pact Xpp Technologies Ag Methods and systems for transferring data between a processing device and external devices
US10579584B2 (en) 2002-03-21 2020-03-03 Pact Xpp Schweiz Ag Integrated data processing core and array data processor and method for processing algorithms
US9170812B2 (en) 2002-03-21 2015-10-27 Pact Xpp Technologies Ag Data processing system having integrated pipelined array data processor
US9274984B2 (en) 2002-09-06 2016-03-01 Pact Xpp Technologies Ag Multi-processor with selectively interconnected memory units
US10296488B2 (en) 2002-09-06 2019-05-21 Pact Xpp Schweiz Ag Multi-processor with selectively interconnected memory units
US20060031662A1 (en) * 2002-09-27 2006-02-09 Lsi Logic Corporation Processor implementing conditional execution and including a serial queue
US7275149B1 (en) * 2003-03-25 2007-09-25 Verisilicon Holdings (Cayman Islands) Co. Ltd. System and method for evaluating and efficiently executing conditional instructions
US20040210748A1 (en) * 2003-04-15 2004-10-21 Sunplus Technology Co., Ltd. Processor and method capable of executing conditional instructions
US20140359255A1 (en) * 2003-08-28 2014-12-04 Pact Xpp Technologies Ag Coarse-Grained Data Processor Having Both Global and Direct Interconnects
US9342480B2 (en) * 2009-02-18 2016-05-17 Samsung Electronics Co., Ltd. Apparatus and method for generating VLIW, and processor and method for processing VLIW
US20140052960A1 (en) * 2009-02-18 2014-02-20 Samsung Electronics Co., Ltd. Apparatus and method for generating vliw, and processor and method for processing vliw
US8769245B2 (en) 2010-12-09 2014-07-01 Industrial Technology Research Institute Very long instruction word (VLIW) processor with power management, and apparatus and method of power management therefor
CN108139911A (en) * 2015-10-22 2018-06-08 德州仪器公司 In the same execution packet of vliw processor specification is performed using having ready conditions for the instruction for expansion slot of having ready conditions
EP3365770A4 (en) * 2015-10-22 2019-05-22 Texas Instruments Incorporated Conditional execution specification of instructions using conditional extension slots in the same execute packet in a vliw processor
US11397583B2 (en) 2015-10-22 2022-07-26 Texas Instruments Incorporated Conditional execution specification of instructions using conditional extension slots in the same execute packet in a VLIW processor
CN108139911B (en) * 2015-10-22 2022-08-09 德州仪器公司 Conditional execution specification of instructions using conditional expansion slots in the same execution packet of a VLIW processor

Also Published As

Publication number Publication date
US20040030860A1 (en) 2004-02-12

Similar Documents

Publication Publication Date Title
US6865662B2 (en) Controlling VLIW instruction operations supply to functional units using switches based on condition head field
US6367004B1 (en) Method and apparatus for predicting a predicate based on historical information and the least significant bits of operands to be compared
KR100571322B1 (en) Exception handling methods, devices, and systems in pipelined processors
EP0219203B1 (en) Computer control providing single-cycle branching
EP0399762B1 (en) Multiple instruction issue computer architecture
US6647489B1 (en) Compare branch instruction pairing within a single integer pipeline
US6430674B1 (en) Processor executing plural instruction sets (ISA's) with ability to have plural ISA's in different pipeline stages at same time
EP0399757B1 (en) Paired instruction processor precise exception handling mechanism
AU631874B2 (en) Paired instruction processor branch recovery mechanism
US5958044A (en) Multicycle NOP
EP0605872B1 (en) Method and system for supporting speculative execution of instructions
US7299343B2 (en) System and method for cooperative execution of multiple branching instructions in a processor
EP0533337A1 (en) Apparatus and method for resolving dependencies among a plurality of instructions within a storage device
KR100423910B1 (en) Method and apparatus for executing coprocessor instructions
US20220035635A1 (en) Processor with multiple execution pipelines
KR100305487B1 (en) Method and system in a data processing system of permitting concurrent processing of instructions of a particular type
US6044460A (en) System and method for PC-relative address generation in a microprocessor with a pipeline architecture
US20050144427A1 (en) Processor including branch prediction mechanism for far jump and far call instructions
US7472264B2 (en) Predicting a jump target based on a program counter and state information for a process
US6308262B1 (en) System and method for efficient processing of instructions using control unit to select operations
EP0211487A1 (en) Conditional operations in computers
KR100515039B1 (en) Pipeline status indicating circuit for conditional instruction
GB2389211A (en) A method and apparatus for improved predicate prediction
US20050228970A1 (en) Processing unit with cross-coupled alus/accumulators and input data feedback structure including constant generator and bypass to reduce memory contention
KR19990084909A (en) Superscalar Pipeline Structure

Legal Events

Date Code Title Description
AS Assignment

Owner name: FARADAY TECHNOLOGY GROP., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:WANG, YU-MIN;REEL/FRAME:012970/0194

Effective date: 20020806

AS Assignment

Owner name: FARADAY TECHNOLOGY CORP., TAIWAN

Free format text: REQUEST FOR CORRECTION OF THE ASSIGNEE'S NAME. PREVIOUSLY RECORDED ON REEL 012970 FRAME 0194.;ASSIGNOR:WANG, YU-MIN;REEL/FRAME:015929/0948

Effective date: 20020806

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20130308

AS Assignment

Owner name: NOVATEK MICROELECTRONICS CORP., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FARADAY TECHNOLOGY CORP.;REEL/FRAME:041174/0755

Effective date: 20170117