WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Hardware predication for conditional instruction path branching    
United States Patent6754812   
Link to this pagehttp://www.wikipatents.com/6754812.html
Inventor(s)Abdallah; Mohammad A. (Folsom, CA); Al-Dajani; Khalid D. (Orangevale, CA)
AbstractAn instruction associated with a condition is executed when the condition is resolved. In executing the instruction, a first operation designated by the instruction is performed to produce a first result, and a second operation is performed to produce a second result. The first result or the second result is output based on how the condition is resolved.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6754812
Hardware predication for conditional instruction path branching - US Patent 6754812 Drawing
Hardware predication for conditional instruction path branching
Inventor     Abdallah; Mohammad A. (Folsom, CA); Al-Dajani; Khalid D. (Orangevale, CA)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     June 22, 2004
Application Number     09/610,895
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     July 6, 2000
US Classification     712/234
Int'l Classification     G06F 009/38
Examiner     Treat; William M.
Assistant Examiner    
Attorney/Law Firm     Blakely, Sokoloff, Taylor & Zafman LLP
Address
Parent Case    
Priority Data    
USPTO Field of Search     712/234
Patent Tags     hardware predication conditional instruction path branching
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6170052
Morrison
712/236
Jan,2001

[0 after 0 votes]
6076153
Grochowski

Jun,2000

[0 after 0 votes]
6047370
Grochowski

Apr,2000

[0 after 0 votes]
6044221
Gupta

Mar,2000

[0 after 0 votes]
6021487
Maliszewski
712/221
Feb,2000

[0 after 0 votes]
5999736
Gupta
717/158
Dec,1999

[0 after 0 votes]
5909573
Sheaffer
712/240
Jun,1999

[0 after 0 votes]
5832260
Arora

Nov,1998

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method comprising:

executing an identified conditional instruction having an associated first instruction path and a second instruction path, the executing comprising:

(i) performing both a first operation from the first instruction path to produce a first result and a second operation from the second instruction path to produce a second result, and

(ii) outputting the first result or the second result based on how a condition associated with the conditional instruction is resolved.

2. The method of claim 1, wherein the performing comprises performing the first operation using at least one operand for the instruction to produce the first result and performing the second operation to produce an operand for the instruction as the second result.

3. The method of claim 1, wherein the instruction is a conditional micro-operation and wherein the method comprises decoding a macro-instruction in a branch instruction path of a program to produce the conditional micro-operation.

4. The method of claim 1, wherein prior to executing the identified conditional instruction, the method comprises:

(a) identifying a conditional branch instruction in a program as the identified conditional instruction;

(b) associating a condition for the identified conditional branch instruction with one or more instructions in a first branch instruction path and a second branch instruction path for the conditional branch instruction to produce one or more conditional instructions.

5. The method of claim 4, wherein the identifying comprises identifying a conditional forward branch instruction as the identified conditional instruction.

6. The method of claim 4, wherein the identifying comprises identifying a conditional branch instruction having a fall through branch instruction path having less than or equal to a predetermined number of instructions and/or a conditional branch instruction having a target branch instruction path having less than or equal to a predetermined number of instructions.

7. The method of claim 4, wherein the identifying comprises identifying a conditional branch instruction that is not predictable within a predetermined degree of accuracy.

8. The method of claim 4, wherein each conditional instruction is a conditional micro-operation and wherein the method comprises decoding one or more macro-instructions in the branch instruction path to produce one or more conditional micro-operations.

9. A processor comprising:

an operation unit to perform an operation designated by a conditional instruction on one or more operands for the conditional instruction to produce a first result;

a bypass bus coupled to an input of the operation unit to receive an operand for the conditional instruction and to present the operand as a second result; and

a multiplexer to selectively output either the first result or the second result based on how a condition associated with the conditional instruction is resolved.

10. The processor of claim 9, comprising a conditional branch processing unit to associate a condition for a conditional branch instruction with one or more instructions in one or more branch instruction paths for the conditional branch instruction to produce one or more conditional instructions.

11. The processor of claim 9, wherein the conditional instruction is a conditional micro-operation, and wherein the processor comprises a decoder to decode macro-instructions of a program into micro-operations.

12. A processor comprising:

(a) a dispatch/execute unit to dispatch and execute instructions for a program out of order, the dispatch/execute unit comprising an execution unit to perform both a first operation from a first instruction path associated with a conditional instruction to produce a first result and a second operation from a second instruction path associated with the conditional instruction to produce a second result and to output the first result or the second result based on how a condition associated with the conditional instruction is resolved;

(b) a reorder buffer to store results of executed instructions; and

(c) a retire unit to retire results of executed instructions.

13. The processor of claim 12, comprising a conditional branch processing unit to associate a condition for a conditional branch instruction with one or more instructions in one or more branch instruction paths for the conditional branch instruction to produce one or more conditional instructions.

14. The processor of claim 12, comprising a fetch/decode unit to fetch macro-instructions of the program and to decode the fetched macro-instructions into micro-operations, wherein the dispatch/execute unit dispatches and executes micro-operations.

15. The processor of claim 12, wherein the execution unit comprises:

(i) an operation unit to perform the operation designated by the conditional instruction on one or more operands for the conditional instruction to produce the first result,

(ii) a bypass bus coupled to an input of the operation unit to receive an operand for the conditional instruction and to present the operand as a second result, and

(iii) a multiplexer to selectively output either the first result or the second result based on how the condition associated with the conditional instruction is resolved.

16. A computer system comprising:

(a) a memory to store instructions of a program; and

(b) a processor to perform both a first operation from a first instruction path associated with a conditional instruction to produce a first result and a second operation from a second instruction path associated with the conditional instruction to produce a second result and to output the first result or the second result based on how a condition associated with the conditional instruction is resolved.

17. The computer system of claim 16, wherein the processor comprises:

(i) a dispatch/execute unit to dispatch and execute instructions for the program out of order, the dispatch/execute unit comprising an execution unit to perform the first operation and the second operation and to output the first result or the second result,

(ii) a reorder buffer to store results of executed instructions, and

(iii) a retire unit to retire results of executed instructions.

18. The computer system of claim 16, wherein the processor comprises a conditional branch processing unit to associate a condition for a conditional branch instruction with one or more instructions in one or more branch instruction paths for the conditional branch instruction to produce one or more conditional instructions.

19. The computer system of claim 16, wherein the processor comprises a fetch/decode unit to fetch macro-instructions of the program and to decode the fetched macro-instructions into micro-operations.

20. The computer system of claim 16, wherein the processor comprises:

(i) an operation unit to perform the operation designated by the conditional instruction on one or more operands for the conditional instruction to produce the first result,

(ii) a bypass bus coupled to an input of the operation unit to receive an operand for the conditional instruction and present the operand as a second result, and

(iii) a multiplexer to selectively output either the first result or the second result based on how the condition associated with the conditional instruction is resolved.

21. The computer system of claim 16, comprising:

a memory controller hub to provide an interface to the memory; and

an input/output controller hub coupled to the memory controller hub.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to the field of computer systems. More particularly, the present invention relates to the field of processing instructions having conditional program execution flow control.

2. Description of Related Art

Typical processors execute instructions out of order to help improve instruction throughput. Such processors typically process instructions through a pipeline that fetches instructions from memory, decodes each instruction, executes the instruction, and retires the instruction. The operation of each stage of the pipeline typically overlaps those of the other stages in time to help process instructions faster.

By identifying instructions that may be executed regardless of whether one or more prior fetched instructions are executed, typical processors may benefit from executing instructions in parallel, that is overlapping the execution of two or more instructions in time, and/or from executing instructions out of order to avoid stalling on any one instruction, for example, while waiting for the completed execution of an instruction upon which the stalled instruction depends. Instructions executed out of order are retired by the pipeline in order.

The pipeline generally fetches instructions of a program in a sequential order as defined by the program until the program alters its sequential flow with a jump or branch instruction, for example.

An unconditional branch instruction, for example, identifies a non-sequential target instruction that is to follow the unconditional branch instruction. The pipeline identifies the target instruction and then continues fetching instructions of the program starting with the target instruction. Before identifying the target instruction, however, the pipeline may have already fetched and started processing one or more instructions sequentially following the unconditional branch instruction as defined by the program. The alteration in the sequential flow of the program therefore penalizes the execution of the program as the pipeline is to flush such instruction(s) and restart fetching and processing at the target instruction. By identifying the target instruction early in the pipeline, such as in the fetch or decode stage for example, the pipeline helps avoid or minimize this penalty.

A conditional branch instruction identifies a non-sequential target instruction that is to follow the conditional branch instruction if a condition identified by the conditional branch instruction is satisfied. If the condition is not satisfied, the fall through instruction, that is the instruction sequentially following the conditional branch instruction as defined by the program, is to follow the conditional branch instruction. Because resolution of the condition relies on the execution of one or more other instructions, the condition may not be resolved when the conditional branch instruction is fetched. As the pipeline then cannot determine which instruction is to follow the conditional branch instruction, the pipeline typically predicts whether the target instruction or the fall through instruction will follow at the risk of penalizing the execution of the program if the pipeline later determines the wrong instruction was selected. If, for example, the pipeline selects the target instruction and the condition is not satisfied, execution of the program is penalized as the pipeline flushes the target instruction and any fetched instructions following the target instruction when the conditional branch instruction is retired and restarts fetching and processing at the fall through instruction.

The pipeline may try to predict how the condition will be resolved, for example, based on prior executions of the same conditional branch instruction in the program. Typical pipelines, however, cannot accurately predict how every conditional branch instruction will be resolved every time and will therefore incur execution penalties due to branch mispredictions.

Software predicated instructions, such as a conditional move instruction for example, may be used to eliminate or reduce branch instructions and therefore avoid or minimize execution penalties associated with branch mispredictions. Software predication, however, requires compiler help to substitute code in eliminating branch instructions and an instruction set architecture that provides for the software predicated instructions.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 illustrates an exemplary computer system comprising a processor having an instruction pipeline with hardware predication for conditional instruction path branching;

FIG. 2 illustrates, for one embodiment, a processor having an instruction pipeline with hardware predication for conditional instruction path branching;

FIG. 3 illustrates, for one embodiment, a flow diagram for processing instructions with hardware predication for conditional instruction path branching;

FIG. 4 illustrates, for one embodiment, a fetch/decode unit for the instruction pipeline of FIG. 3;

FIG. 5 illustrates, for one embodiment, a dispatch/execute unit for the instruction pipeline of FIG. 3;

FIG. 6 illustrates, for one embodiment, a flow diagram for dispatching and executing conditional micro-operations;

FIG. 7 illustrates, for one embodiment, conditional execution circuitry with destination bypassing;

FIG. 8 illustrates, for another embodiment, a flow diagram for dispatching and executing conditional micro-operations;

FIG. 9 illustrates, for one embodiment, a flow diagram for dispatching and executing micro-operations dependent on conditional micro-operations; and

FIG. 10 illustrates, for one embodiment, dual execution circuitry with destination bypassing.

DETAILED DESCRIPTION

The following detailed description sets forth an embodiment or embodiments in accordance with the present invention for hardware predication for conditional instruction path branching. In the following description, details are set forth such as specific processor architecture, instruction processing techniques, etc., in order to provide a thorough understanding of the present invention. It will be evident, however, that the present invention may be practiced without these details. In other instances, well-known function blocks, interfaces, etc., have not been described in particular detail so as not to obscure the present invention.

Exemplary Computer System

FIG. 1 illustrates an exemplary computer system 100 comprising a processor 102 having an instruction pipeline 200 with hardware predication for conditional instruction path branching in accordance with the present invention. Although described in the context of computer system 100, the present invention may be implemented in any suitable computer system comprising any suitable one or more integrated circuits.

As illustrated in FIG. 1, computer system 100 comprises another processor 104 that may also have an instruction pipeline with hardware predication for conditional instruction path branching, a processor bus 110, and a chipset 120. Processors 102 and 104 and chipset 120 are coupled to processor bus 110. Processors 102 and 104 may each comprise any suitable processor architecture and for one embodiment comprise an Intel.RTM. Architecture used, for example, in the Pentium.RTM. family of processors available from Intel.RTM. Corporation of Santa Clara, Calif. Computer system 100 for other embodiments may comprise one, three, or more processors any of which may comprise an instruction pipeline with hardware predication for conditional instruction path branching.

Chipset 120 for one embodiment comprises a memory controller hub (MCH) 130, an input/output (I/O) controller hub (ICH) 140, and a firmware hub (FWH) 170. MCH 130, ICH 140, and FWH 170 may each comprise any suitable circuitry and for one embodiment are each formed as a separate integrated circuit chip. Chipset 120 for other embodiments may comprise any suitable one or more integrated circuit devices.

MCH 130 may comprise any suitable interface controllers to provide for any suitable communication link to processor bus 110 and/or to any suitable device or component in communication with MCH 130. MCH 130 for one embodiment provides suitable arbitration, buffering, and coherency management for each interface.

MCH 130 is coupled to processor bus 110 and provides an interface to processors 102 and 104 over processor bus 110. Processor 102 and/or processor 104 may alternatively be combined with MCH 130 to form a single chip. MCH 130 for one embodiment also provides an interface to a main memory 132 and a graphics controller 134 each coupled to MCH 130. Main memory 132 stores data and/or instructions, for example, for computer system 100 and may comprise any suitable memory, such as a dynamic random access memory (DRAM) for example. Graphics controller 134 controls the display of information on a suitable display 136, such as a cathode ray tube (CRT) or liquid crystal display (LCD) for example, coupled to graphics controller 134. MCH 130 for one embodiment interfaces with graphics controller 134 through an accelerated graphics port (AGP). Graphics controller 134 for one embodiment may alternatively be combined with MCH 130 to form a single chip.

MCH 130 is also coupled to ICH 140 to provide access to ICH 140 through a hub interface. ICH 140 provides an interface to I/O devices or peripheral components for computer system 100. ICH 140 may comprise any suitable interface controllers to provide for any suitable communication link to MCH 130 and/or to any suitable device or component in communication with ICH 140. ICH 140 for one embodiment provides suitable arbitration and buffering for each interface.

For one embodiment, ICH 140 provides an interface to one or more suitable integrated drive electronics (IDE) drives 142, such as a hard disk drive (HDD) or compact disc read only memory (CD ROM) drive for example, to store data and/or instructions for example, one or more suitable universal serial bus (USB) devices through one or more USB ports 144, an audio coder/decoder (codec) 146, and a modem codec 148. ICH 140 for one embodiment also provides an interface through a super I/O controller 150 to a keyboard 151, a mouse 152, one or more suitable devices, such as a printer for example, through one or more parallel ports 153, one or more suitable devices through one or more serial ports 154, and a floppy disk drive 155. ICH 140 for one embodiment further provides an interface to one or more suitable peripheral component interconnect (PCI) devices coupled to ICH 140 through one or more PCI slots 162 on a PCI bus and an interface to one or more suitable industry standard architecture (ISA) devices coupled to ICH 140 by the PCI bus through an ISA bridge 164. ISA bridge 164 interfaces with one or more ISA devices through one or more ISA slots 166 on an ISA bus.

ICH 140 is also coupled to FWH 170 to provide an interface to FWH 170. FWH 170 may comprise any suitable interface controller to provide for any suitable communication link to ICH 140. FWH 170 for one embodiment may share at least a portion of the interface between ICH 140 and super I/O controller 150. FWH 170 comprises a basic input/output system (BIOS) memory 172 to store suitable system and/or video BIOS software. BIOS memory 172 may comprise any suitable non-volatile memory, such as a flash memory for example.

Instruction Pipeline with Hardware Predication

Processor 102 comprises instruction pipeline 200 with hardware predication for conditional instruction path branching to help avoid or minimize any program execution penalty due to branch mispredictions.

As illustrated in FIG. 2, processor 102 for one embodiment comprises instruction pipeline 200 with hardware predication, instruction cache 210, data cache 212, secondary cache 214, bus interface unit 216, and processor architecture registers 218. Bus interface unit 216 couples system bus 110, instruction cache 210, data cache 212, and secondary cache 214 to one another. Instruction cache 210, data cache 212, and registers 218 are each coupled to instruction pipeline 200.

Instruction cache 210, data cache 212, and secondary cache 214 form a two cache level memory subsystem to help ensure a steady supply of instructions and data to instruction pipeline 200. Instruction cache 210 and data cache 212 are at a primary cache level and may be accessed relatively quickly as instruction cache 210 and data cache 212 are each relatively small in size and closely coupled to instruction pipeline 200. Secondary cache 214 is at a secondary cache level and stores more instructions and data for instruction pipeline 200 relative to instruction cache 210 and data cache 212 yet has a slower access time relative to instruction cache 210 and data cache 212.

Instruction cache 210 and/or secondary cache 214 store instructions accessed from main memory 132 through bus interface unit 216 for processing by instruction pipeline 200. Instruction cache 210 and/or secondary cache 214 may also store recently and/or frequently used instructions. Data cache 212 and secondary cache 214 store data accessed from main memory 132 through bus interface unit 216 for processing by instruction pipeline 200. Data cache 212 and/or secondary cache 214 may also store recently and/or frequently used data. Instruction cache 210, data cache 212, and secondary cache 214 may store instructions and/or data in accordance with any suitable caching scheme. Although described as comprising instruction cache 210, data cache 212, and secondary cache 214, processor 102 may comprise any other suitable memory subsystem for storing instructions and data for instruction pipeline 200.

Instruction pipeline 200 for one embodiment comprises a fetch/decode unit 202, a reorder buffer 204, a dispatch/execute unit 206, and a retire unit 208. Fetch/decode unit 202 is coupled to instruction cache 210. Reorder buffer 204 is coupled to fetch/decode unit 202, dispatch/execute unit 206, and retire unit 208. Dispatch/execute unit 206 is coupled to fetch/decode unit 202 and data cache 212. Retire unit 208 is coupled to data cache 212 and registers 218.

Instruction pipeline 200 for one embodiment processes instructions of a program in accordance with a flow diagram 300 as illustrated in FIG. 3. Instruction pipeline 200 may process any suitable instruction at any suitable level, such as macro-instructions for example. The program for one embodiment defines a sequential order for the instructions of the program and comprises one or more conditional branch instructions. As used in this detailed description, a conditional branch instruction encompasses any instruction defined to alter the flow of execution of instructions of a program based on whether one or more conditions have been satisfied. Each conditional branch instruction for one embodiment identifies a condition and a target instruction that is to follow the conditional branch instruction if the condition is satisfied. The conditional branch instruction may identify any suitable condition and target instruction in any suitable manner. Conditional branch instructions are also known as conditional jump instructions, for example.

For block 302 of FIG. 3, fetch/decode unit 202 fetches a next instruction of a program from instruction cache 210. Fetch/decode unit 202 may fetch instructions from the program in any suitable manner.

Fetch/decode unit 202 for block 304 identifies whether the fetched instruction is a conditional branch instruction. If so, fetch/decode unit 202 for block 306 identifies whether the fetched instruction is a qualifying conditional branch instruction and, if so, for block 308 predicts fall through execution for the identified qualifying conditional branch instruction. The next instruction fetched by fetch/decode unit 202 will therefore be the instruction sequentially following the qualifying conditional branch instruction as defined by the program. Fetch/decode unit 202 may identify conditional branch instructions in any suitable manner and may define and identify qualifying conditional branch instructions in any suitable manner.

Fetch/decode unit 202 for one embodiment for block 306 identifies whether the fetched instruction is a conditional forward branch instruction, that is whether a target instruction identified by the conditional branch instruction is positioned after the conditional branch instruction in the sequential order of instructions as defined by the program. Fetch/decode unit 202 for one embodiment for block 306 identifies whether the fetched instruction is a conditional branch instruction identifying a target instruction within a suitable predetermined number of instructions from the conditional branch instruction. Fetch/decode unit 202 for one embodiment for block 306 identifies how predictable the identified conditional branch instruction is, for example, by determining how often the condition is resolved in the same manner each time the conditional branch instruction is executed. If fetch/decode unit 202 for block 306 identifies a conditional branch instruction as a conditional forward branch instruction, as identifying a target instruction within a suitable predetermined number of instructions from the conditional branch instruction, and/or as not being predictable within a suitable predetermined degree of accuracy, fetch/decode unit 202 for block 308 predicts fall through execution for the identified conditional branch instruction.

If the fetched instruction is identified as a conditional branch instruction for block 304 yet is not a qualifying conditional branch instruction as determined for block 306, fetch/decode unit 202 for block 310 predicts either the identified target instruction or the fall through instruction will follow at the risk of penalizing the execution of the program if the wrong instruction was selected. Fetch/decode unit 202 may perform such branch predictions for block 310 in any suitable manner.

Fetch/decode unit 202 for block 312 decodes the fetched instruction into one or more micro-operations. Fetch/decode unit 202 may decode instructions into any suitable one or more micro-operations in any suitable manner. Although described in the context of micro-operations, fetch/decode unit 202 for other embodiments may decode the fetched instruction into any suitable one or more instructions at any suitable one or more instruction levels.

Fetch/decode unit 202 for block 314 determines whether the fetched instruction is in a fall through branch instruction path or any target branch instruction path for an identified qualifying conditional branch instruction. A fall through branch instruction path for a conditional branch instruction comprises one or more instructions that are executed only if a condition for the conditional branch instruction is not satisfied. A target branch instruction path for a conditional branch instruction comprises one or more instructions that are executed only if a condition for the conditional branch instruction is satisfied. Because the target instruction for a conditional branch instruction may be executed regardless of how a condition for the conditional branch instruction is resolved, each conditional branch instruction may not have a target branch instruction path. Fetch/decode unit 202 may identify instructions in a fall through branch instruction path and in any target branch instruction path in any suitable manner.

If the fetched instruction is in the fall through branch instruction path or any target branch instruction path for an identified qualifying conditional branch instruction, fetch/decode unit 202 for block 316 associates a condition for the qualifying conditional branch instruction with each micro-operation for the fetched instruction. Fetch/decode unit 202 may associate the condition with each micro-operation for the fetched instruction in any suitable manner. In decoding a fetched instruction into one or more micro-operations and associating a condition with each such micro-operation, fetch/decode unit 202 effectively decodes the fetched instruction into one or more conditional micro-operations.

As an illustration as to how a condition is associated with one or more fetched instructions, an exemplary program fragment contains the following instructions: JC (Target1) ADD S1,S2 DEC S1 Target1: SUB S1,S2

where JC (Target1) designates to jump or branch to the instruction at Target1 if condition C is satisfied or to continue with the next sequential instruction if condition C is not satisfied, ADD S1,S2 designates to add the content of logical register S1 to that of logical register S2 and store the sum in logical register S1, DEC S1 designates to decrement the content of logical register S1, and SUB S1,S2 designates to subtract the content of logical register S2 from that of logical register S1 and store the difference in logical register S1.

When fetch/decode unit 202 fetches the conditional branch instruction JC (Target1), fetch/decode unit 202 for this illustration identifies JC (Target1) as a qualifying conditional branch instruction, for example, because JC (Target1) is a forward conditional branch instruction, identifies the target instruction SUB S1,S2 within five instructions of JC (Target1), and is not predictable within a predetermined degree of accuracy. Fetch/decode unit 202 predicts fall through execution for JC (Target1) and therefore continues fetching instructions in sequential order as defined by the program. As fetch/decode unit 202 fetches and decodes the instructions in the fall through branch instruction path for JC (Target1), fetch/decode unit 202 associates the condition C with the one or more micro-operations for each such instruction, effectively associating the condition C with each such instruction as follows. JC (Target1) ADD S1,S2/C' DEC S1/C' Target1: SUB S1,S2

The condition C is illustrated in inverse form, that is as C', because the instructions in the fall through branch instruction path are to be executed only if the condition C is not satisfied.

Because the target instruction SUB S1,S2 is to be executed regardless of how the condition C is resolved, the conditional branch instruction JC (Target1) does not have a target branch instruction path in this illustration.

As another illustration as to how a condition is associated with one or more fetched instructions, an exemplary program fragment contains the following instructions: JC (Target1) ADD S1,S2 DEC S1 JMP Target2 Target1: SUB S1,S2 Target2: MUL S4,S5

where JMP Target2 designates to unconditionally jump to the instruction at Target2 and MUL S4,S5 designates to multiply the content of logical register S4 by that of logical register S5 and store the product in logical register S4.

As fetch/decode unit 202 fetches and decodes the instructions in the fall through branch instruction path for JC (Target1) and the instruction in the target branch instruction path for JC (Target1), fetch/decode unit 202 associates the condition C with the one or more micro-operations for each such instruction, effectively associating the condition C with each such instruction as follows. JC (Target1) ADD S1,S2/C' DEC S1/C' JMP Target2/C' Target1: SUB S1,S2/C Target2: MUL S4,S5

The instruction in the target branch instruction path, that is SUB S1,S2, is to be executed only if the condition C is satisfied.

Fetch/decode unit 202 for block 318 maps any sources and renames any destinations for each micro-operation for the fetched instruction. Fetch/decode unit-202 may perform mapping and renaming in any suitable manner.

Fetch/decode unit 202 for block 320 allocates each micro-operation for the fetched instruction in reorder buffer 204. Fetch/decode unit 202 may allocate each micro-operation in reorder buffer 204 in any suitable manner.

Fetch/decode unit 202 may comprise any suitable circuitry. As illustrated in FIG. 4, fetch/decode unit 202 for one embodiment comprises an instruction pointer 402, a branch prediction unit 404, a decoder 406, a conditional branch processing unit 408, and a register alias table (RAT) and allocate unit 410.

Fetch/decode unit 202 controls instruction pointer 402 to identify for block 302 the next instruction to be fetched from instruction cache 210 based on inputs, for example, from branch prediction unit 404, exception/interrupt status, and/or branch misprediction indications from dispatch/execute unit 206.

Branch prediction unit 404 receives fetched instructions from instruction cache 210, identifies each fetched conditional branch instruction for block 304, and predicts for block 310 either a target instruction or the fall through instruction for the conditional branch instruction is to be fetched next. Branch prediction unit 404 may perform branch predictions in any suitable manner. Branch prediction unit 404 for one embodiment identifies qualifying conditional branch instructions for block 306 and predicts fall through execution for block 308. Branch prediction unit 404 is coupled to instruction cache 210, instruction pointer 402, and dispatch/execute unit 206 and may comprise any suitable circuitry.

Decoder 406 is coupled to instruction cache 210 and receives and decodes each fetched instruction into one or more micro-operations for block 312. Decoder 406 may comprise any suitable circuitry to decode each fetched instruction into any suitable one or more micro-operations in any suitable manner. Decoder 406 for one embodiment decodes each instruction into one or more triadic micro-operations. A triadic micro-operation comprises an operation code or opcode and may comprise up to two logical source operands and one logical destination operand.

Decoder 406 for one embodiment tags a micro-operation for each qualifying conditional branch instruction to identify the qualifying conditional branch to conditional branch processing unit 408 and dispatch/execute unit 206. Each qualifying conditional branch may be identified in any suitable manner by branch prediction unit 404 and/or decoder 406. Decoder 406 may also decode each qualifying conditional branch instruction in a suitable manner so as to distinguish micro-operations for qualifying conditional branch instructions from other conditional branch instructions.

Decoder 406 for one embodiment tags a micro-operation for each conditional branch instruction that is not a qualifying conditional branch instruction with suitable information identifying the fall through branch instruction path and the predicted instruction path for the conditional branch instruction to help dispatch/execute unit 206 identify branch mispredictions. The fall through branch instruction path and the predicted instruction path may be identified in any suitable manner by branch prediction unit 404 and/or decoder 406.

Conditional branch processing unit 408 receives micro-operations from decoder 406, identifies micro-operations in the fall through branch instruction path and in any target branch instruction path for a qualifying conditional branch instruction for block 314, and associates a condition for the qualifying conditional branch instruction with each such identified micro-operation for block 316. Conditional branch processing unit 408 for one embodiment identifies qualifying conditional branch instructions based on micro-operations received from decoder 406. Conditional branch processing unit 408 is coupled to decoder 406 and may comprise any suitable circuitry to identify micro-operations in the fall through branch instruction path and in any target branch instruction path for a qualifying conditional branch instruction and to associate a condition for the qualifying conditional branch instruction with each such identified micro-operation in any suitable manner. Conditional branch processing unit 408 for one embodiment tags each such identified micro-operation with a conditional flag identifying the condition as an additional source operand for the micro-operation.

RAT and allocate unit 410 receives micro-operations from conditional branch processing unit 408 and maps any sources and renames any destinations for each micro-operation for block 318. RAT and allocate unit 410 for one embodiment for block 318 converts logical register references to physical register references and in so doing forms dependency links between physical destinations and sources-using a rename map. For one embodiment where conditional branch processing unit 408 tags a micro-operation with a conditional flag identifying a condition for a qualifying conditional branch instruction, RAT and allocate unit 410 attaches to the tagged micro-operation an identifier of the same physical flag register upon which the qualifying conditional branch instruction depends.

As an example, fetch/decode unit 202 may decode, map, and rename the macro-instruction ADD Ldest,Lsource from a fall through branch instruction path for a qualifying conditional branch instruction into the micro-operation ADD Pdest4.rarw.(Pdest1, Pdest2), Pdest3:flag, where Ldest is a logical destination register, Lsource is a logical source register, ADD Ldest,Lsource designates to add the content of logical register Ldest to that of logical register Lsource and store the sum in logical register Ldest, Pdest4 is a physical destination register to store the result of the ADD instruction, Pdest1 is a physical destination register corresponding to logical register Lsource, Pdest2 is a physical destination register corresponding to logical register Ldest, and Pdest3 is a physical destination register corresponding to a flag register to store a conditional flag upon which the qualifying conditional branch instruction depends.

RAT and allocate unit 410 also allocates each micro-operation in reorder buffer 204 for block 320. In entering micro-operations in reorder buffer 204, RAT and allocate unit 410 for one embodiment for block 320 adds status information to the micro-operations to prepare them for out-of-order execution.

RAT and allocate unit 410 is coupled to conditional branch processing unit 408 and reorder buffer 204 and may comprise any suitable circuitry to perform mapping, renaming, and allocation in any suitable manner.

Reorder buffer 204 receives and stores each micro-operation from fetch/decode unit 202. Reorder buffer 204 also stores micro-operations that have already been executed by dispatch/execute unit 206 but not yet retired. Reorder buffer 204 may comprise any suitable circuitry and for one embodiment comprises an array of content-addressable memory (CAM).

Dispatch/execute unit 206 for block 322 of FIG. 3 dispatches micro-operations stored in reorder buffer 204 for execution and executes dispatched micro-operations. Dispatch/execute unit 206 schedules and executes micro-operations stored in reorder buffer 204 in accordance with data dependencies among such micro-operations and execution resource availability and therefore supports out-of-order execution of micro-operations. Dispatch/execute unit 206 stores any result of executing a micro-operation with that micro-operation in reorder buffer 204.

Dispatch/execute unit 206 may comprise any suitable circuitry. As illustrated in FIG. 5, dispatch/execute unit 206 for one embodiment comprises a reservation station 502, integer execution units 511 and 512, floating point execution units 513 and 514, and a memory interface execution unit 515. Each execution unit 511-515 is coupled to reservation station 502. Although illustrated as comprising five execution units 511-515, dispatch/execute unit 206 for other embodiments may comprise any suitable number of execution units each of which may execute any suitable type of micro-operation.

Reservation station 502 is coupled to reorder buffer 204 and scans the status of micro-operations in reorder buffer 204 to identify micro-operations that are ready to be executed, such as micro-operations having available source operands for example. Reservation station 502 for block 322 dispatches each ready micro-operation to an appropriate execution unit 511, 512, 513, 514, or 515 available to execute t