|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of computer systems.
More particularly, the present invention relates to the field of
processing instructions having conditional program execution flow control.
2. Description of Related Art
Typical processors execute instructions out of order to help improve
instruction throughput. Such processors typically process instructions
through a pipeline that fetches instructions from memory, decodes each
instruction, executes the instruction, and retires the instruction. The
operation of each stage of the pipeline typically overlaps those of the
other stages in time to help process instructions faster.
By identifying instructions that may be executed regardless of whether one
or more prior fetched instructions are executed, typical processors may
benefit from executing instructions in parallel, that is overlapping the
execution of two or more instructions in time, and/or from executing
instructions out of order to avoid stalling on any one instruction, for
example, while waiting for the completed execution of an instruction upon
which the stalled instruction depends. Instructions executed out of order
are retired by the pipeline in order.
The pipeline generally fetches instructions of a program in a sequential
order as defined by the program until the program alters its sequential
flow with a jump or branch instruction, for example.
An unconditional branch instruction, for example, identifies a
non-sequential target instruction that is to follow the unconditional
branch instruction. The pipeline identifies the target instruction and
then continues fetching instructions of the program starting with the
target instruction. Before identifying the target instruction, however,
the pipeline may have already fetched and started processing one or more
instructions sequentially following the unconditional branch instruction
as defined by the program. The alteration in the sequential flow of the
program therefore penalizes the execution of the program as the pipeline
is to flush such instruction(s) and restart fetching and processing at the
target instruction. By identifying the target instruction early in the
pipeline, such as in the fetch or decode stage for example, the pipeline
helps avoid or minimize this penalty.
A conditional branch instruction identifies a non-sequential target
instruction that is to follow the conditional branch instruction if a
condition identified by the conditional branch instruction is satisfied.
If the condition is not satisfied, the fall through instruction, that is
the instruction sequentially following the conditional branch instruction
as defined by the program, is to follow the conditional branch
instruction. Because resolution of the condition relies on the execution
of one or more other instructions, the condition may not be resolved when
the conditional branch instruction is fetched. As the pipeline then cannot
determine which instruction is to follow the conditional branch
instruction, the pipeline typically predicts whether the target
instruction or the fall through instruction will follow at the risk of
penalizing the execution of the program if the pipeline later determines
the wrong instruction was selected. If, for example, the pipeline selects
the target instruction and the condition is not satisfied, execution of
the program is penalized as the pipeline flushes the target instruction
and any fetched instructions following the target instruction when the
conditional branch instruction is retired and restarts fetching and
processing at the fall through instruction.
The pipeline may try to predict how the condition will be resolved, for
example, based on prior executions of the same conditional branch
instruction in the program. Typical pipelines, however, cannot accurately
predict how every conditional branch instruction will be resolved every
time and will therefore incur execution penalties due to branch
mispredictions.
Software predicated instructions, such as a conditional move instruction
for example, may be used to eliminate or reduce branch instructions and
therefore avoid or minimize execution penalties associated with branch
mispredictions. Software predication, however, requires compiler help to
substitute code in eliminating branch instructions and an instruction set
architecture that provides for the software predicated instructions.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limitation
in the figures of the accompanying drawings, in which like references
indicate similar elements and in which:
FIG. 1 illustrates an exemplary computer system comprising a processor
having an instruction pipeline with hardware predication for conditional
instruction path branching;
FIG. 2 illustrates, for one embodiment, a processor having an instruction
pipeline with hardware predication for conditional instruction path
branching;
FIG. 3 illustrates, for one embodiment, a flow diagram for processing
instructions with hardware predication for conditional instruction path
branching;
FIG. 4 illustrates, for one embodiment, a fetch/decode unit for the
instruction pipeline of FIG. 3;
FIG. 5 illustrates, for one embodiment, a dispatch/execute unit for the
instruction pipeline of FIG. 3;
FIG. 6 illustrates, for one embodiment, a flow diagram for dispatching and
executing conditional micro-operations;
FIG. 7 illustrates, for one embodiment, conditional execution circuitry
with destination bypassing;
FIG. 8 illustrates, for another embodiment, a flow diagram for dispatching
and executing conditional micro-operations;
FIG. 9 illustrates, for one embodiment, a flow diagram for dispatching and
executing micro-operations dependent on conditional micro-operations; and
FIG. 10 illustrates, for one embodiment, dual execution circuitry with
destination bypassing.
DETAILED DESCRIPTION
The following detailed description sets forth an embodiment or embodiments
in accordance with the present invention for hardware predication for
conditional instruction path branching. In the following description,
details are set forth such as specific processor architecture, instruction
processing techniques, etc., in order to provide a thorough understanding
of the present invention. It will be evident, however, that the present
invention may be practiced without these details. In other instances,
well-known function blocks, interfaces, etc., have not been described in
particular detail so as not to obscure the present invention.
Exemplary Computer System
FIG. 1 illustrates an exemplary computer system 100 comprising a processor
102 having an instruction pipeline 200 with hardware predication for
conditional instruction path branching in accordance with the present
invention. Although described in the context of computer system 100, the
present invention may be implemented in any suitable computer system
comprising any suitable one or more integrated circuits.
As illustrated in FIG. 1, computer system 100 comprises another processor
104 that may also have an instruction pipeline with hardware predication
for conditional instruction path branching, a processor bus 110, and a
chipset 120. Processors 102 and 104 and chipset 120 are coupled to
processor bus 110. Processors 102 and 104 may each comprise any suitable
processor architecture and for one embodiment comprise an Intel.RTM.
Architecture used, for example, in the Pentium.RTM. family of processors
available from Intel.RTM. Corporation of Santa Clara, Calif. Computer
system 100 for other embodiments may comprise one, three, or more
processors any of which may comprise an instruction pipeline with hardware
predication for conditional instruction path branching.
Chipset 120 for one embodiment comprises a memory controller hub (MCH) 130,
an input/output (I/O) controller hub (ICH) 140, and a firmware hub (FWH)
170. MCH 130, ICH 140, and FWH 170 may each comprise any suitable
circuitry and for one embodiment are each formed as a separate integrated
circuit chip. Chipset 120 for other embodiments may comprise any suitable
one or more integrated circuit devices.
MCH 130 may comprise any suitable interface controllers to provide for any
suitable communication link to processor bus 110 and/or to any suitable
device or component in communication with MCH 130. MCH 130 for one
embodiment provides suitable arbitration, buffering, and coherency
management for each interface.
MCH 130 is coupled to processor bus 110 and provides an interface to
processors 102 and 104 over processor bus 110. Processor 102 and/or
processor 104 may alternatively be combined with MCH 130 to form a single
chip. MCH 130 for one embodiment also provides an interface to a main
memory 132 and a graphics controller 134 each coupled to MCH 130. Main
memory 132 stores data and/or instructions, for example, for computer
system 100 and may comprise any suitable memory, such as a dynamic random
access memory (DRAM) for example. Graphics controller 134 controls the
display of information on a suitable display 136, such as a cathode ray
tube (CRT) or liquid crystal display (LCD) for example, coupled to
graphics controller 134. MCH 130 for one embodiment interfaces with
graphics controller 134 through an accelerated graphics port (AGP).
Graphics controller 134 for one embodiment may alternatively be combined
with MCH 130 to form a single chip.
MCH 130 is also coupled to ICH 140 to provide access to ICH 140 through a
hub interface. ICH 140 provides an interface to I/O devices or peripheral
components for computer system 100. ICH 140 may comprise any suitable
interface controllers to provide for any suitable communication link to
MCH 130 and/or to any suitable device or component in communication with
ICH 140. ICH 140 for one embodiment provides suitable arbitration and
buffering for each interface.
For one embodiment, ICH 140 provides an interface to one or more suitable
integrated drive electronics (IDE) drives 142, such as a hard disk drive
(HDD) or compact disc read only memory (CD ROM) drive for example, to
store data and/or instructions for example, one or more suitable universal
serial bus (USB) devices through one or more USB ports 144, an audio
coder/decoder (codec) 146, and a modem codec 148. ICH 140 for one
embodiment also provides an interface through a super I/O controller 150
to a keyboard 151, a mouse 152, one or more suitable devices, such as a
printer for example, through one or more parallel ports 153, one or more
suitable devices through one or more serial ports 154, and a floppy disk
drive 155. ICH 140 for one embodiment further provides an interface to one
or more suitable peripheral component interconnect (PCI) devices coupled
to ICH 140 through one or more PCI slots 162 on a PCI bus and an interface
to one or more suitable industry standard architecture (ISA) devices
coupled to ICH 140 by the PCI bus through an ISA bridge 164. ISA bridge
164 interfaces with one or more ISA devices through one or more ISA slots
166 on an ISA bus.
ICH 140 is also coupled to FWH 170 to provide an interface to FWH 170. FWH
170 may comprise any suitable interface controller to provide for any
suitable communication link to ICH 140. FWH 170 for one embodiment may
share at least a portion of the interface between ICH 140 and super I/O
controller 150. FWH 170 comprises a basic input/output system (BIOS)
memory 172 to store suitable system and/or video BIOS software. BIOS
memory 172 may comprise any suitable non-volatile memory, such as a flash
memory for example.
Instruction Pipeline with Hardware Predication
Processor 102 comprises instruction pipeline 200 with hardware predication
for conditional instruction path branching to help avoid or minimize any
program execution penalty due to branch mispredictions.
As illustrated in FIG. 2, processor 102 for one embodiment comprises
instruction pipeline 200 with hardware predication, instruction cache 210,
data cache 212, secondary cache 214, bus interface unit 216, and processor
architecture registers 218. Bus interface unit 216 couples system bus 110,
instruction cache 210, data cache 212, and secondary cache 214 to one
another. Instruction cache 210, data cache 212, and registers 218 are each
coupled to instruction pipeline 200.
Instruction cache 210, data cache 212, and secondary cache 214 form a two
cache level memory subsystem to help ensure a steady supply of
instructions and data to instruction pipeline 200. Instruction cache 210
and data cache 212 are at a primary cache level and may be accessed
relatively quickly as instruction cache 210 and data cache 212 are each
relatively small in size and closely coupled to instruction pipeline 200.
Secondary cache 214 is at a secondary cache level and stores more
instructions and data for instruction pipeline 200 relative to instruction
cache 210 and data cache 212 yet has a slower access time relative to
instruction cache 210 and data cache 212.
Instruction cache 210 and/or secondary cache 214 store instructions
accessed from main memory 132 through bus interface unit 216 for
processing by instruction pipeline 200. Instruction cache 210 and/or
secondary cache 214 may also store recently and/or frequently used
instructions. Data cache 212 and secondary cache 214 store data accessed
from main memory 132 through bus interface unit 216 for processing by
instruction pipeline 200. Data cache 212 and/or secondary cache 214 may
also store recently and/or frequently used data. Instruction cache 210,
data cache 212, and secondary cache 214 may store instructions and/or data
in accordance with any suitable caching scheme. Although described as
comprising instruction cache 210, data cache 212, and secondary cache 214,
processor 102 may comprise any other suitable memory subsystem for storing
instructions and data for instruction pipeline 200.
Instruction pipeline 200 for one embodiment comprises a fetch/decode unit
202, a reorder buffer 204, a dispatch/execute unit 206, and a retire unit
208. Fetch/decode unit 202 is coupled to instruction cache 210. Reorder
buffer 204 is coupled to fetch/decode unit 202, dispatch/execute unit 206,
and retire unit 208. Dispatch/execute unit 206 is coupled to fetch/decode
unit 202 and data cache 212. Retire unit 208 is coupled to data cache 212
and registers 218.
Instruction pipeline 200 for one embodiment processes instructions of a
program in accordance with a flow diagram 300 as illustrated in FIG. 3.
Instruction pipeline 200 may process any suitable instruction at any
suitable level, such as macro-instructions for example. The program for
one embodiment defines a sequential order for the instructions of the
program and comprises one or more conditional branch instructions. As used
in this detailed description, a conditional branch instruction encompasses
any instruction defined to alter the flow of execution of instructions of
a program based on whether one or more conditions have been satisfied.
Each conditional branch instruction for one embodiment identifies a
condition and a target instruction that is to follow the conditional
branch instruction if the condition is satisfied. The conditional branch
instruction may identify any suitable condition and target instruction in
any suitable manner. Conditional branch instructions are also known as
conditional jump instructions, for example.
For block 302 of FIG. 3, fetch/decode unit 202 fetches a next instruction
of a program from instruction cache 210. Fetch/decode unit 202 may fetch
instructions from the program in any suitable manner.
Fetch/decode unit 202 for block 304 identifies whether the fetched
instruction is a conditional branch instruction. If so, fetch/decode unit
202 for block 306 identifies whether the fetched instruction is a
qualifying conditional branch instruction and, if so, for block 308
predicts fall through execution for the identified qualifying conditional
branch instruction. The next instruction fetched by fetch/decode unit 202
will therefore be the instruction sequentially following the qualifying
conditional branch instruction as defined by the program. Fetch/decode
unit 202 may identify conditional branch instructions in any suitable
manner and may define and identify qualifying conditional branch
instructions in any suitable manner.
Fetch/decode unit 202 for one embodiment for block 306 identifies whether
the fetched instruction is a conditional forward branch instruction, that
is whether a target instruction identified by the conditional branch
instruction is positioned after the conditional branch instruction in the
sequential order of instructions as defined by the program. Fetch/decode
unit 202 for one embodiment for block 306 identifies whether the fetched
instruction is a conditional branch instruction identifying a target
instruction within a suitable predetermined number of instructions from
the conditional branch instruction. Fetch/decode unit 202 for one
embodiment for block 306 identifies how predictable the identified
conditional branch instruction is, for example, by determining how often
the condition is resolved in the same manner each time the conditional
branch instruction is executed. If fetch/decode unit 202 for block 306
identifies a conditional branch instruction as a conditional forward
branch instruction, as identifying a target instruction within a suitable
predetermined number of instructions from the conditional branch
instruction, and/or as not being predictable within a suitable
predetermined degree of accuracy, fetch/decode unit 202 for block 308
predicts fall through execution for the identified conditional branch
instruction.
If the fetched instruction is identified as a conditional branch
instruction for block 304 yet is not a qualifying conditional branch
instruction as determined for block 306, fetch/decode unit 202 for block
310 predicts either the identified target instruction or the fall through
instruction will follow at the risk of penalizing the execution of the
program if the wrong instruction was selected. Fetch/decode unit 202 may
perform such branch predictions for block 310 in any suitable manner.
Fetch/decode unit 202 for block 312 decodes the fetched instruction into
one or more micro-operations. Fetch/decode unit 202 may decode
instructions into any suitable one or more micro-operations in any
suitable manner. Although described in the context of micro-operations,
fetch/decode unit 202 for other embodiments may decode the fetched
instruction into any suitable one or more instructions at any suitable one
or more instruction levels.
Fetch/decode unit 202 for block 314 determines whether the fetched
instruction is in a fall through branch instruction path or any target
branch instruction path for an identified qualifying conditional branch
instruction. A fall through branch instruction path for a conditional
branch instruction comprises one or more instructions that are executed
only if a condition for the conditional branch instruction is not
satisfied. A target branch instruction path for a conditional branch
instruction comprises one or more instructions that are executed only if a
condition for the conditional branch instruction is satisfied. Because the
target instruction for a conditional branch instruction may be executed
regardless of how a condition for the conditional branch instruction is
resolved, each conditional branch instruction may not have a target branch
instruction path. Fetch/decode unit 202 may identify instructions in a
fall through branch instruction path and in any target branch instruction
path in any suitable manner.
If the fetched instruction is in the fall through branch instruction path
or any target branch instruction path for an identified qualifying
conditional branch instruction, fetch/decode unit 202 for block 316
associates a condition for the qualifying conditional branch instruction
with each micro-operation for the fetched instruction. Fetch/decode unit
202 may associate the condition with each micro-operation for the fetched
instruction in any suitable manner. In decoding a fetched instruction into
one or more micro-operations and associating a condition with each such
micro-operation, fetch/decode unit 202 effectively decodes the fetched
instruction into one or more conditional micro-operations.
As an illustration as to how a condition is associated with one or more
fetched instructions, an exemplary program fragment contains the following
instructions:
JC (Target1)
ADD S1,S2
DEC S1
Target1: SUB S1,S2
where JC (Target1) designates to jump or branch to the instruction at
Target1 if condition C is satisfied or to continue with the next
sequential instruction if condition C is not satisfied, ADD S1,S2
designates to add the content of logical register S1 to that of logical
register S2 and store the sum in logical register S1, DEC S1 designates to
decrement the content of logical register S1, and SUB S1,S2 designates to
subtract the content of logical register S2 from that of logical register
S1 and store the difference in logical register S1.
When fetch/decode unit 202 fetches the conditional branch instruction JC
(Target1), fetch/decode unit 202 for this illustration identifies JC
(Target1) as a qualifying conditional branch instruction, for example,
because JC (Target1) is a forward conditional branch instruction,
identifies the target instruction SUB S1,S2 within five instructions of JC
(Target1), and is not predictable within a predetermined degree of
accuracy. Fetch/decode unit 202 predicts fall through execution for JC
(Target1) and therefore continues fetching instructions in sequential
order as defined by the program. As fetch/decode unit 202 fetches and
decodes the instructions in the fall through branch instruction path for
JC (Target1), fetch/decode unit 202 associates the condition C with the
one or more micro-operations for each such instruction, effectively
associating the condition C with each such instruction as follows.
JC (Target1)
ADD S1,S2/C'
DEC S1/C'
Target1: SUB S1,S2
The condition C is illustrated in inverse form, that is as C', because the
instructions in the fall through branch instruction path are to be
executed only if the condition C is not satisfied.
Because the target instruction SUB S1,S2 is to be executed regardless of
how the condition C is resolved, the conditional branch instruction JC
(Target1) does not have a target branch instruction path in this
illustration.
As another illustration as to how a condition is associated with one or
more fetched instructions, an exemplary program fragment contains the
following instructions:
JC (Target1)
ADD S1,S2
DEC S1
JMP Target2
Target1: SUB S1,S2
Target2: MUL S4,S5
where JMP Target2 designates to unconditionally jump to the instruction at
Target2 and MUL S4,S5 designates to multiply the content of logical
register S4 by that of logical register S5 and store the product in
logical register S4.
As fetch/decode unit 202 fetches and decodes the instructions in the fall
through branch instruction path for JC (Target1) and the instruction in
the target branch instruction path for JC (Target1), fetch/decode unit 202
associates the condition C with the one or more micro-operations for each
such instruction, effectively associating the condition C with each such
instruction as follows.
JC (Target1)
ADD S1,S2/C'
DEC S1/C'
JMP Target2/C'
Target1: SUB S1,S2/C
Target2: MUL S4,S5
The instruction in the target branch instruction path, that is SUB S1,S2,
is to be executed only if the condition C is satisfied.
Fetch/decode unit 202 for block 318 maps any sources and renames any
destinations for each micro-operation for the fetched instruction.
Fetch/decode unit-202 may perform mapping and renaming in any suitable
manner.
Fetch/decode unit 202 for block 320 allocates each micro-operation for the
fetched instruction in reorder buffer 204. Fetch/decode unit 202 may
allocate each micro-operation in reorder buffer 204 in any suitable
manner.
Fetch/decode unit 202 may comprise any suitable circuitry. As illustrated
in FIG. 4, fetch/decode unit 202 for one embodiment comprises an
instruction pointer 402, a branch prediction unit 404, a decoder 406, a
conditional branch processing unit 408, and a register alias table (RAT)
and allocate unit 410.
Fetch/decode unit 202 controls instruction pointer 402 to identify for
block 302 the next instruction to be fetched from instruction cache 210
based on inputs, for example, from branch prediction unit 404,
exception/interrupt status, and/or branch misprediction indications from
dispatch/execute unit 206.
Branch prediction unit 404 receives fetched instructions from instruction
cache 210, identifies each fetched conditional branch instruction for
block 304, and predicts for block 310 either a target instruction or the
fall through instruction for the conditional branch instruction is to be
fetched next. Branch prediction unit 404 may perform branch predictions in
any suitable manner. Branch prediction unit 404 for one embodiment
identifies qualifying conditional branch instructions for block 306 and
predicts fall through execution for block 308. Branch prediction unit 404
is coupled to instruction cache 210, instruction pointer 402, and
dispatch/execute unit 206 and may comprise any suitable circuitry.
Decoder 406 is coupled to instruction cache 210 and receives and decodes
each fetched instruction into one or more micro-operations for block 312.
Decoder 406 may comprise any suitable circuitry to decode each fetched
instruction into any suitable one or more micro-operations in any suitable
manner. Decoder 406 for one embodiment decodes each instruction into one
or more triadic micro-operations. A triadic micro-operation comprises an
operation code or opcode and may comprise up to two logical source
operands and one logical destination operand.
Decoder 406 for one embodiment tags a micro-operation for each qualifying
conditional branch instruction to identify the qualifying conditional
branch to conditional branch processing unit 408 and dispatch/execute unit
206. Each qualifying conditional branch may be identified in any suitable
manner by branch prediction unit 404 and/or decoder 406. Decoder 406 may
also decode each qualifying conditional branch instruction in a suitable
manner so as to distinguish micro-operations for qualifying conditional
branch instructions from other conditional branch instructions.
Decoder 406 for one embodiment tags a micro-operation for each conditional
branch instruction that is not a qualifying conditional branch instruction
with suitable information identifying the fall through branch instruction
path and the predicted instruction path for the conditional branch
instruction to help dispatch/execute unit 206 identify branch
mispredictions. The fall through branch instruction path and the predicted
instruction path may be identified in any suitable manner by branch
prediction unit 404 and/or decoder 406.
Conditional branch processing unit 408 receives micro-operations from
decoder 406, identifies micro-operations in the fall through branch
instruction path and in any target branch instruction path for a
qualifying conditional branch instruction for block 314, and associates a
condition for the qualifying conditional branch instruction with each such
identified micro-operation for block 316. Conditional branch processing
unit 408 for one embodiment identifies qualifying conditional branch
instructions based on micro-operations received from decoder 406.
Conditional branch processing unit 408 is coupled to decoder 406 and may
comprise any suitable circuitry to identify micro-operations in the fall
through branch instruction path and in any target branch instruction path
for a qualifying conditional branch instruction and to associate a
condition for the qualifying conditional branch instruction with each such
identified micro-operation in any suitable manner. Conditional branch
processing unit 408 for one embodiment tags each such identified
micro-operation with a conditional flag identifying the condition as an
additional source operand for the micro-operation.
RAT and allocate unit 410 receives micro-operations from conditional branch
processing unit 408 and maps any sources and renames any destinations for
each micro-operation for block 318. RAT and allocate unit 410 for one
embodiment for block 318 converts logical register references to physical
register references and in so doing forms dependency links between
physical destinations and sources-using a rename map. For one embodiment
where conditional branch processing unit 408 tags a micro-operation with a
conditional flag identifying a condition for a qualifying conditional
branch instruction, RAT and allocate unit 410 attaches to the tagged
micro-operation an identifier of the same physical flag register upon
which the qualifying conditional branch instruction depends.
As an example, fetch/decode unit 202 may decode, map, and rename the
macro-instruction ADD Ldest,Lsource from a fall through branch instruction
path for a qualifying conditional branch instruction into the
micro-operation ADD Pdest4.rarw.(Pdest1, Pdest2), Pdest3:flag, where Ldest
is a logical destination register, Lsource is a logical source register,
ADD Ldest,Lsource designates to add the content of logical register Ldest
to that of logical register Lsource and store the sum in logical register
Ldest, Pdest4 is a physical destination register to store the result of
the ADD instruction, Pdest1 is a physical destination register
corresponding to logical register Lsource, Pdest2 is a physical
destination register corresponding to logical register Ldest, and Pdest3
is a physical destination register corresponding to a flag register to
store a conditional flag upon which the qualifying conditional branch
instruction depends.
RAT and allocate unit 410 also allocates each micro-operation in reorder
buffer 204 for block 320. In entering micro-operations in reorder buffer
204, RAT and allocate unit 410 for one embodiment for block 320 adds
status information to the micro-operations to prepare them for
out-of-order execution.
RAT and allocate unit 410 is coupled to conditional branch processing unit
408 and reorder buffer 204 and may comprise any suitable circuitry to
perform mapping, renaming, and allocation in any suitable manner.
Reorder buffer 204 receives and stores each micro-operation from
fetch/decode unit 202. Reorder buffer 204 also stores micro-operations
that have already been executed by dispatch/execute unit 206 but not yet
retired. Reorder buffer 204 may comprise any suitable circuitry and for
one embodiment comprises an array of content-addressable memory (CAM).
Dispatch/execute unit 206 for block 322 of FIG. 3 dispatches
micro-operations stored in reorder buffer 204 for execution and executes
dispatched micro-operations. Dispatch/execute unit 206 schedules and
executes micro-operations stored in reorder buffer 204 in accordance with
data dependencies among such micro-operations and execution resource
availability and therefore supports out-of-order execution of
micro-operations. Dispatch/execute unit 206 stores any result of executing
a micro-operation with that micro-operation in reorder buffer 204.
Dispatch/execute unit 206 may comprise any suitable circuitry. As
illustrated in FIG. 5, dispatch/execute unit 206 for one embodiment
comprises a reservation station 502, integer execution units 511 and 512,
floating point execution units 513 and 514, and a memory interface
execution unit 515. Each execution unit 511-515 is coupled to reservation
station 502. Although illustrated as comprising five execution units
511-515, dispatch/execute unit 206 for other embodiments may comprise any
suitable number of execution units each of which may execute any suitable
type of micro-operation.
Reservation station 502 is coupled to reorder buffer 204 and scans the
status of micro-operations in reorder buffer 204 to identify
micro-operations that are ready to be executed, such as micro-operations
having available source operands for example. Reservation station 502 for
block 322 dispatches each ready micro-operation to an appropriate
execution unit 511, 512, 513, 514, or 515 available to execute t | | |