|
Claims  |
|
|
We claim:
1. In a processor having a decoder and an execution unit for executing
microinstructions a method is provided for selecting one of two data
values based upon control states of the processor, the method comprising
the steps of:
providing a microinstruction indicating actions to be taken by the
execution unit in resolving a specified condition, with resolution of the
specified condition being contingent upon the control states of the
processor;
providing a first data value as a first source input to the
microinstruction;
providing a second data value as a second source input to the
microinstruction; and
issuing the microinstruction to the execution unit for execution, wherein
during execution of the microinstruction, the first source input value is
selected when the specified condition is determined to be true and the
second source input value is selected when the specified condition is
determined to be false.
2. In a processor having an execution unit for executing microinstructions
and for providing a resolution to a specified condition and a decoder, a
method is provided for selecting one of two data values based upon a
specified condition, resolution of the specified condition being
contingent on predetermined processor state values, the state values being
selected from a group consisting of processor operating mode values and
processor privilege values, the method comprising the steps of:
providing the predetermined processor state values to hardware circuitry of
the execution unit;
providing a microinstruction indicating actions to be taken by the
execution unit in resolving the specified condition;
providing a first data value as a first source input to the
microinstruction;
providing a second data value as a second source input to the
microinstruction;
issuing the microinstruction to the execution unit for execution;
reading the predetermined processor state values provided to the execution
unit;
resolving the specified condition in the execution unit;
selecting the first source input value of the microinstruction when the
specified condition is true;
selecting the second source input value of the microinstruction when the
specified condition is false; and
allocating a storage location to which the selected source input value will
be written, the storage location being preselected from a group consisting
of a processor register, a memory location within the processor and a
memory location external to the processor.
3. The method according to claim 2, wherein the method further comprises
the steps of:
writing the selected source input value to the storage location, the
storage location comprising the processor register having an associated
flag;
setting the associated flag when the selected source input value written to
the processor register comprises a binary zero;
clearing the associated flag when the selected source input value written
to the processor register does not comprise a binary zero; and
writing the flag to the storage location allocated.
4. The method according to claim 2, wherein the first data value and the
second data value provided to the microinstruction comprise values
selected from the group consisting of register values, immediate values,
constant values and memory values.
5. The method according to claim 2, wherein the processor operating mode
values and processor privilege values represent Intel Processor
Architecture operating modes and privilege states.
6. In an out-of-order processor having a decoder, at least one execution
unit for executing microinstructions out-of-order and a reorder buffer
having storage locations provided for buffering execution results of
corresponding microinstructions, a method is provided for selecting one of
two data values based upon control states of the processor, the method
comprising the steps of:
providing a microinstruction indicating actions to be taken by the
execution unit in resolving a specified condition;
providing a first data value as a first source input to the
microinstruction;
providing a second data value as a second source input to the
microinstruction;
issuing the microinstruction to the execution unit for execution;
determining the control states of the processor;
resolving the specified condition in the execution unit;
selecting the first source input value of the microinstruction when the
specified condition is true; and
selecting the second source input value of the microinstruction when the
specified condition is false.
7. The method according to claim 6, wherein the step of determining the
control states of the processor is performed by the steps of:
providing predetermined values to hardware circuitry of the execution unit,
the predetermined values comprising processor control state values
selected from the group consisting of processor operating mode values and
processor privilege values; and
reading the predetermined values provided to the execution unit.
8. The method according to claim 6, wherein the method further comprises
the step of allocating a storage location to which the selected source
input value will be written, the storage location comprising one of a
processor register, a memory location within the processor and a memory
location external to the processor.
9. The method according to claim 8, wherein the method further comprises
the steps of:
writing the selected source input value to the storage location, the
storage location comprising the processor register, the processor register
having an associated flag;
setting the associated flag when the selected source input value written to
the processor register comprises a binary zero;
clearing the associated flag when the selected source input value written
to the processor register does not comprise a binary zero; and
writing the flag to the storage location allocated.
10. The method according to claim 6, wherein the processor further
comprises a dispatch buffer for temporarily storing the microinstruction
until the execution unit is available, an instruction decoder having
microcode read only memory for decoding instructions into the
microinstruction and an allocator for assigning the microinstruction
storage location in each of the dispatch buffer and the reorder buffer,
and wherein the step of issuing the microinstruction to the execution unit
for execution is performed by issuing the microinstruction from the
microcode read only memory of the instruction decoder upon detection of an
instruction the execution of which depends upon the control states of the
processor.
11. The method according to claim 6, wherein the first data value and the
second data value to the microinstruction comprise values selected from
the group consisting of register values, immediate values, constant values
and memory values.
12. The method according to claim 6, wherein the processor control states
comprises states selected from the group consisting of processor operating
mode states and processor privilege states.
13. An apparatus provided in a processor having a decoder, the apparatus
for selecting one of two data values based upon control states of the
processor, the apparatus comprising:
a decoder for providing a sequence of microinstructions, each
microinstruction having an opcode indicating actions to be taken by the
processor in resolving a specified condition, a first source input value
and a second source input value;
an execution unit for executing microinstructions, the execution unit
having hardware circuitry for collecting predetermined control state
values to be used in resolving the specified condition, the predetermined
state values being selected from a group consisting of processor operating
mode values and processor privilege values, the execution unit executing
each microinstruction to produce as output the first source input value of
the microinstruction when the specified condition is true and the second
source input value of the microinstruction when the specified condition is
false.
14. The apparatus according to claim 13, wherein the execution unit in
execution of the microinstruction further writes the selected source input
value to a storage location, the storage location being selected from the
group consisting of a processor register, a memory location within the
processor and a memory location external to the processor.
15. The apparatus according to claim 14, wherein the processor comprises an
out-of-order processor in which the execution unit of the processor
executes microinstructions out-of-order, the storage location including a
location within a reorder buffer, the reorder buffer buffering execution
results and data after execution of corresponding microinstructions.
16. The apparatus according to claim 15, wherein the processor further
comprises a dispatch buffer for temporarily storing microinstructions
until the execution unit is available, an instruction decoder having
microcode read only memory for decoding instructions into
microinstructions and an allocator for assigning microinstructions storage
locations in the dispatch buffer and the reorder buffer, and wherein each
microinstruction is issued from the microcode read only memory of the
instruction decoder upon detection of the instruction the execution of
which depends upon the control states of the processor.
17. The apparatus according to claim 14, wherein the storage location
comprises a processor register having an associated flag and the selected
source input value written to the processor register causes the associated
flag to be set when the selected source input value comprises a binary
zero, the selected source input value written to the processor register
further causing the associated flag to be cleared when the selected source
input value does not comprises a binary zero.
18. The apparatus according to claim 13, wherein the first source input
value and the second source input value of the microinstruction comprise
values selected from the group consisting of register values, immediate
values, constant values and memory values.
19. The apparatus according to claim 13, wherein the predetermined control
state values comprise values selected from the group consisting of
processor operating mode values and processor privilege values.
20. The apparatus according to claim 19, wherein the processor operating
mode values and processor privilege values comprise values corresponding
to operating modes and privileges defined by the Intel Architecture.
21. The apparatus according to claim 13, wherein the processor is
implemented in a computer system comprising an input/output means for
providing a communications interface and a memory means coupled to the
input/output means for providing input data and output data to interface
with a computer user. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to programming operations in a microprocessor, and
more specifically, to the use of a conditional move microinstruction for
implementing processor state dependent operations. The invention is
particularly pertinent to speculative, out-of-order processors which
predict program flow and execute instructions out-of-order, but may also
be used in conventional pipelined and non-pipelined processors.
2. Art Background
I. State Dependent Operations in Pipelined, In-Order Microprocessors
Simple microprocessors generally process instructions one at a time. Each
instruction can be considered as being processed in five sequential
stages: instruction fetch, instruction decode, operand fetch, execute and
writeback. During instruction fetch, an instruction pointer from a program
counter is sent to an instruction memory, such as an instruction cache, to
retrieve a macroinstruction. The macroinstruction is decoded into
microinstructions or micro-operations (uops) which specify an opcode in
addition to source and destination register addresses. During operand
fetch, a register file is addressed with the source register addresses to
return the source operand values. In the execution stage, the uop and the
source operand values are sent to an execution unit for execution. During
writeback, the result value of the microinstruction execution is written
to the register file at the destination register address encoded in the
microinstruction.
Within simple microprocessors, different dedicated logic blocks perform
each processing stage. Each logic block waits until all the previous logic
blocks complete operations before beginning its operation. Without
pipelining, the microprocessor processes the uops sequentially one after
another. However, to improve microprocessor efficiency, microprocessor
architectures are now designed with overlapped pipeline stages so that the
microprocessor can operate on several uops simultaneously.
In the processing of state dependent instructions, the results derived from
execution of these instructions depend upon the current state of the
microprocessor. But since the state of the processor may be changed by
certain control instructions which may be fetched and decoded but not
executed before the fetching of the state dependent instructions, it is
possible that some state dependent instructions will be erroneously
fetched. This is because the fetching of a state dependent instruction is
based upon a processor state that may subsequently be modified by a
previously fetched control instruction. In this case, the processor would
have to detect the change in state and the fact that a particular state
dependent instruction was erroneously fetched so as to stop the execution
of the state dependent instruction and cause a fault to occur indicating
that the state dependent instruction should not be executed and that
another flow of uops should be fetched.
In order to prevent such a situation from occurring, conventional pipelined
processors are designed to detect the existence of a control instruction
at the decode stage and stall the pipeline by issuing fake uops (no-ops)
to the execution unit until the result of the control instruction (i.e. a
possible change in state) is determined during its execution. Once the
control instruction reaches the execution unit and its execution is
complete, the decoder is informed of any change in state and can resume
the normal fetching of instructions. Obviously, processors which utilize
this method incur a performance penalty due to the number of clock cycles
that are wasted during the pipeline stall.
Additionally, in the execution of state dependent instructions, many clock
cycles are required to access the state information needed and to resolve
their dependencies. For example, in execution of a privileged instruction,
the processor would have to read the proper control registers, place the
information in the proper format, compare the proper values and perform a
select (i.e. a conditional move operation) based upon the comparison. For
example, consider the relatively complex pseudo-instruction shown below:
IF [(CPL=0) & (IOPL=3) & (VME)], THEN SELECT A (INSTRUCTION EXECUTION), OR
ELSE B (FAULT)
In order to resolve the above condition, the following pseudo-uops would be
required:
A: T0:=compare (CPL,0)
T1:=select.sub.-- Equal(A,B)
B: T0:=compare (IOPL,3)
T1:=select.sub.-- Equal(T1,B)
C: T0:=compare (VME, TRUE)
T1:=select.sub.-- Equal(T1,B)
The value within register T1 can then be checked by microcode to determine
whether execution of the instruction can proceed (T1=A) or whether a fault
must be posted (T1=B). In calculation of these operations, however, the
processor would require many clock cycles (i.e. approximately 5) in order
to (A) read the CPL control register, do a mask to get the lower 2 bits of
the CPL register, and compare CPL to 0; (B) read the IOPL value, mask and
shift the value, and compare IOPL to 3; and (C) read the processor mode,
mask the mode value, and check to see if the mode is enabled. Nonetheless,
even after all this has been done, the result of these calculations may
indicate that sufficient privilege does not exist, thereby requiring
microcode to signal a fault to the writeback logic of the execution unit
so that a fault can be posted instead of executing the privileged
instruction.
Furthermore, the performance of privilege or mode sensitive algorithms and
updates based on processor state (i.e. instructions that modify the
control flags based upon processor mode) also give rise to a similar
performance penalties. In the case where instructions which modify the
control are executed, for example STI, CLI and IRET in the Intel
architecture, the execution unit will take several cycles to determine the
current processor mode. Thereafter, based on the current mode, a jump (or
branch) will or will not be taken to an algorithm or routine which
determines whether a particular control flag will be modified. Yet, for
processors which predict the flow of instructions instead stalling the
pipeline, if the branch is conditionally taken and later found to be
mispredicted, more cycles will be lost due to the instructions that were
speculatively fetched which must now be canceled or flushed from the
pipeline.
Hence, the performance of the above state dependent operations in
conventional in-order, pipelined processors significantly reduces the
efficiency of the processor due to the wasted cycles needed to stall the
processor upon detection of control instructions and those required to
resolve the conditions of state dependent instructions or recover from
mispredicted branches.
II. Speculative, Out-of-Order Processors
For pipelined microprocessors to operate more efficiently, an instruction
fetch unit at the head of the pipeline must continually provide the
pipeline with a stream of instructions. However, conditional branch
instructions within an instruction stream prevent the instruction fetch
unit from fetching subsequent instructions that are known to be correct
since the conditions for such instructions are not resolved until
execution.
To alleviate this problem, some newer pipelined microprocessors use branch
prediction mechanisms that predict the outcome of branches, and then fetch
subsequent instructions according to the branch prediction. Branch
prediction is achieved using a branch target buffer to store the history
of a branch instruction based only upon the instruction pointer or address
of that instruction. Every time a branch instruction is fetched, the
branch target buffer predicts the target address of the branch using the
branch history. For a more detailed discussion of branch prediction,
please refer to Tse Yu Yeh and Yale N. Patt, Two-Level Adaptive Branch
Prediction, the 24th ACM/IEEE International Symposium and Workshop on
MicroArchitecture, November 1991, and Tse Yu Yeh and Yale N. Patt,
Alternative Implementations of Two-Level Adaptive Branch Prediction,
Proceedings of the Nineteenth International Symposium on Computer
Architecture, May 1992.
In combination with speculative execution, out-of-order dispatch of
instructions to the execution units results in a substantial increase in
instruction throughput. With out-of-order completion, any number of
instructions are allowed to be in execution in the execution units, up to
the total number of pipeline stages in all the functional units.
Instructions may complete out of order because instruction dispatch is not
stalled when a functional unit takes more than one cycle to compute a
result. Consequently, a functional unit may complete an instruction after
subsequent instructions have already completed. For a detailed explanation
of speculative out-of-order execution, please refer to M. Johnson,
Superscalar Microprocessor Design, Prentice Hall, 1991, Chapters 2,3,4,
and 7.
In a processor using out-of-order execution, instruction dispatch is
stalled when there is a conflict for a functional unit or when an issued
instruction depends on a result that is not yet computed. In order to
prevent or mitigate stalls in decoding, the prior art provides for a
temporary storage buffer (referred to herein as a dispatch buffer) between
the decode and execute stages. The processor decodes instructions and
places (or "issues") them into the dispatch buffer as long as there is
room in the buffer, and at the same time, examines instructions in the
dispatch buffer to find those that can be dispatched to the execution
units (i.e. those instructions for which all source operands and the
appropriate execution units are available).
Instructions are dispatched from the dispatch buffer to the execution units
with little regard for their original program order. However, the
capability to issue instructions out-of-order introduces a constraint on
register usage. To understand this problem, consider the following
pseudo-microcode sequence:
1. t.rarw.load (memory)
2. eax.rarw.add (eax,t)
3. ebx.rarw.add (ebx,eax)
4. eax.rarw.mov (2)
5. edx.rarw.add (eax,3)
The micro-instructions and registers shown above are generic and will be
recognized by those familiar with the art as those of the well known Intel
microprocessor architecture.
In an out-of-order machine executing these instructions, it is likely that
the machine would complete execution of the fourth instruction before the
second instruction, because the third ADD instruction may require only one
clock cycle, while the load instruction and the immediately following ADD
instruction may require a total of four clock cycles, for example.
However, if the fourth instruction is executed before the second
instruction, then the fourth instruction would probably incorrectly
overwrite the first operand of the second instruction, leading to an
incorrect result. Instead of the second instruction producing a value that
the third instruction would use, the fourth instruction produces a value
that would destroy a value that the second one uses.
This type of dependency is called a storage conflict, because the reuse of
storage locations (including registers) causes instructions to interfere
with one another, even though the conflicting instructions are otherwise
independent. Such storage conflicts constrain instruction dispatch and
reduce performance.
It is known in the art that storage conflicts can be avoided by using
register renaming where additional registers are used to reestablish the
correspondence between registers and values. Using register renaming, the
additional "physical" registers are associated with the original "logical"
registers and values needed by the program. To implement this technique,
the processor typically allocates a new register for every new value
produced (i.e., for every instruction that writes a register). An
instruction identifying the original logical register for the purpose of
reading its value obtains instead the value in the newly allocated
register. Thus, the hardware renames the original register identifier in
the instruction to identify the new register and the correct value. The
same register identifier in several different instructions may access
different hardware registers depending on the locations of register
references with respect to the register assignments.
With renaming, the example instruction sequence depicted above becomes:
1. t.sub.a .rarw.load (mem)
2. eax.sub.b .rarw.add (eax.sub.a,t.sub.a)
3. ebx.sub.b .rarw.add (ebx.sub.a,eax.sub.b)
4. eax.sub.c .rarw.mov (2)
5. edx.sub.a .rarw.add (eax.sub.c,3)
In this sequence, each assignment to a register creates a new instance of
the register, denoted by an alphabetic subscript. The creation of a
renamed register for eax in the fourth instruction avoids the resource
dependency on the second and third instructions, and does not interfere
with correctly supplying an operand to the fifth instruction. Renaming
allows the fourth instruction to be dispatched immediately, whereas,
without renaming, the instruction must be delayed until execution of the
second and third instructions. When an instruction is decoded, its result
value is assigned a location in a functional storage unit (referred to
herein as a reorder buffer), and its destination register number is
associated with this location. This renames the destination register to
the reorder buffer location. When a subsequent instruction refers to the
renamed destination register, in order to obtain the value considered to
be stored in the register, the instruction may instead obtain the value
stored in the reorder buffer if that value has already been computed.
The use of register renaming in the reorder buffer not only avoids register
resource dependencies to permit out-of-order execution, but also plays a
key role in speculative execution. If the instruction sequence given above
is considered to be part of a predicted branch, then one can see that
execution of those instructions using the renamed registers in the reorder
buffer has no effect on the actual registers denoted by instruction. Thus,
if it is determined that the branch was mispredicted, the results
calculated and stored in the reorder buffer may be erased and the pipeline
flushed without affecting the actual registers found in the processor's
register file. If the predicted branch affected the values in the register
file, then it would be difficult to recover from branch misprediction
because it would be difficult to determine the values stored in the
registers before the predicted branch was taken without the use of
redundant registers in the reorder buffer.
When a result is output from an execution unit, it is written back to the
reorder buffer. The result may also provide an input operand to one or
more waiting instructions buffered in the dispatch buffer, indicating that
the source operand is ready for dispatch to one or more execution units
along with the instructions using the operand. After the value is written
into the reorder buffer, subsequent instructions continue to fetch the
value from the reorder buffer, unless the entry is superseded by a new
register assignment and until the value is retired by writing it to the
register file.
After the processor determines that the predicted instruction flow is
correct, the processor commits the speculative results of those
instructions that were stored in the reorder buffer to an architectural
state by writing those results to the register file. This process is known
as retirement wherein the instructions are architecturally committed or
retired according to their original program order (i.e. the original
instruction sequence).
III. State Dependent Operations in Out-of-Order Processors
In out-of-order microprocessors, the processor state needed for execution
of state dependent instructions is located either in the register file or
in microcode control registers distributed throughout the processor's
architecture. However, due to the speculative, out-of-order nature of the
processor, the problems involved with processing state dependent
operations, such as checking privileged instructions, executing privilege
or mode sensitive algorithms and updating processor state, become much
worse.
One problem is in the out-of-order nature of execution which gives rise to
significantly greater performance penalties. The number of pipestages for
an out-of-order processor between the decode stage and the retirement
stage (where the register file is updated) is increased by approximately
10 stages over that for an in-order processor. Hence, a pipeline stall at
the decode stage caused by a control instruction requesting a change of
state would waste many more cycles in an out-of-order processor, thereby
increasing the performance penalty to an unacceptable value. However,
out-of-order does not, in and of itself, increase the length of the
pipeline. In one embodiment of the present invention, the microprocessor
uses superpipelining, a technique which increases the number of stages in
each pipe while shortening each stage. This is done so that pipe stages
which require short periods of time to execute are not penalized due to
longer periods required by preceding or subsequent pipe stages. This
technique is what increases the number of pipe stages in the present
invention over past implementations. The primary affect of out-of-order
execution is the increase in the number of microinstructions which may be
outstanding in the portion of the pipeline which supports out-of-order
execution. Also, note that out-of-order execution allows operations which
come after a given operation to contend for execution unit resources in
some cases. This can further lengthen the pipeline for a microinstruction
in the pipeline.
Similarly, the pipeline length in addition to the size of the reorder
buffer determine the number of speculative uops that are in the pipeline
at any one time, this number ranging between approximately 30-50 uops.
Therefore, the cost of taking a speculative branch (i.e. by predicting the
result of a conditional move or jump instruction) later found to be
mispredicted (at the execute stage) would give rise to another
unacceptable performance penalty due to the large number of speculative
uops that would have to be flushed in addition to the lost opportunity
costs in terms of the clock cycles wasted by the flushed uops.
Furthermore, with regard to microcode determining processor state at the
decode stage, such as with privileged instruction checking, the
disjunction between the instruction decoder, the execution units and the
retirement logic in an out-of-order processor would also require a
substantial investment in hardware and microcode to enable state updates
to occur at the various functional units throughout the processor. Since
the back-end, out-of-order functional units have little control over the
front-end, in-order functional units, a substantial amount of
communications or signaling hardware would have to be implemented between
the decoder and the updated processor state kept in the real register file
and in microcode registers throughout the processor. Even so, the
broadcasting of state updates would cause more penalties due to the
multiple state updates required for each state change.
Accordingly, it is an object of the present invention to provide a method
and apparatus in a microprocessor for conditionally selecting one of two
data values based upon control states of a processor via a
microinstruction.
It is another object of the present invention to provide a method and
apparatus for performing processor state dependent operations in an
out-of-order processor through the use of microcode while minimizing
performance penalties caused by pipestailing, conditional moves and
conditional jumps.
It is a further object of the present invention to provide a method and
apparatus for performing privileged instruction checking, privilege or
mode sensitive algorithm execution and privileged updating in an
out-of-order processor through the use of a microinstruction that avoids
the complexity and expense of dedicated hardware that would otherwise have
to be implemented between the front-end and back-end of the processor.
SUMMARY OF THE INVENTION
The present invention provides a microinstruction for conditionally
selecting one of two data values based upon control states of a processor.
The microinstruction is preferably utilized in an out-of-order processor,
although it may be used in conventional processors, to perform state
dependent operations, including but not limited to privilege or mode
sensitive instruction checking, privilege or mode sensitive algorithm
execution and processor state updating. This is accomplished through the
issuance from microcode to an execution unit upon decoding of a state
dependent instruction a conditional move operation that takes advantage of
condition resolving circuitry implemented within the execution unit. The
execution unit's circuitry makes available state information in the form
of result values that can be immediately used by the microinstruction upon
its execution to resolve the conditions which it specifies. Upon immediate
resolution of a specified condition, one of two values (or microcode
temporary registers having values therein) is selected in order to
properly complete the state dependent operation or to take other
appropriate action such as posting a fault.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is generalized block diagram of one embodiment of the microprocessor
in which the present invention is utilized.
FIG. 2 is a block diagram of the microprocessor shown in FIG. 1 in which
the pertinent in-order, front-end functional units and out-of-order,
back-end functional units are shown.
FIG. 3 is a block diagram of one embodiment of a computer system in which
the out-of-order microprocessor of the present invention may be
implemented.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides a method and apparatus for performing
processor state dependent operations in a microprocessor through the use
of a microinstruction that selects one of two data values based upon
control states of a processor. For purposes of explanation, specific
embodiments are set forth in detail to provide a thorough understanding of
the present invention. However, it will be apparent to one skilled in the
art that the present invention may be practiced with other embodiments and
without all the specific details set forth. In other instances, well known
elements, devices, circuits, process steps and the like are not set forth
in detail in order to avoid unnecessarily obscuring the present invention.
I. System Block Diagram
FIG. 1 is a generalized block diagram of one embodiment of a speculative,
out-of-order processor according to the present invention. This particular
embodiment includes a variety of functional units grouped together in
clusters forming a bus cluster, an instruction fetch cluster, an issue
cluster, an out-of-order cluster, an execution cluster and a memory
cluster. In particular, these clusters can be further categorized into an
in-order section comprising the bus cluster, the instruction fetch cluster
and the issue cluster, and an out-of-order section comprising the
out-of-order cluster, the execution cluster and the memory cluster.
The functional units and corresponding interconnections pertinent to the
description of the present invention are shown in more detail in FIG. 2.
With reference to FIG. 2, the in-order section (or front-end) of the
microprocessor is denoted as 120, while the out-of-order section (or
back-end) is denoted as 130. The in-order section 120 includes an
instruction fetch unit (IFU) 102 having an instruction cache (ICACHE) and
an instruction translation lookaside buffer (ITLB) (neither being shown),
a branch target buffer (BTB) 104, and instruction decoder (ID) 106, a
microinstruction sequencer (MS) 107, an allocator (ALLOC) 112 and a
register alias table (RAT) 114. The out-of-order section 130 includes a
reservation station (RS) 118, a number of execution units (EUs) 116 (i.e.
an instruction execution unit (IEU) and a memory execution unit (MEU)), a
retire control circuit | | |