|
Description  |
|
|
BACKGROUND OF THE INVENTION
The subject matter of this invention relates to computing systems, and more
particularly, to a multiple instruction stream, multiple data pipeline for
use in a functional unit of such computing system, such as a floating
point unit, which is designed to operate in conjunction with a single
instruction stream, single data architecture.
Most computer processors utilize some form of pipelining. In a pipelined
computer processor, more than one instruction of an instruction stream is
being executed at the same time. However, each of the instructions being
executed are disposed within different stages of the pipe. The
performances of a pipelined processor is necessarily better than the
performance of a non-pipelined processor. There are different types of
pipelining. One type is termed "single instruction stream single data
(SISD)" pipelining. In the SISD type of pipelining, individual
instructions are pipelined with at most a single data operation. However,
using the SISD pipelining approach, many "hazards" were encountered.
Hazards are encountered upon entering the pipeline at a maximum possible
new data rate. The "hazards" can be divided in two categories, namely,
structural hazards and data dependent hazards. A structural hazard occurs
when two pieces of data attempt to use the same hardware and thus
collisions occur. Data dependent hazards may occur when the events
transpiring in one stage of a pipeline determines whether or not data may
pass through another stage of the pipeline. For example, in a pipeline
having two stages, each stage requiring use of a single memory, when one
stage is using the memory, the other stage must remain idle until the
first stage is no longer using the memory. Another type of pipeline
approach is termed "multiple instruction stream, multiple data (MIMD)"
pipelining. When the MIMD type of pipelining is being used, rather than
pipe individual instructions, as in the SISD pipeline approach,
instruction "streams" are piped. The MIMD pipeline approach did not
encounter the hazards problem. However, although instruction streams are
being piped in the MIMD approach, a first instruction stream must complete
execution before a second instruction stream could commence execution.
Thus, although the performance of the MIMD pipeline was better than the
performance of the SISD pipeline, the performance of the MIMD pipeline was
limited, by the "one instruction stream at a time" execution philosophy.
SUMMARY OF THE INVENTION
Accordingly, it is a primary object of the present invention to introduce a
novel type of pipeline for computer functional units, hereinafter termed a
"dynamic MIMD pipeline".
It is another object of the present invention to introduce the dynamic MIMD
pipeline which is not limited by the "one instruction stream at a time"
execution philosophy.
It is another object of the present invention to introduce the dynamic MIMD
pipeline capable of simultaneously executing a multiple number of
instruction streams in a multiple number of pipelines thereby increasing
substantially the performance of the functional unit embodying the dynamic
MIMD pipeline.
In accordance with these and other objects of the present invention, a
plurality of pipes are capable of piping, for execution thereof, a further
plurality of instructions. Each pipe is capable of simultaneously storing,
for execution, a plurality of instructions. Thus, the plurality of pipes
are capable of simultaneously storing, for execution, the further
plurality of instructions. The further plurality of instructions are
chosen from a plurality of instruction streams which are executing
simultaneously in the plurality of pipes. Since the instructions in a
particular pipe may be in various stages of completion of execution, in
order to keep an accurate record of the execution disposition of each
instruction in the pipe, a dynamic history table stores information
associated with each instruction disposed in each of the plurality of
pipes, the information for each instruction including the pipe number in
which the instruction is temporarily stored, and the status of completion
of execution of the particular instruction. A handshakes and global
hazards circuit determines the busy status of the functional unit, in
which the dynamic MIMD pipe is embodied, and responds to other functional
units in the computer system, such as the central processing unit (CPU).
It also determines if any hazards exist. If the functional unit is not
busy and no hazards exist, the next instruction from one of the plurality
of instruction streams enters the next available pipe. An MIMD/SISD switch
circuit determines if an incoming instruction is greater than "X" bits
long (e.g.--64), and if so, the switch switches the dynamic MIMD pipeline
of the present invention to the standard SISD mode and executes the
incoming instruction in the "one instruction stream at a time" execution
philosophy mode. SISD is also invoked for "difficult" instructions which
are considered to be divides and square roots.
Further scope of applicability of the present invention will become
apparent from the detailed description presented hereinafter. It should be
understood, however, that the detailed description and the specific
examples, while representing a preferred embodiment of the invention, are
given by way of illustration only, since various changes and modifications
within the spirit and scope of the invention will become obvious to one
skilled in the art from a reading of the following detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
A full understand of the present invention will be obtained from the
detailed description of the preferred embodiment presented hereinbelow,
and the accompanying drawings, which are given by way of illustration only
and are not intended to be limitative of the present invention, and
wherein:
FIG. 1 illustrates a block diagram of a prior art standard MIMD
architecture pipeline;
FIG. 2 illustrates the dynamic MIMD/SISD pipeline floating point unit 20 of
the present invention;
FIG. 3 illustrates the instruction stack 20.1 of FIG. 2 and includes FIGS.
3a-3c with suitable legends in tabular format;
FIG. 4 illustrates with appropriate legends the dynamic history table 20.7
of FIG. 2;
FIG. 5 illustrates the pipe1 (ADD) internal control registers 20.6a of FIG.
2 and includes with appropriate legends FIGS. 5a-5b.
FIG. 6 illustrates the pipe2 (MULT) internal control registers 20.6b of
FIG. 2 and includes with appropriate legends FIGS. 6a-6b;
FIG. 7 illustrates the pipe3 (LOAD RX) 20.6c of FIG. 2;
FIG. 8 illustrates the pipe4 (MISCELLANEOUS) 20.6d of FIG. 2;
FIG. 9 illustrates the dbus stack 20.9 and the dbus stack controls 20.10 of
FIG. 2 and includes with appropriate legends FIGS. 9a-9b;
FIG. 10 illustrates a construction of the handshakes and global hazards
circuit 20.3 of FIG. 2;
FIG. 11 illustrates a construction of the initialization circuit 20.5 of
FIG. 2; and
FIG. 12 illustrates an example instruction stream.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
By way of background information, the dynamic MIMD pipeline of the present
invention is incorporated into a functional unit of a computer system.
Such a functional unit may be a floating point unit (FPU). In addition to
the FPU, the computer system also includes a cache, a central processing
unit (CPU), and a vector processor (VP). The Floating Point Unit (FPU)
receives data directly from the cache, the Central Processing Unit (CPU),
or the Vector Processor (VP); and receives instructions from the CPU. The
CPU does not control the data coming from the cache. The CPU requests date
(from the cache) while sending instructions to the FPU. While the data is
being accessed by the CPU from the cache, the CPU continues to send
instructions to the FPU without regard to synchronization of the cycle by
which data is being accessed from the cache with the cycle by which the
corresponding instructions are being sent to the FPU. Therefore, the data
arriving at the FPU at cycle N may be data pertaining to an instruction
delivered to the Floating point unit in cycle M, where M.ltoreq.N. The CPU
requests that certain operations be performed by the FPU and other units,
such as the cache, via a bus called the CBUS. The CBUS is the only means
by which instructions are communicated between the CPU and the FPU. The
CBUS conducts handshake control signals and instruction opcodes. When the
CPU transmits these requests, the functional units to which these requests
are sent are called Processor Bus Units (PBU). The FPU comprises one of
the PBUs. When the CPU encounters an instruction which it cannot execute,
and a PBU should execute the instruction, the CPU transmits a Processor
Bus Operation (PBO) signal to the appropriate PBU. For instance, if the
CPU decoded an instruction to be a multiple floating point long, since it
is much easier for the FPU to perform this operation, the CPU transmits
the PBO signal to the FPU requiring the FPU to perform the multiply
floating point long instruction.
The FPU comprises two main parts: a first section in which data actually
flows, and a second section into which instructions are introduced and
subsequently converted into control signals. This specification describes
the second section.
Referring to FIG. 1, a prior art MIMD pipeline system is illustrated.
In FIG. 1, a storage 10 stores a plurality of instruction streams, and in
particular, the state of each such stream. An initialization control 12 is
connected to the output of the storage 10, and a pipeline circuit 14 is
connected to the output of the initialization control 12, the pipeline
circuit 14 having no hazards detection circuit. The output of the pipeline
14 is connected to the storage 10.
In operation, an instruction stream, stored in storage 10 of FIG. 1, is
transmitted to the initialization control 12. The initialization control
12, in response thereto, transmits each instruction of the instruction
stream, one at a time, to the pipeline circuit 14. The instructions are
piped within pipeline circuit 14 and executed, one at a time. In response,
updated instructions are transmitted from the pipeline circuit 14 for
storage in storage 10. When the last instruction of the instruction stream
is transmitted to the pipeline circuit 14 from the initialization control
12, piped and executed therein, the last updated instruction of the
original instruction stream is transmitted to storage 10. At this point,
another instruction stream is transmitted from the storage 10 to the
initialization control 12 for execution thereof in the pipeline circuit
14. It is evident that, in the configuration of FIG. 1, the original
instruction stream must complete piping and execution within the pipeline
circuit 14 before the next instruction stream may be transmitted from
storage 10 to the initialization control 12 for piping and execution
within the pipeline circuit 14. This is the limitation and disadvantage
associated with the standard MIMD pipeline approach.
Referring to the FIG. 2, a dynamic MIMD pipeline 20, according to the
present invention, is illustrated.
In FIG. 2, the CBUS is connected to an instruction stack 20.1. An output of
the instruction stack 20.1 is connected to a decode circuit 20.2. The
decode circuit 20.2 output is connected to a handshakes and global hazards
circuit 20.3, a mimd/sisd switch 20.4, and an initialization circuit 20.5.
The outputs of the handshakes and global hazards circuit 20.3 and the
mimd/sisd switch 20.4 are connected to the inputs of the initialization
circuit 20.5. The initialization circuit 20.5 output is connected to a
dynamic history table 20.7, to pipeline circuits 20.6, and to floating
point registers (FPR) 20.8. The handshakes and global hazards circuit 20.3
output is also connected to an exception handler circuit 20.11, the output
of which is further connected to the dynamic history table 20.7. Pipeline
circuits 20.6 are also connected to the dynamic history table 20.7 and to
the exception handler circuit 20.11, and produce an output which is
conducted on a bus called the DBUS which is connected to the data cache,
and conducted to the FPR 20.8 which is a local architecturally defined
storage. The output of the dynamic history table 20.7 is used to control
the gating of the output to the DBUS and to the FPR 20.8. The CBUS, in
addition to being input to the instruction stack 20.1, is also input to a
DBUS stack controls 20.10 circuit. The output of the DBUS stack controls
20.10 circuit is connected to a DBUS stack 20.9, the input of which is
connected to the DBUS. The output of the DBUS stack 20.9 and the output of
the FPRs 20.8 generates the data which begins the data flow.
The dynamic MIMD pipeline of the present invention, illustrated in FIG. 2,
may be subdivided into two paths: one for instructions and controls (CBUS
path), and the other for data flow (DBUS path). The instructions are
received via the CBUS and put in the instruction stack 20.1, and then
decoded via decoder 20.2. The data is introduced into the dynamic MIMD
pipeline of FIG. 2 via the DBUS.
The handshakes and global hazards circuit 20.3 of FIG. 2 transmits
"handshake" signals to the CPU and detects global hazards. A further
construction of the handshakes and global hazards circuit 20.3 may be
found in FIG. 10 of the drawings. A more detailed description of the
handshakes and global hazards circuit 20.3 of FIG. 10 will be set forth
below in one of the following paragraphs of this specification. The CBUS
contains a set of handshake signals to be transmitted between the CPU to
each PUB, including the FPU. When the FPU receives a request via the CBUS,
the handshakes and global hazards circuit 20.3 of the FPU is required to
send an acknowledge signal, a busy signal, or an interrupt signal back to
the CPU if the CBUS request was sent from the CPU and the FPU is the only
PBU involved in the request. The Acknowledge handshake signal is sent from
the FPU to the CPU if a CBUS request is sent to the FPU and the FPU is not
BUSY. The interrupt signal is sent from the FPU to the CPU if a data
exception is encountered and critical information is stored in the status
word. The busy handshake signal is sent from the FPU to the CPU if the FPU
cannot accept another instruction for execution. The handshake signals,
acknowledge, busy, interrupt, are sent to the CPU from the handshakes and
global hazards circuit 20.3 of the FPU. Global hazards are detected in the
handshakes and global hazards circuit 20.3 of the FPU and a signal is
transmitted therefrom, for transmission to the initialization circuit
20.5, representative of the existence of such hazards. The handshake logic
20.3 (in connection with the initialization circuit 20.5) delivers the
appropriate responses of the FPU to the other processor bus units (PBU).
It also helps to detect the beginning and the end of an instruction
stream. The global hazards circuit 20.3 detects the existence of hazards
due to data dependencies of instructions on other executing instructions
(data interlock).
Depending upon the incoming instruction being decoded by decoder 20.2, the
MIMD/SISD switch 20.4 switches to either SISD mode or MIMD mode. If an
incoming instruction involves operands more than 64 bits in length or if
an instruction is determined to be difficult to execute, the MIMD/SISD
switch 20.4 selects the SISD mode, otherwise it uses the MIMD mode.
The specific instructions which are considered to be "difficult" and which
invoke SISD mode are:
The Divides, both floating and fixed point
The Square Roots
Operations which involve extended operands
During execution in the SISD mode, everything is shutdown except for the
execution of the difficult instruction; this is accomplished by holding
the BUSY signal, generated from the handshakes circuit 20.3 of the FPU to
the CPU, active. This stops the CPU from sending any more requests to the
FPU via the CUBS. The following instructions or any of their combinations
will cause MIMD/SISD switch 20.4 to switch the pipeline mode to the MIMD
mode:
FLOATING POINT OPERATIONS
ADDs
COMPAREs
HALVE
LOADs
MULTIPLYs
STOREs
SUBTRACTs
FIXED POINT OPERATIONS--microcode
MULTIPLY
OTHER OPERATIONS--microcode
LOADs
STOREs
STATUS WORD
INDIRECT MODE
RETRY
The following instructions will cause MIMD/SISD switch 20.4 to switch the
pipeline mode to the SISD mode.
FLOATING POINT OPERATIONS--microcode
ADD extended
MULTIPLY extended
DIVIDEs
DIVIDE extended
SQUARE ROOT
LOAD rounded extended
FIXED POINT OPERATIONS--microcode
DIVIDE
The initialization circuit 20.5 of FIG. 2 starts the pipe, and updates the
Dynamic History Table 20.7. A further construction of the initialization
circuit 20.5 may be found in FIG. 11 of the drawings. A more detailed
description of the initialization circuit 20.5 of FIG. 11 will be set
forth below in one of the following paragraphs of this specification. In
connection with the handshake/hazard logic 20.3, the initialization
circuit 20.5 determines the beginning and end of an instruction stream and
determines if any data dependent hazards exist. After the decode step, the
type of instruction, as indicated by the output from the decoder 20.2, is
compared, in the initialization circuit 20.5 and the global hazard circuit
20.3, with the completion status of the first cycle status of the
appropriate pipe to use, as indicated by the internal pipe controls
20.6a-d. If there is no global hazards, as indicated by the dynamic
history table 20.7, and no immediate internal hazards exist, as indicated
by the output of the handshakes and global hazards circuit 20.3, the
instruction is initialized. If the BUSY handshake signal is developed by
the handshakes and global hazards circuit 20.3, no initialization takes
place in the initialization circuit 20.5 Initialization involves starting
the status controls of the appropriate pipe and also entering a new line
in the Dynamic History Table. Notification of initialization is handled by
the handshake controls 20.3 which sends the acknowledge signal to the CPU
indicating that the instruction has been started or by sending a busy
signal to the CPU, indicating that the FPU has the instruction but the
pipe of incoming instructions had better be stopped because the FPU cannot
handle very many more instructions. The initialization logic 20.5 and the
global hazards logic 20.3 determine the beginning and end of a stream of
instructions. The response "acknowledge" and "not busy" to an instruction
not already in a stream indicates the beginning of a stream, and "busy"
indicates the end of a stream. The global hazards circuit 20.3 is used to
determine hazards due to data "dependencies". The initialization logic
20.5 adds new lines to the dynamic history table 20.7. Therefore,
initialization consists of handshaking, updating the history file, and
possibly dealing with data hazards.
The dynamic MIMD pipeline of FIG. 2 includes four pipeline circuits 20.6:
pipe1 20.6a, pipe2 20.6b, pipe3 20.6c, and pipe4 20.6d. Thus, there are
four categories of instructions, one category for each pipe 20.6a through
20.6d.
Data on the DBUS is processed by either the FPRs 20.8, or the DBUS stack
20.9 which is controlled by the DBUS stack controls 20.10.
The exception handler 20.11 determines if there is an exception. The types
of data exceptions that can occur while executing instructions are:
Exponent Overflow Exception
Exponent Underflow Exception
Floating Point Divide Exception
Fixed Point Divide Exception
Significance Exception
Square Root Exception
When an instruction is detected that causes one of these exceptions, all
instructions received after this must be cancelled as if they were never
received even though they may be already executing; this is a property of
a SISD architecture which must be preserved by the dynamic MIMD
architecture. This is done by changing all the valid bits in the Dynamic
History Table to zero after the instruction causing the exception has
completed. In addition, the CPU and the other units are notified of the
interrupt and they must cancel their instructions until the CPU begins an
interrupt handler routine.
The dynamic MIMD pipeline 20, disposed in the FPU of the computer system,
receives instructions via the CBUS and the FPU responds back, as do the
other processor bus units, by transmitting certain "handshake" signals
including an ACKnowledge handshake signal, a BUSY handshake signal, and an
INTerrupt handshake signal. Since the CPU works in a pipeline mode and
send PBO commands out every cycle, regardless of whether the last PBO was
ACKnowledged, the PBUs must determine whether the last PBO was
acknowledged before processing to execute the next PBO. The POBs include a
"smart" interface. Therefore, using the smart interface, a PBU must check
on the handshakes of other PBUs with the CPU. A PBU is required to send
one of the three handshake signals (from the handshakes circuit 20.3 of
FIG. 2) to the CPU in the cycle after a PBO was received by the PBU. If
hazards are encountered by a PBU, such as the FPU, a BUSY handshake signal
is sent to the CPU by a PBU. When the BUSY signal is sent to the CPU, the
PBU holds the received instruction and the following instruction in an
instruction stack (such as instruction stack 20.1 of FIG. 2 for FPU) so
that the sequence of instructions received from the CPU can be maintained.
Thus, implemented on the FPU, as part of the instruction stack 20.1, is a
CBUS Register 20.1.2 and a CBUS STACK 20.1.1 which hold the received
instruction and the following instruction, respectively. Instructions are
not stacked unless hazards, which cause generation of the BUSY handshake
signal, are encountered. The FPU accepts as many instructions as it can
handle; however, the FPU does not contain as much information as is
contained by the CPU, since the CPU can halt an instruction before the
instruction is even sent to the bus units if it sees, in its buffer of
instructions, that problems may be encountered. When PBOs are sent from
the CPU that require execution by the FPU and another bus unit, such as
the data cache, the FPU has no power to prevent the data cache from
starting the execution of the instruction. Thus, the most efficient method
for the FPU to pipe is to go as far as possible until a hazard is
encountered.
Referring to FIGS. 3a-3c, a construction of the instruction stack 20.1 of
FIG. 2 is illustrated. FIG. 3a illustrates the construction of the
instruction stack 20.1, FIG. 3b illustrates the bits on the CBUS during a
hardwired mode, and FIG. 3c illustrates the bits on the CBUS during a
microcode mode.
In FIG. 3a, the instruction stack 20.1 comprises the CBUS Stack register
20.1.1, and a CBUS register 20.1.2 connected to the output of the CBUS
stack register 20.1.1. The instruction stack 20.1 as well as the CBUS
consist of 25 bits of information for, at most, 2 instructions. This 25
bits of information comprise:
bit 0--the PBO bit which indicates whether the FPU is in a hardwired mode
or a microcode mode; if in hardwired mode (0), exceptions are reported to
the CPU; if in microcode mode (1), exceptions are stored in the status
word (see FIG. 8, element number 20.6d.3) but are not reported;
bit 1--the FPU request bit which signals the FPU that this instruction must
be executed by the FPU;
bit 2--the IPU/Cache request bit which signals the cache to decode the
instruction;
bit 3--the VP request bit;
bits 17 to 19--in microcode mode, these bits are the SRC, source,
identifier bits;
bits 20 and 22--in microcode mode, these bits are the DST, destination,
identifier bits;
bits 4 to 10--the instruction opcode bits;
bits 17 to 19--in hardwired mode, these bits are the interrupt tag field
which is stored in the status word on an exception; and
bit 24--the parity bit used for checking the validity of the instruction.
Thus, the instruction on the CBUS is introduced into the instruction stack
20.1 via the CBUS bits defined above.
Referring to FIG. 4, a construction of the dynamic history table 20.7 of
FIG. 2 is illustrated.
In FIG. 4, the dynamic history table 20.7 comprises 17 bits of information
stored away for at most 8 instructions at a time. The Dynamic History
Table 20.7 consists of data that is needed when it is necessary for an
incoming instruction to enter one of pipes 20.6a-20.6d and to complete
from these pipes. Since the instructions are stacked, the table 20.7
provides a means of sequencing the completion of execution of instructions
of one or more instruction streams. The limitation of the CBUS, to send
one instruction at a time, determines the instruction's starting time.
Since the execution of the instructions of the one or more instruction
streams may take multiple cycles to complete, and since there exists more
than one pipe, it is possible that multiple instructions will be executing
at the same time. Due to architectural constraints, the instruction
completion sequence must be maintained because of possible unpredictable
results if an interrupt occurred and the instructions were not sequential.
Therefore, there is a need to maintain and store sequencing information
and completion information in the table 20.7. The dynamic history table
20.7 stores the following information:
1. the pipe number (PIPE NO),
2. the write address (WR ADDR),
3. whether the instruction is a write type (WT) of instruction,
4. a tag (INT TAG) which uniquely identifies it with an instruction in a
stack on the CPU,
5. whether the instruction is an SISD instruction type (M/S), where "M"
implies an MIMD instruction type and "S" implies an SISD instruction type,
6. the result length (LEN),
7. whether it is a hardwired or microcoded PBO (H),
8. some retry information (PSW PTR), and
9. a valid bit (V).
The pipe number (PIPE NO) is critical because it sequences the multiple
pipes of pipeline circuits 6a-6d of FIG. 2 (four in all). Sequencing is of
very little concern in a one pipe system, but, with multiple pipes,
tracking information must be maintained. The write address (WR ADDR),
write type (WT), and result length (LEN) help in completing the
instruction. The tag information (INT TAG) is stored away if an exception
occurs and helps in identifying the exact instruction which caused the
exception. If it is a SISD instruction, completion is sensed in a
different way than looking to see if valid data is disposed at the end of
a pipe, but instead, is determined by a counter which counts the cycles.
The most important bit is the valid bit (V) which indicates whether the
instruction in this entry of the stack is valid. The valid bit (V) is
cleared when an exception occurs. The valid bit (V) entry is cleared and
the stack is shifted upon completion of an instruction. Thus, a quick
method is available to cancel all pending instructions in the FPU, that
is, by clearing the valid bit (V) in the dynamic history table 20.7.
Referring to FIGS. 5 and 8, a construction of the pipeline 20.6 of FIG. 2
is illustrated. In particular, FIG. 5 illustrates the construction of
pipe1 20.6a which is used for add type instructions, such as add,
subtract, divide, compares, and square roots. The FIG. 5 pipe functions in
three cycles. FIG. 6 illustrates the construction of pipe2 20.6b, the
multiply pipe, which functions in 5 cycles, and is used for multiply
instructions. FIG. 7 illustrates the construction of pipe3 20.6c, which is
used for load RX type instructions and functions in two cycles. FIG. 8
illustrates the construction of pipe4 20.6d, which is used for all other
miscellaneous functions, and which are usually either a write or a read of
some auxiliary or status registers.
In FIG. 2, pipeline 20.6a includes a controls section and a pipe1 section.
Similarly, pipes 20.6b through 20.6d each include a controls section. The
pipeline controls section of pipe 20.6a-20.6d controls the internal parts
of each pipe by pushing the operations as far as possible through the
pipes and sensing when the FPRs 20.8 are interlocked and by determining
where good data can be found. To better understand these controls, it is
best to first understand the layout of each pipe. In FIG. 2 and in FIGS.
5-8, pipeline 20.6 comprises four pipes: 20.6a through 20.6d. In MIMD
mode, these pipes have different lengths thus creating complexities in
controlling these pipes globally. Internal to the pipes, there are
registers and, associated with these registers, are status fields.
Referring to FIG. 5b, as indicated more thoroughly below, the status fields
for the registers in the add pipe of FIG. 5 consist of the operands FPR
address (ADDR), whether SISD or MIMD mode is invoked (M/S), some bypass
information (2BY), whether this stage in the pipe is for a valid
instruction (VI), whether the data in the register is valid (VD), and
whether the instruction is RX or RR type (RX).
Referring to FIG. 6b, the status fields associated with the registers of
the multiply pipe of FIG. 6 store this information and further information
including the length of the operands (LI), whether it's a floating point
or fixed point operand (FLP), and whether the data is still valid (VR),
even if this stage of the pipe does not have a valid instruction. The
other pipes do not need status information because they are very short.
Referring again to FIG. 2, the status information for each stage of the
pipe flags to the following stage its validity and then, in the following
cycle, the next stage becomes valid if there are no contentions. Thus, the
flags help in determining contention and push the instruction's data as
far as it can through the pipe. After the pipe in question locates the
data on the DBUS and execution takes place in the pipe, the pipe in
question waits for the Dynamic History Tables oldest entry to match with
the pipe number (PIPE NO) of the pipe in question; at this point, the pipe
is allowed to complete thus maintaining instruction completion
synchronization.
Referring to FIG. 5, pipe1 (add) internal control registers 20.6a are
illustrated.
In FIG. 5a, pipe 1 20.6a includes an alignment register 20.6a.4, an FA
register 20.6a.1, an FB register 20.6a.2, an A register 20.6a.5, a B
register 20.6a.6, an adder 20.6a.7, an FS register 20.6a.3, and S register
20.6a.8, and a post normalizer register 20.6a.9. In FIG. 5b, status fields
associated with the FA register 20.6a.1, FB register 20.6a.2, and FS
register 20.6a.3 are illustrated.
FIG. 5 illustrates the add pipe 20.6a and its associated internal pipe
control registers. The add pipe of FIG. 5 consists of three cycles. During
the first cycle, the data is retrieved from either the FPRs 20.8 and/or
from the DBUS. Alignment is accomplished by the alignment hardware
20.6a.4. Operands are latched into A register 20.6a.5 and B register
20.6a.6. In cycle two, the actual add is performed by the adder 20.6a.7
and the result is stored in S register 20.6a.8. In the third and final
cycle, the post normalizer 20.6a.9 shifts out leading zeros if required
and the data is sent back to the FPRs 20.8. The previously described
function reflects the manner in which the pipe handles add instructions.
For other instructions, belonging to the same category, at least some of
the registers or some of the internal bypassing controls in the pipe are
used. Thus, the three cycle add pipe is used for many different
instructions. To maintain and control this pipe, three major control
registers are needed: FA register 20.6a.1, FB register 20.6a.2, and FS
register 20.6a.3.
Referring to FIG. 5b, the status fields for the FA register 20.6a.1, FB
register 20.6a.2, and FS register 20.6a.3 are illustrated.
In FIG. 5b, such status fields include the following bits:
1. FPR address bits (ADDR) of the operand which are used to locate operands
that may be interlocked
2. A valid instruction bit (VI) which is used to indicate that this stage
in the pipe is valid for an instruction
3. A valid data bit (VD) which indicates that the associated data register
is valid
4. A MIMD/SISD pipe indicator (M/S) which signals the instruction end. For
MIMD mode, its the last stage valid and no contention on completion. For
SISD mode, its a little more complex because the instruction may loop
several times through the pipe.
5. A bit to indicate that the instruction is an RX type of instruction (RX)
which, on the FB register, indicates that its address bits are really
invalid and the data bus should be watched for incoming data if not
already valid.
6. A bit which indicates the first cycle of a two cycle bypass (2BY).
Sometimes it takes two cycles to retrieve interlocked data once it has
been located.
Referring to FIG. 6a, pipe2 20.6b, the multiply pipe, and the control
registers, internal to the pipe, is illustrated.
In FIG. 6a, pipe2 20.6b, the multiply pipe, includes FXA register 20.6b.1,
FYS register 20.6b.2, FXB register 20.6b.3, FY register 20.6b.4, FP
register 20.6b.5, XA register 20.6b.6, 3X hardware 20.6b.7, XB and 3X
registers 20.6b.8, Y register 20.6b.9, M1 hardware 20.6b.10, M2 hardware
20.6b.11, and P register 20.6b.12. The multiply pipe consists of 5 cycles
if no hazards are encountered:
Cycle 1--Operand 1 is loaded into XA register 20.6b.6 from the FPRs 20.8
and, if operand 2 is also from the FPRs, it is read and stored in a
temporary register because the bus structure limits the loading to one
operand at a time.
Cycle 2--Operand 2 is loaded from either the temporary register or from the
dbus to Y register 20.6b.9; concurrently, a 3 times multiple of operand 1
is calculated by the 3X hardware 20.6b.7 and stored in 3X register
20.6b.8. The XA register 20.6b.6 directly loads XB register 20.6b.8.
Cycle 3 and Cycle 4--These are the two cycles of actual execution of the
multiplier. These cycles, termed the M1 and M2 cycle, use the M1 hardware
20.6b.10 and the M2 hardware 20.6b.11. No registers separate the two
cycles of execution. Thus, XB and 3X registers 20.6b.8 and Y register
20.6b.9 must be held for these two cycles until the data is latched in P
register 20.6b.12.
Cycle 5--This cycle involves a write from P register 20.6b.12 to the FPRs
20.8. If the result is extended, there is a cycle 6, which is a second
| | |