|
Claims  |
|
|
What is claimed is:
1. A data processing system for processing a sequence of program
instructions, said system comprising
an instruction pipeline having a plurality of serially operating
instruction stages for reading instructions from storage and for forming
therefrom plural address data for use during execution of said
instructions,
an execution pipeline connected to said instruction pipeline and having a
plurality of serially operating execution stages for receiving said
address data and for employing said address data formed by said
instruction pipeline for referencing stored data to be employed for
executing said instructions,
a pipeline control unit connected to said instruction pipeline and said
execution pipeline for operating said instruction pipeline and said
execution pipeline,
means coupled to said instruction pipeline and said execution pipeline for
predicting collisions between read data read from a register in the
instruction pipeline phase of operation in response to a sequentially
later program instruction and write data written in registers during the
execution phase of operation in response to a sequentially earlier program
instruction, wherein the execution phase can include a plurality of
execution cycles each of which modifies values in zero, one, or more
registers, and wherein said first instruction requires one of said
modified values to continue valid operation, said prediction means
comprising
means for predicting a data destination address for said sequentially
earlier program instruction,
means for comparing the predicted destination address with an actual data
read address of said sequentially later program instruction and for
generating a predicted collision signal when the two addresses match, and
said pipeline control unit having means connected with said comparing means
and responsive to said predicted collision signal for inserting a delay
between said sequentially later program instruction and said sequentially
earlier program instruction for forming the write data address before the
read address data is used to access data.
2. The data processing system of claim 1 wherein said data destination
address predicting means comprises
means for examining selected bits of an instruction for deriving said
predicted destination address for the instruction.
3. The apparatus of claim 2 further wherein said microcode instructions
provide for storing any register update on a final multi-instruction
microinstruction in a selected storage register.
4. A data processing system for processing a sequence of program
instructions comprising
an instruction pipeline having a plurality of serially operating
instruction stages for reading instructions from storage and for forming
therefrom plural address data for use during execution of said
instructions,
an execution pipeline connected to the instruction pipeline having a
plurality of serially operating execution stages for receiving said
address data and for employing said address data formed by said
instruction pipeline for referencing stored data for use during execution
of said instructions,
a pipeline control unit connected to the instruction pipeline and the
execution pipeline for operating said instruction pipeline and said
execution pipeline,
detecting means coupled to said instruction pipeline and to said execution
pipeline for detecting collisions between data read from a register in the
instruction pipeline phase of operation in response to a first instruction
and write data written in registers during the execution phase of
operation in response to an earlier instruction, wherein said execution
phase can include a plurality of execution cycles each of which modifies
values in zero, one, or more registers and wherein said first instruction
requires one of said modified values to continue valid operation,
said detecting means comprising,
means for storing said modified values generated during the execution phase
and write register addresses associated therewith,
means for comparing the associated write register address of each modified
value with a read register address read by the instruction pipeline, and
means connected with the comparing means and responsive to a match between
said compared register addresses for directing the modified value being
written at said register address to replace in at least one of said
instruction and execution pipelines data previously designated for use by
said first instruction during said instruction phase of operation.
5. The data processing system of claim 4 wherein said directing means
further comprises
a selector coupled to said comparing means and responsive to the comparison
of said addresses for selecting either of the data read during the
instruction phase or said modified value, and
a storage register coupled to said execution pipeline for storing the last
written modified value generated during said execution phase of operation.
6. The data processing system of claim 5 further comprising
a microcode storage element, and
means coupled to said storage element for reading microinstructions from
said storage element for controlling operation of said detecting means.
7. The data processing system of claim 4 wherein said directing means
further comprises
means for detecting collisions between a portion of the modified register
and a corresponding portion of the read data, and
said directing means replaces only said portion of the read data with the
portion of the modified data value.
8. The data processing system of claim 7 wherein said register portion is
one-half of the read data.
9. The data processing system of claim 4 further wherein said detecting
means comprises
means for generating a pipeline collision signal in response to a detected
collision, and
means for generating an execution stage signal indicating that generation
of the modified data has not been completed, and
wherein said pipeline control unit comprises
means coupled to the detecting means and responsive to said pipeline
collision signal and said execution stage signal for inhibiting operation
of all pipeline stages other than those required to generate said modified
data until said modified data has been generated.
10. The data processing system of claim 4 wherein said directing means
includes
means coupled to a plurality of pipeline stages for directing said modified
data to said plurality of pipeline stages, each said stage then using said
modified data in place of said originally read data.
11. In a method of operating a data processing system for processing a
sequence of program instructions, the data processing system having an
instruction pipeline having a plurality of serially operating instruction
stages for reading instructions from storage and for forming therefrom
plural address data for use during execution of the instructions, and an
execution pipeline connected to the instruction pipeline and having a
plurality of serially operating execution stages for receiving the address
data and for employing the address data formed by the instruction pipeline
for referencing stored data to be employed for executing the instructions,
the improvement comprising
a collision detection method comprising the steps of
detecting collisions between read data read from a register during the
instruction pipeline phase of operation in response to a sequentially
later instruction and write data written in registers during the execution
phase of operation in response to a sequentially earlier instruction,
wherein the execution phase can include a plurality of execution cycles
each of which modifies values in zero, one, or more registers, and wherein
the first instruction requires one of the modified values to continue
valid operation,
said detection step further comprising the steps of
storing the modifed values generated during the execution phase and write
register addresses associated therewith,
comparing the associated write register address of each modified value with
a read register address read by the instruction pipeline, and
responding to a match between said compared addresses for selecting the
modified value being written into the matching register address to replace
in at least one of said instruction and execution pipelines data
previously designated for use by said first instruction during the
instruction phase of operation.
12. The collision detection method of claim 11 further comprising the steps
of
generating a pipeline collision signal in response to a detected collision,
generating an execution stage signal indicating that generation of the
modified data has not been completed, and
inhibiting operation of all pipeline stages other than those required to
generate said modified data until said modified data generation has been
completed.
13. The collision detection method of claim 11 further comprising the step
of
directing the modified data to a plurality of pipeline stages, each stage
using the modified data in place cf the originally read data during the
instruction phase of operation.
14. In a method for operating a data processing system for processing a
sequence of program instructions, the data processing system having an
instruction pipeline having a plurality of serially operating instruction
stages for reading instructions from storage, and for forming therefrom
plural address data for use during execution of the instructions, and an
execution pipeline, connected to the instruction pipeline, and having a
plurality of serially operating execution stages for receiving the address
data and for employing the address data formed by the instruction pipeline
for referencing stored data to be employed for execution said
instructions, the improvement comprising
a collision prediction method comprising the steps of
predicting the data destination address written by the sequentially earlier
of two instructions which may potentially collide,
comparing the predicted destination address with an actual data read
address of a sequentially later occurring instruction, and
delaying processing of said later instruction for a time for forming the
write data address of said sequentially earlier instruction, and
comparing said formed write data address with data read address thereby
determining whether a collision between said two instructions.
15. The collision prediction method of claim 14 further comprising the step
of
examining selected bits of an instruction for deriving said predicted
destination address for the instruction.
16. The method of claim 15 further comprising the step of
storing any register update on a final multi-instruction microinstruction
in a selected storage register. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to the field of digital computers and, in
particular, to apparatus and methods for processing instructions in high
speed data processing systems.
Data processing systems generally include a central processor, an
associated storage system (or main memory), and peripheral devices and
associated interfaces. Typically, the main memory consists of relatively
low cost, high-capacity digital storage devices. The peripheral devices
may be, for example, non-volatile semi-permanent storage media, such as
magnetic disks and magnetic tape drives. In order to carry out tasks, the
central processor of such systems executes a succession of instructions
which operate on data. The succession of instructions and the data those
instructions reference are referred to as a program.
In operation of such systems, programs are initially brought to an
intermediate storage area, usually in the main memory. The central
processor may then interface directly to the main memory to execute the
stored program. However, this procedure places limitations on performance
due principally to the relatively long times required in accessing that
main memory. To overcome these limitations a high speed (i.e. relatively
fast access) storage system, in some cases called a cache, is used for
holding currently used portions of programs within the central processor
itself. The cache interfaces with main memory through memory control
hardware which handles program transfers between the central processor,
main memory and the peripheral device interfaces.
One form of computer, typically a "mainframe" computer has been developed
in the prior art to concurrently hardware process a succession of
instructions in a so-called "pipeline" processor. In such pipeline
processors each instruction is executed in part at each of a succession of
stages. After the instruction has been processed at each of the stages,
the execution is complete. With this configuration, as an instruction is
passed from one stage to the next, that instruction is replaced by the
next instruction in the program. Thus, the stages together form a
"pipeline" which, at any given time, is executing, in part, a succession
of instructions. Such instruction pipelines for processing a plurality of
instructions in parallel are found in several mainframe computers. These
processors consist of single pipelines of varying length and employ
hardwired logic for all data manipulation. The large quantity of control
logic in such machines makes them extremely fast, but also very expensive.
Another form of computer system, typically a "minicomputer," incorporates
microcode control of instruction execution. Generally, under microcode
control, each instruction is fully executed before execution of the next
instruction begins. Microcode-controlled execution does not provide as
high performance (principally in terms of speed) as hardwired control, but
the microcode control does permit significant cost advantages compared to
hardwired systems. As a result, microcode control of instruction execution
has been employed in many cost-sensitive machines. Microcode reduces the
total quantity of hardware in the processor and also allows much more
flexibility in terms of adapting to changes which may be required during
system operation. Unfortunately, the conventional pipeline techniques for
instruction execution are not compatible with the multiple steps which
must be performed to execute some instructions in a microcode-controlled
environment.
Accordingly, it is an object of the present invention to provide an
improved computer system.
Another object is to provide performance characteristics heretofore
associated only with mainframes while maintaining a cost profile
consistent with the minicomputers.
It is yet another object to provide a computer system incorporating
pipelined instruction processing and microcode-controlled instruction
execution.
SUMMARY OF THE INVENTION
The invention relates to a data processing system and pipeline control
method for processing a sequence of program instructions in a computer.
The data processing system has an instruction pipeline having a plurality
of serially operating instruction stages for reading instructions from
storage and for forming therefrom plural address data to be employed
during execution of the program instructions. The data processing system
further has an execution pipeline having a plurality of serially operating
execution stages for receiving the address data and for employing that
data, formed by the instruction pipeline for referencing stored data to be
employed for executing the program instructions.
The data processing system further features circuitry for detecting
collisions between pipeline stages and for halting operation of one or
more of the pipeline stages to provide a separation between the colliding
instructions. Similarly, where a latter stage of the pipeline alters the
data to be used by an earlier stage, an appropriate "flushing" of the
pipeline can be accomplished should a previously employed address be
incorrect, or the instruction pipeline can be halted until the data is
properly available from the execution pipeline (registered by-pass).
In another aspect, the data processing system relates to an instruction
pipeline and an execution pipeline each having a plurality of serially
operating instruction stages and a pipeline control unit for operating the
instruction and execution pipelines. In this aspect of the invention there
is featured circuitry for detecting collisions between read data from a
register in the instruction pipeline phase of operation, read in response
to a first program instruction, and write data written in registers during
the execution phase of operation in response to an earlier instruction.
During the execution phase, a plurality of execution cycles, during each
of which a register can be modified, occur, and the first instruction will
require one of the modified values to continue valid operation. The
detection circuitry features storage for the modified values generated
during the execution phase and storage for the write register addresses
associated therewith. Circuitry is provided for comparing the associated
write register address for each modified value with the read register
address employed by the instruction pipeline. When a match is found,
directing circuitry replaces the modified value, to be written at the
matched address for the data previously designated to be used during the
associated instruction phase of operation.
In selected embodiments of this aspect of the invention, only a portion of
the originally read data may be modified and replaced. In other aspects of
the invention, operation of all pipeline stages, except those required to
generate modified data, can take place so that the modified data can be
generated prior to a continuation of operation along the instruction, and
portions of the execution, pipelines.
The invention also features circuitry for detecting potential collisions
between data read from a register in the instruction pipeline phase of
operation and data which may be written at a later time. According to this
aspect of the invention, the instructions themselves are examined and
potential collisions predicted. In response to the prediction of a
potential conflict, the pipeline control unit provides delay between the
potentially colliding instructions so that if a collision takes place, it
can be handled in due course by the system structure described above.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects of this invention, the various features
thereof, as well as the invention itself, may be more fully understood
from the following description, when read together with the accompanying
drawings in which:
FIG. 1 shows, in block diagram form, an exemplary computer system embodying
the present invention.
FIG. 1A depicts, in block diagram form, the instruction processor,
including the two three-stage pipelines, showing overlap and flow between
stages, and the pipeline control unit, of the central processor of the
system of FIG. 1;
FIG. 2 depicts the five hardware units that form the instruction processor
of FIG. 2, showing major data paths for the processing of instructions;
FIG. 3 shows, in block diagram form, the pipeline control unit of FIG. 2;
FIG. 3A shows, in block diagram form, the decode logic for the pipeline
control unit of FIG. 4;
FIG. 4 shows, in detailed block diagram form, the pipelines of FIG. 1;
FIG. 5 depicts the flow of instructions through the two pipelines, with
examples of alteration to normal processing flow; FIG. 6 illustrates the
clock generation of the ID stage of the IP pipelines of FIG. 1A;
FIG. 7 depicts a block diagram of the Shared Program Cache of FIG. 1A;
FIG. 8 depicts a block diagram of the Instruction Pre-Processor of FIG. 1A;
FIG. 9 depicts a block diagram of the Micro-Control Store of FIG. 1A; and
FIG. 10 depicts a combined block diagram of the two Execution units of FIG.
1A;
FIG. 11 shows, in block diagram form, the branch cache of the system of
FIG. 4; and
FIG. 12 shows, in block diagram form, the register bypass network of the
Instruction Pre-Processor of FIG. 8.
DESCRIPTION OF THE PREFERRED EMBODIMENT
FIG. 1 shows a computer system embodying the present invention. The system
includes a central processor, main memory, peripheral interface and
exemplary peripheral devices.
This system of FIG. 1 processes computer data instructions in the central
processor which includes instruction pre-processing hardware, local
program storage, micro-control store, and execution hardware. The central
processor includes two independent pipelines; the Instruction Pipeline
(IP) and the Execution Pipeline (EP). In the preferred form, each pipeline
is three stages in length (where the processing time associated with each
stage is nominally the same), with the last stage of the IP being
overlapped with the first stage of the EP. With this configuration, an
instruction requires a minimum of five stage times for completion. All
control for advancing instructions through all required stages originates
from a Pipeline Control Unit (PCU) in the central processor. The PCU
controls the stages to be clocked dynamically, based on pipeline status
information gathered from all stages.
This form of the invention processes instructions defined in the System
Architecture Reference Guide, 2d Ed. (PRC3060-182) Revision 18,2,
published by Prime Computer, Inc., Natick, Mass., and supports the machine
architecture, which includes a plurality of addressing modes, defined in
the Reference Guide. In keeping with this architecture, words are 16 bits
in length, and double words are 32 bits in length. This form of the
invention is optimized to perform address formations including BR+X+D,
BR+GRH+D and RP+X+D, where BR (Base Register) is a 32-bit starting address
pointer, X (Index) is a 16-bit register, GRH (high side of General
Register) is a 16-bit quantity, D (the displacement) is contained
explicitly in the instruction and may be either 9 or 16 bits, and RP is
the current value of the program counter.
PRINCIPLES OF PIPELINE OPERATION
Pipeline Stage
FIG. 1A shows, in functional block diagram form, two three-stage pipelines,
an Instruction Pipeline (IP) and an Execution Pipeline (EP), together with
the pipeline control unit (PCU) in the central processor. The Instruction
Pipeline includes an Instruction Fetch (IF) stage 2, an Instruction Decode
(ID) stage 3, and an Address Generation (AG) stage 4. The Execution
Pipeline (EP) includes a Control Formation (CF) stage 5, an Operand
Execute (OE) stage 6, and an Execute Store (ES) stage 7. The PCU 1 is
depicted in detailed block diagram form in FIGS. 3 and 3A and the IF, ID,
AG, CF, OE and ES stages are depicted in detailed block diagram form in
FIG. 4.
FIG. 2 shows an embodiment of the IP, EP and PCU of FIG. 1A in terms of
five hardware units: Instruction Pre-Processor (IPP) 8, Shared Program
Cache (SPC) 9, Execution-1 board (EX1) 10, Execution-2 board (EX2) 11, and
Mirco-Control Store (MCS) 12. The hardware units of FIG. 2 are
representative of groupings of the various elements of the IP and EP of
FIG. 4. The respective hardware units are shown in detailed form in FIGS.
7-10. In alternative embodiments, other groupings of the various elements
of the IP and EP may be used.
Briefly, in the illustrated grouping of FIG. 2, the Shared Program Cache 9
contains local storage and provides instructions by way of bus 13 to the
Instruction Pre-Processor 8, and provides memory operands by way of bus 14
to the Execution-1 board 10. The IPP 8 supplies memory operand addresses
by way of bus 15 to the SPC 9, register operands and immediate data by way
of bus 17 to EX1 10, and control decode addresses by way of bus 19 to the
Micro-Control Store 12. EXl 10 operates on memory operands received by way
of bus 14 from the SPC 9 and register file operands received by way of bus
16 from the Execution-2 board 11, and transfers partial results by way of
bus 18 to EX2 11 for post-processing and storage. EX2 11 also performs
multiplication operations. The MCS 12 provides microprogrammed algorithmic
control for the four blocks 8-11, while the PCU 1 provides pipeline stage
manipulation for all blocks 8-12.
The pipeline stage operations are completed within the various hardware
units 8-12 as follows:
IF (Instruction Fetch): A Look-ahead program counter on SPC 9 is loaded
into a local general address register; instruction(s) are accessed from a
high speed local memory (cache).
ID (Instruction Decode): Instruction data is transferred from SPC to IPP 8;
IPP 8 decodes instructions, forming micro-control store entry point
information for MCS 12, and accessing registers for address generation in
IPP 8.
AG (Address Generation): IPP 8 forms instruction operand address and
transfers value to SPC 9 address register.
CF (Control Formation): MCS 12 accesses local control store word and
distributes control information to all boards.
OE (Operand Execute): SPC 9 accesses memory data operands in cache; EX1 10
receives memory data operands from SPC 9, register operands from IPP 8,
and begins arithmetic operations.
ES (Execute Store): EX1 10 and EX2 11 complete arithmetic operation and
store results.
The Address Generation and Control Formation stages are overlapped in time
within the data system. The IP and EP operate synchronously under the
supervision of the pipeline control unit (PCU) 1, which interfaces to each
stage with two enable lines (ENCxxl and ENCxx2) that provide two distinct
clock phases within each stage, as indicated in FIG. 1A. The notation "xx"
refers to a respective one of the reference designations IF, ID, AG, CF,
OE and ES. The six ENCxx2 lines denote the respective stage operations are
complete and the data (or control) processed in those stages are ready for
passing to the next stage.
Clocking of Pipeline Stages
Timing and clocking in the dual pipelines (IP and EP) are synchronized by
two signals--the master clock MCLK and the enable-end-of-phase signal
ENEOP. ENEOP is produced by the Pipeline Control Unit 1 and notifies all
boards of the proper time to examine the stage clock enable signal lines
(ENCxx1 and ENCxx2) in order to produce phase 1 and phase 2 stage clocks
from the master clock MCLK. (See FIG. 6). Pipeline stages always consist
of two phases. Phase 1 lasts for exactly two MCLK pulses while phase 2 can
last for an arbitrary number of MCLK pulses, as described below, depending
on the conditions present in both the IP and the EP.
An example of how MCLK and ENEOP and the stage clock enables interact on
each board to form the clocks which define the stage boundaries is shown
in FIG. 6 for the Instruction Decode stage 2. Register 22 generates clock
signals when enabled by ENEOP. When ENCID1 is present the clock CID1 is
generated; when ENCID2 is present, the clock CID2 is generated.
PIPELINE CONTROL UNIT
The Pipeline Control Unit 1 shown in FIGS. 3 and 3A controls the flow of
instructions through the dual pipelines (IP and EP) by generating the
enable signals for all clocks which define stage boundaries and relative
overlap of the IP and EP. The PCU 1 includes stage clock enable decode
logic 23 and the Pipeline State Register (PSR) 24. PCU 1 receives as
inputs:
1. Instruction information and exception and register conditions from the
IPP 8
2. Exception and cache conditions from the SPC 9
3. Microcode specified timing conditions related to the length of stage OE
and the overlap of stage OE and CF from the MCS 12
4. Exception conditions from EX1 10 and EX2 11.
The PCU 1 has complete control of all stage boundaries. With that control:
1. The PCU 1 can hold the IP while cycling multi-microcode through the EP.
2. The PCU 1 can alter the flow of instructions based on control
information provided by microcode.
3. The PCU 1 can extend all stages if extra time is required for a
particular stage to finish its operation.
4. The PCU 1 can alter the relative overlap of stages OE and CF of the EP
in order to allow different types of microcode sequencing (as described
below in conjunction with EX1,2).
5. The PCU 1 can flush out instructions in the IP and recycle the IP to
load new instructions upon detecting incorrect flow (such as an incorrect
flow prediction provided by Branch Cache 34).
6. The PCU 1 can idle the EP with no-operation (NOP) cycles, while cycling
the IP, for example, when IRP 27,33 in the SPC 9 is reloaded after an
incorrect program flow sequence.
7. The PCU 1 can suspend all pipeline operations during non-overlappable
operations such as "cache miss" access to main memory.
8. The PCU 1 can introduce separation between sequential instructions in
the IP under certain conditions, such as "collisions" between
instructions.
9. The PCU 1 can keep an instruction held in the IF stage upon detecting an
instruction-related exception, and then allow the other instructions
currently in the pipeline to complete processing so that the exception can
be processed in the correct order.
The Pipeline Control Unit (PCU) 1 which controls the clocking of the stages
in the IP and EP is shown in detail in FIG. 3A. Condition signals received
from the IPP 8, SPC 9, MSC 12, EX1 10, and EX2 11 hardware units are
utilized to produce enable signals for clocks in the IF 2, ID 3, AG 4, CF
5, OE 6, and ES 7 stages of the dual pipelines (IP and EP). There are two
major elements in PCU 1 which produce the clock enable signals ENCxx1,2:
the pipeline state register (PSR) 24 (including state registers
180,182,184,186,188,190) and the stage clock enable decode logic 23
(including conbinatorial logic blocks (181,183,185,187,189,191). The state
registers 180,182,184,186,188,190 indicate that the respective pipeline
stages are ready to be enabled. if there are no conditions received by the
PCU 1 which should inhibit the stage from proceeding. When the stages are
in operation, the state registers 180,182,184,186,188,190 provide a timing
r | | |