WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Dynamic multiple instruction stream multiple data multiple pipeline apparatus for floating-point single instruction stream single data architectures    
United States Patent4916652   
Link to this pagehttp://www.wikipatents.com/4916652.html
Inventor(s)Schwarz; Eric M. (Endicott, NY); Vassiliadis; Stamatis (Vestal, NY)
AbstractA dynamic multiple instruction stream, multiple data, multiple pipeline (MIMD) apparatus simultaneously executes more than one instruction associated with a multiple number of instruction streams utilizing multiple data associated with the multiple number of instruction streams in a multiple number of pipeline processors. Since instructions associated with a multiple number of instruction streams are being executed simultaneously by a multiple number of pipeline processors, a tracking mechanism is needed for keeping track of the pipe in which each instruction is executing. As a result, a dynamic history table maintains a record of the pipeline processor number in which each incoming instruction is executing, and other characteristics of the instruction. When a particular instruction is received, it is decoded and its type is determined. Each pipeline processor handles a certain category of instructions; the particular instruction is transmitted to the pipeline processor having its corresponding category. However, before transmission, the pipeline processor is checked for completion of its oldest instruction by consulting the dynamic history table. If the table indicates that the oldest instruction in the pipeline processor should complete, execution of the oldest instruction in such processor completes, leaving room for insertion of the particular instruction therein for execution. When the particular instruction is transmitted to its associated pipeline processor, information including the pipe number is stored in the dynamic history table for future reference.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Schwarz; Eric M. (Endicott, NY); Vassiliadis; Stamatis (Vestal, NY)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     April 10, 1990
Application Number     07/102,985
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 30, 1987
US Classification     708/510 708/508 712/208 712/214 712/218
Int'l Classification     G06F 007/38
Examiner     Laroche; Eugene R.
Assistant Examiner     Ham; Seung
Attorney/Law Firm     Romney; David S. Bouchard; John H. ,
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 364/736 364/748 364/900 364/231.8
Patent Tags     dynamic multiple instruction stream multiple data multiple pipeline floating-point single instruction stream single data architectures
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4594655
Hao
712/218
Jun,1986

[0 after 0 votes]
4390946
Lane
712/239
Jun,1983

[0 after 0 votes]
3840861
Amdahl
713/501
Oct,1974

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A pipelined processing apparatus having a plurality of pipelined processors accomodating different types of instructions, comprising:

receiving means for receiving incoming instructions, which receiving means includes decode means for decoding each of the incoming instructions thereby identifying a category type for said each of the incoming instructions;

a plurality of pipelined processors each connected to said receiving means for temporarily holding at least some of the incoming instructions for execution, each pipelined processor being identified by a unique number;

switch means connected to said receiving means and coupled to said plurality of pipelined processors, said switch means changing to a standard single pipelining mode for a first category of instructions identified by said decode means and changing to a dynamic multiple pipelining mode for a second category of instructions identified by said decode means, with each instruction in said second category of instructions having the unique number for the pipelined processor which accomodates that type of instruction;

table means connected to the plurality of pipelined processors and to the receiving means for recording the numbers of one or more pipelined processor means in which said at least some of the incoming instructions are temporarily held for execution, and for controlling the completion of instruction processing by the plurality of pipelined processors so that such completion will be in a predetermined sequence; and

wherein said receiving means determines if a particular one of the pipelined processors is available for use, said receiving means inserting an incoming instruction in said particular one of the pipelined processors for execution when said particular one of the pipelined processors becomes available for use.

2. The apparatus of claim 1 wherein said first category of instructions which are processed in a standard single pipelining mode includes instructions having a length outside a predetermined range of bits or instructions which are more difficult to execute such as divides and square roots.

3. The apparatus of claim 1 which further includes hazard circuit means interconnected between said receiving means, said table means and said plurality of pipelining processors, for detecting data dependent hazards as well as collision hazards and for generating control signals to either allow or alternatively prevent execution of an instruction by one of the pipelined processors.

4. The apparatus of claim 1, wherein said particular one of the pipelined processors is available for use when the particular pipelined processor is not full of instructions of when, if the particular pipelined processor is full of instructions, the number of the particular pipelined processor matches one of the numbers recorded in the table means and, if the numbers match, the oldest one of said instructions in said particular pipelined processor has been executed.

5. The apparatus of claim 1, wherein said receiving means records the number of said particular one of the pipelined processors in said table means when said receiving means inserts said particular incoming instruction in said particular one of the pipelined processors for execution.

6. The apparatus of claim 1, wherein said receiving means comprises stacking means for stacking at least one of the incoming instructions.

7. The apparatus of claim 6, wherein said receiving means comprises decode means connected to the stacking means for decoding the oldest one of the stacked incoming instructions thereby producing a decoded output signal.

8. The apparatus of claim 7, wherein said receiving means comprises handshake means responsive to the decoded output signal of said decode means for developing a valid instruction signal and an acknowledge signal if said oldest one of the stacked incoming instructions is valid.

9. The apparatus of claim 8, wherein said receiving means comprises selection means responsive to said decoded output signal, said acknowledge signal, and said valid instruction signal for identifying and selecting a particular one of the plurality of pipelined processor means in accordance with said decoded output signal, said valid instruction signal, and said acknowledge signal and for passing certain characteristics of said oldest one of the stacked incoming instructions to the selected particular one of the plurality of pipelined processor means when the particular pipelined processor means is selected.

10. The apparatus of claim 6, wherein said stacking means comprises first stacking means for stacking at least two of the incoming instructions and second stacking means for also stacking said at least two of the incoming instructions, said second stacking means storing the numbers of the pipelined processor means in which said at least two of the incoming instructions are held for execution.

11. The apparatus of claim 10, further comprising: data receiving means for receiving incoming data corresponding to said incoming instructions, said data receiving means including third stacking means for stacking the incoming data.

12. The apparatus of claim 11, wherein the instructions stacked in said second stacking means correspond, respectively, to the data stacked in said third stacking means whereby the oldest one of the instructions stacked in said second stacking means is executed in conjunction with corresponding data stacked in said third stacking means.

13. The apparatus of claim 12, wherein said oldest one of the instructions stacked in said second stacking means is also being held in one of said plurality of pipelined processor means, said corresponding data being transmitted to said one of said plurality of pipelined processor means for association with said oldest one of the instructions in accordance with the number of said one of said plurality of pipelined processor means stored in said second stacking means.

14. In an apparatus including an instruction receiving means for receiving instructions, a plurality of pipelined processor means, each pipelined processor means including internal pipe controls, a table means connected to the instruction receiving means and the plurality of pipelined processor means, and data receiving means for receiving data corresponding to the received instructions, a method of inserting a received instruction received via the instruction receiving means into one of the pipelined processor means and for associating a corresponding received data received via said data receiving means with said received instruction in said one of the pipelined processor means, comprising the steps of:

identifying said received instruction thereby determining the identity of said one of the pipelined processor means from among said plurality of pipelined processor means;

consulting said table means and the internal pipe controls associated with said one of the pipelined processor means to determine if said one of the pipelined processor means is ready for receipt of said received instruction and said corresponding received data;

stacking said instructions received via said instruction receiving means, including said received instruction, in an instruction stacking means, each instruction in each stack of the instruction stacking means including a number identifying one of the pipelined processors;

separately stacking said data received via said data receiving means, including said corresponding received data, in a data stacking means, there being a one-to-one correspondence between the stacked instructions and the stacked data; and

transmitting said received instruction and said corresponding received data from said instruction receiving means and said data stacking means to said one of the pipelined processor means identified in accordance with the number associated with said received instruction stored in said instruction stacking means when said one of the pipelined processor means is ready for receipt of said received instruction and said corresponding received data.

15. The method of claim 14, further comprising the step of storing in said table means the numbers of said plurality of pipelined processor means which previously received an instruction and its corresponding data, and wherein said consulting step includes searching said table means to determine if the number of said one of the pipelined processor means is recorded in said table means.

16. The method of claim 15, wherein said searching step includes determining if said number is located in a certain position in said table means indicating the oldest entry in said table means.

17. The method of claim 16, wherein after the searching step, if the number of said one of said plurality of pipelined processor means associated with said received instruction is recorded in said certain position in said table means, checking said one of said plurality of pipelined circuit means to determine if the oldest one of said previously received instructions has been fully executed to completion, and if so, allowing it to exit said one of said plurality of pipelined processor means, whereby if the oldest previously received instruction is processed to completion and is allowed to exit, the transmitting step to said one pipelined processor means can be performed for another of said received instructions and its corresponding data.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The subject matter of this invention relates to computing systems, and more particularly, to a multiple instruction stream, multiple data pipeline for use in a functional unit of such computing system, such as a floating point unit, which is designed to operate in conjunction with a single instruction stream, single data architecture.

Most computer processors utilize some form of pipelining. In a pipelined computer processor, more than one instruction of an instruction stream is being executed at the same time. However, each of the instructions being executed are disposed within different stages of the pipe. The performances of a pipelined processor is necessarily better than the performance of a non-pipelined processor. There are different types of pipelining. One type is termed "single instruction stream single data (SISD)" pipelining. In the SISD type of pipelining, individual instructions are pipelined with at most a single data operation. However, using the SISD pipelining approach, many "hazards" were encountered. Hazards are encountered upon entering the pipeline at a maximum possible new data rate. The "hazards" can be divided in two categories, namely, structural hazards and data dependent hazards. A structural hazard occurs when two pieces of data attempt to use the same hardware and thus collisions occur. Data dependent hazards may occur when the events transpiring in one stage of a pipeline determines whether or not data may pass through another stage of the pipeline. For example, in a pipeline having two stages, each stage requiring use of a single memory, when one stage is using the memory, the other stage must remain idle until the first stage is no longer using the memory. Another type of pipeline approach is termed "multiple instruction stream, multiple data (MIMD)" pipelining. When the MIMD type of pipelining is being used, rather than pipe individual instructions, as in the SISD pipeline approach, instruction "streams" are piped. The MIMD pipeline approach did not encounter the hazards problem. However, although instruction streams are being piped in the MIMD approach, a first instruction stream must complete execution before a second instruction stream could commence execution. Thus, although the performance of the MIMD pipeline was better than the performance of the SISD pipeline, the performance of the MIMD pipeline was limited, by the "one instruction stream at a time" execution philosophy.

SUMMARY OF THE INVENTION

Accordingly, it is a primary object of the present invention to introduce a novel type of pipeline for computer functional units, hereinafter termed a "dynamic MIMD pipeline".

It is another object of the present invention to introduce the dynamic MIMD pipeline which is not limited by the "one instruction stream at a time" execution philosophy.

It is another object of the present invention to introduce the dynamic MIMD pipeline capable of simultaneously executing a multiple number of instruction streams in a multiple number of pipelines thereby increasing substantially the performance of the functional unit embodying the dynamic MIMD pipeline.

In accordance with these and other objects of the present invention, a plurality of pipes are capable of piping, for execution thereof, a further plurality of instructions. Each pipe is capable of simultaneously storing, for execution, a plurality of instructions. Thus, the plurality of pipes are capable of simultaneously storing, for execution, the further plurality of instructions. The further plurality of instructions are chosen from a plurality of instruction streams which are executing simultaneously in the plurality of pipes. Since the instructions in a particular pipe may be in various stages of completion of execution, in order to keep an accurate record of the execution disposition of each instruction in the pipe, a dynamic history table stores information associated with each instruction disposed in each of the plurality of pipes, the information for each instruction including the pipe number in which the instruction is temporarily stored, and the status of completion of execution of the particular instruction. A handshakes and global hazards circuit determines the busy status of the functional unit, in which the dynamic MIMD pipe is embodied, and responds to other functional units in the computer system, such as the central processing unit (CPU). It also determines if any hazards exist. If the functional unit is not busy and no hazards exist, the next instruction from one of the plurality of instruction streams enters the next available pipe. An MIMD/SISD switch circuit determines if an incoming instruction is greater than "X" bits long (e.g.--64), and if so, the switch switches the dynamic MIMD pipeline of the present invention to the standard SISD mode and executes the incoming instruction in the "one instruction stream at a time" execution philosophy mode. SISD is also invoked for "difficult" instructions which are considered to be divides and square roots.

Further scope of applicability of the present invention will become apparent from the detailed description presented hereinafter. It should be understood, however, that the detailed description and the specific examples, while representing a preferred embodiment of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become obvious to one skilled in the art from a reading of the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

A full understand of the present invention will be obtained from the detailed description of the preferred embodiment presented hereinbelow, and the accompanying drawings, which are given by way of illustration only and are not intended to be limitative of the present invention, and wherein:

FIG. 1 illustrates a block diagram of a prior art standard MIMD architecture pipeline;

FIG. 2 illustrates the dynamic MIMD/SISD pipeline floating point unit 20 of the present invention;

FIG. 3 illustrates the instruction stack 20.1 of FIG. 2 and includes FIGS. 3a-3c with suitable legends in tabular format;

FIG. 4 illustrates with appropriate legends the dynamic history table 20.7 of FIG. 2;

FIG. 5 illustrates the pipe1 (ADD) internal control registers 20.6a of FIG. 2 and includes with appropriate legends FIGS. 5a-5b.

FIG. 6 illustrates the pipe2 (MULT) internal control registers 20.6b of FIG. 2 and includes with appropriate legends FIGS. 6a-6b;

FIG. 7 illustrates the pipe3 (LOAD RX) 20.6c of FIG. 2;

FIG. 8 illustrates the pipe4 (MISCELLANEOUS) 20.6d of FIG. 2;

FIG. 9 illustrates the dbus stack 20.9 and the dbus stack controls 20.10 of FIG. 2 and includes with appropriate legends FIGS. 9a-9b;

FIG. 10 illustrates a construction of the handshakes and global hazards circuit 20.3 of FIG. 2;

FIG. 11 illustrates a construction of the initialization circuit 20.5 of FIG. 2; and

FIG. 12 illustrates an example instruction stream.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

By way of background information, the dynamic MIMD pipeline of the present invention is incorporated into a functional unit of a computer system. Such a functional unit may be a floating point unit (FPU). In addition to the FPU, the computer system also includes a cache, a central processing unit (CPU), and a vector processor (VP). The Floating Point Unit (FPU) receives data directly from the cache, the Central Processing Unit (CPU), or the Vector Processor (VP); and receives instructions from the CPU. The CPU does not control the data coming from the cache. The CPU requests date (from the cache) while sending instructions to the FPU. While the data is being accessed by the CPU from the cache, the CPU continues to send instructions to the FPU without regard to synchronization of the cycle by which data is being accessed from the cache with the cycle by which the corresponding instructions are being sent to the FPU. Therefore, the data arriving at the FPU at cycle N may be data pertaining to an instruction delivered to the Floating point unit in cycle M, where M.ltoreq.N. The CPU requests that certain operations be performed by the FPU and other units, such as the cache, via a bus called the CBUS. The CBUS is the only means by which instructions are communicated between the CPU and the FPU. The CBUS conducts handshake control signals and instruction opcodes. When the CPU transmits these requests, the functional units to which these requests are sent are called Processor Bus Units (PBU). The FPU comprises one of the PBUs. When the CPU encounters an instruction which it cannot execute, and a PBU should execute the instruction, the CPU transmits a Processor Bus Operation (PBO) signal to the appropriate PBU. For instance, if the CPU decoded an instruction to be a multiple floating point long, since it is much easier for the FPU to perform this operation, the CPU transmits the PBO signal to the FPU requiring the FPU to perform the multiply floating point long instruction.

The FPU comprises two main parts: a first section in which data actually flows, and a second section into which instructions are introduced and subsequently converted into control signals. This specification describes the second section.

Referring to FIG. 1, a prior art MIMD pipeline system is illustrated.

In FIG. 1, a storage 10 stores a plurality of instruction streams, and in particular, the state of each such stream. An initialization control 12 is connected to the output of the storage 10, and a pipeline circuit 14 is connected to the output of the initialization control 12, the pipeline circuit 14 having no hazards detection circuit. The output of the pipeline 14 is connected to the storage 10.

In operation, an instruction stream, stored in storage 10 of FIG. 1, is transmitted to the initialization control 12. The initialization control 12, in response thereto, transmits each instruction of the instruction stream, one at a time, to the pipeline circuit 14. The instructions are piped within pipeline circuit 14 and executed, one at a time. In response, updated instructions are transmitted from the pipeline circuit 14 for storage in storage 10. When the last instruction of the instruction stream is transmitted to the pipeline circuit 14 from the initialization control 12, piped and executed therein, the last updated instruction of the original instruction stream is transmitted to storage 10. At this point, another instruction stream is transmitted from the storage 10 to the initialization control 12 for execution thereof in the pipeline circuit 14. It is evident that, in the configuration of FIG. 1, the original instruction stream must complete piping and execution within the pipeline circuit 14 before the next instruction stream may be transmitted from storage 10 to the initialization control 12 for piping and execution within the pipeline circuit 14. This is the limitation and disadvantage associated with the standard MIMD pipeline approach.

Referring to the FIG. 2, a dynamic MIMD pipeline 20, according to the present invention, is illustrated.

In FIG. 2, the CBUS is connected to an instruction stack 20.1. An output of the instruction stack 20.1 is connected to a decode circuit 20.2. The decode circuit 20.2 output is connected to a handshakes and global hazards circuit 20.3, a mimd/sisd switch 20.4, and an initialization circuit 20.5. The outputs of the handshakes and global hazards circuit 20.3 and the mimd/sisd switch 20.4 are connected to the inputs of the initialization circuit 20.5. The initialization circuit 20.5 output is connected to a dynamic history table 20.7, to pipeline circuits 20.6, and to floating point registers (FPR) 20.8. The handshakes and global hazards circuit 20.3 output is also connected to an exception handler circuit 20.11, the output of which is further connected to the dynamic history table 20.7. Pipeline circuits 20.6 are also connected to the dynamic history table 20.7 and to the exception handler circuit 20.11, and produce an output which is conducted on a bus called the DBUS which is connected to the data cache, and conducted to the FPR 20.8 which is a local architecturally defined storage. The output of the dynamic history table 20.7 is used to control the gating of the output to the DBUS and to the FPR 20.8. The CBUS, in addition to being input to the instruction stack 20.1, is also input to a DBUS stack controls 20.10 circuit. The output of the DBUS stack controls 20.10 circuit is connected to a DBUS stack 20.9, the input of which is connected to the DBUS. The output of the DBUS stack 20.9 and the output of the FPRs 20.8 generates the data which begins the data flow.

The dynamic MIMD pipeline of the present invention, illustrated in FIG. 2, may be subdivided into two paths: one for instructions and controls (CBUS path), and the other for data flow (DBUS path). The instructions are received via the CBUS and put in the instruction stack 20.1, and then decoded via decoder 20.2. The data is introduced into the dynamic MIMD pipeline of FIG. 2 via the DBUS.

The handshakes and global hazards circuit 20.3 of FIG. 2 transmits "handshake" signals to the CPU and detects global hazards. A further construction of the handshakes and global hazards circuit 20.3 may be found in FIG. 10 of the drawings. A more detailed description of the handshakes and global hazards circuit 20.3 of FIG. 10 will be set forth below in one of the following paragraphs of this specification. The CBUS contains a set of handshake signals to be transmitted between the CPU to each PUB, including the FPU. When the FPU receives a request via the CBUS, the handshakes and global hazards circuit 20.3 of the FPU is required to send an acknowledge signal, a busy signal, or an interrupt signal back to the CPU if the CBUS request was sent from the CPU and the FPU is the only PBU involved in the request. The Acknowledge handshake signal is sent from the FPU to the CPU if a CBUS request is sent to the FPU and the FPU is not BUSY. The interrupt signal is sent from the FPU to the CPU if a data exception is encountered and critical information is stored in the status word. The busy handshake signal is sent from the FPU to the CPU if the FPU cannot accept another instruction for execution. The handshake signals, acknowledge, busy, interrupt, are sent to the CPU from the handshakes and global hazards circuit 20.3 of the FPU. Global hazards are detected in the handshakes and global hazards circuit 20.3 of the FPU and a signal is transmitted therefrom, for transmission to the initialization circuit 20.5, representative of the existence of such hazards. The handshake logic 20.3 (in connection with the initialization circuit 20.5) delivers the appropriate responses of the FPU to the other processor bus units (PBU). It also helps to detect the beginning and the end of an instruction stream. The global hazards circuit 20.3 detects the existence of hazards due to data dependencies of instructions on other executing instructions (data interlock).

Depending upon the incoming instruction being decoded by decoder 20.2, the MIMD/SISD switch 20.4 switches to either SISD mode or MIMD mode. If an incoming instruction involves operands more than 64 bits in length or if an instruction is determined to be difficult to execute, the MIMD/SISD switch 20.4 selects the SISD mode, otherwise it uses the MIMD mode.

The specific instructions which are considered to be "difficult" and which invoke SISD mode are:

The Divides, both floating and fixed point

The Square Roots

Operations which involve extended operands

During execution in the SISD mode, everything is shutdown except for the execution of the difficult instruction; this is accomplished by holding the BUSY signal, generated from the handshakes circuit 20.3 of the FPU to the CPU, active. This stops the CPU from sending any more requests to the FPU via the CUBS. The following instructions or any of their combinations will cause MIMD/SISD switch 20.4 to switch the pipeline mode to the MIMD mode:

FLOATING POINT OPERATIONS

ADDs

COMPAREs

HALVE

LOADs

MULTIPLYs

STOREs

SUBTRACTs

FIXED POINT OPERATIONS--microcode

MULTIPLY

OTHER OPERATIONS--microcode

LOADs

STOREs

STATUS WORD

INDIRECT MODE

RETRY

The following instructions will cause MIMD/SISD switch 20.4 to switch the pipeline mode to the SISD mode.

FLOATING POINT OPERATIONS--microcode

ADD extended

MULTIPLY extended

DIVIDEs

DIVIDE extended

SQUARE ROOT

LOAD rounded extended

FIXED POINT OPERATIONS--microcode

DIVIDE

The initialization circuit 20.5 of FIG. 2 starts the pipe, and updates the Dynamic History Table 20.7. A further construction of the initialization circuit 20.5 may be found in FIG. 11 of the drawings. A more detailed description of the initialization circuit 20.5 of FIG. 11 will be set forth below in one of the following paragraphs of this specification. In connection with the handshake/hazard logic 20.3, the initialization circuit 20.5 determines the beginning and end of an instruction stream and determines if any data dependent hazards exist. After the decode step, the type of instruction, as indicated by the output from the decoder 20.2, is compared, in the initialization circuit 20.5 and the global hazard circuit 20.3, with the completion status of the first cycle status of the appropriate pipe to use, as indicated by the internal pipe controls 20.6a-d. If there is no global hazards, as indicated by the dynamic history table 20.7, and no immediate internal hazards exist, as indicated by the output of the handshakes and global hazards circuit 20.3, the instruction is initialized. If the BUSY handshake signal is developed by the handshakes and global hazards circuit 20.3, no initialization takes place in the initialization circuit 20.5 Initialization involves starting the status controls of the appropriate pipe and also entering a new line in the Dynamic History Table. Notification of initialization is handled by the handshake controls 20.3 which sends the acknowledge signal to the CPU indicating that the instruction has been started or by sending a busy signal to the CPU, indicating that the FPU has the instruction but the pipe of incoming instructions had better be stopped because the FPU cannot handle very many more instructions. The initialization logic 20.5 and the global hazards logic 20.3 determine the beginning and end of a stream of instructions. The response "acknowledge" and "not busy" to an instruction not already in a stream indicates the beginning of a stream, and "busy" indicates the end of a stream. The global hazards circuit 20.3 is used to determine hazards due to data "dependencies". The initialization logic 20.5 adds new lines to the dynamic history table 20.7. Therefore, initialization consists of handshaking, updating the history file, and possibly dealing with data hazards.

The dynamic MIMD pipeline of FIG. 2 includes four pipeline circuits 20.6: pipe1 20.6a, pipe2 20.6b, pipe3 20.6c, and pipe4 20.6d. Thus, there are four categories of instructions, one category for each pipe 20.6a through 20.6d.

Data on the DBUS is processed by either the FPRs 20.8, or the DBUS stack 20.9 which is controlled by the DBUS stack controls 20.10.

The exception handler 20.11 determines if there is an exception. The types of data exceptions that can occur while executing instructions are:

Exponent Overflow Exception

Exponent Underflow Exception

Floating Point Divide Exception

Fixed Point Divide Exception

Significance Exception

Square Root Exception

When an instruction is detected that causes one of these exceptions, all instructions received after this must be cancelled as if they were never received even though they may be already executing; this is a property of a SISD architecture which must be preserved by the dynamic MIMD architecture. This is done by changing all the valid bits in the Dynamic History Table to zero after the instruction causing the exception has completed. In addition, the CPU and the other units are notified of the interrupt and they must cancel their instructions until the CPU begins an interrupt handler routine.

The dynamic MIMD pipeline 20, disposed in the FPU of the computer system, receives instructions via the CBUS and the FPU responds back, as do the other processor bus units, by transmitting certain "handshake" signals including an ACKnowledge handshake signal, a BUSY handshake signal, and an INTerrupt handshake signal. Since the CPU works in a pipeline mode and send PBO commands out every cycle, regardless of whether the last PBO was ACKnowledged, the PBUs must determine whether the last PBO was acknowledged before processing to execute the next PBO. The POBs include a "smart" interface. Therefore, using the smart interface, a PBU must check on the handshakes of other PBUs with the CPU. A PBU is required to send one of the three handshake signals (from the handshakes circuit 20.3 of FIG. 2) to the CPU in the cycle after a PBO was received by the PBU. If hazards are encountered by a PBU, such as the FPU, a BUSY handshake signal is sent to the CPU by a PBU. When the BUSY signal is sent to the CPU, the PBU holds the received instruction and the following instruction in an instruction stack (such as instruction stack 20.1 of FIG. 2 for FPU) so that the sequence of instructions received from the CPU can be maintained. Thus, implemented on the FPU, as part of the instruction stack 20.1, is a CBUS Register 20.1.2 and a CBUS STACK 20.1.1 which hold the received instruction and the following instruction, respectively. Instructions are not stacked unless hazards, which cause generation of the BUSY handshake signal, are encountered. The FPU accepts as many instructions as it can handle; however, the FPU does not contain as much information as is contained by the CPU, since the CPU can halt an instruction before the instruction is even sent to the bus units if it sees, in its buffer of instructions, that problems may be encountered. When PBOs are sent from the CPU that require execution by the FPU and another bus unit, such as the data cache, the FPU has no power to prevent the data cache from starting the execution of the instruction. Thus, the most efficient method for the FPU to pipe is to go as far as possible until a hazard is encountered.

Referring to FIGS. 3a-3c, a construction of the instruction stack 20.1 of FIG. 2 is illustrated. FIG. 3a illustrates the construction of the instruction stack 20.1, FIG. 3b illustrates the bits on the CBUS during a hardwired mode, and FIG. 3c illustrates the bits on the CBUS during a microcode mode.

In FIG. 3a, the instruction stack 20.1 comprises the CBUS Stack register 20.1.1, and a CBUS register 20.1.2 connected to the output of the CBUS stack register 20.1.1. The instruction stack 20.1 as well as the CBUS consist of 25 bits of information for, at most, 2 instructions. This 25 bits of information comprise:

bit 0--the PBO bit which indicates whether the FPU is in a hardwired mode or a microcode mode; if in hardwired mode (0), exceptions are reported to the CPU; if in microcode mode (1), exceptions are stored in the status word (see FIG. 8, element number 20.6d.3) but are not reported;

bit 1--the FPU request bit which signals the FPU that this instruction must be executed by the FPU;

bit 2--the IPU/Cache request bit which signals the cache to decode the instruction;

bit 3--the VP request bit;

bits 17 to 19--in microcode mode, these bits are the SRC, source, identifier bits;

bits 20 and 22--in microcode mode, these bits are the DST, destination, identifier bits;

bits 4 to 10--the instruction opcode bits;

bits 17 to 19--in hardwired mode, these bits are the interrupt tag field which is stored in the status word on an exception; and

bit 24--the parity bit used for checking the validity of the instruction.

Thus, the instruction on the CBUS is introduced into the instruction stack 20.1 via the CBUS bits defined above.

Referring to FIG. 4, a construction of the dynamic history table 20.7 of FIG. 2 is illustrated.

In FIG. 4, the dynamic history table 20.7 comprises 17 bits of information stored away for at most 8 instructions at a time. The Dynamic History Table 20.7 consists of data that is needed when it is necessary for an incoming instruction to enter one of pipes 20.6a-20.6d and to complete from these pipes. Since the instructions are stacked, the table 20.7 provides a means of sequencing the completion of execution of instructions of one or more instruction streams. The limitation of the CBUS, to send one instruction at a time, determines the instruction's starting time. Since the execution of the instructions of the one or more instruction streams may take multiple cycles to complete, and since there exists more than one pipe, it is possible that multiple instructions will be executing at the same time. Due to architectural constraints, the instruction completion sequence must be maintained because of possible unpredictable results if an interrupt occurred and the instructions were not sequential. Therefore, there is a need to maintain and store sequencing information and completion information in the table 20.7. The dynamic history table 20.7 stores the following information:

1. the pipe number (PIPE NO),

2. the write address (WR ADDR),

3. whether the instruction is a write type (WT) of instruction,

4. a tag (INT TAG) which uniquely identifies it with an instruction in a stack on the CPU,

5. whether the instruction is an SISD instruction type (M/S), where "M" implies an MIMD instruction type and "S" implies an SISD instruction type,

6. the result length (LEN),

7. whether it is a hardwired or microcoded PBO (H),

8. some retry information (PSW PTR), and

9. a valid bit (V).

The pipe number (PIPE NO) is critical because it sequences the multiple pipes of pipeline circuits 6a-6d of FIG. 2 (four in all). Sequencing is of very little concern in a one pipe system, but, with multiple pipes, tracking information must be maintained. The write address (WR ADDR), write type (WT), and result length (LEN) help in completing the instruction. The tag information (INT TAG) is stored away if an exception occurs and helps in identifying the exact instruction which caused the exception. If it is a SISD instruction, completion is sensed in a different way than looking to see if valid data is disposed at the end of a pipe, but instead, is determined by a counter which counts the cycles. The most important bit is the valid bit (V) which indicates whether the instruction in this entry of the stack is valid. The valid bit (V) is cleared when an exception occurs. The valid bit (V) entry is cleared and the stack is shifted upon completion of an instruction. Thus, a quick method is available to cancel all pending instructions in the FPU, that is, by clearing the valid bit (V) in the dynamic history table 20.7.

Referring to FIGS. 5 and 8, a construction of the pipeline 20.6 of FIG. 2 is illustrated. In particular, FIG. 5 illustrates the construction of pipe1 20.6a which is used for add type instructions, such as add, subtract, divide, compares, and square roots. The FIG. 5 pipe functions in three cycles. FIG. 6 illustrates the construction of pipe2 20.6b, the multiply pipe, which functions in 5 cycles, and is used for multiply instructions. FIG. 7 illustrates the construction of pipe3 20.6c, which is used for load RX type instructions and functions in two cycles. FIG. 8 illustrates the construction of pipe4 20.6d, which is used for all other miscellaneous functions, and which are usually either a write or a read of some auxiliary or status registers.

In FIG. 2, pipeline 20.6a includes a controls section and a pipe1 section. Similarly, pipes 20.6b through 20.6d each include a controls section. The pipeline controls section of pipe 20.6a-20.6d controls the internal parts of each pipe by pushing the operations as far as possible through the pipes and sensing when the FPRs 20.8 are interlocked and by determining where good data can be found. To better understand these controls, it is best to first understand the layout of each pipe. In FIG. 2 and in FIGS. 5-8, pipeline 20.6 comprises four pipes: 20.6a through 20.6d. In MIMD mode, these pipes have different lengths thus creating complexities in controlling these pipes globally. Internal to the pipes, there are registers and, associated with these registers, are status fields.

Referring to FIG. 5b, as indicated more thoroughly below, the status fields for the registers in the add pipe of FIG. 5 consist of the operands FPR address (ADDR), whether SISD or MIMD mode is invoked (M/S), some bypass information (2BY), whether this stage in the pipe is for a valid instruction (VI), whether the data in the register is valid (VD), and whether the instruction is RX or RR type (RX).

Referring to FIG. 6b, the status fields associated with the registers of the multiply pipe of FIG. 6 store this information and further information including the length of the operands (LI), whether it's a floating point or fixed point operand (FLP), and whether the data is still valid (VR), even if this stage of the pipe does not have a valid instruction. The other pipes do not need status information because they are very short.

Referring again to FIG. 2, the status information for each stage of the pipe flags to the following stage its validity and then, in the following cycle, the next stage becomes valid if there are no contentions. Thus, the flags help in determining contention and push the instruction's data as far as it can through the pipe. After the pipe in question locates the data on the DBUS and execution takes place in the pipe, the pipe in question waits for the Dynamic History Tables oldest entry to match with the pipe number (PIPE NO) of the pipe in question; at this point, the pipe is allowed to complete thus maintaining instruction completion synchronization.

Referring to FIG. 5, pipe1 (add) internal control registers 20.6a are illustrated.

In FIG. 5a, pipe 1 20.6a includes an alignment register 20.6a.4, an FA register 20.6a.1, an FB register 20.6a.2, an A register 20.6a.5, a B register 20.6a.6, an adder 20.6a.7, an FS register 20.6a.3, and S register 20.6a.8, and a post normalizer register 20.6a.9. In FIG. 5b, status fields associated with the FA register 20.6a.1, FB register 20.6a.2, and FS register 20.6a.3 are illustrated.

FIG. 5 illustrates the add pipe 20.6a and its associated internal pipe control registers. The add pipe of FIG. 5 consists of three cycles. During the first cycle, the data is retrieved from either the FPRs 20.8 and/or from the DBUS. Alignment is accomplished by the alignment hardware 20.6a.4. Operands are latched into A register 20.6a.5 and B register 20.6a.6. In cycle two, the actual add is performed by the adder 20.6a.7 and the result is stored in S register 20.6a.8. In the third and final cycle, the post normalizer 20.6a.9 shifts out leading zeros if required and the data is sent back to the FPRs 20.8. The previously described function reflects the manner in which the pipe handles add instructions. For other instructions, belonging to the same category, at least some of the registers or some of the internal bypassing controls in the pipe are used. Thus, the three cycle add pipe is used for many different instructions. To maintain and control this pipe, three major control registers are needed: FA register 20.6a.1, FB register 20.6a.2, and FS register 20.6a.3.

Referring to FIG. 5b, the status fields for the FA register 20.6a.1, FB register 20.6a.2, and FS register 20.6a.3 are illustrated.

In FIG. 5b, such status fields include the following bits:

1. FPR address bits (ADDR) of the operand which are used to locate operands that may be interlocked

2. A valid instruction bit (VI) which is used to indicate that this stage in the pipe is valid for an instruction

3. A valid data bit (VD) which indicates that the associated data register is valid

4. A MIMD/SISD pipe indicator (M/S) which signals the instruction end. For MIMD mode, its the last stage valid and no contention on completion. For SISD mode, its a little more complex because the instruction may loop several times through the pipe.

5. A bit to indicate that the instruction is an RX type of instruction (RX) which, on the FB register, indicates that its address bits are really invalid and the data bus should be watched for incoming data if not already valid.

6. A bit which indicates the first cycle of a two cycle bypass (2BY). Sometimes it takes two cycles to retrieve interlocked data once it has been located.

Referring to FIG. 6a, pipe2 20.6b, the multiply pipe, and the control registers, internal to the pipe, is illustrated.

In FIG. 6a, pipe2 20.6b, the multiply pipe, includes FXA register 20.6b.1, FYS register 20.6b.2, FXB register 20.6b.3, FY register 20.6b.4, FP register 20.6b.5, XA register 20.6b.6, 3X hardware 20.6b.7, XB and 3X registers 20.6b.8, Y register 20.6b.9, M1 hardware 20.6b.10, M2 hardware 20.6b.11, and P register 20.6b.12. The multiply pipe consists of 5 cycles if no hazards are encountered:

Cycle 1--Operand 1 is loaded into XA register 20.6b.6 from the FPRs 20.8 and, if operand 2 is also from the FPRs, it is read and stored in a temporary register because the bus structure limits the loading to one operand at a time.

Cycle 2--Operand 2 is loaded from either the temporary register or from the dbus to Y register 20.6b.9; concurrently, a 3 times multiple of operand 1 is calculated by the 3X hardware 20.6b.7 and stored in 3X register 20.6b.8. The XA register 20.6b.6 directly loads XB register 20.6b.8.

Cycle 3 and Cycle 4--These are the two cycles of actual execution of the multiplier. These cycles, termed the M1 and M2 cycle, use the M1 hardware 20.6b.10 and the M2 hardware 20.6b.11. No registers separate the two cycles of execution. Thus, XB and 3X registers 20.6b.8 and Y register 20.6b.9 must be held for these two cycles until the data is latched in P register 20.6b.12.

Cycle 5--This cycle involves a write from P register 20.6b.12 to the FPRs 20.8. If the result is extended, there is a cycle 6, which is a second