WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency    
United States Patent5226130   
Link to this pagehttp://www.wikipatents.com/5226130.html
Inventor(s)Favor; John G. (San Jose, CA); Van Dyke; Korbin (Fremont, CA); Stiles; David R. (Sunnyvale, CA)
AbstractThe present invention provides for the updating of both the instructions in a branch prediction cache and instructions recently provided to an instruction pipeline from the cache when an instruction being executed attempts to change such instructions ("Store-Into-Instruction-Stream"). The branch prediction cache (BPC) includes a tag identifying the address of instructions causing a branch, a record of the target address which was branched to on the last occurrence of each branch instruction, and a copy of the first several instructions beginning at this target address. A separate instruction cache is provided for normal execution of instructions, and all of the instructions written into the branch prediction cache from the system bus must also be stored in the instruction cache. The instruction cache monitors the system bus for attempts to write to the address of an instruction contained in the instruction cache. Upon such a detection, that entry in the instruction cache is invalidated, and the corresponding entry in the branch prediction cache is invalidated. A subsequent attempt to use an instruction in the branch prediction cache which has been invalidated will detect that it is not valid, and will instead go to main memory to fetch the instruction, where it has been modified.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5226130
Method and apparatus for store-into-instruction-stream detection and

     maintaining branch prediction cache consistency - US Patent 5226130 Drawing
Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
Inventor     Favor; John G. (San Jose, CA); Van Dyke; Korbin (Fremont, CA); Stiles; David R. (Sunnyvale, CA)
Owner/Assignee     NexGen Microsystems (Milpitas, CA)
Patent assignment
All assignments
Publication Date     July 6, 1993
Application Number     07/485,312
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     February 26, 1990
US Classification     712/238 711/125 711/144 712/207
Int'l Classification     G06F 009/38 G06F 009/42
Examiner     Harrell; Robert B.
Assistant Examiner     Mohamed; Ayni
Attorney/Law Firm     Townsend and Townsend Khourie and Crew
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/425 395/375
Patent Tags     store-into-instruction-stream detection and maintaining branch prediction cache consistency
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4991078
Wilhelm
712/218
Feb,1991

[0 after 0 votes]
4943908
Emma
712/240
Jul,1990

[0 after 0 votes]
4942520
Langendorf
711/140
Jul,1990

[0 after 0 votes]
4933837
Freidin
711/125
Jun,1990

[0 after 0 votes]
4926323
Baror
712/238
May,1990

[0 after 0 votes]
4894772
Langendorf
712/240
Jan,1990

[0 after 0 votes]
4882673
Witt
714/42
Nov,1989

[0 after 0 votes]
4860199
Langendorf
711/213
Aug,1989

[0 after 0 votes]
4853840
Shibuya
712/237
Aug,1989

[0 after 0 votes]
4847753
Matsuo
712/238
Jul,1989

[0 after 0 votes]
4827402
Wada
712/234
May,1989

[0 after 0 votes]
4802113
Onishi
712/233
Jan,1989

[0 after 0 votes]
4796175
Matsuo
712/207
Jan,1989

[0 after 0 votes]
4777594
Jones
712/240
Oct,1988

[0 after 0 votes]
4777587
Case
712/235
Oct,1988

[0 after 0 votes]
4775955
Liu
711/145
Oct,1988

[0 after 0 votes]
4722050
Lee

Jan,1988

[0 after 0 votes]
4679141
Pomerene
712/240
Jul,1987

[0 after 0 votes]
4594659
Guenthner
712/207
Jun,1986

[0 after 0 votes]
4442488
Hall
711/125
Apr,1984

[0 after 0 votes]
4295193
Pomerene
712/215
Oct,1981

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. An apparatus comprising:

a system bus;

an instruction cache, coupled to said system bus, including validity bits for indicating whether an entry is valid and means for setting a validity bit for an entry to an invalid state upon detection of a store signal on said system bus for an address matching said entry;

a branch prediction cache including, for each entry, a branch entry address, a branch target address, branch target instruction data and a validity bit;

means for comparing an address of an instruction cache entry to said branch target addresses;

control means for writing instructions into said branch prediction cache only if said instructions are also written into said instruction cache and for removing instructions from said branch prediction cache whenever said instructions are removed from said instruction cache; and

means for setting a validity bit for a branch prediction cache entry to invalid when the branch target address matches the address of an instruction cache entry with a validity bit in an invalid state;

such that a store-into-instruction-stream will be detected by said instruction cache and will result in the invalidation of any corresponding entries in both said instruction cache and said branch prediction queue;

wherein said instruction cache is direct-mapped and said branch prediction cache is fully associative; and

wherein index bits of an instruction address are used to map into said instruction cache, and bits more significant than said index bits are not compared to said branch target addresses.

2. An apparatus comprising:

a system bus;

an instruction cache, coupled to said system bus, including validity bits for indicating whether an entry is valid and means for setting a validity bit for an entry to an invalid state upon detection of a store signal on said system bus for an address matching said entry;

a branch prediction cache including, for each entry, a branch instruction address, a branch target address, branch target instruction data and validity bit;

means for comparing an address of an instruction cache entry to said branch target addresses;

control means for writing instructions into said branch prediction cache only if said instructions are also written into said instruction cache and for removing instructions from said branch prediction cache whenever said instructions are removed from said instruction cache; and

means for setting a validity bit for a branch prediction cache entry to invalid when the branch target address matches the address of an instruction cache entry with a validity bit in an invalid state;

such that a store-into-instruction-stream will be detected by said instruction cache and will result in the invalidation of any corresponding entries in both said instruction cache and said branch prediction queue;

wherein each line of said branch prediction cache includes a plurality of branch target instructions immediately following said branch instruction to form a set of instruction bytes, said set of instruction bytes being aligned differently than a line of said instruction cache, and further comprising means for storing at least a portion of an end address of said set and comparing said portion of said end address to a portion of an address of an instruction cache entry.

3. The apparatus of claim 2 further comprising control means for ensuring that all the addresses of said set of instructions are within the same page of memory, such that said second means for comparing said portion of said end address to a portion of an address of an instruction cache entry being operable without comparing bits more significantly than a page boundary bit.

4. An apparatus comprising:

a system bus;

an instruction cache, coupled to said system bus, including validity bits for indicating whether an entry is valid and means for setting a validity bit for an entry to an invalid state upon detection of a store signal on said system bus for an address matching said entry;

a branch prediction cache including, for each entry, a branch instruction address, a branch target address, branch target instruction data and a validity bit;

means for comparing an address of an instruction cache entry to said branch target addresses;

control means for writing instructions into said branch prediction cache only if said instructions are also written into said instruction cache and for removing instructions from said branch prediction cache whenever said instructions are removed from said instruction cache; and

means for setting a validity bit for a branch prediction cache entry to invalid when the branch target address matches the address of an instruction cache entry with a validity bit in an invalid state;

such that a store-into-instruction-stream will be detected by said instruction cache and will result in the invalidation of any corresponding entries in both said instruction cache and said branch prediction queue;

wherein said means for comparing does not compare bits more significant than index bits of said instruction cache.
 Description Submit all comments and votes
 


Related applications filed concurrently herewith are entitled "Two-Level Branch Prediction Cache", U.S. patent application Ser. No. 07/485,306, now abandoned, "Integrated Single-Structure Branch Prediction Cache", U.S. patent application Ser. No. 07/485,307, now U.S. Pat. No. 5,093,778 and "Integrated Instruction Queue and Branch Target Cache"U.S. patent application Ser. No. 485,304.

BACKGROUND OF THE INVENTION

The present invention relates to branch prediction caches, and in particular to methods and apparatus for handling the modification of an instruction in the branch prediction cache or already provided to an instruction pipeline by an instruction farther along the pipeline.

As computer designers have designed increasingly higher performance implementations of various computer architectures, a number of classes of techniques have been developed to achieve these increases in performance. Broadly speaking, many of these techniques can be categorized as forms of pipelining, caching, and hardware parallelism. Some of these techniques are generally applicable to and effective in the implementation of most types of computer architectures, while others are most appropriate in the context of speeding up the implementations of Complex Instruction Set Computers (CISC).

The purpose of creating a high-performance implementation of a CISC architecture system is to achieve the appearance of each instruction having a processing time of one or a few processor/clock cycles. However, such timing typically is only approximated. One of the principal reasons for not achieving a one-cycle processing time is the existence of various types of dependencies between neighboring instructions. The dependencies may result in the occurrence of processing delays.

One critical area in the achievement of a one-cycle processing time is in the handling of control dependencies, i.e. branch-type instructions. In the context of a CISC architecture implementation these instructions tend to be difficult insofar as being able to quickly calculate or otherwise determine the target address of a branch, to quickly resolve the proper path of subsequent instruction processing in the case of conditional branches, and in all cases to then quickly restart the fetching of instructions at the new address. Pipeline processing delays result when these operations cannot be performed quickly.

To minimize the actual impact of these delays on processing throughput, various types of prediction and caching techniques are available. The purpose of a designer in applying these various types of techniques is to always accurately predict the information to be produced by the above operations, i.e. branch target address, conditional branch direction, and the first one or more instructions at the branch target address. The percentage success rates of these prediction techniques then reduce the effective delay penalties incurred by the above three operations in corresponding amounts.

Generally speaking, existing techniques are based on the retention or caching of information from the prior processing of branch instructions. When a branch instruction is encountered again, and information from previous processing of this instruction is still to be found in the prediction cache structure, the cached information is then used to make an "intelligent" dynamic prediction for the current occurrence of the branch. When no such information is to be found in the prediction cache structure, either a "dumber" static prediction must be made, or normal processing, with the attendant possibility of incurring delays, must be performed.

Existing high-performance CISC designs use forms of cache structures to hold various combinations of information with the intention of predicting one or more of the three types of information mentioned above. In an aggressive all-encompassing design each entry holds: a record of the actual target address associated with the last occurrence of the branch; a copy of the first several instructions at this target address; and, in the case of conditional branches, a history record of the direction taken by each of the past couple of branch occurrences.

In parallel with the fetching and/or decoding of a branch instruction, the instruction is also looked up in the branch prediction cache. Generally, this look-up is based on the fetch address of the branch or on a closely related address. As the instruction is being decoded, the branch history information is used to predict the direction of conditional branches. The history information determines whether subsequent instruction processing should continue with the instructions sequentially following the branch, or with the sequence of instructions starting at the target address.

Whether the branch is conditional or unconditional, if processing is to continue with the target instruction stream, then the processing of successive instructions proceed without delay using the branch target instructions from the cache. At the same time, fetching of further non-cached instructions is immediately initiated using the predicted branch target address, plus an appropriate increment.

While this branch prediction design offers the possibility of fast, efficient processing of the predicted branches, the possibility of processing a branch instruction based on erroneous information is introduced. In general, handling mispredicted aspects of processing a branch must be an integral part of the overall central processing unit (CPU) design.

There are subtler issues stemming from the nature of the implemented architecture. In the case of many CISC architectures, one part of a program may modify other parts. These modified program parts are then executed. The result is that the modified image of these instructions, instead of the original image, is then executed.

For some CISC architectures which allow programs to modify itself, this type of programming practice has become an established practice within a significant portion of the existing software base. Consequently, to maintain backward software compatibility, new CPU implementations often must not only implement the direct semantics of the architecture's instruction set, but also maintain the appearances of this expected secondary semantic behavior. In the case of higher performance implementations this can become a significant, and potentially difficult, requirement to satisfy.

When cache-based dynamic branch prediction in incorporated into the design difficulties with high-performance arise. It is possible that the target address of a branch instruction, which might otherwise be a constant value for that instruction instance in memory, may be modified by what is normally an instruction that writes data to memory. Or, for that matter, the branch instruction opcode may be changed to that of a different type of branch instruction or to that of a non-branch type of instruction.

Further, even if none of the bytes of a branch instruction itself are modified, one of the target instructions may be modified. To the extent that these instructions are fetched from main memory after any such modification has taken affect, there is no problem. But if, for example, a copy of the modified instruction is held in a branch prediction cache such as described above, and is fetched from there instead of main memory, then a consistency problem exists.

For both branch and target instruction modifications, the design of a branch prediction cache and associated control circuitry must maintain a sufficient degree of consistency to ensure proper processing of instructions. The maintenance of consistency must encompass not only conventional data/instruction cache consistency, but also consistency with respect to memory store instructions modifying other instructions which are executed shortly thereafter.

The consistency problem is essentially similar to that encountered with more conventional data/instruction cache structures used in high-performance CPU designs. Data writes by the CPU must be appropriately reflected in the state and/or contents of any affected cache entries. But the scope of the problem is more general than this. When other devices within the system, such as Direct Memory Address (DMA) devices or CPU's, modify main memory, the issue of cache/main memory consistency again arises. For a branch prediction cache these other devices are additional sources or causes of inconsistencies which must also be covered.

In extreme "store-into-instruction-stream" cases, such as a modifying instruction immediately followed by a branch and then a modified target instruction, this can be difficult. Particularly for highly pipelined, high-performance CPU designs, implementing this can prove to be expensive in terms of additional hardware circuitry, complex to design, and/or forcing compromise in the overall performance attainable by the design.

Pipelining, particularly the deep pipelining that is common in high-performance implementations of CISC architectures, results in large instruction processing latencies and high degrees of overlap between the processing of successive instructions. Access of a branch prediction cache and usage of the resultant prediction information and target instructions generally occurs early in such pipelines. Execution of memory writes by instructions storing to memory, on the other hand, generally takes place late in such pipelines.

Consequently, actions such as fetching target instructions from a branch prediction cache (BPC) can easily occur before architecturally preceding memory writes modifying such target instructions have actually been performed. Such actions may even occur before the store addresses have been generated, this being the earliest point at which potential consistency problems could be checked for and detected. The result is that consistency must be maintained with respect to not only the explicit contents of the branch prediction cache, but also with respect to the implicit contents associated with instructions currently being processed.

Insofar as these consistency issues apply to target instructions fetched from the cache, they also apply to instructions temporarily stored in and then taken from instruction pre-fetch queues. As a transient form of cache, a pre-fetch queue can lead to similar inconsistencies.

Overall, the need to maintain branch prediction cache and more general fetched instruction consistency with respect to cases of "store-into-instruction-stream", as well as with respect to more generic modifications of main memory blocks, is a difficult problem.

SUMMARY OF THE INVENTION

The present invention provides for the updating of both the instructions in a branch prediction cache and instructions recently provided to an instruction pipeline from the cache when an instruction being executed attempts to change such instructions ("Store-Into-Instruction-Stream"). The branch prediction cache (BPC) includes a tag identifying the address of instructions causing a branch, a record of the target address which was branched to on the last occurrence of each branch instruction, and a copy of the first several instructions beginning at this target address. A separate instruction cache is provided for normal execution of instructions. and all of the instructions written into the branch prediction cache from the system bus must also be stored in the instruction cache. The instruction cache monitors the system bus for attempts to write to the address of an instruction contained in the instruction cache. Upon such a detection, that entry in the instruction cache is invalidated, and the corresponding entry in the branch prediction cache is invalidated. A subsequent attempt to use an instruction in the branch prediction cache which has been invalidated will detect that it is not valid, and will instead go to main memory to fetch the instruction, where it has been modified.

The present invention also provides for invalidating an instruction already provided to an instruction pipeline when that instruction is attempted to be changed by an instruction currently being executed. This is done by maintaining a copy of the target instruction address of the branch after the target instructions are provided to the instruction pipeline. These instructions (if still present) are then invalidated upon a detection of an attempt to write to them in main memory, and an error signal is generated. A control circuit for the instruction pipeline receives this error signal, and flushes the instruction pipeline of all instructions and results from the branch target instruction onward, and restarts the instruction pipeline at the branch target instruction. This stored group of instructions is referred to as a pre-fetch queue.

The branch target cache contains the basic hardware necessary for implementing the pre-fetch queue. One or more 24 byte portions of the branch target cache are simply designated as being used for the pre-fetch queue. These instructions can then be invalidated just like any other instruction in the branch target cache.

For a fuller understanding of the nature and advantages of the invention, reference should be made to the ensuing detailed description taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the computer system incorporating the present invention;

FIG. 2 is a block diagram of the decoder block of FIG. 1;

FIG. 3 is a block diagram of the front end block of FIG. 2, showing the branch prediction cache of the present invention;

FIG. 4 is a block diagram of the operation or the instruction cache of the present invention;

FIg. 5 is a diagram showing the contents of the branch prediction cache the of the present invention;

FIG. 6 is a diagram of the consistency check comparisons done in the present invention;

FIG. 7 is a block diagram of the branch prediction cache according to the present invention;

FIG. 8 is a block diagram of the tag RAM of FIG. 7;

FIG. 9 is a diagram of the target address RAM of FIG. 7;

FIG. 10 is a block diagram of the branch target instruction sequence RAM of FIG. 7;

FIG. 11 is a block diagram of the target instruction valid and branch history memory of FIG. 7;

FIG. 12 is a diagram of the address selection logic of FIG. 7;

FIG. 13 is a diagram illustrating the pre-fetch queue in the branch target instruction sequence RAM of the present invention;

FIG. 14 is a flowchart of the operation of the control circuit for monitoring the pre-fetch queue for store-into instruction stream errors;

FIG. 15 is a diagram of the store-into-stream detection logic; and

FIG. 16 is a diagram of the multiple instruction streams resulting from multiple branches.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The following System Overview and Pipeline Control System Overview describe a pipeline system which can be flushed and restarted when a branch error is detected several cycles after the branch is taken. This is a necessary element of aspects of the invention set forth in Maintaining Branch Prediction Cache Consistency.

System Overview

FIG. 1 is a block diagram of a CPU 10 incorporating the present invention. The CPU is designed to execute an instruction set compatible with that of the Intel 80386, as described in the Intel 80386 Programmer's Reference Manual published by Intel Corporation, Santa Clara, California, 1986. Each block in the diagram corresponds generally to a separate integrated circuit chip or group of chips in a current embodiment.

An Instruction Decoder (DEC) 12 performs instruction fetch, instruction decode, and pipeline control. DEC 12 optionally interleaves instruction pre-fetch of up to three simultaneous instruction streams. DEC 12 contains a fully associative Branch Prediction Cache (BPC) according to the present invention. The BPC is an integrated structure which contains dynamic branch history data, a physical branch target address, and a branch target buffer for each cache entry. As branch instructions are decoded, the BPC is consulted for information about that branch. Independent of the direction predicted, branches are executed in a single cycle and do not cause pipeline bubbles.

On each cycle, an instruction is selected from one of the three instruction buffers or a branch target buffer in the BPC. The instruction is decoded, assembled into an internal 96-bit decoded instruction word, referred to as a pseudo-op (p-op), and dispatched to the various functional units. Instruction decode generally proceeds at a single cycle rate. Each p-op issued by DEC 12 is given a tag which uniquely identifies each instruction currently outstanding in the machine. Tags are issued in increasing order, allowing easy determination of relative age of any two outstanding tags. Bus transactions between chips include the tag of the originating instruction. Functional units pair up instructions, addresses, and operands with these tags.

DEC 12 is also responsible for tracking the status of outstanding instructions, pipeline control, and for invoking exception processing when needed.

An address Preparation Unit (AP) 15 calculates effective addresses, performs segment relocation, and implements a demand paged memory management system. It contains a translation lookaside buffer (TLB).

An Integer Execution Unit (IEU) 17 performs single cycle execution of most integer instructions. It contains an 8.times.32 multiplier and accumulator array, as well as microcode for multiply and divide instructions. The pipeline control architecture allows the IEU to perform parallel and/or out-of-order execution of integer instructions.

A Numerics Processor (NP) 20 may optionally be included in the CPU. It is a high performance implementation of the IEEE floating point standard. The NP is integrated into the pipeline and does not incur any special overhead for the transfer of instructions and operands. Integer (IEU) and floating point (NP) instructions execute concurrently.

A Memory and Cache Controller (MCC) 25 is responsible for controlling the instruction and data caches and implements the cache coherency protocol. The MCC controls the interface to the System Bus, supporting high speed single and block mode transfers between cache and memory. The MCC also contains write reservation tables for integer, floating point, and system writes, and includes read after write short-circuit paths.

An Instruction Tag (ITAG) chip 27 contains the tag RAM for an Instruction Cache (ICache) 30. The tag RAM contains the address tag, a "Valid" bit, and an "Attention" bit for each line in the ICache. The Attention bit indicates that the DEC chip may also have data from this line cached in the BPC.

A Data Tag (DTAG) chip 32 contains the tag RAM for a Data Cache (DCache) 35. The tag RAM contains the address tag and line state bits for each line in the DCache. The possible line states are Absent, Shared Read, Owned Clean, and Owned Dirty, supporting a write back multiprocessor cache coherency protocol (modified write once). The tag RAM is dual ported to allow both CPU and bus snooping cache look-ups in a single cycle.

Each functional unit chip is packaged in a custom ceramic pin grid array (PGA) which contains power and ground planes and associated decoupling capacitors Roughly 25% of the pins are dedicated to power and ground. For 0.8 micron to 1.2 micron processes, I/0 delays are comparable to on-chip critical paths. Inter-chip I/0 is incorporated into the pipeline, and thus does not add to the machine cycle time. ICache 30 and DCache 35 use conventional static RAM's.

Communications between the various functional units are carried out over a number of internal buses. (These include: a 64-bit IFETCH.sub.-- DATA bus for instruction fetches; a 96-bit P-OP bus for communicating issued p-ops to the ApU, the lEU, and the Np; a 52-bit pADR bus for communicating physical addresses; a 64-bit (32 bits in each direction) data cache bus DIOBUS for data cache transfers; a 32-bit bus DXBUS for inter-chip exchange; a 64-bit bus for cache/memory updates; and a number of termination buses (AP.sub.-- TERM, IEU.sub.-- TERM, NP.sub.-- TERM, and MCC.sub.-- TERM) from the functional units to DEC 12. Some of these buses are full width and some (e.g. P-OP bus) are half-width (time multiplexed). Interactions between functional units are generally limited to well defined transactions on the internal processor buses.

Pipeline Control System Overview

Pipeline control of the processor is distributed across the functional units mentioned above. No centralized scheduling or score boarding of the pipeline is performed. DEC 12 does observe certain overall resource constraints in the architecture and will occasionally hold off on issuing a decoded instruction which would violate resource limitations. Each functional unit is responsible for scheduling its own internal operations. Interlock checking is performed at a local level.

In a deeply pipelined machine, exception detection at various stages in the pipeline creates significant control difficulties. Each stage must be careful to hold off modification of state while any other stage may yet detect an exception on a previous instruction. Special purpose control logic is common, and careful pipeline simulations must be performed.

The processor deals with this complexity