WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Dual instruction set processor having a pipeline with a pipestage functional unit that is relocatable in time and sequence order    
United States Patent5542059   
Link to this pagehttp://www.wikipatents.com/5542059.html
Inventor(s)Blomgren; James S. (San Jose, CA)
AbstractA CPU pipeline is able to process instructions from a complex instruction set computer CISC instruction set and from a reduced instruction set computer RISC set. A mode register is provided to indicate whether RISC or CISC instructions are currently being processed. Two instruction decode units are used, one for each instruction set. Compound CISC instructions flow from the decode pipestage to the address generate stage, then to an operand cache stage, and finally to an algebraic execute stage before the results are written back to the GPR register. When the CPU switches to RISC mode by clearing a mode bit in the mode register, the pipeline is re-arranged for processing the simpler RISC instructions. Two outputs are provided for the RISC instruction decoder. The first output is for simple execute-type instructions, while the second output is for load/store-type instructions, and connects to the address generate pipestage, which generates an address for the operand cache stage. These instructions are prevented from continuing to the execute stage by a mux. The mux normally connects the operand cache stage to the execute stage when CISC instructions are being processed, but the mux directly connects the second output of the RISC instruction decoder to the execute stage when the mode register enables RISC instruction decoding. This reduces the latency for RISC instructions by 1 or 2 clocks. An alternate embodiment re-arranges the pipeline dynamically as simple instructions are detected by the decode units. The preferred embodiment uses a fixed pipeline with the execute hardware relocatable to the D, C, or M pipestages. Thus the pipeline is optimized for both RISC and CISC instructions.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5542059
Dual instruction set processor having a pipeline with a pipestage

     functional unit that is relocatable in time and sequence order - US Patent 5542059 Drawing
Dual instruction set processor having a pipeline with a pipestage functional unit that is relocatable in time and sequence order
Inventor     Blomgren; James S. (San Jose, CA)
Owner/Assignee     Exponential Technology, Inc. (San Jose, CA)
Patent assignment
All assignments
Publication Date     July 30, 1996
Application Number     08/361,017
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 21, 1994
US Classification     712/41 710/240 712/208 712/209
Int'l Classification     G06F 009/30
Examiner     Lall; Parshotam S.
Assistant Examiner     Maung; Zarni
Attorney/Law Firm     Auvinen; Stuart T.
Address
Parent Case     RELATED APPLICATION This application is a continuation of Ser. No. 08/180,023 filed 1/11/94, now abandoned.
Priority Data    
USPTO Field of Search     395/375 395/725 364/231.8 364/232.23
Patent Tags     dual instruction set processor pipeline pipestage functional is relocatable time sequence order
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5291586
Jen
712/220
Mar,1994

[0 after 0 votes]
5287465
Kurosawa
712/215
Feb,1994

[0 after 0 votes]
5269007
Hanawa
712/218
Dec,1993

[0 after 0 votes]
5255384
Sachs
711/207
Oct,1993

[0 after 0 votes]
5241636
Kohn
712/215
Aug,1993

[0 after 0 votes]
5230069
Brelsford
718/100
Jul,1993

[0 after 0 votes]
5226164
Nadas
712/209
Jul,1993

[0 after 0 votes]
5222223
Webb, Jr.
711/140
Jun,1993

[0 after 0 votes]
5210832
Maier
712/227
May,1993

[0 after 0 votes]
5167023
de Nicolas

Nov,1992

[0 after 0 votes]
5150468
Staplin
712/245
Sep,1992

[0 after 0 votes]
5136696
Beckwith
712/240
Aug,1992

[0 after 0 votes]
5097407
Hino
712/209
Mar,1992

[0 after 0 votes]
5077657
Cooper

Dec,1991

[0 after 0 votes]
5077654
Ohtsuki

Dec,1991

[0 after 0 votes]
5073855
Staplin

Dec,1991

[0 after 0 votes]
5067069
Fite

Nov,1991

[0 after 0 votes]
4992934
Portanova
712/209
Feb,1991

[0 after 0 votes]
4991081
Bosshart
711/3
Feb,1991

[0 after 0 votes]
4972317
Buonomo
712/227
Nov,1990

[0 after 0 votes]
4962519
Upadhya
378/133
Oct,1990

[0 after 0 votes]
4841476
Mitchell
703/26
Jun,1989

[0 after 0 votes]
4821187
Ueda
712/246
Apr,1989

[0 after 0 votes]
4812975
Adachi
703/26
Mar,1989

[0 after 0 votes]
4794522
Simpson
703/26
Dec,1988

[0 after 0 votes]
4780819
Kashiwagi
703/23
Oct,1988

[0 after 0 votes]
4763242
Lee
703/27
Aug,1988

[0 after 0 votes]
4633417
Wilburn
703/28
Dec,1986

[0 after 0 votes]
4538241
Levin
711/207
Aug,1985

[0 after 0 votes]
4377844
Kaufman
711/220
Mar,1983

[0 after 0 votes]
5230045
Sindhu
711/203
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. A CPU having a pipeline for processing a plurality of instructions from two separate instruction sets, said pipeline comprising:

first instruction decode means for decoding instructions from said first instruction set, said first decode means providing decoded instructions to said plurality of functional units, said first instruction set having a first encoding of opcodes to instructions;

second instruction decode means for decoding instructions from said second instruction set, said second instruction set having a second encoding of opcodes to instructions, said second encoding of opcodes to instructions being separate and independent from said first encoding of optodes to instructions, said second decode means providing decoded instructions to said plurality of functional units;

enable means for enabling either said first instruction decode means or said second instruction decode means, said enable means coupled to said first instruction decode means and coupled to said second instruction decode means;

a plurality of functional units for processing said plurality of instructions;

a relocatable functional unit in said plurality of functional units, said relocatable functional unit for executing native instructions from both of said two separate instruction sets;

means for indicating which one of said two separate instruction sets is being processed by said plurality of functional units; and

means for relocating, responsive to said indicating means, said relocating means for temporally relocating said relocatable functional unit in time and sequence order to other functional units in said plurality of functional units.

2. The CPU of claim 1 wherein said two separate instruction sets comprise a first and a second instruction set, and wherein said relocatable functional unit is relocated by said relocating means when instructions from said first instruction set are being processed by said plurality of functional units.

3. The CPU of claim 2 wherein said relocatable functional unit is a functional unit for executing algebraic operations relocated by said relocating means when instructions from said first instruction set are being processed by said plurality of functional units.

4. The CPU of claim 2 wherein said first instruction decode means includes means for detecting a simple execute instruction and wherein a functional unit for executing algebraic operations is relocated by said relocating means when said simple execute instruction is detected by said first instruction decode means.

5. A CPU pipeline for executing a complex instruction set computer CISC instruction set and a reduced instruction set computer RISC instruction set, said pipeline comprising:

CISC decode means for decoding said CISC instruction set;

RISC decode means for decoding said RISC instruction set;

enable means for enabling either said RISC decode means or said CISC decode means, said enable means coupled to said RISC decode means and coupled to said CISC decode means:

a plurality of functional units, each functional unit in said plurality of functional units for executing both native RISC and native CISC instructions, said plurality of functional units receiving decoded insauctions from said CISC decode means and said RISC decode means, said plurality of functional units arranged in a sequence of functional units;

a relocatable functional unit in said plurality of functional units;

means for indicating if said RISC instruction set or said CISC instruction set is being processed by said plurality of functional units;

means for relocating, responsive to said means for indicating, said relocating means for relocating temporally said relocatable functional unit in time and sequence to other functional units in said plurality of functional units if said means for indicating indicates that said RISC instruction set is being processed.

6. The pipeline of claim 5 wherein said RISC instruction set has an encoding of opcodes to instructions separate and independent from the encoding of said CISC instruction set.

7. The pipeline of claim 6 wherein said relocatable functional unit is relocated relative to other functional units by said means for relocation when said enable means enables said RISC decode means.

8. The pipeline of claim 7 wherein said relocatable functional unit relocated by said means for relocation is a functional unit for execution of algebraic and logic instructions.

9. The pipeline of claim 8 wherein said functional unit for execution is relocated by said means for relocation to an earlier position in said sequence of functional units in said pipeline.

10. The pipeline of claim 9 wherein said functional unit for execution is relocated by said means for relocation to immediately after said RISC decode means, said functional unit for execution receiving decoded RISC execute instructions from said RISC decode means when said means for indicating indicates that said RISC instruction set is being processed.

11. A CPU pipeline for executing a complex instruction set computer CISC instruction set and a reduced instruction set computer RISC instruction set, said pipeline comprising:

CISC decode means for decoding said CISC instruction set;

RISC decode means for decoding said RISC instruction set;

enable means for enabling either said RISC decode means or said CISC decode means, said enable means coupled to said RISC decode means and coupled to said CISC decode means;

a plurality of functional units, said plurality of functional units receiving decoded instructions from said CISC decode means and said RISC decode means, said plurality of functional units arranged in a sequence of functional units;

a relocatable functional unit in said plurality of functional units, said relocatable functional unit in said plurality of functional units for executing both native RISC and native CISC instructions;

means for indicating if said RISC instruction set or said CISC instruction set is being processed by said plurality of functional units;

means for relocating, responsive to said means for indicating, said relocating means for relocating temporally said relocatable functional unit in time and sequence to other functional units in said plurality of functional units if said means for indicating indicates that said RISC instruction set is being processed.

12. The pipeline of claim 11 wherein said RISC instruction set has an encoding of opcodes to instructions separate and independent from the encoding of said CISC instruction set.

13. The pipeline of claim 11 wherein said relocatable functional unit is relocated relative to other functional units by said means for relocation when said enable means enables said RISC decode means.

14. The pipeline of claim 13 wherein said relocatable functional unit relocated by said means for relocation is a functional unit for execution of algebraic and logic instructions.

15. The pipeline of claim 14 wherein said functional unit for execution is relocated by said means for relocation to an earlier position in said sequence of functional units in said pipeline.

16. The pipeline of claim 15 wherein said functional unit for execution is relocated by said means for relocation to immediately after said RISC decode means, said functional unit for execution receiving decoded RISC execute instructions from said RISC decode means when said means for indicating indicates that said RISC instruction set is being processed.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to microprocessor architectures, and more particularly to a pipeline with variable latencies to support execution of multiple instruction sets.

RELATED APPLICATION

This application is related to a co-pending application for a "Dual-Instruction-Set Architecture CPU with Hidden Software Emulation Mode", invented by Blomgren and Richter, Ser. No. 08/179,926, filed 1/11/94, and assigned to the same assignee as this application.

2. Description of the Related Art

The performance of microprocessors has been increased through the use of the well-known technique of pipelining. A pipelined central processing unit or CPU is divided into several units referred to as stages or pipestages, each pipestage typically requiring one processor clock period to perform its function. As an instruction is processed by the microprocessor, it flows through the pipeline: first the instruction is fetched from memory by the Fetch pipestage, then the instruction is decoded by the D stage, the decoded instruction may then be executed by an arithmetic-logic-unit (ALU) or adder, then the result from the execute stage is written to a register file or to memory. While a first instruction is in the execute stage, the following instruction is in the D stage, and the next following instruction in the Fetch stage. Thus many instructions are being processed at the same time, but each instruction is processed over several clock periods. The result is that the clock period may be reduced, improving performance.

Pipelining has worked very well with simple, well-organized instruction sets such as with reduced instruction set computers or RISC instruction sets. However, older, more complex instructions set computers or CISC instruction sets contain instructions that require additional use of functional units. Some complex, compound instructions actually perform the equivalent work of two or more simple instructions. A high-performance design may require adding more functional units and stages to the pipeline than are necessary for the simpler instructions. The difficulty arises in trying to process both simple and complex instructions in the same pipeline. If the pipeline is to execute both a simple RISC and a complex CISC instruction set, the difficulty is intensified.

When instructions are pipelined, the results from one instruction may be needed by a subsequent instruction, even before the instruction completes. Techniques such as bypassing and forwarding of results can route the result from one instruction to a subsequent instruction, when both instructions are in different pipestages of the pipeline. However, the subsequent instruction will still have to wait for the next pipestage to be released by the previous instruction.

All microprocessors perform 3 basic types of instructions: accessing memory, performing algebraic operations, and control transfer. These 3 types can be referred to as LOAD/STORE, ALU, and BRANCH operations. Regardless of the architecture or instruction set, all instructions are composed of these 3 component operations. An example is in the x86 instruction set, made popular by personal computers (PC's) using Intel Corporation (of Santa Clara, Calif.) 386 and 486 microprocessors. The x86 POP instruction performs a LOAD from a stack in memory followed by an ALU operation to increment the stack pointer register. Compound instructions are common in CISC instruction sets such as the x86 set, but are rare in RISC instruction sets.

OPERATIONAL LATENCIES

ALU operations include addition, subtraction, Boolean operations, and bit shifts. Multiplication, division, and other complex floating-point operations may also be performed if sufficient hardware resources are provided. This type of instruction usually takes one or two operands from a high-speed internal CPU general-purpose register file (GPR), and stores the result back to this register file. Since data is not transferred off the CPU die, the operation is very fast, typically requiring one clock period. The latency, or time required to perform the operation defined by the instruction, is one clock period. Latency does not include fetch, decode, or write-back time normally required for instruction processing; latency here refers to the time to perform a component operation.

LOAD/STORE operations must first compute a memory address where the data resides, and then write data to that memory location or read data from that location or address. Data is transferred between a register in the GPR and the memory, which can be slow DRAM-based system memory or cache memory. The cache may be on the CPU die or off-chip. With a cache, the transfer will usually take one clock, while the address computation, which normally requires addition, takes an additional clock. Thus the LOAD/STORE operations require a total of 2 clocks, for a latency of two.

Control transfer or BRANCH operations calculate a new address to load into the instruction pointer. If the branch is conditional, a new target address for the code to jump to is calculated, and a branch condition is evaluated, usually from the condition code register associated with the ALU. Branch operations may be quite complex to pipeline, but optimizations and prediction techniques are possible. A supplementary adder may be provided to calculate the target address early, during the decode pipestage, and the Fetch stage may be designed to fetch both the target instruction and the sequential (branch not taken) instruction. However, since the branch may have to wait for the condition codes to be set by a previous instruction's ALU pipestage, and the next instruction must be decoded after the branch decision is made, the latency is at least two clocks.

LATENCY DIAGRAM

FIG. 1 is a latency diagram that is useful in designing pipelines. Each box in the figure represents an operation or function that requires one clock to complete. Connections between boxes show how one operation may depend upon the results of another operation. The computational work performed by any instruction can be analyzed with this latency diagram. Instruction cache 10 contains a buffer of instructions that have not yet been decoded, and may contain instructions that will not be executed if a branch occurs. Branch adder 12 is used to calculate the target address for a branch. Instruction decode and register file 14 decodes an instruction fetched into the instruction cache, and provides the register operands to adder 16, which performs an ALU operation, or can calculate an address. Operand cache 18 is a cache of main memory data or operands and can be written into for a STORE operation or read from in a LOAD operation.

If an operation has a greater latency than 1 clock, then the diagram may be modified accordingly. For example, if the operand cache 18 were slow and required 2 clocks, then box 18 could be replaced with two boxes in sequence. Similarly, the adder could be replaced with two or three boxes for floating-point operations. Connections may also be modified depending upon the design; for example, a very high-speed design might not allow connection "D", the bypass around the ALU. Another design might have adder 16 located after operand cache 18 rather than before it, or in both locations.

A LOAD/STORE will flow through the latency diagram, FIG. 1, starting as an instruction in the I-cache 10, decoded in block 14, which provides address components from the register file or immediate values from the instruction itself, and ALU control information, along path "B" to adder 16. Adder 16 generates a memory address from these address components and provides this address along path "C" to the operand cache 18. The operand cache stores or loads the data specified by the address. If the operation is a load, then the data read from the cache is available to the adder 16 along path "E", and is loaded into the register file (not shown). Thus the load operation takes 4 clocks to execute and provide its data result. Four clocks are required because of dependencies within the load instruction itself: the operand cache could not be accessed before the address was generated, and the address could not be generated before the register file provided the components.

An operand dependency may exist with the instruction following the load. If the subsequent instruction is an Add using the data loaded by the load instruction, then the Add instruction will be in the adder block 16 while the load instruction is in the operand cache 18. However, the adder cannot perform the add until the end of the clock period when the data is provided from the cache 18 to the adder 16 along path "E". Thus the add instruction must wait or "stall" in the add stage 16 for an additional clock before starting the add operation. The stall would still be necessary even if several adder blocks 16 were provided, because the data was not yet available to the add instruction.

Recently, compilers have been designed to re-order instructions to try to reduce dependencies that cause stalls. In the above example, if the Add instruction could occur after another instruction, rather than immediately following the load instruction, then the stall would be avoided. RISC compilers in particular have been successful at instruction re-ordering, allowing for CPU's with multiple functional units to increase performance using dual-pipeline techniques such as super-scalar designs. However, CISC instructions may perform both the load and the add as one atomic instruction, eliminating the possibility of re-ordering. Thus the CISC instruction set itself imposes latencies.

RISC PIPELINE

RISC instructions are typically simple instructions that do not perform both an ALU operation and a cache or memory operand access. Thus path "E" of FIG. 1 is not used within a single instruction, but may be used by a second instruction following a load instruction. However, re-ordering compilers reduce or eliminate the need for path "E". Thus a simple pipeline for RISC is: ##STR1##

A write-back stage is normally also included at the end of the pipeline when the results are written back into the register file and the condition codes are modified. However, this stage does not affect the dependencies and is thus not shown. Likewise the fetch stage is not shown. The adder is designed for both ALU operations and address generation, since address generation is usually simple in RISC instructions. An instruction that uses the ALU will store its result back to the register file rather than the operand cache memory, while an instruction accessing the operand cache will not use the ALU except for generating the address in the operand cache. The Execute pipestage for RISC instructions can perform an address generation or an ALU operation, but not both at the same time.

The diagram below indicates the