|
|  Custom CD of patents similar to US5970241 : Maintaining synchronism between a processor pipeline and subsystem
pipelines during debugging of a data processing system - $19.95 |
| United States Patent | 5970241 |
| Link to this page | http://www.wikipatents.com/5970241.html |
| Inventor(s) | Deao; Douglas E. (Brookshire, TX);
Seshan; Natarajan (Houston, TX);
Lell; Anthony J. (Houston, TX) |
| Abstract | A data processing system on an integrated circuit 42 with microprocessor 1
and peripheral devices 60-61 is provided with an emulation unit 50 which
allows debugging and emulation of integrated circuit 42 when connected to
an external test system 51. Microprocessor 1 has in instruction execution
pipeline which has several execution phases which involve fetch/decode
units 10a-c and functional execution units 12, 14, 16 and 18. The pipeline
of microprocessor 1 is unprotected so that memory access latency to data
memory 22 and register file 20 can be utilized by system program code
which is stored in instruction memory 23. Emulation unit 50 provides means
for emulating the unprotected pipeline of microprocessor 1 and for rapidly
uploading and downloading memories 22-23. Emulation unit 50 operates in a
manner to prevent extraneous operations from occurring which could
otherwise affect memories 22-23 or peripheral devices 60-61 during
emulation. |
| |
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5970241 |
|
|
Maintaining synchronism between a processor pipeline and subsystem
pipelines during debugging of a data processing system |
|
|
|
|
|
| Publication Date |
October 19, 1999 |
|
|
|
|
|
| Filing Date |
November 19, 1997 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to coassigned application Ser. No. 08/783,382
(TI-22105), Ser. No. 09/008,909 (TI-22106), Ser. No. 08/788,751
(TI-22108), Ser. No. 09/012,676 (TI-22109), Ser. No. 09/012,380
(TI-23604), Ser. No. 09/012,381 (TI-24333), Ser. No. 09/012,324
(TI-24334), Ser. No. 09/012,693 (TI-24335), Ser. No. 09/012,325
(TI-24942), Ser. No. 08/974,742 (TI-24946), Ser. No. 08/974,741
(TI-24947), Ser. No. 08/974,630 (TI-24948), Ser. No. 09/012,327
(TI-25248), Ser. No. 09,012,329 (TI-25309), Ser. No. 09/012,326
(TI-25310), and Ser. No. 09/012,813 (TI-25311) all filed contemporaneously
herewith and incorporated herein by reference. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Description  |
|
|
NOTICE
(C) Copyright 1997 Texas Instruments Incorporated. A portion of the
disclosure of this patent document contains material which is subject to
copyright protection. The copyright owner has no objection to the
facsimile reproduction by anyone of the patent disclosure, as it appears
in the Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
TECHNICAL FIELD OF THE INVENTION
This invention relates to microprocessors, and particularly relates to
architectures of very long instruction word processors.
BACKGROUND OF THE INVENTION
As the technology for manufacturing integrated circuits advances, more and
more logic functions may be included in a single integrated circuit
device. Modern integrated circuit (IC) devices include large numbers of
gates on a single semiconductor chip, with these gates interconnected so
as to perform multiple and complex functions, such as, for example, those
in a general-purpose microprocessor. The manufacture of such circuits
incorporating such Very Large Scale Integration (VLSI) requires that the
fabrication of the circuit be error free, as some manufacturing defects
may prevent it from performing all of the functions that it is designed to
perform. This requires verification of the design of the circuit and also
various types of electrical testing after manufacture.
However, as the complexity of the circuit increases, so does the cost and
difficulty of verifying and electrically testing each of the devices in
the circuit. From an electrical test standpoint, in order to totally
verify that each gate in a VLSI circuit functions properly, one must
ideally be able to exercise each of the gates not only individually (in
the digital sense, determining that it is neither stuck-open nor
stuck-closed), but also in conjunction with the other gates in the circuit
in all possible combinations of operations. This is normally accomplished
by automated testing equipment (ATE) that employs test vectors to perform
the desired tests. A test vector describes the desired test input (or
signals), associated clock pulse (or pulses), and expected test output (or
signals) for every package pin during a period of time, often in an
attempt to "test" a particular gate (or macro). For complex circuitry,
this may involve a large number of test vectors and accordingly a long
test time. Macro and cell are used herein to mean the same thing and may
be used interchangeably.
Circuit designers have used stuck-fault modeling techniques in improving
the efficiency of the testing of such VLSI circuits. Stuck-fault modeling
is directed not to stuck-open or stuck-closed defects in individual gates,
but to the effect of such defective gates (and defective interconnections)
resulting in stuck-high and stuck-low nodes of the logic circuit. Minimum
patterns of test vectors are then derived for the exercising of the logic
circuit. Applying such test vectors to the circuit detects stuck-high and
stuck-low nodes if defects are present. Such techniques have been
successful in improving the test efficiency of current generation VLSI
circuits.
In addition, specific circuit configurations in the VLSI circuit may have
some of its gates inaccessible for all but a special combination of
signals, thereby hiding a fault unless a very specific pattern of signals
is presented. However, the cost of performing such testing on 100% of the
manufactured circuits is staggering, considering the high cost of the test
equipment required to exercise each circuit in conjunction with the long
time required to present each possible combination to each gate. This has
in the past forced integrated circuit manufacturers to test less than all
of the active devices in a chip, with the attendant quality levels of the
product being less than optimal. Thus, one of the major problems in
integrated circuit design is the ability to adequately test the final IC
design, and this problem increases with increasing complexity of the
integrated circuit.
One way to address this problem is through design for test (DFT). The key
concepts in DFT are controllability and observability. Controllability is
the ability to set and reset the state of every node in the circuit, while
observability is the ability to observe either directly or indirectly the
state of any node in the circuit. The purpose of DFT is to increase the
ability to control and observe internal and external nodes from external
inputs/outputs. That is, DFT techniques may be employed for logic
verification and DC parametric tests.
Designing testability into any circuit will affect the circuitry to some
degree. Additional logic will probably have to be added. This additional
logic will increase the amount of silicon required to implement the
design. The savings from enhanced testability do not usually show up until
the development time and testing costs of the circuit and its end system
are analyzed.
In conjunction with the stuck-fault modeling and associated test
generation, other circuitry may be included in the VLSI circuit
specifically designed to improving its testability. One type of test
circuitry is a scan path in the logic circuit. A scan path consists of a
chain of synchronously clocked master/slave latches (or registers), each
of which is connected to a particular node in the logic circuit. These
latches can be loaded with a serial data stream ("scan in") presetting the
logic circuit nodes to a predetermined state. The logic circuit then can
be exercised in normal fashion, with the result of the operation (at each
of the nodes having a scan latch) stored in its respective latch. By
serially unloading the contents of the latches ("scan out"), the result of
the particular test operation at the associated nodes is read out and may
be analyzed for improper node operation. Repetition of this operation with
a number of different data patterns effectively tests all necessary
combinations of the logic circuit, but with a reduced test time and cost
compared to separately testing each active component or cell and all their
possible interactions. Scan paths permit circuit initialization by
directly writing to the latches (or registers) and directly observing the
contents of the latches (or registers). Using scan paths helps to reduce
the quantity of test vectors compared to traditional "functional mode"
approaches. Techniques for scanning such data are discussed by E. J.
McCluskey in A Survey of Design for Testability Scan Techniques, VLSI
Design (Vol. 5, No. 12, pp. 38-61, December 1984).
Also as VLSI technology is advancing, users of integrated circuits are
desiring specially designed and constructed integrated circuits, for
performing functions customized for the user's application. Such
integrated circuits have been called Application-Specific Integrated
Circuits (ASICs). For an ASIC device to be cost-competitive with general
purpose microcomputers which may have special functions implemented in
programmable firmware, and cost-competitive with a board design made up of
smaller scale integrated circuits, the design time of the ASIC circuit
must be short and the ASIC circuit must be manufacturable and testable at
low cost. Accordingly, it is useful for such circuits to be modular in
design, with each of the modules performing a certain function, so that a
new ASIC circuit may be constructed by combining previously-designed
circuit modules. Such an approach can also be used for non-ASIC
microcomputers and microprocessors. Regardless of the end product, the use
of a modular approach allows the designer to use logic which has
previously been verified, and proven manufacturable. However, if logic
modules containing existing scan paths are placed into a new circuit
application, new test patterns will generally be required for the new
device, thereby lengthening the design/manufacture cycle time.
A modular approach to utilizing scan paths and other testability circuits
has been used to provide thorough coverage of all possible faults in an
efficient manner. However, this approach utilizes system buses to set up
and operate the scan test, so that even though each module is tested
independently, the test pattern designed for a given module depends upon
the operation of other modules in the logic circuit for purposes of bus
control and module selection. This results in the testability of a
particular module depending upon the fault-free operation of other
modules. In addition, the automatic test program generator (ATPG) program
which sets the conditions for test of a given module depends upon the
position of the module relative to other modules, and upon the operating
features of such other modules. While reduced test times and costs are
thus achieved by such modularity, the use of system buses to load and
unload the scan paths in the individual modules may not only affect the
operation of the particular module, but is likely to also preclude
"porting" of the test program for a given module from one logic circuit to
another.
Recently, MegaModules have been used in the design of ASICs. (MegaModule is
a trademark of Texas Instruments Incorporated.) Types of MegaModules
include SRAMs, FIFOs, register files, RAMs, ROMs, universal asynchronous
receiver-transmitters (UARTs), programmable logic arrays and other such
logic circuits. MegaModules are usually defined as integrated circuit
modules of at least 500 gates in complexity and having a complex ASIC
macro function. These MegaModules may be predesigned and stored in an ASIC
design library. The MegaModules can then be selected by the designer and
placed within a certain area on the desired IC chip. This allows ASIC
designers to integrate MegaModules into their logic as easily as simple
macros.
Another solution to this testing problem of an ASIC is the use of a
so-called Parallel Module Test (PMT), which is often referred to as a
"direct connect" scheme. (Parallel Module Test is a trademark of Texas
Instruments Incorporated.) PMT is a direct connect scheme, because it
connects external pins to a MegaModule bypassing all other logic, buffers,
etc. It is primarily intended as a logic verification testability scheme
and has recently been enhanced to address limited VIH/VIL and ICCQ
testability schemes. However, even PMT may have problems since the logic
states of the ASIC's circuitry may be disturbed as part of the test
process during test selection and enabling.
Another solution is the test access port and boundary-scan architecture
defined by the IEEE 1149.1 standard, a so-called JTAG test port. IEEE
1149.1 is primarily intended as a system test solution. The IEEE 1149.1
standard requires a minimum of four package pins to be dedicated to the
test function. The IEEE 1149.1 standard requires boundary scan cells for
each I/O buffer, which adds data delay to all normal operation function
pins as well as silicon overhead. Although it has "hooks" for controlling
some internal testability schemes, it is not optimized for chip-level
testing. IEEE 1149.1 does not explicitly support testing of internal DC
parametrics.
Software breakpoints (SWBP) provide another mechanism to allow the debug of
microprocessor code and to evaluate performance. A SWBP is typically
accomplished through opcode replacement, provided the program resides in a
writable memory module which allows the opcode at the stop point to be
replaced in memory with the software breakpoint opcode. In most machines,
when a SWBP opcode reaches the first execute stage of an instruction
execution pipeline, it causes the pipeline to stop advancing or trap to an
interrupt service routine, and set a debug status bit indicating the
pipeline has stopped or trapped. In processors classified as protected
pipelines, instructions fetched into the pipeline after the SWBP are not
executed. Instructions that are already in the pipeline are allowed to
complete. To restart execution the pipeline can be cleared and then
restarted by simply prefetching the opcode at the SWBP memory address
after the opcode is replaced in memory with the original opcode.
Microprocessor designers have increasingly endeavored to exploit
parallelism to improve performance. One parallel architecture that has
found application in some modern microprocessors is the very long
instruction word, or VLIW, architecture. VLIW architecture microprocessors
are called that because they handle VLIW format instructions.
A VLIW format instruction is a long fixed-width instruction that encodes
multiple concurrent operations. VLIW systems use multiple independent
functional units. Instead of issuing multiple independent instructions to
the units, a VLIW system combines the multiple operations into one very
long instruction. In a VLIW system, computer instructions for multiple
integer operations, floating point operations, and memory references may
be combined in a single, wide, VLIW instruction.
Testing and debugging such a complex pipeline is difficult, even when the
techniques described in the preceding paragraphs are used. These and other
disadvantages of the prior art are overcome by the present invention,
however, and improved methods and apparatus for chip-level testing, as
well as system-level debugging, are provided.
SUMMARY OF THE INVENTION
According to the present invention, synchronization between an unprotected
instruction execution pipeline internal to a microprocessor and a memory
subsystem data pipeline external to the microprocessor can advantageously
be maintained by utilizing a "ready" output signal or signals from the
microprocessor to warn of a pending halt condition and a second "ready"
output signal or signals to indicate the microprocessor has halted. In an
unprotected pipeline load and store operations are completed independently
of a load or store instruction which initiated the operation. An
instruction which follows the initiating load or store instruction can
take advantage of memory access latency to retrieve data from the target
location of the initiating load or store instruction prior to completion
of the load or store operation.
A method for maintaining synchronism between a processor instruction
execution pipeline and a subsystem data pipeline in such a data processing
system during debugging includes the following steps:
executing system code in the processor instruction execution pipeline in a
normal operational manner to initiate a plurality of operations in the
instruction execution pipeline and in the data pipeline;
sending a first signal from the processor to the subsystem to indicate a
pending halt;
conditioning the subsystem for halting in response to receipt of the first
signal;
halting the normal operation of the processor pipeline such that at least
one of the plurality of operations is still pending;
sending a second signal to the subsystem to indicate the processor pipeline
is halted;
halting the subsystem in response to receipt of the second signal such that
any of the operations in the subsystem pipeline which correspond to the at
least one of the plurality of operations still pending in the instruction
execution pipeline is maintained; and
continuing execution of the system code in the processor instruction
execution pipeline in a manner that no extraneous operations occur within
the data processing system.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the present invention will become apparent
by reference to the following detailed description when considered in
conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram of a digital signal processor (DSP), showing
components thereof pertinent to an embodiment of the present invention;
FIG. 2 is a block diagram of the functional units, data paths and register
files of FIG. 1;
FIG. 3 shows the addressing mode register (AMR) of the DSP of FIG. 1;
FIG. 4 shows the control status register (CSR) which contains control and
status bits of the DSP of FIG. 1;
FIG. 5 depicts a general-purpose input register (IN) which supports 32
general-purpose input signals of the DSP of FIG. 1;
FIG. 6 depicts a general-purpose output register (OUT) which supports 32
general-purpose output signals of the DSP of FIG. 1;
FIG. 7 illustrates the register storage scheme for 40-bit data of the DSP
of FIG. 1;
FIGS. 8A-8J show an opcode map for the DSP of FIG. 1;
FIG. 9 shows the basic format of a fetch packet of the DSP of FIG. 1;
FIG. 10A depicts a fetch packet of FIG. 9 with fully serial p-bits;
FIG. 10B depicts a fetch packet of FIG. 9 with fully parallel p-bits;
FIG. 10C depicts a fetch packet of FIG. 9 with partially serial p-bits;
FIG. 11 shows the phases of the pipeline of the DSP of FIG. 1;
FIG. 12 shows the branch instruction phases;
FIG. 13 shows the operation of the pipeline of the DSP of FIG. 1 based on
clock cycles and fetch packets;
FIG. 14 depicts fetch packet n, which contains three execute packets, shown
followed by six fetch packets (n+1 through n+6), each with one execution
packet (containing 8 parallel instructions);
FIG. 15 is a block diagram of an MTAP to Test Port Interface for the
processor of FIG. 1;
FIG. 16 is a timing diagram of a MegaModule Reset Sequence for the
processor of FIG. 1;
FIG. 17A shows the interrupt flag register (IFR) which contains the status
of INT4-INT15 and NMI;
FIG. 17B show the interrupt enable register (IER) of the DSP of FIG. 1;
FIG. 17C shows the interrupt set register (ISR), which allows to setting or
clearing interrupts manually in the IFR;
FIG. 17D shows the interrupt clear register (ICR), which allows to setting
or clearing interrupts manually in the IFR;
FIG. 18 is a timing diagram of detection of Analysis interrupts for the
processor of FIG. 1;
FIGS. 19A and 19B illustrate two analysis interrupt related instructions,
SWI and B ARP;
FIG. 20 is a block diagram describing MTAP to CPU Interface Signals for the
MTAP of FIG. 15;
FIG. 21 is a block diagram of an MTAP to MegaModule Domain Interface for
the processor of FIG. 1;
FIG. 22 is a state diagram of the test port states for the processor of
FIG. 1;
FIG. 23A is a timing diagram of a clock switch from Functional Run to Scan
on UCLK for the processor of FIG. 1;
FIG. 23B is a timing diagram of a clock switch from Functional Run on TCLK
for the processor of FIG. 1;
FIG. 23C is a timing diagram of a clock switch from Functional Run on UCLK
to Functional Run on TCLK for the processor of FIG. 1;
FIG. 24 is a table of a scan chain for a Data Scan based on the MSEND bits
for the processor of FIG. 1;
FIG. 25 is a timing diagram showing various cases of halting for the
processor of FIG. 1;
FIG. 26 is a circuit diagram of circuitry to form signal ERDY;
FIG. 27A is a timing diagram of a CPU test port requested halt during
interrupt processing for the processor of FIG. 1;
FIG. 27B is a timing diagram illustrating a Test Port Requested Test Halt;
FIG. 28 is a timing diagram of a pipeline halt showing a pipeline
management process for emulation for the processor of FIG. 1;
FIG. 29 is a timing diagram showing a pipeline restoration process after
emulation for the processor of FIG. 1;
FIG. 30A illustrates an analysis control register for the processor of FIG.
1;
FIG. 30B illustrates an analysis data register for the processor of FIG. 1;
FIG. 30C illustrates an analysis data interrupt return pointer register for
the processor of FIG. 1;
FIG. 30D illustrates a data streaming register for the processor of FIG. 1;
FIG. 31 is a timing diagram of the instruction execution pipeline for the
processor of FIG. 1 showing various pipeline phases;
FIG. 32 is a block diagram illustrating pin connections to a megamodule in
the processor of FIG. 1;
FIG. 33 is a block diagram illustrating JTAG instruction and data register
paths for the processor of FIG. 1;
FIG. 34A illustrates JTAG instruction register contents when Strap status
is selected in the registers of FIG. 33;
FIG. 34B illustrates JTAG instruction register contents when Stop Emulation
status is selected in the registers of FIG. 33;
FIG. 34C illustrates JTAG instruction register contents when Real Time
Emulation status is selected in the registers of FIG. 33;
FIG. 34D illustrates JTAG instruction register contents when Emulation
Error status is selected in the registers of FIG. 33;
FIG. 35 is a block diagram of a JTAG to MPSD Interface for the processor of
FIG. 1;
FIG. 36 illustrates the emulation control register of FIG. 33;
FIG. 37 is a block diagram of a code state machine (CSM) for the MTAP of
the processor of FIG. 1;
FIG. 38 is a schematic of a clock source switch for the CSM of FIG. 37;
FIG. 39 is a schematic of circuitry to generate an EVTA interrupt for the
processor of FIG. 1;
FIG. 40 illustrates the counter register of FIG. 33;
FIG. 41 is a block diagram of domain interconnections for the processor of
FIG. 1;
FIG. 42 is a block diagram illustrating a stream scan register within the
MTAP of FIG. 41;
FIG. 43 is a schematic of EMU pin connection for the processor of FIG. 1;
and
FIG. 44 is a block diagram of a JTAG TAP configuration for the processor of
FIG. 1.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
FIG. 1 is a block diagram of a microprocessor 1 which has an embodiment of
the present invention. Microprocessor 1 is a VLIW digital signal processor
("DSP"). In the interest of clarity, FIG. 1 only shows those portions of
microprocessor 1 that are relevant to an understanding of an embodiment of
the present invention. Details of general construction for DSPs are well
known, and may be found readily elsewhere. For example, U.S. Pat. No.
5,072,418 issued to Frederick Boutaud, et al, describes a DSP in detail
and is incorporated herein by reference. U.S. Pat. No. 5,329,471 issued to
Gary Swoboda, et al, describes in detail how to test and emulate a DSP and
is incorporated herein by reference. Details of portions of microprocessor
1 relevant to an embodiment of the present invention are explained in
sufficient detail hereinbelow, so as to enable one of ordinary skill in
the microprocessor art to make and use the invention.
In microprocessor 1 there are shown a central processing unit (CPU) 10,
data memory 22, program memory 23, peripherals 60 and an external memory
interface (EMIF) with a direct memory access (DMA) 61. CPU 10 further has
an instruction fetch/decode unit 10a-c, a plurality of execution units,
including an arithmetic and load/store unit D1, a multiplier M1, an
ALU/shifter unit S1, an arithmetic logic unit ("ALU") L1, a shared
multiport register file 20a from which data are read and to which data are
written. Decoded instructions are provided from the instruction
fetch/decode unit 10a-c to the functional units D1, M1, S1, and L1 over
various sets of control lines which are not shown. Data are provided
to/from the register file 20a from/to to load/store units D1 over a first
set of busses 32a, to multiplier M1 over a second set of busses 34a, to
ALU/shifter unit S1 over a third set of busses 36a and to ALU L1 over a
fourth set of busses 38a. Data are provided to/from the memory 22 from/to
the load/store units D1 via a fifth set of busses 40a. Note that the
entire data path described above is duplicated with register file 20b and
execution units D2, M2, S2, and L2. Instructions are fetched by fetch unit
10a from instruction memory 23 over a set of busses 41. Emulation
circuitry 50 provides access to the internal operation of integrated
circuit 1 which can be controlled by an external test/development system
(XDS) 51.
External test system 51 is representative of a variety of known test
systems for debugging and emulating integrated circuits. One such system
is described in U.S. Pat. No. 5,535,331 which is incorporated herein by
reference. Test circuitry 52 contains control registers and parallel
signature analysis circuitry for testing integrated circuit 1.
Note that the memory 22 and memory 23 are shown in FIG. 1 to be a part of a
microprocessor 1 integrated circuit, the extent of which is represented by
the box 42. The memories 22-23 could just as well be external to the
microprocessor 1 integrated circuit 42, or part of it could reside on the
integrated circuit 42 and part of it be external to the integrated circuit
42. These are matters of design choice. Also, the particular selection and
number of execution units are a matter of design choice, and are not
critical to the invention.
When microprocessor 1 is incorporated in a data processing system,
additional memory or peripherals may be connected to microprocessor 1, as
illustrated in FIG. 1. For example, Random Access Memory (RAM) 70, a Read
Only Memory (ROM) 71 and a Disk 72 are shown connected via an external bus
73. Bus 73 is connected to the External Memory Interface (EMIF) which is
part of functional block 61 within microprocessor 42. A Direct Memory
Access (DMA) controller is also included within block 61. The DMA
controller is generally used to move data between memory and peripherals
within microprocessor 1 and memory and peripherals which are external to
microprocessor 1.
FIG. 2 is a block diagram of the execution units and register files of the
microprocessor of FIG. 1 and shows a more detailed view of the buses
connecting the various functional blocks. In this figure, all data busses
are 32 bits wide, unless otherwise noted. Bus 40a has an address bus DA1
which is driven by mux 200a. This allows an address generated by either
load/store unit D1 or D2 to provide an address for loads or stores for
register file 20a. Data Bus LD1 loads data from an address in memory 22
specified by address bus DA1 to a register in load unit D1. Unit D1 may
manipulate the data provided prior to storing it in register file 20a.
Likewise, data bus ST1 stores data from register file 20a to memory 22.
Load/store unit D1 performs the following operations: 32-bit add,
subtract, linear and circular address calculations. Load/store unit D2
operates similarly to unit D1, with the assistance of mux 200b for
selecting an address.
ALU unit L1 performs the following types of operations: 32/40 bit
arithmetic and compare operations; left most 1, 0, bit counting for 32
bits; normalization count for 32 and 40 bits; and logical operations. ALU
L1 has input src1 for a 32 bit source operand and input src2 for a second
32 bit source operand. Input msb.sub.-- src is an 8 bit value used to form
40 bit source operands. ALU L1 has an output dst for a 32 bit destination
operands. Output msb.sub.-- dst is an 8 bit value used to form 40 bit
destination operands. Two 32 bit registers in register file 20a are
concatenated to hold a 40 bit operand. Mux 211 is connected to input src1
and allows a 32 bit operand to be obtained from register file 20a via bus
38a or from register file 20b via bus 210. Mux 212 is connected to input
src2 and allows a 32 bit operand to be obtained from register file 20a via
bus 38a or from register file 20b via bus 210. ALU unit L2 operates
similarly to unit L1.
ALU/shifter unit S1 performs the following types of operations: 32 bit
arithmetic operations; 32/40 bit shifts and 32 bit bit-field operations;
32 bit logical operations; | | |