|
|  Get related patents on CD |
| United States Patent | 6055649 |
| Link to this page | http://www.wikipatents.com/6055649.html |
| Inventor(s) | Deao; Douglas E. (Brookshire, TX); Seshan; Natarajan (Houston, TX); Lell; Anthony J. (Houston, TX) |
| Abstract | A data processing system on an integrated circuit 42 with microprocessor 1
and peripheral devices 60-61 is provided with an emulation unit 50 which
allows debugging and emulation of integrated circuit 42 when connected to
an external test system 51. Microprocessor 1 has in instruction execution
pipeline which has several execution phases which involve fetch/decode
units 10a-c and functional execution units 12, 14, 16 and 18. The pipeline
of microprocessor 1 is unprotected so that memory access latency to data
memory 22 and register file 20 can be utilized by system program code
which is stored in instruction memory 23. Emulation unit 50 provides means
for emulating the unprotected pipeline of microprocessor 1 and for rapidly
uploading and downloading memories 22-23. Emulation unit 50 operates in a
manner to prevent extraneous operations from occurring which could
otherwise affect memories 22-23 or peripheral devices 60-61 during
emulation. |
| |
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 6055649 |
|
|
Processor test port with scan chains and data streaming |
|
|
|
|
|
| Publication Date |
April 25, 2000 |
|
|
|
|
|
| Filing Date |
November 19, 1997 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is related to coassigned applications Ser. No. 08/783,382,
(TI-22105); Ser. No. 09/008,909, (TI-22106); Ser. No. 08/788,751,
(TI-22108); Ser. No. 09/012,676, (TI-22109); Ser. No. 09/012,380,
(TI-23604); Ser. No. 09/012,381, (TI-24333); Ser. No. 09/012,324,
(TI-24334); Ser. No. 09/012,693, (TI-24335); Ser. No. 09/012,325,
(TI-24942); Ser. No. 08/974,742, (TI-24946); Ser. No. 08/974,741,
(TI-24947); Ser. No. 09/012,332, (TI-24956); Ser. No. 08/974,589,
(TI-25049); Ser. No. 08/974,014 (TI-25112); Ser. No. 08/974,744,
(TI-25113); Ser. No. 09/012,327, (TI-25248); Ser. No. 09/012,329,
(TI-25309); Ser. No. 09/012,326, (TI-25310); and Ser. No. 09/012,813,
(TI-25311), all filed contemporaneously herewith and incorporated herein
by reference. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Description  |
|
|
NOTICE
(C) Copyright 1997 Texas Instruments Incorporated. A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of
the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
TECHNICAL FIELD OF THE INVENTION
This invention relates to microprocessors, and particularly relates to architectures of very long instruction word processors.
BACKGROUND OF THE INVENTION
As the technology for manufacturing integrated circuits advances, more and more logic functions may be included in a single integrated circuit device. Modern integrated circuit (IC) devices include large numbers of gates on a single
semiconductor chip, with these gates interconnected so as to perform multiple and complex functions, such as, for example, those in a general-purpose microprocessor. The manufacture of such circuits incorporating such Very Large Scale Integration (VLSI)
requires that the fabrication of the circuit be error free, as some manufacturing defects may prevent it from performing all of the functions that it is designed to perform. This requires verification of the design of the circuit and also various types
of electrical testing after manufacture.
However, as the complexity of the circuit increases, so does the cost and difficulty of verifying and electrically testing each of the devices in the circuit. From an electrical test standpoint, in order to totally verify that each gate in a
VLSI circuit functions properly, one must ideally be able to exercise each of the gates not only individually (in the digital sense, determining that it is neither stuck-open nor stuck-closed), but also in conjunction with the other gates in the circuit
in all possible combinations of operations. This is normally accomplished by automated testing equipment (ATE) that employs test vectors to perform the desired tests. A test vector describes the desired test input (or signals), associated clock pulse
(or pulses), and expected test output (or signals) for every package pin during a period of time, often in an attempt to "test" a particular gate (or macro). For complex circuitry, this may involve a large number of test vectors and accordingly a long
test time. Macro and cell are used herein to mean the same thing and may be used interchangeably.
Circuit designers have used stuck-fault modeling techniques in improving the efficiency of the testing of such VLSI circuits. Stuck-fault modeling is directed not to stuck-open or stuck-closed defects in individual gates, but to the effect of
such defective gates (and defective interconnections) resulting in stuck-high and stuck-low nodes of the logic circuit. Minimum patterns of test vectors are then derived for the exercising of the logic circuit. Applying such test vectors to the circuit
detects stuck-high and stuck-low nodes if defects are present. Such techniques have been successful in improving the test efficiency of current generation VLSI circuits.
In addition, specific circuit configurations in the VLSI circuit may have some of its gates inaccessible for all but a special combination of signals, thereby hiding a fault unless a very specific pattern of signals is presented. However, the
cost of performing such testing on 100% of the manufactured circuits is staggering, considering the high cost of the test equipment required to exercise each circuit in conjunction with the long time required to present each possible combination to each
gate. This has in the past forced integrated circuit manufacturers to test less than all of the active devices in a chip, with the attendant quality levels of the product being less than optimal. Thus, one of the major problems in integrated circuit
design is the ability to adequately test the final IC design, and this problem increases with increasing complexity of the integrated circuit.
One way to address this problem is through design for test (DFT). The key concepts in DFT are controllability and observability. Controllability is the ability to set and reset the state of every node in the circuit, while observability is the
ability to observe either directly or indirectly the state of any node in the circuit. The purpose of DFT is to increase the ability to control and observe internal and external nodes from external inputs/outputs. That is, DFT techniques may be
employed for logic verification and DC parametric tests.
Designing testability into any circuit will affect the circuitry to some degree. Additional logic will probably have to be added. This additional logic will increase the amount of silicon required to implement the design. The savings from
enhanced testability do not usually show up until the development time and testing costs of the circuit and its end system are analyzed.
In conjunction with the stuck-fault modeling and associated test generation, other circuitry may be included in the VLSI circuit specifically designed to improving its testability. One type of test circuitry is a scan path in the logic circuit.
A scan path consists of a chain of synchronously clocked master/slave latches (or registers), each of which is connected to a particular node in the logic circuit. These latches can be loaded with a serial data stream ("scan in") presetting the logic
circuit nodes to a predetermined state. The logic circuit then can be exercised in normal fashion, with the result of the operation (at each of the nodes having a scan latch) stored in its respective latch. By serially unloading the contents of the
latches ("scan out"), the result of the particular test operation at the associated nodes is read out and may be analyzed for improper node operation. Repetition of this operation with a number of different data patterns effectively tests all necessary
combinations of the logic circuit, but with a reduced test time and cost compared to separately testing each active component or cell and all their possible interactions. Scan paths permit circuit initialization by directly writing to the latches (or
registers) and directly observing the contents of the latches (or registers). Using scan paths helps to reduce the quantity of test vectors compared to traditional "functional mode" approaches. Techniques for scanning such data are discussed by E. J.
McCluskey in A Survey of Design for Testability Scan Techniques, VLSI Design (Vol. 5, No. 12, pp. 38-61, December 1984).
Also as VLSI technology is advancing, users of integrated circuits are desiring specially designed and constructed integrated circuits, for performing functions customized for the user's application. Such integrated circuits have been called
Application-Specific Integrated Circuits (ASICs). For an ASIC device to be cost-competitive with general purpose microcomputers which may have special functions implemented in programmable firmware, and cost-competitive with a board design made up of
smaller scale integrated circuits, the design time of the ASIC circuit must be short and the ASIC circuit must be manufacturable and testable at low cost. Accordingly, it is useful for such circuits to be modular in design, with each of the modules
performing a certain function, so that a new ASIC circuit may be constructed by combining previously-designed circuit modules. Such an approach can also be used for non-ASIC microcomputers and microprocessors. Regardless of the end product, the use of
a modular approach allows the designer to use logic which has previously been verified, and proven manufacturable. However, if logic modules containing existing scan paths are placed into a new circuit application, new test patterns will generally be
required for the new device, thereby lengthening the design/manufacture cycle time.
A modular approach to utilizing scan paths and other testability circuits has been used to provide thorough coverage of all possible faults in an efficient manner. However, this approach utilizes system buses to set up and operate the scan test,
so that even though each module is tested independently, the test pattern designed for a given module depends upon the operation of other modules in the logic circuit for purposes of bus control and module selection. This results in the testability of a
particular module depending upon the fault-free operation of other modules. In addition, the automatic test program generator (ATPG) program which sets the conditions for test of a given module depends upon the position of the module relative to other
modules, and upon the operating features of such other modules. While reduced test times and costs are thus achieved by such modularity, the use of system buses to load and unload the scan paths in the individual modules may not only affect the
operation of the particular module, but is likely to also preclude "porting" of the test program for a given module from one logic circuit to another.
Recently, MegaModules have been used in the design of ASICs. (MegaModule is a trademark of Texas Instruments Incorporated.) Types of MegaModules include SRAMs, FIFOs, register files, RAMs, ROMs, universal asynchronous receiver-transmitters
(UARTs), programmable logic arrays and other such logic circuits. MegaModules are usually defined as integrated circuit modules of at least 500 gates in complexity and having a complex ASIC macro function. These MegaModules may be predesigned and
stored in an ASIC design library. The MegaModules can then be selected by the designer and placed within a certain area on the desired IC chip. This allows ASIC designers to integrate MegaModules into their logic as easily as simple macros.
Another solution to this testing problem of an ASIC is the use of a so-called Parallel Module Test (PMT), which is often referred to as a "direct connect" scheme. (Parallel Module Test is a trademark of Texas Instruments Incorporated.) PMT is a
direct connect scheme, because it connects external pins to a MegaModule bypassing all other logic, buffers, etc. It is primarily intended as a logic verification testability scheme and has recently been enhanced to address limited VIH/VIL and ICCQ
testability schemes. However, even PMT may have problems since the logic states of the ASIC's circuitry may be disturbed as part of the test process during test selection and enabling.
Another solution is the test access port and boundary-scan architecture defined by the IEEE 1149.1 standard, a so-called JTAG test port. IEEE 1149.1 is primarily intended as a system test solution. The IEEE 1149.1 standard requires a minimum of
four package pins to be dedicated to the test function. The IEEE 1149.1 standard requires boundary scan cells for each I/O buffer, which adds data delay to all normal operation function pins as well as silicon overhead. Although it has "hooks" for
controlling some internal testability schemes, it is not optimized for chip-level testing. IEEE 1149.1 does not explicitly support testing of internal DC parametrics.
Software breakpoints (SWBP) provide another mechanism to allow the debug of microprocessor code and to evaluate performance. A SWBP is typically accomplished through opcode replacement, provided the program resides in a writable memory module
which allows the opcode at the stop point to be replaced in memory with the software breakpoint opcode. In most machines, when a SWBP opcode reaches the first execute stage of an instruction execution pipeline, it causes the pipeline to stop advancing
or trap to an interrupt service routine, and set a debug status bit indicating the pipeline has stopped or trapped. In processors classified as protected pipelines, instructions fetched into the pipeline after the SWBP are not executed. Instructions
that are already in the pipeline are allowed to complete. To restart execution the pipeline can be cleared and then restarted by simply refetching the opcode at the SWBP memory address after the opcode is replaced in memory with the original opcode.
Microprocessor designers have increasingly endeavored to exploit parallelism to improve performance. One parallel architecture that has found application in some modern microprocessors is the very long instruction word, or VLIW, architecture.
VLIW architecture microprocessors are called that because they handle VLIW format instructions.
A VLIW format instruction is a long fixed-width instruction that encodes multiple concurrent operations. VLIW systems use multiple independent functional units. Instead of issuing multiple independent instructions to the units, a VLIW system
combines the multiple operations into one very long instruction. In a VLIW system, computer instructions for multiple integer operations, floating point operations, and memory references may be combined in a single, wide, VLIW instruction.
Testing and debugging such a complex pipeline is difficult, even when the techniques described in the preceding paragraphs are used. These and other disadvantages of the prior art are overcome by the present invention, however, and improved
methods and apparatus for chip-level testing, as well as system-level debugging, are provided.
SUMMARY OF THE INVENTION
In accordance with the present invention, during the debug process of a data processing system it is advantageous to provide high speed downloads of program memory and high speed uploads and downloads of data memory. The data streaming process
according to the present invention eliminates communication overhead previously associated with a serial scan test access port by providing a continuous stream of data on the scan channel of the test access port.
A method for debugging a data processing system in which a processor has a test port for transferring data into and out of the processor includes the following steps:
transferring first data into a first memory element within the processor via the test port;
moving the first data from the first memory element to a different memory element accessible to said processor by executing at least one instruction within the processor in response to the step of transferring data;
ceasing instruction execution within the processor; and
repeating the transferring, moving and ceasing steps until a plurality of data is transferred.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features and advantages of the present invention will become apparent by reference to the following detailed description when considered in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram of a digital signal processor (DSP), showing components thereof pertinent to an embodiment of the present invention;
FIG. 2 is a block diagram of the functional units, data paths and register files of FIG. 1;
FIG. 3 shows the addressing mode register (AMR) of the DSP of FIG. 1;
FIG. 4 shows the control status register (CSR) which contains control and status bits of the DSP of FIG. 1;
FIG. 5 depicts a general-purpose input register (IN) which supports 32 general-purpose input signals of the DSP of FIG. 1;
FIG. 6 depicts a general-purpose output register (OUT) which supports 32 general-purpose output signals of the DSP of FIG. 1;
FIG. 7 illustrates the register storage scheme for 40-bit data of the DSP of FIG. 1;
FIGS. 8A-8J show an opcode map for the DSP of FIG. 1;
FIG. 9 shows the basic format of a fetch packet of the DSP of FIG. 1;
FIG. 10A depicts a fetch packet of FIG. 9 with fully serial p-bits;
FIG. 10B depicts a fetch packet of FIG. 9 with fully parallel p-bits;
FIG. 10C depicts a fetch packet of FIG. 9 with partially serial p-bits;
FIG. 11 shows the phases of the pipeline of the DSP of FIG. 1;
FIG. 12 shows the branch instruction phases;
FIG. 13 shows the operation of the pipeline of the DSP of FIG. 1 based on clock cycles and fetch packets;
FIG. 14 depicts fetch packet n, which contains three execute packets, shown followed by six fetch packets (n+1 through n+6), each with one execution packet (containing 8 parallel instructions);
FIG. 15 is a block diagram of an MTAP to Test Port Interface for the processor of FIG. 1;
FIG. 16 is a timing diagram of a MegaModule Reset Sequence for the processor of FIG. 1;
FIG. 17A shows the interrupt flag register (IFR) which contains the status of INT4-INT15 and NMI;
FIG. 17B show the interrupt enable register (IER) of the DSP of FIG. 1;
FIG. 17C shows the interrupt set register (ISR), which allows to setting or clearing interrupts manually in the IFR;
FIG. 17D shows the interrupt clear register (ICR), which allows to setting or clearing interrupts manually in the IFR;
FIG. 18 is a timing diagram of detection of Analysis interrupts for the processor of FIG. 1;
FIGS. 19A and 19B illustrate two analysis interrupt related instructions, SWI and B ARP;
FIG. 20 is a block diagram describing MTAP to CPU Interface Signals for the MTAP of FIG. 15;
FIG. 21 is a block diagram of an MTAP to MegaModule Domain Interface for the processor of FIG. 1;
FIG. 22 is a state diagram of the test port states for the processor of FIG. 1;
FIG. 23A is a timing diagram of a clock switch from Functional Run to Scan on UCLK for the processor of FIG. 1;
FIG. 23B is a timing diagram of a clock switch from Functional Run on TCLK for the processor of FIG. 1;
FIG. 23C is a timing diagram of a clock switch from Functional Run on UCLK to Functional Run on TCLK for the processor of FIG. 1;
FIG. 24 is a table of a scan chain for a Data Scan based on the MSEND bits for the processor of FIG. 1;
FIG. 25 is a timing diagram showing various cases of halting for the processor of FIG. 1;
FIG. 26 is a circuit diagram of circuitry to form signal ERDY;
FIG. 27A is a timing diagram of a CPU test port requested halt during interrupt processing for the processor of FIG. 1;
FIG. 27B is a timing diagram illustrating a Test Port Requested Test Halt;
FIG. 28 is a timing diagram of a pipeline halt showing a pipeline management process for emulation for the processor of FIG. 1;
FIG. 29 is a timing diagram showing a pipeline restoration process after emulation for the processor of FIG. 1;
FIG. 30A illustrates an analysis control register for the processor of FIG. 1;
FIG. 30B illustrates an analysis data register for the processor of FIG. 1;
FIG. 30C illustrates an analysis data interrupt return pointer register for the processor of FIG. 1;
FIG. 30D illustrates a data streaming register for the processor of FIG. 1;
FIG. 31 is a timing diagram of the instruction execution pipeline for the processor of FIG. 1 showing various pipeline phases;
FIG. 32 is a block diagram illustrating pin connections to a megamodule in the processor of FIG. 1;
FIG. 33 is a block diagram illustrating JTAG instruction and data register paths for the processor of FIG. 1;
FIG. 34A illustrates JTAG instruction register contents when Strap status is selected in the registers of FIG. 33;
FIG. 34B illustrates JTAG instruction register contents when Stop Emulation status is selected in the registers of FIG. 33;
FIG. 34C illustrates JTAG instruction register contents when Real Time Emulation status is selected in the registers of FIG. 33;
FIG. 34D illustrates JTAG instruction register contents when Emulation Error status is selected in the registers of FIG. 33;
FIG. 35 is a block diagram of a JTAG to MPSD Interface for the processor of FIG. 1;
FIG. 36 illustrates the emulation control register of FIG. 33;
FIG. 37 is a block diagram of a code state machine (CSM) for the MTAP of the processor of FIG. 1;
FIG. 38 is a schematic of a clock source switch for the CSM of FIG. 37;
FIG. 39 is a schematic of circuitry to generate an EVTA interrupt for the processor of FIG. 1;
FIG. 40 illustrates the counter register of FIG. 33;
FIG. 41 is a block diagram of domain interconnections for the processor of FIG. 1;
FIG. 42 is a block diagram illustrating a stream scan register within the MTAP of FIG. 41;
FIG. 43 is a schematic of EMU pin connection for the processor of FIG. 1; and
FIG. 44 is a block diagram of a JTAG TAP configuration for the processor of FIG. 1.
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
FIG. 1 is a block diagram of a microprocessor 1 which has an embodiment of the present invention. Microprocessor 1 is a VLIW digital signal processor ("DSP"). In the interest of clarity, FIG. 1 only shows those portions of microprocessor 1 that
are relevant to an understanding of an embodiment of the present invention. Details of general construction for DSPs are well known, and may be found readily elsewhere. For example, U.S. Pat. No. 5,072,418 issued to Frederick Boutaud, et al,
describes a DSP in detail and is incorporated herein by reference. U.S. Pat. No. 5,329,471 issued to Gary Swoboda, et al, describes in detail how to test and emulate a DSP and is incorporated herein by reference. Details of portions of microprocessor
1 relevant to an embodiment of the present invention are explained in sufficient detail hereinbelow, so as to enable one of ordinary skill in the microprocessor art to make and use the invention.
In microprocessor 1 there are shown a central processing unit (CPU) 10, data memory 22, program memory 23, peripherals 60 and an external memory interface (EMIF) with a direct memory access (DMA) 61. CPU 10 further has an instruction
fetch/decode unit 10a-c, a plurality of execution units, including an arithmetic and load/store unit D1, a multiplier M1, an ALU/shifter unit S1, an arithmetic logic unit ("ALU") L1, a shared multiport register file 20a from which data are read and to
which data are written. Decoded instructions are provided from the instruction fetch/decode unit 10a-c to the functional units D1, M1, S1, and L1 over various sets of control lines which are not shown. Data are provided to/from the register file 20a
from/to to load/store units D1 over a first set of busses 32a, to multiplier M1 over a second set of busses 34a, to ALU/shifter unit S1 over a third set of busses 36a and to ALU L1 over a fourth set of busses 38a. Data are provided to/from the memory 22
from/to the load/store units D1 via a fifth set of busses 40a. Note that the entire data path described above is duplicated with register file 20b and execution units D2, M2, S2, and L2. Instructions are fetched by fetch unit 10a from instruction
memory 23 over a set of busses 41. Emulation circuitry 50 provides access to the internal operation of integrated circuit 1 which can be controlled by an external test/development system (XDS) 51.
External test system 51 is representative of a variety of known test systems for debugging and emulating integrated circuits. One such system is described in U.S. Pat. No. 5,535,331 which is incorporated herein by reference. Test circuitry 52
contains control registers and parallel signature analysis circuitry for testing integrated circuit 1.
Note that the memory 22 and memory 23 are shown in FIG. 1 to be a part of a microprocessor 1 integrated circuit, the extent of which is represented by the box 42. The memories 22-23 could just as well be external to the microprocessor 1
integrated circuit 42, or part of it could reside on the integrated circuit 42 and part of it be external to the integrated circuit 42. These are matters of design choice. Also, the particular selection and number of execution units are a matter of
design choice, and are not critical to the invention.
When microprocessor 1 is incorporated in a data processing system, additional memory or peripherals may be connected to microprocessor 1, as illustrated in FIG. 1. For example, Random Access Memory (RAM) 70, a Read Only Memory (ROM) 71 and a
Disk 72 are shown connected via an external bus 73. Bus 73 is connected to the External Memory Interface (EMIF) which is part of functional block 61 within microprocessor 42. A Direct Memory Access (DMA) controller is also included within block 61.
The DMA controller is generally used to move data between memory and peripherals within microprocessor 1 and memory and peripherals which are external to microprocessor 1.
FIG. 2 is a block diagram of the execution units and register files of the microprocessor of FIG. 1 and shows a more detailed view of the buses connecting the various functional blocks. In this figure, all data busses are 32 bits wide, unless
otherwise noted. Bus 40a has an address bus DA1 which is driven by mux 200a. This allows an address generated by either load/store unit D1 or D2 to provide an address for loads or stores for register file 20a. Data Bus LD1 loads data from an address
in memory 22 specified by address bus DA1 to a register in load unit D1. Unit D1 may manipulate the data provided prior to storing it in register file 20a. Likewise, data bus ST1 stores data from register file 20a to memory 22. Load/store unit D1
performs the following operations: 32-bit add, subtract, linear and circular address calculations. Load/store unit D2 operates similarly to unit D1, with the assistance of mux 200b for selecting an address.
ALU unit L1 performs the following types of operations: 32/40 bit arithmetic and compare operations; left most 1, 0, bit counting for 32 bits; normalization count for 32 and 40 bits; and logical operations. ALU L1 has input src1 for a 32 bit
source operand and input src2 for a second 32 bit source operand. Input msb.sub.-- src is an 8 bit value used to form 40 bit source operands. ALU L1 has an output dst for a 32 bit destination operands. Output msb.sub.-- dst is an 8 bit value used to
form 40 bit destination operands. Two 32 bit registers in register file 20a are concatenated to hold a 40 bit operand. Mux 211 is connected to input src1 and allows a 32 bit operand to be obtained from register file 20a via bus 38a or from register
file 20b via bus 210. Mux 212 is connected to input src2 and allows a 32 bit operand to be obtained from register file 20a via bus 38a or from register file 20b via bus 210. ALU unit L2 operates similarly to unit L1.
ALU/shifter unit S1 performs the following types of operations: 32 bit arithmetic operations; 32/40 bit shifts and 32 bit bit-field operations; 32 bit logical operations; branching; and constant generation. ALU S1 has input scr1 for a 32 bit
source operand and input src2 for a second 32 bit source operand. Input msb.sub.-- src is an 8 bit value used to form 40 bit source operands. ALU S1 has an output dst for a 32 bit destination operands. Output msb.sub.-- dst is an 8 bit value used to
form 40 bit destination operands. Mux 213 is connected to input src2 and allows a 32 bit operand to be obtained from register file 20a via bus 36a or from register file 20b via bus 210. ALU unit S2 operates similarly to unit S1, but can additionally
perform register transfers to/from the control register file 102.
Multiplier M1 performs 16.times.16 multiplies. Multiplier M1 has input src1 for a 32 bit source operand and input src2 for a 32 bit source operand. ALU S1 has an output dst for a 32 bit destination operands. Mux 214 is connected to input src2
and allows a 32 bit operand to be obtained from register file 20a via bus 34a or from register file 20b via bus 210. Multiplier M2 operates similarly to multiplier M1.
As depicted in FIG. 2, one unit (.S2) can read from and write to the control register file 102 using buses 220 and 221. Table 2 lists the control registers contained in the control register file, and briefly describes each. The control
registers are described more fully later herein. Each control register is accessed by the MVC instruction; see the MVC instruction description later herein.
TABLE 2 ______________________________________ Control Registers Abbreviation Name Description ______________________________________ AMR Addressing mode register Specifies whether to use linear or circular addressing for one of eight
registers; also contains sizes for circular addressing CSR Control status register Contains the global interrupt enable bit, cache control bits, and other miscellaneous control and status bits IFR Interrupt flag register Displays status of
interrupts ISR Interrupt set register Allows you to set pending interrupts manually ICR Interrupt clear register Allows you to clear pending interrupts manually IER Interrupt enable register Allows enabling/disabling of individual interrupts
ISTP Interrupt service table Points to the beginning of the pointer interrupt service table IRP Interrupt return pointer Contains the address to be used to return from a maskable interrupt NRP Nonmaskable interrupt Contains the address to be used return pointer to return from a nonmaskable interrupt IN General-purpose Contains 32 input signals input register OUT General-purpose Contains 32 output signals output register PCE1 Program counter Contains the address of the fetch packet that
contains the execute packet in the E1 pipeline stage PDATA.sub.-- O Program data out Contains 32 output signals; used by the STP instruction to write to program space ______________________________________
FIG. 3 shows the addressing mode register, (AMR). Eight registers (A4-A7, B4-B7) can perform circular addressing. For each of these registers, the AMR specifies the addressing mode. A 2-bit field for each register is used to select the address
modification mode: linear (the default) or circular mode. With circular addressing, the field also specifies which BK (block size) field to use for a circular buffer. In addition, the buffer must be aligned on a byte boundary equal to the block size.
The mode select field encoding is shown in Table 3.
TABLE 3 ______________________________________ Addressing Mode Field Encoding Mode Description ______________________________________ 00 Linear modification (default at reset) 01 Circular addressing using the BK0 field 10 Circular
addressing using the BK1 field 11 Reserved ______________________________________
The block size fields, BK0 and BK1, specify block sizes for circular addressing. The five bits in BK0 and BK1 specify the width. The formula for calculating the block size width is:
where N is the value in BK1 or BK0
Table 4 shows block size calculations for all 32 possibilities.
TABLE 4 ______________________________________ Block Size Calculations N Block Size ______________________________________ 00000 2 00001 4 00010 8 00011 16 00100 32 00101 64 00110 128 00111 256 01000 512 01001 1?024 01010 2?048
01011 4?096 01100 8?192 01101 16?384 01110 32?768 01111 65?536 10000 131,072 10001 262,144 10010 524,288 10011 1,048,576 10100 2,097,152 10101 4,194,304 10110 8,388,608 10111 16,777,216 11000 33,554,432 11001 67,108,864 11010 134,217,728
11011 268,435,456 11100 536,870,912 11101 1,073,741,824 11110 2,147,483,648 11111 4,294,967,296 ______________________________________
The control status register (CSR), shown in FIG. 4, contains control and status bits. The function of the bit fields in the CSR are shown in Table 5.
TABLE 5 ______________________________________ Control Status Register: Bit Fields, Read/Write Status and Function Bit Position Width BitField Name Function ______________________________________ 31-24 8 CPU ID CPU ID. Defines which CPU.
23-16 8 Rev ID Revision ID. Defines silicon revision of the CPU. 15-10 6 PWRD Control power down modes. The values will always be read as zero. 9 1 SAT The saturate bit, set when any unit performs a saturate, can be | | |