|
Description  |
|
|
RELATED CASES
This application discloses subject matter also disclosed in copending
applications Ser. Nos. 282,629, 283,139; 282,469; 282,538; and 282,540,
filed Dec. 9, 1988, and further discloses subject matter also disclosed in
prior copending applications Ser. No. 118,503, filed Nov. 9, 1987, all of
said applications being assigned to Tandem Computers Incorporated, the
assignee of this invention.
BACKGROUND OF THE INVENTION
This invention relates to computer systems, and more particularly to a I/O
processor control in a fault-tolerant multiprocessor system.
Highly reliable digital processing is achieved in various computer
architectures employing redundancy. For example, TMR (triple modular
redundancy) systems may employ three CPUs executing the same instruction
stream, along with three separate main memory units and separate I/O
devices which duplicate functions, so if one of each type of element
fails, the system continues to operate. Another fault-tolerant type of
system is shown in U.S. Pat. No. 4,228,496, issued to Katzman et al, for
"Multiprocessor System", assigned to Tandem Computers Incorporated.
Various methods have been used for synchronizing the units in redundant
systems; for example, in said prior application Ser. No. 118,503, filed
Nov. 9, 1987, by R. W. Horst, for "Method and Apparatus for Synchronizing
a Plurality of Processors", also assigned to Tandem Computers
Incorporated, a method of "loose" synchronizing is disclosed, in contrast
to other systems which have employed a lock-step synchronization using a
single clock, as shown in U.S. Pat. No. 4,453,215 for "Central Processing
Apparatus for Fault-Tolerant Computing", assigned to Stratus Computer,
Inc. A technique called "synchronization voting" is disclosed by Davies &
Wakerly in "Synchronization and Matching in Redundant Systems", IEEE
Transactions on Computers June 1978, pp. 531-539. A method for interrupt
synchronization in redundant fault-tolerant systems is disclosed by Yondea
et al in Proceeding of 15th Annual Symposium on Fault-Tolerant Computing,
June 1985, pp. 246-251, "Implementation of Interrupt Handler for Loosely
Synchronized TMR Systems". U.S. Pat. No. 4,644,498 for "Fault-Tolerant
Real Time Clock" discloses a triple modular redundant clock configuration
for use in a TMR computer system. U.S. Pat. No. 4,733,353 for "Frame
Synchronization of Multiply Redundant Computers" discloses a
synchronization method using separately-clocked CPUs which are
periodically synchronized by executing a synch frame.
As high-performance microprocessor devices have become available, using
higher clock speeds and providing greater capabilities, such as the Intel
80386 and Motorola 68030 chips operating at 25-MHz clock rates, and as
other elements of computer systems such as memory, disk drives, and the
like have correspondingly become less expensive and of greater capability,
the performance and cost of high-reliability processors has been required
to follow the same trends. In addition, standardization on a few operating
systems in the computer industry in general has vastly increased the
availability of applications software, so a similar demand is made on the
field of high-reliability systems; i.e., a standard operating system must
be available.
It is therefore the principal object of this invention to provide an
improved high-reliability computer system, particularly of the
fault-tolerant type. Another object is to provide an improved redundant,
fault-tolerant type of computing system, and one in which high performance
and reduced cost are both possible; particularly, it is preferable that
the improved system avoid the performance burdens usually associated with
highly redundant systems. A further object is to provide a
high-reliability computer system in which the performance, measured in
reliability as well as speed and software compatibility, is improved but
yet at a cost comparable to other alternatives of lower performance. An
additional object is to provide a high-reliability redundant computer
system which is capable of detecting faulty system components and placing
them off-line, then reintegrating repaired system components without
shutting down the system.
SUMMARY OF THE INVENTION
In accordance with one embodiment of the invention, a computer system
employs multiple identical CPUs executing the same instruction stream, and
has multiple, identical memory modules storing duplicates of the same
data.
I/O functions are implemented using multiple, identical I/O busses (in the
example, two such I/O busses are shown), and each of the I/O busses is
separately coupled to only one of the memory modules. A number of I/O
processors are coupled to two I/O busses, and I/O devices are coupled to
pairs of the FO processors but accessed by only one of the I/O processors
at a given time. Therefore, if an I/O processor becomes inoperative, any
L/O device connected to this I/O processor can be redesignated to be
accessed through the other I/O processor, without system shutdown. Since
one memory module is designated primary, only the I/O bus for this module
will be controlling the I/O processors, and I/O traffic between memory
module and I/O is not voted. The CPUs can access the I/O processors
through the memory modules (each access being voted just as the memory
accesses are voted), but the I/O processors can only access the memory
modules, not the CPUs; the I/O processors can only send interrupts to the
CPUs, and these interrupts are collected in the memory modules before
presenting to the CPUs. Thus synchronization overhead for I/O device
access is not burdening the CPUs, yet fault tolerance is provided. If an
I/O processor fails, the other one of the pair can take over control of
the I/O devices for this I/O processor by merely changing the addresses
used for the I/O device in the I/O page table maintained by the operating
system. In this manner, fault tolerance and reintegration of an I/O device
is possible without system shutdown, and yet without hardware expense and
performance penalty associated with voting and the like in these I/O
paths.
One of the features of the disclosed embodiment of the invention is ability
to replace faulty components, such as CPU modules or memory modules,
without shutting down the system. Thus, the system is available for
continuous use even though components may fail and have to be replaced. In
addition, the ability to obtain a high level of fault tolerance with fewer
system components, e.g., no fault-tolerant clocking needed, only two
memory modules needed instead of three, voting circuits minimized, etc.,
means that there are fewer components to fail, and so the reliability is
enhanced. That is, there are fewer failures because there are fewer
components, and when there are failures the components are isolated to
allow the system to keep running, while the components can be replaced
without system shutdown.
The CPUs of this system preferably use a commercially-available
high-performance microprocessor chip for which operating systems such as
Unix.TM. are available. The parts of the system which make it
fault-tolerant are either transparent to the operating system or easily
adapted to the operating system. Accordingly, a high-performance
fault-tolerant system is provided which allows comparability with
contemporary widely-used multi-tasking operating system and applications
software.
BRIEF DESCRIPTION OF THE DRAWINGS
The features believed characteristic of the invention are set forth in the
appended claims. The invention itself, however, as well as other features
and advantages thereof, may best be understood by reference to the
detailed description of a specific embodiment which follows, when read in
conjunction with the accompanying drawings, wherein:
FIG. 1 is an electrical diagram in block form of a computer system
according to one embodiment of the invention;
FIG. 2 is an electrical schematic diagram in block form of one of the CPUs
of the system of FIG. 1;
FIG. 3 is an electrical schematic diagram in block form of one of the
microprocessor chip used in the CPU of FIG. 2;
FIGS. 4 and 5 are timing diagrams showing events occurring in the CPU of
FIGS. 2 and 3 as, a function of time;
FIG. 6 is an electrical schematic diagram in block form of one of the
memory modules in the computer system of FIG. 1;
FIG. 7 is a timing diagram showing events occurring on the CPU to memory
busses in the system of FIG. 1;
FIG. 8 is an electrical schematic diagram in block form of one of the I/O
processors in the computer system of FIG. 1;
FIG. 9 is a timing diagram showing events vs. time for the transfer
protocol between a memory module and an I/O processor in the system of
FIG. 1;
FIG. 10 is a timing diagram showing events vs. time for execution of
instructions in the CPUs of FIGS. 1, 2 and 3;
FIG. 10a is a detail view of a part of the diagram of FIG. 10;
FIGS. 11 and 12 are timing diagrams similar to FIG. 10 showing events vs.
time for execution of instructions in the CPUs of FIGS. 1, 2 and 3;
FIG. 13 is an electrical schematic diagram in block form of the interrupt
synchronization circuit used in the CPU of FIG. 2;
FIGS. 14, 15, 16 and 17 are timing diagrams like FIGS. 10 or 11 showing
events vs. time for execution of instructions in the CPUs of FIGS. 1, 2
and 3 when an interrupt occurs, illustrating various scenarios;
FIG. 18 is a physical memory map of the memories used in the system of
FIGS. 1, 2, 3 and 6;
FIG. 19 is a virtual memory map of the CPUs used in the system of FIGS. 1,
2, 3 and 6;
FIG. 20 is a diagram of the format of the virtual address and the TLB
entries in the microprocessor chips in the CPU according to FIG. 2 or 3;
FIG. 21 is an illustration of the private memory locations in the memory
map of the global memory modules in the system of FIGS. 1, 2, 3 and 6; and
FIG. 22 is an electrical diagram of a fault-tolerant power supply used with
the system of the invention according to one embodiment.
DETAILED DESCRIPTION OF SPECIFIC EMBODIMENT
With reference to FIG. 1, a computer system using features of the invention
is shown in one embodiment having three identical processors 11, 12 and
13, referred to as CPU-A, CPU-B and CPU-C, which operate as one logical
processor, all three typically executing the same instruction stream; the
only time the three processors are not executing the same instruction
stream is in such operations as power-up self test, diagnostics and the
like. The three processors are coupled to two memory modules 14 and 15,
referred to as Memory-#1 and Memory-#2, each memory storing the same data
in the same address space. In a preferred embodiment, each one of the
processors 11, 12 and 13 contains its own local memory 16, as well,
accessible only by the processor containing this memory.
Each one of the processors 11, 12 and 13, as well as each one of the memory
modules 14 and 15, has its own separate clock oscillator 17; in this
embodiment, the processors are not run in "lock step", but instead are
loosely synchronized by a method such as is set forth in the
above-mentioned application Ser. No. 118,503, i.e., using events such as
external memory references to bring the CPUs into synchronization.
External interrupts are synchronized among the three CPUs by a technique
employing a set of busses 18 for coupling the interrupt requests and
status from each of the processors to the other two; each one of the
processors CPU-A, CPU-B and CPU-C is responsive to the three interrupt
requests, its own and the two received from the other CPUs, to present an
interrupt to the CPUs at the same point in the execution stream. The
memory modules 14 and 15 vote the memory references, and allow a memory
reference to proceed only when all three CPUs have made the same request
(with provision for faults). In this manner, the processors are
synchronized at the time of external events (memory references), resulting
in the processors typically executing the same instruction stream, in the
same sequence, but not necessarily during aligned clock cycles in the time
between synchronization events. In addition, external interrupts are
synchronized to be executed at the same point in the instruction stream of
each CPU.
The CPU-A processor 11 is connected to the Memory-#1 module 14 and to the
Memory-#2 module 15 by a bus 21; likewise the CPU-B is connected to the
modules 14 and 15 by a bus 22, and the CPU-C is connected to the memory
modules by a bus 23. These busses 21, 22, 23 each include a 32-bit
multiplexed address/data bus, a command bus, and control lines for address
and data strobes. The CPUs have control of these busses 21, 22 and 23, so
there is no arbitration, or bus-request and bus-grant.
Each one of the memory modules 14 and 15 is separately coupled to a
respective input/output bus 24 or 25, and each of these busses is coupled
to two (or more) input/output processors 26 and 27. The system can have
multiple I/O processors as needed to accommodate the I/O devices needed
for the particular system configuration. Each one of the input/output
processors 26 and 27 is connected to a bus 28-1 or 28-2, which may be of a
standard configuration such as a VMEbus.TM., and each bus 28-1 or 28-2 is
connected to one or more bus interface modules 29 for interface with a
standard I/O controller 30. Each bus interface module 29 is connected to
two of the busses 28-1 or 28-2, so failure of one I/O processor 26 or 27,
or failure of one of the bus channels 28-1 or 28-2, can be tolerated. The
I/O processors 26 and 27 can be addressed by the CPUs 11, 12 and 13
through the memory modules 14 and 15, and can signal an interrupt to the
CPUs via the memory modules. Disk drives, terminals with CRT screens and
keyboards, and network adapters, are typical peripheral devices operated
by the controllers 30. The controllers 30 may make DMA-type references to
the memory modules 14 and 15 to transfer blocks of data. Each one of the
I/O processors 26, 27, etc., has certain individual lines directly
connected to each one of the memory modules for bus request, bus grant,
etc.; these point-to-point connections are called "radials" and are
included in a group of radial lines 31.
A system status bus 32 is individually connected to each one of the CPUs
11, 12 and 13, to each memory module 14 and 15, and to each of the I/O
processors 26 and 27, for the purpose of providing information on the
status of each element. This status bus provides information about which
of the CPUs, memory modules and I/O processors is currently in the system
and operating properly.
An acknowledge/status bus 33 connecting the three CPUs and two memory
modules includes individual lines by which the modules 14 and 15 send
acknowledge signals to the CPUs when memory requests are made by the CPUs,
and at the same time a status field is sent to report on the status of the
command and whether it executed correctly. The memory modules not only
check parity on data read from or written to the global memory, but also
check parity on data passing through the memory modules to or from the I/O
busses 24 and 25, as well as checking the validity of commands. It is
through the status lines in bus 33 that these checks are reported to the
CPUs 11, 12 and 13, so if errors occur a fault routine can be entered to
isolate a faulty component.
Even though both memory modules 14 and 15 are storing the same data in
global memory, and operating to perform every memory reference in
duplicate, one of these memory modules is designated as primary and the
other as back-up, at any given time. Memory write operations are executed
by both memory modules so both are kept current, and also a memory read
operation is executed by both, but only the primary module actually loads
the read-data back onto the busses 21, 22 and 23, and only the primary
memory module controls the arbitration for multi-master busses 24 and 25.
To keep the primary and back-up modules executing the same operations, a
bus 34 conveys control information from primary to back-up. Either module
can assume the role of primary at boot-up, and the roles can switch during
operation under software control; the roles can also switch when selected
error conditions are detected by the CPUs or other error-responsive parts
of the system.
Certain interrupts generated in the CPUs are also voted by the memory
modules 14 and 15. When the CPUs encounter such an interrupt condition
(and are not stalled), they signal an interrupt request to the memory
modules by individual lines in an interrupt bus 35, so the three interrupt
requests from the three CPUs can be voted. When all interrupts have been
voted, the memory modules each send a voted-interrupt signal to the three
CPUs via bus 35. This voting of interrupts also functions to check on the
operation of the CPUs. The three CPUs synch the voted interrupt CPU
interrupt signal via the inter-CPU bus 18 and present the interrupt to the
processors at a common point in the instruction stream. This interrupt
synchronization is accomplished without stalling any of the CPUs.
CPU Module:
Referring now to FIG. 2, one of the processors 11, 12 or 13 is shown in
more detail. All three CPU modules are of the same construction in a
preferred embodiment, so only CPU-A will be described here. In order to
keep costs within a competitive range, and to provide ready access to
already-developed software and operating systems, it is preferred to use a
commercially-available microprocessor chip, and any one of a number of
devices may be chosen. The RISC (reduced instruction set) architecture has
some advantage in implementing the loose synchronization as will be
described, but more-conventional CISC (complex instruction set)
microprocessors such as Motorola 68030 devices or Intel 80386 devices
(available in 20-MHz and 25-MHz speeds) could be used. High-speed 32-bit
RISC microprocessor devices are available from several sources in three
basic types; Motorola produces a device as part number 88000, MIPS
Computer Systems, Inc. and others produce a chip set referred to as the
MIPS type, and Sun Microsystems has announced a so-called SPARC.TM. type
(scalable processor architecture). Cypress Semiconductor of San Jose,
Calif., for example, manufactures a microprocessor referred to as part
number CY7C601 providing 20-MIPS (million instructions per second),
clocked at 33-MHz, supporting the SPARC standard, and Fujitsu manufactures
a CMOS RISC microprocessor, part number S-25, also supporting the SPARC
standard.
The CPU board or module in the illustrative embodiment, used as an example,
employs a microprocessor chip 40 which is in this case an R2000 device
designed by MIPS Computer Systems, Inc., and also manufactured by
Integrated Device Technolgy, Inc. The R2000 device is a 32-bit processor
using RISC architecture to provide high performance, e.g., 12-MIPS at
16.67-MHz clock rate. Higher-speed versions of this device may be used
instead, such as the R3000 that provides 20-MIPS at 25-MHz clock rate. The
processor 40 also has a co-processor used for memory management, including
a translation lookaside buffer to cache translations of logical to
physical addresses. The processor 40 is coupled to a local bus having a
data bus 41, an address bus 42 and a control bus 43. Separate instruction
and data cache memories 44 and 45 are coupled to this local bus. These
caches are each of 64K-byte size, for example, and are accessed within a
single clock cycle of the processor 40. A numeric or floating point
co-processor 46 is coupled to the local bus if additional performance is
needed for these types of calculations; this numeric processor device is
also commercially available from MIPS Computer Systems as part number
R2010. The local bus 41, 42, 43, is coupled to an internal bus structure
through a write buffer 50 and a read buffer 51. The write buffer is a
commercially available device, part number R2020, and functions to allow
the processor 40 to continue to execute Run cycles after storing data and
address in the write buffer 50 for a write operation, rather than having
to execute stall cycles while the write is completing.
In addition to the path through the write buffer 50, a path is provided to
allow the processor 40 to execute write operations bypassing the write
buffer 50. This path is a write buffer bypass 52 allows the processor,
under software selection, to perform synchronous writes. If the write
buffer bypass 52 is enabled (write buffer 50 not enabled) and the
processor executes a write then the processor will stall until the write
completes. In contrast, when writes are executed with the write buffer
bypass 52 disabled the processor will not stall because data is written
into the write buffer 50 (unless the write buffer is full). If the write
buffer 50 is enabled when the processor 40 performs a write operation, the
write buffer 50 captures the output data from bus 41 and the address from
bus 42, as well as controls from bus 43. The write buffer 50 can hold up
to four such data-address sets while it waits to pass the data on to the
main memory. The write buffer runs synchronously with the clock 17 of the
processor chip 40, so the processor-to-buffer transfers are synchronous
and at the machine cycle rate of the processor. The write buffer 50
signals the processor if it is full and unable to accept data. Read
operations by the processor 40 are checked against the addresses contained
in the four-deep write buffer 50, so if a read is attempted to one of the
data words waiting in the write buffer to be written to memory 16 or to
global memory, the read is stalled until the write is completed.
The write and read buffers 50 and 51 are coupled to an internal bus
structure having a data bus 53, an address bus 54 and a control bus 55.
The local memory 16 is accessed by this internal bus, and a bus interface
56 coupled to the internal bus is used to access the system bus 21 (or bus
22 or 23 for the other CPUs). The separate data and address busses 53 and
54 of the internal bus (as derived from busses 41 and 42 of the local bus)
are converted to a multiplexed address/data bus 57 in the system bus 21,
and the command and control lines are correspondingly converted to command
lines 58 and control lines 59 in this external bus.
The bus interface unit 56 also receives the acknowledge/status lines 33
from the memory modules 14 and 15. In these lines 33, separate status
lines 33-1 or 33-2 are coupled from each of the modules 14 and 15, so the
responses from both memory modules can be evaluated upon the event of a
transfer (read or write) between CPUs and global memory, as will be
explained.
The local memory 16, in one embodiment, comprises about 8-Mbyte of RAM
which can be accessed in about three or four of the machine cycles of
processor 40, and this access is synchronous with the clock 17 of this
CPU, whereas the memory access time to the modules 14 and 15 is much
greater than that to local memory, and this access to the memory modules
14 and 15 is asynchronous and subject to the synchronization overhead
imposed by waiting for all CPUs to make the request then voting. For
comparison, access to a typical commercially-available disk memory through
the I/O processors 26, 27 and 29 is measured in milliseconds, i.e.,
considerably slower than access to the modules 14 and 15. Thus, there is a
hierarchy of memory access by the CPU chip 40, the highest being the
instruction and data caches 44 and 45 which will provide a hit ratio of
perhaps 95% when using 64-KByte cache size and suitable fill algorithms.
The second highest is the local memory 16, and again by employing
contemporary virtual memory management algorithms a hit ratio of perhaps
95% is obtained for memory references for which a cache miss occurs but a
hit in local memory 16 is found, in an example where the size of the local
memory is about 8-MByte. The net result, from the standpoint of the
processor chip 40, is that perhaps greater than 99% of memory references
(but not I/O references) will be synchronous and will occur in either the
same machine cycle or in three or four machine cycles.
The local memory 16 is accessed from the internal bus by a memory
controller 60 which receives the addresses from address bus 54, and the
address strobes from the control bus 55, and generates separate row and
column addresses, and RAS and CAS controls, for example, if the local
memory 16 employs DRAMs with multiplexed addressing, as is usually the
case. Data is written to or read from the local memory via data bus 53. In
addition, several local registers 61, as well as non-volatile memory 62
such as NVRAMs, and high-speed PROMs 63, as may be used by the operating
system, are accessed by the internal bus; some of this part of the memory
is used only at power-on, some is used by the operating system and may be
almost continuously within the cache 44, and other may be within the
non-cached part of the memory map.
External interrupts are applied to the processor 40 by one of the pins of
the control bus 43 or 55 from an interrupt circuit 65 in the CPU module of
FIG. 2. This type of interrupt is voted in the circuit 65, so that before
an interrupt is executed by the processor 40 it is determined whether or
not all three CPUs are presented with the interrupt; to this end, the
circuit 65 receives interrupt pending inputs 66 from the other two CPUs 12
and 13, and sends an interrupt pending signal to the other two CPUs via
line 67, these lines being part of the bus 18 connecting the three CPUs
11, 12 and 13 together. Also, for voting other types of interrupts,
specifically CPU-generated interrupts, the circuit 65 can send an
interrupt request from this CPU to both of the memory modules 14 and 15 by
a line 68 in the bus 35, then receive separate voted-interrupt signals
from the memory modules via lines 69 and 70; both memory modules will
present the external interrupt to be acted upon. An interrupt generated in
some external source such as a keyboard or disk drive on one of the I/O
channels 28-1 or 28-2, for example, will not be presented to the interrupt
pin of the chip 40 from the circuit 65 until each one of the CPUs 11, 12
and 13 is at the same point in the instruction stream, as will be
explained.
Since the processors 40 are clock by separate clock oscillators 17, there
must be some mechanism for periodically bringing the processors 40 back
into synchronization. Even though the clock oscillators 17 are of the same
nominal frequency, e.g., 16.67-MHz, and the tolerance for these devices is
about 25-ppm (parts per million), the processors can potentially become
many cycles out of phase unless periodically brought back into synch. Of
course, every time an external interrupt occurs the CPUs will be brought
into synch in the sense of being interrupted at the same point in their
instruction stream (due to the interrupt synch mechanism), but this does
not help bring the cycle count into synch. The mechanism of voting memory
references in the memory modules 14 and 15 will bring the CPUs into synch
(in real time), as will be explained. However, some conditions result in
long periods where no memory reference occurs, and so an additional
mechanism is used to introduce stall cycles to bring the processors 40
back into synch. A cycle counter 71 is coupled to the clock 17 and the
control pins of the processor 40 via control bus 43 to count machine
cycles which are Run cycles (but not Stall cycles). This counter 71
includes a count register having a maximum count value selected to
represent the period during which the maximum allowable drift between CPUs
would occur (taking into account the specified tolerance for the crystal
oscillators); when this count register overflows action is initiated to
stall the faster processors until the slower processor or processors catch
up. This counter 71 is reset whenever a synchronization is done by a
memory reference to the memory modules 14 and 15. Also, a refresh counter
72 is employed to perform refresh cycles on the local memory 16, as will
be explained. In addition, a counter 73 counts machine cycle which are Run
cycles but not Stall cycles, like the counter 71 does, but this counter 73
is not reset by a memory reference; the counter 73 is used for interrupt
synchronization as explained below, and to this end produces the output
signals CC-4 and CC-8 to the interrupt synchronization circuit 65.
The processor 40 has a RISC instruction set which does not support
memory-to-memory instructions, but instead only memory-to-register or
register-to-memory instructions (i.e., load or store). It is important to
keep frequently-used data and the currently-executing code in local
memory. Accordingly, a block-transfer operation is provided by a DMA state
machine 74 coupled to the bus interface 56. The processor 40 writes a word
to a register in the DMA circuit 74 to function as a command, and writes
the starting address and length of the block to registers in this circuit
74. In one embodiment, the microprocessor stalls while the DMA circuit
takes over and executes the block transfer, producing the necessary
addresses, commands and strobes on the busses 53-55 and 21. The command
executed by the processor 40 to initiate this block transfer can be a read
from a register in the DMA circuit 74. Since memory management in the Unix
operating system relies upon demand paging, these block transfers will
most often be pages being moved between global and local memory and I/O
traffic. A page is 4-KBytes. Of course, the busses 21, 22 and 23 support
single-word read and write transfers between CPUs and global memory; the
block transfers referred to are only possible between local and global
memory.
The Processor:
Referring now to FIG. 3, the R2000 or R3000 type of microprocessor 40 of
the example embodiment is shown in more detail. This device includes a
main 32-bit CPU 75 containing thirty-two 32-bit general purpose registers
76, a 32-bit ALU 77, a zero-to-64 bit shifter 78, and a 32-by-32
multiply/divide circuit 79. This CPU also has a program counter 80 along
with associated incrementer and adder. These components are coupled to a
processor bus structure 81, which is coupled to the local data bus 41 and
to an instruction decoder 82 with associated control logic to execute
instructions fetched via data bus 41. The 32-bit local address bus 42 is
driven by a virtual memory management arrangement including a translation
lookaside buffer (TLB) 83 within an on-chip memory-management coprocessor.
Also in the memory-management coprocessor are exception and control
registers 83a and MMU registers 83b. The TLB 83 contains sixty-four
entries to be compared with a virtual address received from the
microprocessor block 75 via virtual address bus 84. The low-order 16-bit
part 85 of the bus 42 is driven by the low-order part of this virtual
address bus 84, and the high-order part is from the bus 84 if the virtual
address is used as the physical address, or is the tag entry from the TLB
83 via output 86 if virtual addressing is used and a hit occurs. The
control lines 43 of the local bus are connected to pipeline and bus
control circuitry 87, driven from the internal bus structure 81 and the
control logic 82.
The microprocessor block 75 in the processor 40 is of the RISC type in that
most instructions execute in one machine cycle, and the instruction set
uses register-to-register and load/store instructions rather than having
complex instructions involving mem | | |