WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multiple processor system having shared memory with private-write capability    
United States Patent4965717   
Link to this pagehttp://www.wikipatents.com/4965717.html
Inventor(s)Cutts, Jr.; Richard W. (Georgetown, TX); Mehta; Nikhil A. (Austin, TX); Jewett; Douglas E. (Austin, TX)
AbstractA computer system in a fault-tolerant configuration employs multiple identical CPUs executing the same instruction stream, with multiple, identical memory modules in the address space of the CPUs storing duplicates of the same data. Memory references. The multiple CPUs are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts can be synchronized by ensuring that all CPUs implement the interrupt at the same point in their instruction stream. Memory references by the multiple CPUs are voted by each of the memory modules. A private-write area is included in the shared memory space in the memory modules to allow functions such as software voting of state information unique to CPUs. All CPUs write state information to their private-write area, then all CPUs read all the private-write areas for functions such as detecting differences in interrupt cause or the like.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4965717
Multiple processor system having shared memory with private-write

     capability - US Patent 4965717 Drawing
Multiple processor system having shared memory with private-write capability
Inventor     Cutts, Jr.; Richard W. (Georgetown, TX); Mehta; Nikhil A. (Austin, TX); Jewett; Douglas E. (Austin, TX)
Owner/Assignee     Tandem Computers Incorporated (Cupertino, CA)
Patent assignment
All assignments
Publication Date     October 23, 1990
Application Number     07/283,573
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 13, 1988
US Classification     714/12 711/148 714/3 714/5 714/9 714/10 714/48
Int'l Classification     G06F 015/16
Examiner     Zache; Raulfe B.
Assistant Examiner    
Attorney/Law Firm     Arnold, White & Durkee
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 364/300 364/900
Patent Tags     multiple processor shared memory private-write capability
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4794601
Kikuchi
714/758
Dec,1988

[0 after 0 votes]
4785453
Chandran
714/25
Nov,1988

[0 after 0 votes]
4783731
Miyazaki
711/148
Nov,1988

[0 after 0 votes]
4783733
Greig
710/105
Nov,1988

[0 after 0 votes]
4779008
Kessels
327/147
Oct,1988

[0 after 0 votes]
4683570
Bedard
714/797
Jul,1987

[0 after 0 votes]
4672535
Katzman
710/38
Jun,1987

[0 after 0 votes]
4667287
Allen
709/234
May,1987

[0 after 0 votes]
4648035
Fava
711/202
Mar,1987

[0 after 0 votes]
4597084
Dynneson
714/805
Jun,1986

[0 after 0 votes]
4034347
Probert, Jr.
709/214
Jul,1977

[0 after 0 votes]
4015243
Kurpanek
710/107
Mar,1977

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A computer system comprising:

(a) multiple CPUs, a sequence of instructions executed separately by each one of said CPUs whereby the multiple CPUs are executing the same sequence of instructions,

(b) a common memory coupled to said CPUs and having memory space accessed by all said CPUs,

(c) a private memory space in said memory space of said common memory for storing state information for each CPU of said multiple CPUs, each said private memory space writable only by one CPU,

(d) said state information in each said private memory space for all of said multiple CPUs being readable by all of said multiple CPUs to thereby evaluate said state information for equality by each CPU.

2. A system according to claim 1 wherein said common memory includes a separate private memory space for each one of said multiple CPUs.

3. A system according to claim 1 wherein memory accesses made by said CPUs to said common memory are voted by said common memory before being executed.

4. A system according to claim 3 wherein memory accesses made by said CPUs to said private memory space are voted to compare addresses but not data.

5. A system according to claim 1 wherein said private memory space for all of said multiple CPUs has a given logical address associated with instructions executed by said CPUs, but is translated to a unique address for each private memory space before addressing said common memory.

6. A system according to claim 1 wherein there are three of said CPUs, and wherein said common memory includes a pair of redundant memories, said memory space of said common memory being duplicated in said pair of redundant memories.

7. A computer system having multiple CPUs comprising:

(a) a shared memory coupled to each of said multiple CPUs, the shared memory having memory space accessed by all of said multiple CPUs,

(b) each one of said multiple CPUs also having a separate private-write memory space in said shared memory for storing state information, each said private-write space writable only by one of said multiple CPUs;

(c) each said private-write memory space being readable by all of said multiple CPUs.

8. A system according to claim 7 wherein each one of said multiple CPUs is executing a given sequence of instructions, said given sequence being the same for all of said multiple CPUs.

9. A system according to claim 7 wherein said shared memory votes memory requests made by said multiple CPUs to said shared memory.

10. A system according to claim 9 wherein said shared memory votes write requests made to each of said private-write space by comparing addresses but not data.

11. A system according to claim 6 wherein there are three of said CPUs, and wherein said shared memory includes a pair of redundant memories, said memory space of said shared memory being duplicated in said pair of redundant memories.

12. A method of operating a computer system having multiple processors, comprising the steps of:

(a) storing data by each of said multiple processors in a shared memory having memory space accessed by all of said multiple processors,

(b) also storing information by each one of said multiple processors in a private memory space for each multiple processor writable only by one multiple processor.

13. A method according to claim 12 including the step of executing a given instruction stream in each one of said multiple processors.

14. A method according to claim 12 wherein said step of storing data includes voting memory requests to said shared memory made by said multiple processors.

15. A method according to claim 12 wherein step of storing information in private memory space includes making a write request to said private memory space for all of said multiple processors by each of said multiple processors but executing the write request only in one private memory space for each one of said processors.

16. A method according to claim 12 including the step of evaluating for equality said information from said private memory space by each one of said multiple processors.

17. A method according to claim 12 including the step of reading said information in said private memory space for all multiple processors by each multiple processor.

18. A method according to claim 17 including the step of executing a given instruction stream in each one of said multiple processors, and wherein said step of storing data includes voting memory requests to said shared memory made by said multiple processors.

19. A method according to claim 18 wherein said multiple processors are loosely synchronized upon said step of voting memory requests.

20. A method according to claim 12 wherein there are three of said processors, and wherein said step of storing data in a shared memory includes a storing said data in a pair of redundant memories, said memory space of said shared memory being duplicated in said pair of redundant memories.
 Description Submit all comments and votes
 


RELATED CASES

This application discloses subject matter also disclosed in copending application Ser. Nos. 282,469, 282,538, 282,540, 282,629, 283,139 and 283,141, filed Dec. 9, 1988, and Ser. No. 283,574, filed Dec. 13, 1988 and further discloses subject matter also disclosed in prior copending application Ser. No. 118,503, filed Nov. 9, 1987, all of said applications to Tandem Computers Incorporated, the assignee of this invention.

BACKGROUND OF THE INVENTION

This invention relates to computer systems, and more particularly to a memory arrangement for a fault-tolerant system using multiple CPUs.

Highly reliable digital processing is achieved in various computer architectures employing redundancy. For example, TMR (triple modular redundancy) systems may employ three CPUs executing the same instruction stream, along with three separate main memory units separate I/O devices which duplicate functions, so if one of each type of element fails, the system continues to operate. Another fault-tolerant type of system is shown in U.S. Pat. No. 4,228,496, issued to Katzman et al, for "Multiprocessor System", assigned to Tandem Computers Incorporated. Various methods have been used for synchronizing the units in redundant systems; for example, in said prior application Ser. No. 118,503, filed Nov. 9, 1987, R. W. Horst, for "Method and Apparatus for Synchronizing a Plurality of Processors", also assigned to Tandem Computers Incorporated, a method of "loose" synchronizing is disclosed, in contrast to other systems which have employed a lock-step synchronization using a single clock, as shown in U.S. Pat. No. 4,453,215 for "Central Processing Apparatus for Fault-Tolerant Computing", assigned to Stratus Computer, Inc. A technique called "synchronization voting" is disclosed by Davies & Wakerly in "Synchronization and Matching in Redundant Systems", IEEE Transactions on Computers Jun. 1978, pp. 531-539. A method for interrupt synchronization in redundant fault-tolerant systems is disclosed by Yondea et al in Proceeding of 15th Annual Symposium on Fault-Tolerant Computing, Jun. 1985, pp. 246-251, "Implementation of Interrupt Handler for Loosely Synchronized TMR Systems". U.S. Pat. No. 4,644,498 for "Fault-Tolerant Real Time Clock" discloses a triple modular redundant clock configuration for use in a TMR computer system. U.S. Pat. 4,733,353 for "Frame Synchronization of Multiply Redundant Computers" discloses a synchronization method using separately-clocked CPUs which are periodically synchronized by executing a synch frame.

As high-performance microprocessor devices have become available, using higher clock speeds and providing greater capabilities, such as the Intel 80386 and Motorola 68030 chips operating at 25-MHz clock rates, and as other elements of computer systems such as memory, disk drives, and the like have correspondingly become less expensive and of greater capability, the performance and cost of high-reliability processors has been required to follow the same trends. In addition, standardization on a few operating systems in the computer industry in general has vastly increased the availability of applications software, so a similar demand is made on the field of high-reliability systems; i.e., a standard operating system must be available.

It is therefore the principal object of this invention to provide an improved high-reliabiliy computer system, particularly of the fault-tolerant type. Another object is to provide an improved redundant, fault-tolerant type of computing system, and one in which high performance and reduced cost are both possible; particularly, it is preferable that the improved system avoid the performance burdens usually associated with highly redundant systems. A further object is to provide a high-reliability computer system in which the performance, measured in reliability as well as speed and software compatibility, is improved but yet at a cost comparable to other alternatives of lower performance. An additional object is to provide a high-reliability computer system which is capable of executing an operating system which uses virtual memory management with demand paging, and having protected (supervisory or "kernel") mode; particularly an operating system also permitting execution of multiple processes; all at a high level of performance.

SUMMARY OF THE INVENTION

In accordance with one embodiment of the invention, a computer system employs multiple identical CPUs executing the same instructions stream, and has multiple identical memory modules storing duplicates of the same data. In order to avoid imposing the performane burden of fault-tolerant operation on the CPUs themselves, and imposing the expense, complexity and timing problems of fault-tolerant clocking, the multiple CPUs each have their own separate and independent clocks, but are loosely synchronized, as by detecting events such as memory references and stalling any CPU ahead of others until all execute the function simultaneously; interrupts are also synchronized to the CPUs ensuring that the CPUs execute the interrupt at the same point in their instruction stream. The multiple asynchronous memory references by the separate CPUs are voted at separate ports of each of the mwmory modules at the time of the memory request, but read data is not voted when returned to the CPUs.

Although the memory modules are essentially duplicates or one another, storing the same date, there is still a need in some situations to be able to store data separately by each CPU in a manner such that the data is readable by all CPUs. Of course, the CPUs of the example embodiment have local memory (not in the memory modules but instead on the CPU modules) but this local memory is not accessible by the other CPUs. Thus, an area of private-write memory is included in the shared memory area, so that unique state information can be written by each CPU then read by the others to do a compare operation, for example. The private write is accessed in a manner such that the instruction streams of the CPUs are still identical, and addresses used are identical, so the integrity of the identical code stream is maintained. Voting of data is suspended when a private write operation is detected by the memory modules, since this data may differ, but the addresses and commands are still voted. The area used for private write may be changed, or eliminated, under control of the instruction stream. Accordingly, the ability to compare unique data is provided in a flexible manner, without bypassing the synchronization and voting mechanisms, and without disturbing the identical nature of the code executed by the multiple CPUs.

BRIEF DESCRIPTION OF THE DRAWINGS

The features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as other features and advantages thereof, may best be understood by reference to the detailed description of a specific embodiment which follows, when read in conjunction with the accompanying drawings, wherein:

FIG. 1 is an electrical diagram in block form of a computer system according to one embodiment of the invention;

FIG. 2 is an electrical schematic diagram in block form of one of the CPUs of the system of FIG. 1;

FIG. 3 is an electrical schematic diagram in block form of one of the microprocessor chip used in the CPU of FIG. 2;

FIGS. 4 and 5 are timing diagrams showing events occurring in the CPU of FIGS. 2 and 3 as a function of time;

FIG. 6 is an electrical schematic diagram in block form of one of the memory modules in the computer system of FIG. 1;

FIG. 7 is a timing diagram showing events occurring on the CPU to memory busses in the system of FIG. 1;

FIG. 8 is an electrical schematic diagram in block form of one of the I/O processors in the computer system of FIG. 1;

FIG. 9 is a timing diagram showing events vs. time for the transfer protocol between a memory module and an I/O processor in the system of FIG. 1;

FIG. 10 is a timing diagram showin events vs. time for execution of instructions in the CPUs of FIGS. 1, 2 and 3;

FIG. 10a is a detail view of a part of the diagram of FIG. 10;

FIGS. 11 and 12 are timing diagrams similar to FIG. 10 showing events vs. time for execution of instructions in the CPUs of FIGS. 1, 2 and 3;

FIG. 13 is an electrical schematic diagram in block form of the interrupt synchronization circuit used in the CPU of FIG. 2;

FIGS. 14, 15, 16 and 17 are timing diagrams like FIGS. 10 or 11 showing events vs. time for execution of instructions in the CPUs of FIGS. 1, 2 and 3 when an interrupt occurs, illustrating various scenarios;

FIG. 18 is a physical memory map of the memories used in the system of FIGS. 1, 2, 3 and 6;

FIG. 19 is a virtual memory map of the CPUs used in the system of FIGS. 1, 2, 3 and 6;

FIG. 20 is a diagram of the format of the virtual address and the TLB entries in the microprocessor chips in the CPU according to FIG. 2 or 3;

FIG. 21 is an illustration of the private memory locations in the memory map of the global memory modules in the system of FIGS. 1, 2, 3 and 6; and

FIG. 22 is an electrical diagram of a fault-tolerant power supply used with the system of the invention according to one embodiment.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENT

With reference to FIG. 1, a computer system using features of the invention is shown in one embodiment having three identical processors 11, 12 and 13, referred to as CPU-A, CPU-B and CPU-C, which operate as one logical processor, all three typically executing the same instruction stream; the only time the three processors are not executing the same instruction stream is in such operations as power-up self test, diagnostics and the like. The three processors are coupled to two memory modules 14 and 15, referred to as Memory-#1 and Memory-#2, each memory storing the same data in the same address space. In a preferred embodiment, each one of the processors 11, 12 and 13 contains its own local memory 16, as well, accessible only by the processor containing this memory.

Each one of the processors 11, 12 and 13, as well as each one of the memory modules 14 and 15, has its own separate clock oscillator 17; in this embodiment, the processors are not run in "lock-step", but instead are loosely synchronized by a method such as is set forth in the above-mentioned application Ser. No. 118,503, i.e., using events such as external memory references to bring the CPUs into synchronization. External interrupts are synchronized among the three CPUs by a technique employing a set of busses 18 for coupling the interrupt requests and status from each of the processor to the other two; each one of the processors CPU-A, CPU-B and CPU-C is responsive to the three interrupt requests, its own and the two received from the other CPUs, to present an interrupt to the CPUs at the same point in the execution stream. The memory modules 14 and 15 vote the memory references, and allow a memory reference to proceed only when all three CPUs have made the same request (with provision for faults). In this manner, the processors are synchronized at the time of external events (memory references), resulting in the processors typically executing the same instruction stream, in the same sequence, but not necessarily during aligned clock cycles in the time between synchronization events. In addition, external interrupts are snchronized to be executed at the same point in the instruction stream of each CPU.

The CPU-A processor 11 is connected to the Memory-#1 module 14 and to the Memory-#2 modules 15 by a bus 21; likewise the CPU-B is connected to the modules 14 and 15 by a bus 22, and the CPU-C is connected to the memory modules by a bus 23. These busses 21, 22, 23 each included a 32-bit multiplexed address/data bus, a command bus, and control lines for address and data strobes. The CPUs have control of these busses 21, 22 and 23, so there is no arbitration, or bus-request and bus-grant.

Each one of the memory modules 14 and 15 is separately coupled to a respective input/output bus 24 or 25, and each of these busses is coupled to two (or more) input/output processors 26 and 27. The system can have multiple I/O processors as needed to accommodate the I/O devices needed for the particular system configuration. Each one of the input/output processors 26 and 27 is connected to a bus 28, which may be of a standard configuration such as a VMEbus.TM., and each bus 28 is connected to one or more bus interface modules 29 for interface with a standard I/O controller 30. Each bus interface module 29 is connected to two of the busses 28, so failure of one I/O processor 26 or 27, or failure of one of the bus channels 28, can be tolerated. The I/O processors 26 and 27 can be addressed by the CPUs 11, 12 and 13 through the memory modules 14 and 15, and can signal an interrupt to the CPUs via the memory modules. Disk drives, terminals with CRT screens and keyboards, and network adapters, are typical peripheral devices operated by the controllers 30. The controllers 30 may make DMA-type reference to the memory modules 14 and 15 to transfer blocks of data. Each one of the I/O processors 26, 27, etc., has certain individual lines directly connected to each one of the memory modules for bus request, bus grant, etc.; these point-to-point connections are called "radials" and are included in a group of radial lines 31.

A system status bus 32 is individually connected to each one of the CPUs 11, 12 and 13, to each memory module 14 and 15, and to each of the I/O processor 26 and 27, for the purpose of providing information on the status of each element. This status bus provided information about which of the CPUs, memory modules and I/O processors is currently in the system and operating properly.

An acknowkedge/status bus 33 connecting the three CPUs and two memory modules includes individual lines by which the modules 14 and 15 send acknowledge signals to the CPUs when memory requests are made by the CPUs, and at the same time a status field is sent to report on the status of the command and whether it executed correctly. The memory modules not only check parity on data read from or written to the global memory, but also check parity on data passing through the memory modules to or from the I/O busses 24 and 25, as well as checking the validity of command. It is through the status lines in bus 33 that these checks are reported to the CPUs 11, 12 and 13, so if errors occur a fault routine can be entered to isolate a faulty component.

Even though both memory modules 14 and 15 are storing the same data in global memory, and operating to perform every memory reference in duplicate, one of these memory modules is designated as primary and the other as back-up, at any given time. Memory write operations are executed by both memory modules so both are kept current, and also a memory read operation is executed by both, but only the primary module actually loads the read-data back onto the busses 21, 22 and 23, and only the primary memory module controls the arbitration for multi-master busses 24 and 25. To keep the primary and back-up modules executing the same operations, a bus 34 conveys control information from primary to back-up. Either module can assume the role of primary at boot-up, and the roles can switch during operation under software control; the roles can also switch when selected error conditions are detected by the CPUs or other error-responsive parts of the system.

Certain interrupts generated in the CPUs are also voted by the memory modules 14 and 15. When the CPUs encounter such an interrupt condition (and are not stalled), they signal an interrupt request to the memory modules by individual lines in an interrupt bus 35, so the three interrupt requests from the three CPUs can be voted. When all interrupts have been voted, the memory modules each send a voted-interrupt signal to the three CPUs via bus 35. This voting of interrupts also functions to check on the operation of the CPUs. The three CPUs synch the voted interrupt CPU interrupt signal via the inter-CPU bus 18 and present the interrupt to the processors at a common point in the instruction stream. This interrupt synchronization is accomplished without stalling any of the CPUs.

CPU Module

Referring now to FIG. 2, one of the processors 11, 12 or 13 is shown in more detail. All three CPU modules are of the same construction in a preferred embodiment, so only CPU-A will be described here. In order to keep costs within a competitive range, and to provide ready access to already-developed software and operating systems, it is preferred to use a comercially-available microprocessor chip, and any one of a number of devices may be chosen. The RISC (reduced instruction set) architecture has some advantage in implementing the loose synchronization as will be described, but more-conventional CISC (complex instruction set) microprocessors such as Motorola 68030 devices or Intel 80386 devices (available in 20-MHz and 25-MHz speeds) could be used. High-speed 32-bit RISC microprocessor devices are available from several sources in three basic types; Motorola produces a device as part number 8000, MIPS Computer Systems, Inc. an others produce a chip set referred to as the MIPS tupe, and Sun Microsystems has announced a so-called SPARC.TM. type (scalable processor architecture). Cypress Semiconductor of San Jose, Calif., for example, manufactures a microprocessor referred to as part number CY7C601 providing 20-MIPS (million instructions per second), clocked at 33-MHz, supporting the SPARC standard, and Fujitsu manufactures a CMOS RISC microprocessor, part number S-25, also supporting the SPARC standard.

The CPU board or module in the illustrative embodiment, used as an example, emoloys a microprocessor chip 40 which is in this case an R2000 device designed by MIPS Computer Systems, Inc., and also manufactured by Integrated Device Technology, Inc. The R2000 device is a 32-bit processor using RISC architecture to provide high performance, e.g., 12-MIPS at 16.67-MHz clock rate. Higher-speed versions of this device may be used instead, such as the R3000 that provides 20-MIPS at 25-MHz clock rate. The processor 40 also has a co-processor used for memory management, inlcuding a translation lookaside buffer to cache of logical to physical addresses. The processor 40 is coupled to a local bus having a data bus 41, an address bus 42 and a control bus 43. Separate instruction and data cache memories 44 and 45 are coupled to this local bus. These caches are each of 64K-byte size, for example, and are accessed within a single clock cycle of the processor 40. A numeric or floating point co-processor 46 is coupled to the local bus if additional performance is needed for these types of calculations; this numeric processor device is also commercially available fromm MIPS Computer Systems as part number R2010. The local bus 41, 42, 43, is coupled to an internal bus structure through a write buffer 50 and a read buffer 51. The write buffer is a commercially available device, part number R2020, and functions to allow the processor 40 to continue to execute Run cycles after storing data and address in the write buffer 50 for a write operation, rather than having to execute stall cycles while the write is completing.

In addition to the path through the write buffer 50, a path is provided to allow the processor 40 to execute write operations bypassing the write buffer 50. This path is a write buffer bypass 52 allows the processor, under software selection, to perform synchronous writes. If the write buffer bypass 52 is enabled (write buffer 50 not enabled) and the processor executes a write then the processor will stall until the write completes. In contrast, when writes are executed with the write buffer bypass 52 disabled the processor will not stall because data is written into the write buffer 50 (unless the write buffer is full). If the write buffer 50 is enabled when the processor 40 performs a write operation, the write buffer 50 captures the output data from bus 41 and the address from bus 42, as well as controls from bus 43. The write buffer 50 can hold up to four such data-address sets while it waits to pass the data on to the main memory. The write buffer runs synchronously with the clock 17 of the processor chip 40, so the processor-to-buffer transfers are synchronous and at the machine cycle rate of the processor. The write buffer 50 signals the processor if it is full and unable to accept data. Read operations by the processor 40 are checked against the addresses contained in the four-deep write buffer 50, so if a read is attempted to one of the data words waiting in the write buffer to be written to memory 16 or to global memory, the read is stalled until the write is completed.

The write and read buffers 50 and 51 are coupled to an internal bus structure having a data bus 53, an address bus 54 and a control bus 55. The local memory 16 is accessed by this internal bus, and a bus inteface 56 coupled to the internal bus is used to access the system bus 21 (or bus 22 or 23 for the other CPUs). The separate data and address busses 53 and 54 of the internal bus (as derived from busses 41 and 42 of the local bus) are converted to a multiplexed address/data bus 57 in the system bus 21, and the command and control lines are correspondingly converted to command lines 58 and control lines 59 in this external bus.

The bus interface unit 56 also receives the acknowledge/status lines 33 from the memory modules 14 and 15. In these lines 33, separate status lines 33-1 or 33-2 are coupled from each of the modules 14 and 15, so the response from both memory modules can be evaluated upon the event of a transfer (read or write) between CPUs and global memory, as will be explained.

The local memory 16, in one embodiment, comprises about 8-Mbyte of RAM which can be accessed in about three or four of the machine cycles of processor 40, and this access is synchronous with the clock 17 of this CPU, whereas the memory access time to the modules 14 and 15 is much greater than that to local memory, and this access to the memory modules 14 and 15 is asynchronous and subject to the synchronization overhead imposed by waiting for all CPUs to make the request then voting. For comparison, access to a typical commercially-available disk memory through the I/O processors 26, 27 and 29 is measured in milliseconds, i.e., considerably slower than access to the modules 14 and 15. Thus, there is a hierarchy of memory access by the CPU chip 40, the highest being the instruction and data caches 44 and 45 which will provide a hit ratio of perhaps 95% when using 64-KByte cache size and suitable fill algorithms. The second highest is the local memory 16, and again by employing contemporary virtual memory management algorithms a hit ratio of perhaps 95% is obtained for memory references for which a cache miss occurs but a hit in local memory 16 is found, in an example where the size of the local memory is about 8-MByte. The net result, from the standpoint of the processor chip 40, is that perhaps greater than 99% of memory references (but not I/O references) will be synchronous and will occur in either the same machine cycle or in three or four machine cycles.

The local memory 16 is accessed from the internal bus by a memory controller 60 which receives the addresses from address bus 54, and the address strobes from the control bus 55, and generates separate row and column addresses, and RAS and CAS controls, for example, if the local memory 16 employs DRAMs with multiplexed addressing, as is usually the case. Data is written to or read from the local memory via data bus 53. In addition, several local registers 61, as well as non-volatile memory 62 such as NVRAMs, and high-speed PROMs 63, as may be used by the operating system, are accessed by the internal bus; some of this part of the memory is used only at power-on, some is used by the operating system and may be almost continuously within the cache 44, and other may be within the non-cached part of the memory map.

External interrupts are applied to the processor 40 by one of the pins of the control bus 43 or 55 from an interrupt circuit 65 in the CPU module of FIG. 2. This tpe of interrupt is voted in the circuit 65, so that before an interrupt is executed by the processor 40 it is determined whether or not all three CPUs are presented with the interrupt; to this end, the circuit 65 receives interrupt pending inputs 66 from the other two CPUs 12 and 13, and sends an interrupt pending signal to the other two CPUs via line 67, these lines being part of the bus 18 connecting the three CPUs 11, 12 and 13 together. Also, for voting other types of interrupts, specifically CPU-generated interrupts, the circuit 65 can send an interrupt request from this CPU to both of the memory modules 14 and 15 by a line 68 in the bus 35, then receive separate voted-interrupt signals from the memory modules via lines 69 and 70; both memory modules will present the external interrupt to be acted upon. An interrupt generated in some external source such as a keyboard or disk drive on one of the I/O channels 28, for example, will not be presented to the interrupt pin of the chip 40 from the circuit 65 until each one of the CPUs 11, 12 and 13 is at the same point in the instruction stream, as will be explained.

Since the processors 40 are clocked by separate clock oscillators 17, there must be some mechanism for periodically bringing the processors 40 back into synchronization. Even though the clock oscillators 17 are of the same nominal frequency, e.g., 16.67-MHz, and the tolerance for these devices is about 25-ppm (parts per million), the processors can potentially become many cycles out of phase unless periodically brough back into synch. Of course, every time an external interrupt occurs the CPUs will be brought into synch in the sense of being interrupted at the same point in their instruction stream (due to the interrupt synch mechanism), but this does not help bring the cycle count in synch. The mechanism of voting memory references in the memory modules 14 and 15 will bring the CPUs into synch (in real time), as will be explained. However, some conditions result in long periods where no memory reference occurs, and so as additional mechanism is used to introduce stall cycles to bring the processors 40 back into synch. A cycle counter 71 is coupled to the clock 17 and the control pins of the processor 40 via control bus 43 to count machine cycles which are Run cycles (but not Stall cycles). This counter 71 includes a count register having a maximum count value selected to represent the period during which the maximum allowable drift between CPUs would occur (taking into account the specfied tolerance for the crystal oscillators); when this count register overflows action is initiated to stall the faster processors until the slower processor or processors catch up. This counter 71 is reset whenever a synchronization is done by a memory reference to the memory modules 14 and 15. Also, a refresh counter 72 is employed to perform refresh cycles on the local memory 116, as will be explained. In addition, a counter 73 counts machine cycle which are Run cycles but not Stall cycles, like the counter 71 does, but this counter 73 is not reset by a memory reference; the counter 73 is used for interrupt synchronization as explained below, and to this end produces the output signals CC-4 and CC-8 to the interrupt ssynchronization circuit 65.

The processor 40 has a RISC instruction set which does not support memory-to-memory instructions, but instead only memory-to-register to register-to-memory instructions (i.e., load or store). It is important to keep frequently-used data and the currently-executing code in local memory. Accordingly, a block-transfer operation is provided by a DMA state machine 74 coupled to the bus interface 56. The processor 40 writes a word to a register in the DMA circuit 74 to function as a command, and writes the starting address and length of the block to registers in this circuit 74. In one embodiment, the microprocessor stalls while the DMA circuit takes over and executes the block transfer, producing the necessary addresses, commands and strobes on the busses 53-55 and 21. The command executed by the processor 40 to initiate this block transfer can be a read from a register in the DMA circuit 74. Since memory management in the Unix operating system relies upon demand paging, these block transfers will most often be pages being moved between global and local memory and I/O traffic. A page is 4-KBytes. Of course, the busses 21, 22 and 23 support single-word read and write transfers between CPUs and global memory; the block transfers referred to are only possible between local and global memory.

The Processor

Referring now to FIG. 3, the R2000 or R3000 type of microprocessor 40 of the example embodiment is shown in more detail. This device includes a main 32-bit CPU 75 containing thirty-two 32-bit general purpose registers 76, a 32-bit ALU 77, a zero-to-64 bit shifter 78, and a 32-by-32 multiply/divide circuit 79. This CPU also has a program counter 80 along with associated incrementer and adder. These components are coupled to a processor bus structure 81, which is coupled to the local data bus 41 and to an instruction decoder 82 with associated control logic to execute instructions fetched via data bus 41. The 32-bit local address bus 42 is driven by a virtual memory management arrangement including a translation lookaside buffer (TLB) 83 within an on-chip memory-management coprocessor. The TLB 83 contains sixty-four entries to be compared with a virtual address received from the microprocessor block 75 via virtual address bus 84. The low-order 16-bit part 85 of the bus 42 is driven by the low-order part of this virtual address bus 84, and the high-order part is from the bus 84 if the virtual address is used as the physical address, or is the tag entry from the TLB 83 via output 86 if virtual addressing is used and a hit occurs. The control lines 43 of the local bus are connected to pipeline and bus control circuitry 87, driven from the internal bus structure 81 and the control logic 82.

The microprocessor block 75 in the proc