WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Coupled memory multiprocessor computer system including cache coherency management protocols    
United States Patent5303362   
Link to this pagehttp://www.wikipatents.com/5303362.html
Inventor(s)Butts, Jr.; H. Bruce (Redmond, WA); Orbits; David A. (Redmond, WA); Abramson; Kenneth D. (Seattle, WA)
AbstractA coherent coupled memory multiprocessor computer system that includes a plurality of processor modules (11a, 11b . . . ), a global interconnect (13), an optional global memory (15) and an input/output subsystem (17,19) is disclosed. Each processor module (11a, 11b . . . ) includes: a processor (21); cache memory (23); cache memory controller logic (22); coupled memory (25); coupled memory control logic (24); and a global interconnect interface (27). Coupled memory (25) associated with a specific processor (21), like global memory (15), is available to other processors (21). Coherency between data stored in coupled (or global) memory and similar data replicated in cache memory is maintained by either a write-through or a write-back cache coherency management protocol. The selected protocol is implemented in hardware, i.e., logic, form, preferably incorporated in the coupled memory control logic (24) and in the cache memory controller logic (22). In the write-through protocol, processor writes are propagated directly to coupled memory while invalidating corresponding data in cache memory. In contrast, the write-back protocol allows data owned by a cache to be continuously updated until requested by another processor, at which time the coupled memory is updated and other cache blocks containing the same data are invalidated.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5303362
Coupled memory multiprocessor computer system including cache coherency

     management protocols - US Patent 5303362 Drawing
Coupled memory multiprocessor computer system including cache coherency management protocols
Inventor     Butts, Jr.; H. Bruce (Redmond, WA); Orbits; David A. (Redmond, WA); Abramson; Kenneth D. (Seattle, WA)
Owner/Assignee     Digital Equipment Corporation (Maynard, MA)
Patent assignment
All assignments
Publication Date     April 12, 1994
Application Number     07/673,766
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 20, 1991
US Classification     711/121
Int'l Classification     G06F 012/06
Examiner     Dixon; Joseph L.
Assistant Examiner     Nguyen; Hiep T.
Attorney/Law Firm     Christensen, O'Connor, Johnson & Kindness
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 MS File 364/900 MS File 395/400 395/425
Patent Tags     coupled memory multiprocessor computer including cache coherency management protocols
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
3735360
Connor
379/3
Feb,2006

[0 after 0 votes]
5148533
Joyce
711/144
Sep,1992

[0 after 0 votes]
5146607
Sood
711/211
Sep,1992

[0 after 0 votes]
5146603
Frost
711/143
Sep,1992

[0 after 0 votes]
5123106
Otsuki
711/153
Jun,1992

[0 after 0 votes]
5117350
Parrish
711/1
May,1992

[0 after 0 votes]
5097409
Schwartz
711/145
Mar,1992

[0 after 0 votes]
5029070
McCarthy
711/143
Jul,1991

[0 after 0 votes]
5010477
Omoda
712/4
Apr,1991

[0 after 0 votes]
4965717
Cutts, Jr.
714/12
Oct,1990

[0 after 0 votes]
4939641
Schwartz
711/146
Jul,1990

[0 after 0 votes]
4812981
Chan
711/202
Mar,1989

[0 after 0 votes]
4811216
Bishop
711/153
Mar,1989

[0 after 0 votes]
4785395
Keeley
711/122
Nov,1988

[0 after 0 votes]
4760521
Rehwald
711/106
Jul,1988

[0 after 0 votes]
4757438
Thatte
718/100
Jul,1988

[0 after 0 votes]
4755930
Wilson, Jr.
711/122
Jul,1988

[0 after 0 votes]
4747043
Rodman

May,1988

[0 after 0 votes]
4744078
Kowalczyk
370/364
May,1988

[0 after 0 votes]
4591977
Nissen
709/226
May,1986

[0 after 0 votes]
4571672
Hatada
711/152
Feb,1986

[0 after 0 votes]
4442487
Fletcher
711/122
Apr,1984

[0 after 0 votes]
4161024
Joyce
711/121
Jul,1979

[0 after 0 votes]
4016541
Delagi
711/106
Apr,1977

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:

1. A coherent coupled memory multiprocessor computer system comprising:

a global interconnect for interconnecting a plurality of processor modules; and

a plurality of processor modules, each of said processor modules including a processor, cache memory, coupled memory, coupled memory control logic and a global interconnect interface, said coupled memory control logic and said global interconnect interface of each of said processor modules coupling said global interconnect and the coupled memory and the cache memory of the processor modules such that said coupled memory can be accessed by the processor associated with the coupled memory without using said global interconnect and such that said coupled memory can be accessed by the processors of other processor modules via said global interconnect, said coupled memory control logic including means for maintaining coherency between data stored in the coupled memory of the associated processor module and data stored in the cache memory of the associated processor module and the cache memories of other processor modules.

2. A coherent coupled memory multiprocessor computer system as claimed in claim 1 wherein said coupled memory control logic coherency maintaining means implements a write-through cache coherency management protocol.

3. A coherent coupled memory multiprocessor computer system as claimed in claim 1 wherein said coupled memory control logic coherency maintaining means implements a write-back cache coherency management protocol.

4. A coherent coupled memory multiprocessor computer system as claimed in claim 1 wherein:

said cache memory stores data in blocks;

said coupled memory includes a plurality of memory locations each equal in size to said cache blocks; and

each coupled memory location includes a shared status bit, said shared status bits denoting whether a copy of the data stored in the related memory location is or is not also stored in a cache.

5. A coherent coupled memory multiprocessor computer system as claimed in claim 4 wherein said coupled memory control logic coherency maintaining means implements a write-through cache coherency management protocol.

6. A coherent coupled memory multiprocessor computer system as claimed in claim 4 wherein each of said plurality of coupled memory locations also includes an exclusive status bit, said shared and exclusive bits denoting whether a copy of the data stored in the related memory location is or is not also stored in a cache and, if stored in a cache, whether the cache stored data can be updated.

7. A coherent coupled memory multiprocessor computer system as claimed in claim 6 wherein said coupled memory control logic coherency maintaining means implements a write-back cache coherency management protocol.

8. A coherent coupled memory multiprocessor computer system as claimed in claim 6 wherein each of said plurality of processor modules also includes cache memory controller logic for coupling said cache memory to said processor, said cache memory controller logic including coherency maintaining means, said cache memory controller logic coherency maintaining means forming a portion of said means for maintaining coherency between data stored in the coupled memory of the associated processor module and data stored in the cache memory of the associated processor module and the cache memories of other processor modules.

9. A coherent coupled memory multiprocessor computer system as claimed in claim 8 wherein said coupled memory control logic coherency maintaining means and said cache memory controller logic coherency maintaining means implement a write-back cache coherency management protocol.

10. A coherent coupled memory multiprocessor computer system as claimed in claim 8 wherein each block of data stored in a cache includes an ownership bit denoting whether the associated data is readable or writable.

11. A coherent coupled memory multiprocessor computer system as claimed in claim 10 wherein said coupled memory control logic coherency maintaining means and said cache memory controller logic coherency maintaining means implement a write-back cache coherency management protocol.

12. A coherent coupled memory multiprocessor computer system as claimed in claim 10 wherein said cache memory controller logic controls whether a cache block of data can be updated based on the status of said ownership bit.

13. A coherent coupled memory multiprocessor computer system as claimed in claim 12 wherein said coupled memory control logic coherency maintaining means and said cache memory controller logic coherency maintaining means implement a write-back cache coherency management protocol.

14. A coherent coupled memory multiprocessor computer system as claimed in claim 12 wherein the status of said exclusive bit associated with said coupled memory locations is controlled by the status of the ownership bit associated with the same data stored in a cache.

15. A coherent coupled memory multiprocessor computer system as claimed in claim 14 wherein said coupled memory control logic coherency maintaining means and said cache memory controller logic coherency maintaining means implement a write-back cache coherency management protocol.
 Description Submit all comments and votes
 


TECHNICAL AREA

This invention relates to multiprocessor computer systems and, more particularly, to multiprocessor computer systems in which system memory is distributed such that a portion of system memory is coupled to each processor of the system.

BACKGROUND OF THE INVENTION

Memory latency, i.e., the time required to access data or instructions stored in the memory of a computer, has increasingly become the bottleneck that prevents the full realization of the speed of contemporary single and multiprocessor computer systems. This result is occurring because the speed of integrated processors has outstripped memory subsystem speed. In order to operate most efficiently and effectively, fast processors require the contradictory features of reduced memory latency and larger memory size. Larger memory size implies greater physical size, greater communication distances, and slower access time due to the additional signal buffers needed to drive heavily loaded address, data and control signal lines, all of which increase memory latency. The primary negative effect of memory latency is its effect on processor speed. The longer it takes to obtain data from memory, the slower a processor runs because processors usually remain idle when they are waiting for data. This negative effect has increased as processor speed has outstripped memory subsystem speed. Despite the gains made in high-density, high-speed integrated memories, the progress to date still leaves the memory subsystem as the speed-limiting link in computer system design. This is true regardless of whether the computer system includes a single processor or a plurality of processors.

One way to reduce average memory latency is to add a cache subsystem to a computer system. A cache subsystem consists of a small memory situated adjacent to a processor that is hardware controlled rather than software controlled. Frequently used datum and instructions are replicated in cache memories. Cache subsystems capitalize on the property that once a datum or instruction has been fetched from system memory, it is very likely that it will be reused in the near future. Due to the close association between cache memory and its associated processor and the nature of the control (hardware as opposed to software), cache memory latency is several times less than that of system memory. Because access is much more rapid, overall speed is improved in computer systems that include a cache subsystem. As memory latency increases with memory size, hierarchical caches have been developed to maintain average memory latency at a low level. Some high-performance processors include separate instruction and datum caches that can be simultaneously accessed. (For simplicity of description, datum, instructions and any other forms of information commonly stored in computer memories are collectively hereinafter referred to as data.)

While computer systems that include a cache subsystem have a number of advantages, one disadvantage is the expense of cache memories. This disadvantage is enhanced because a cache memory does not add capacity to system memory. Rather, cache memories are add-ons to system memory, because, as noted above, cache memories replicate data stored in system memory. The replication of data leads to another disadvantage of cache memories, namely, the need to maintain coherency between data stored at two or more locations in the memories of a computer system. More specifically, because data stored at either location can be independently updated, a computer system that includes a cache subsystem requires a way of maintaining coherency between independent sources of the same data. If coherency is not maintained, data at one location will become stale when the same data at another location is updated. The use of stale data can lead to errors.

Several different types of cache management algorithms have been developed to govern what occurs when data stored in a cache are updated. The simplest algorithm is known as a "write-through" cache coherency management protocol. A write-through cache coherency management protocol causes processor writes to be propagated directly to system memory. All caches throughout the computer system are searched, and any copies of written data are either invalidated or updated. While a write-through cache coherency management protocol can be used with multiprocessor computer systems that include a large number of processors, a write-through cache coherency management protocol is better suited for single processor computer systems or multiprocessor computer systems incorporating a limited number, e.g., four, of processors.

A more complex, but higher performance, coherency management algorithm is known as a "write-back" cache coherency management protocol. Like a write-through cache coherency management protocol, a write-back cache coherency management protocol is an algorithm that is normally incorporated in the hardware of a computer system that controls the operation of a cache. In a write-back cache coherency management protocol, initial processor writes are written only to cache memory. Later, as necessary, updated data stored in a cache memory is transferred to system memory. Updated data transfer occurs when an input/output device or another processor requires the updated data. A write-back cache coherency management protocol is better suited for use in multiprocessor computer systems that include a large number of processors (e.g., 24) than a write-through cache coherency management protocol because a write-back cache coherency management protocol has a lower impact on the system interconnect because a write-through cache coherency management protocol greatly reduces write traffic.

One of the first write-back coherency management protocols was suggested by Dr. James R. Goodman in his paper entitled "Using Cache Memory to Reduce Processor Memory Traffic" (10th International Symposium of Computer Architecture, 1983). Dr. Goodman's improvement is based on the observation that if the sole copy of data associated with a specific system memory location is stored in a cache, the cache copy can be repeatedly modified without the need to broadcast write-invalidate messages to all other system caches each time a modification occurs. More specifically, Dr. Goodman's improvement requires the addition of a state bit to each cache copy. The state bit indicates that the copy is either "shared" or "owned." When a system memory location is first read, and data supplied to a cache, the state bit is set to the "shared" state. If the cache copy is later written, i.e., modified, the state bit transitions to the "owned" state. At the same time, a write-invalidate message is broadcast, resulting in the updated cache copy of the data being identified as the only valid copy associated with the related system memory location. As long as the cache location remains in an owned state, it can be rewritten, i.e., updated, without the need to broadcast further write-invalidate messages. A remote request for the data to the memory location associated with the cached copy causes a transition back to the shared state, and the read request to be satisfied by either the cache and, then, updating the related system memory location, or by the cache delaying the memory request until valid data is rewritten to the system memory location.

Recently, proposals have been made to distribute memory throughout a multiprocessor computer system, rather than use bank(s) of global memory accessible by all processors via a common interconnect bus. More specifically, in distributed memory multiprocessor computer systems, a portion of system memory is physically located adjacent to the processor the memory portion is intended to serve. Research in this area has grown out of attempts to find ways of creating effective multiprocessor computer systems out of a large number of powerful workstations connected together via a network link. In the past, distributed shared memory computer networks have used various software-implemented protocols to share memory space. The software-implemented protocols make the distributed memory simulate a common global memory accessible by all of the computers connected to the network. Memory latency is improved because the portion of memory associated with a specific processor can be accessed by that processor without use of the network link. An example of this research work is the BBN Butterfly computer system developed by BBN Laboratories, Inc. See Butterfly.TM. Parallel Processor Overview, BBN Report No. 6148, Version 1, Mar. 6, 1986, and The Uniform System Approach to Programming the Butterfly.TM. Parallel Processor, BBN Report No. 6149, Version 2, Jun. 16, 1986.

A drawback of the software-implemented protocols used in the BBN Butterfly and the like computer systems is their extremely poor performance when the amount of sharing between processors is large or when the memory associated with a single processor is insufficient to meet the needs of a program and it becomes necessary to use the memory associated with another processor and/or to make data calls to storage devices, such as a hard disk. In the past, such requirements have significantly reduced processing speed. Such requirements have also negatively impacted the bandwidth requirements of the network linking the processors together. A further disadvantage has been the increased overhead associated with the management of data stored at different locations in the distributed memory of the computer system. More specifically, the processing speed of prior art distributed memory multiprocessor computer systems have been improved by replicating shared data in the memories associated with the different processors needing the data. This has a number of disadvantages. First, replicating data in system memory creates a high memory overhead, particularly because system memory is stored on a page basis and page sizes are relatively large. Recently, page sizes of 64K bytes have been proposed. In contrast cache memories store data in blocks of considerably smaller size. A typical cache block of data is 64 bytes. Thus, the "granularity" of the data replicated in system memory is considerably larger than the granularity of data replicated in cache memory. The large granularity size leads to other disadvantages. Greater interconnect bandwidth is required to transfer larger data granules than smaller data granules. Coherency problems are increased because of the likelihood that more processors will be contending for the larger granules than the number contending for smaller granules on a packet-to-packet basis.

In summary, in the last several years, very large-scale integrated circuit (VLSI) processor speeds have been increased by roughly an order of magnitude due to continual semiconductor improvements and due to the introduction of reduced instruction set computer (RISC) architectures. As processor speeds have improved, large, fast and expensive cache memories have been needed in order to reduce average memory latency and keep processor idle times reasonable. Even with improved cache memories and improved ways of maintaining data coherency, average memory latency remains the bottleneck to improving the performance of multiprocessor computer systems.

A major portion of memory latency, i.e., memory access time, is the latency of the network that interconnects the multiprocessors, memory, and input/output modules of multiprocessor computer systems. Regardless of whether the interconnect network is a fully interconnected switching network or a shared bus, the time it takes for a memory request to travel between a processor and system memory is directly added to the actual memory operational latency. Interconnect latency includes not only the actual signal propagation delays, but overhead delays such as synchronization of the interconnect timing environment with the interconnect arbitration, which increases rapidly as processors are added to a multiprocessor system.

Recent attempts to improve memory latency have involved distributing system memory so that it is closer to the processors requiring access to the memory. This has led to sharing data in system memory, which, in turn, has required the implementation of coherency schemes. In the past system memory coherency schemes have been implemented in software. Because they have been implemented in software they have been slow. Further, system memory sharing requires that large data granules be transferred. Large data granules take up large parts of system memory, require large amounts of interconnect bandwidth to transfer from one memory location to another, and are more likely to be referenced by a large number of processors than smaller data granules. The present invention is directed to providing a multiprocessor computer system that overcomes these disadvantages.

SUMMARY OF THE INVENTION

The present invention is directed to providing a multiprocessor system that overcomes the problems outlined above. More specifically, the present invention is directed to providing a multiprocessor computer system wherein system memory is broken into sections, denoted coupled memory, and distributed throughout the system such that a coupled memory is closely associated with each processor. The close association improves memory latency and reduces the need for system interconnect bandwidth. Coupled memory is not cache memory. Cache memory stores replications of data stored in system memory. Coupled memory is system memory. As a general rule, data stored at one location in coupled memory is not replicated at another coupled memory location. Moreover, the granular size of data stored in caches, commonly called blocks of data, is considerably smaller than the granular size of data stored in system memory, commonly called pages. Coupled memory is lower in cost since it does not provide the high performance of cache memory. Coupled memory is directly accessible by its associated processor, i.e., coupled memory is accessible by its associate processor without use of the system interconnect. More importantly, while coupled memory is closely associated with a specific processor, unlike a cache, coupled memory is accessible by other system processors via the system interconnect. In addition to coupled memory, system memory may include global memory, i.e., memory not associated with a processor, but rather shared equally by all processors via the system interconnect. For fast access, frequently used system memory data are replicated in caches associated with each processor of the system. Cache coherency is maintained by the system hardware.

More specifically, in accordance with this invention, a coherent coupled memory multiprocessor computer system that includes a plurality of processor modules, a global interconnect, an optional global memory and an input/output subsystem is provided. Each processor module includes a processor, cache memory, cache memory controller logic, coupled memory, coupled memory control logic and a global interconnect interface. The coupled memory associated with each specific processor and global memory, if any, form system memory, i.e., coupled memory like global memory is available to other processors. Coherency between similar (i.e., replicated) data stored in specific coupled memory locations and both local and remote caches are maintained by either write-through or write-back cache coherency management protocols. The cache coherency management protocols are implemented in hardware, i.e., logic, form and, thus, constitute a part of the computer system hardware.

In embodiments of the invention incorporating a write-through cache coherency management protocol, each time a memory reference occurs, the protocol logic determines if the read or write is of local or remote origin and the state of a shared bit associated with the related system memory location. The shared bit denotes if the data or instruction at the addressed coupled memory location has or has not been shared with a remote processor. Based on the nature of the command (read or write), the source of the command (local or remote) and the state of the shared data bit (set or clear), the write-through protocol logic controls the invalidating of cache-replicated data or instructions and the subsequent state of the shared bit. Thereafter, the read or write operation takes place.

Embodiments of the invention incorporating a write-back cache coherency management protocol also determine if a read or write is of local or remote origin. The protocol logic also determines the state of shared and exclusive bits associated with the addressed coupled memory location. Based on the nature of the command (read or write), the source of the command (local or remote) and the state of the shared and exclusive bits (set or clear), the write-back protocol logic controls the invalidating of cache-stored data, the subsequent state of the shared and exclusive bits and the supplying of data to the source of read commands or the writing of data to the coupled memory. The write-back cache coherency management protocol logic also determines the state of an ownership bit associated with replicated data stored in caches and uses the status of the ownership bit to control the updating of replicated data stored in caches.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other features and advantages of this invention will become better understood by reference to the following detailed description of preferred embodiments of the invention when taken in conjunction with the accompanying drawings, wherein:

FIG. 1 is a block diagram of a coherent coupled memory multiprocessor system formed in accordance with this invention;

FIG. 2 is a flow diagram illustrating the write operation of a write-through cache coherency management protocol suitable for use in embodiments of the invention;

FIG. 3 is a flow diagram illustrating the read operation of a write-through cache coherency management protocol suitable for use in embodiments of the invention;

FIG. 4 is a state diagram illustrating the logic used to carry out the write-through cache coherency management protocol illustrated in FIGS. 2 and 3;

FIG. 5 is a flow diagram illustrating a processor cache read request of a write-back cache coherency management protocol suitable for use in embodiments of the invention;

FIG. 6 is a flow diagram illustrating a processor cache write request of a write-back cache coherency management protocol suitable for use in embodiments of the invention;

FIG. 7 is a state diagram illustrating the logic used to carry out the processor cache read and write requests illustrated in FIGS. 5 and 6;

F