WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for supporting read, write, and invalidation operations to memory which maintain cache consistency    
United States Patent5572702   
Link to this pagehttp://www.wikipatents.com/5572702.html
Inventor(s)Sarangdhar; Nitin V. (Beaverton, OR); Rhodehamel; Michael W. (Beaverton, OR); Merchant; Amit A. (Citrus Heights, CA); Fisch; Matthew A. (Beaverton, OR); Brayton; James M. (Beaverton, OR)
AbstractRequests to memory issued by an agent on a bus are satisfied while maintaining cache consistency. The requesting agent may issue a request to another agent, or the memory unit, by placing the request on the bus. Each agent on the bus snoops the bus to determine whether the issued request can be satisfied by accessing its cache. An agent which can satisfy the request using its cache, i.e., the snooping agent, issues a signal to the requesting agent indicating so. The snooping agent places the cache line which corresponds to the request onto the bus, which is retrieved by the requesting agent. In the event of a read request, the memory unit also retrieves the cache line data from the bus and stores the cache line in main memory. In the event of a write request, the requesting agent transfers write data over the bus along with the request. This write data is retrieved by both the memory unit, which temporarily stores the data, and the snooping agent. Subsequently, the snooping agent transfers the entire cache line over the bus. The memory unit retrieves this cache line, merges it with the write data previously stored, and writes the merged cache line to memory.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5572702
Method and apparatus for supporting read, write, and invalidation

     operations to memory which maintain cache consistency - US Patent 5572702 Drawing
Method and apparatus for supporting read, write, and invalidation operations to memory which maintain cache consistency
Inventor     Sarangdhar; Nitin V. (Beaverton, OR); Rhodehamel; Michael W. (Beaverton, OR); Merchant; Amit A. (Citrus Heights, CA); Fisch; Matthew A. (Beaverton, OR); Brayton; James M. (Beaverton, OR)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     November 5, 1996
Application Number     08/202,790
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     February 28, 1994
US Classification     711/146 711/155
Int'l Classification     G06F 013/14
Examiner     Shah; Alpesh M.
Assistant Examiner    
Attorney/Law Firm     Blakely, Sokoloff, Taylor & Zafman
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/800 395/775 395/650 395/425 395/325 395/250 395/275 395/840 395/842 395/856 395/287 395/440 395/445 395/467 395/481 395/468 395/469 395/470 395/471 395/472 395/473 395/496 395/482 364/131 364/132 364/133 364/134
Patent Tags     supporting read, write, invalidation operations memory which maintain cache consistency
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5420991
Konigsfeld
711/150
May,1995

[0 after 0 votes]
5386511
Murata
711/120
Jan,1995

[0 after 0 votes]
5345578
Manasse
711/146
Sep,1994

[0 after 0 votes]
5341487
Derwin
711/146
Aug,1994

[0 after 0 votes]
5119485
Ledbetter, Jr.
711/146
Jun,1992

[0 after 0 votes]
5072369
Theus

Dec,1991

[0 after 0 votes]
4959777
Holman, Jr.
711/141
Sep,1990

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of operating a computer system which includes a bus coupled to a memory unit, a first processor having a first cache, and a second processor having a second cache, said method comprising the steps of:

(a) issuing a request as part of a pipelined transaction from said first processor to a target agent;

(b) snooping said bus by said second processor to determine whether data corresponding to said request is contained in said second cache;

(c) issuing one of either a first signal or a second signal by said second processor, said first signal indicating that said data is contained in said second cache in a modified state, and said second signal indicating that said data is contained in said second cache in an unmodified state;

(d) placing said data from said second cache onto said bus to satisfy said request during said pipelined transaction, provided said first signal is issued; and

(e) updating, based on which of said first signal or said second signal is issued in said step (c), a state of a cache line of said first cache corresponding to said data and a state of a cache line of said second cache corresponding to said data.

2. The method of claim 1 further comprising the step of latching said data from said second cache off said bus by said first processor.

3. The method of claim 1 further comprising the step of taking said data from said second cache off said bus by said target agent.

4. The method of claim 1 further comprising responding to said first processor from said target agent.

5. The method of claim I wherein said target agent is said memory unit.

6. The method of claim I wherein said request is an invalidation of cache line request.

7. The method of claim 5 wherein said request is a read request, said method further comprising taking said data from said second cache off said bus by said memory unit.

8. The method of claim 7 further comprising storing said data in said memory unit.

9. The method of claim 1 wherein said request is a write request, said method further comprising:

placing write data corresponding to said write request on said bus by said first processor; and

taking said write data off said bus by said memory unit and said second processor.

10. The method of claim 9 further comprising:

taking said data from said second cache off said bus by said memory unit;

creating an updated data unit by said memory unit by merging said write data and said data from said second cache; and

storing said updated data unit in said memory unit.

11. The method of claim 9 wherein said write request is a write to a portion of said cache line of said second cache.

12. A method of operating a computer system which includes a bus coupled to a requesting agent, said requesting agent having a first cache, a snooping agent having a second cache, and a memory unit, said method comprising the steps of:

(a) issuing a read request as part of a pipelined transaction by said requesting agent, said read request having a target agent;

(b) snooping said bus by said snooping agent to determine whether said read request can be satisfied by accessing data corresponding to said request contained in said second cache;

(c) issuing one of either a first signal or a second signal from said snooping agent to said requesting agent indicating said read request can be satisfied by accessing said cache, said first signal indicating that said data is contained in said second cache in a modified state, and said second signal indicating that said data is contained in said second cache in an unmodified state;

(d) placing said data from said second cache onto said bus during said pipelined transaction to satisfy said read request;

(e) taking said data from said cache off said bus by said memory unit; and

(f) updating, based on which of said first signal or said second signal is issued in said step (c), a state of a cache line of said first cache corresponding to said data and a state of a cache line of said second cache corresponding to said data.

13. The method of claim 12 further comprising the step of storing said data in said memory unit.

14. A method of operating a computer system which includes a bus coupled to a requesting processor having a first cache, a snooping processor having a second cache, and a memory unit, said method comprising the steps of:

(a) issuing a write request as part of a pipelined transaction by said requesting processor, said write request having a target agent;

(b) snooping on said bus by said snooping processor to determine whether first data corresponding to said write request is contained in said second cache;

(c) issuing one of either a first signal or a second signal from said snooping processor to said requesting processor, said first signal indicating that said first data is contained in said second cache in a modified state, and said second signal indicating that said first data is contained in said second cache in an unmodified state;

(d) placing second data corresponding to said write request on said bus by said requesting processor during said pipelined transaction;

(e) taking said second data corresponding to said write request off said bus by said snooping processor; and

(f) updating, based on which of said first signal or said second signal is issued in said step (c), a state of a cache line of said first cache corresponding to said second data and a state of a cache line of said second cache corresponding to said second data.

15. The method of claim 14 further comprising the steps of:

creating an updated data unit by merging said first data and said second data; and

storing said updated data unit in said second cache.

16. The method of claim 15 wherein said write request is a write to a portion of said cache line of said second cache.

17. A computer system comprising:

a pipelined bus coupled to a requesting agent, a snooping agent, and a memory unit, wherein said requesting agent is for issuing a request as part of a pipelined transaction having a target agent, wherein said requesting agent includes a first cache, and wherein said snooping agent includes a second cache;

said snooping agent for,

snooping on said bus in a first phase of said pipelined transaction to determine whether data corresponding to said request is contained in a cache line of said second cache, and

issuing one of either a first signal or a second signal and placing data from said cache to satisfy said request onto said bus in a second phase of said pipelined transaction, wherein said first signal indicates that said data is contained in said second cache in a modified state, and said second signal indicates that said data is contained in said second cache in an unmodified state, and wherein said snooping agent is also for updating a state of said cache line of said second cache based on which of said first signal and said second signal issued; and

wherein said requesting agent is also for updating a state of a cache line of said first cache corresponding to said data based on which of said first signal and said second signal is issued.

18. The system of claim 17 wherein said second phase is subsequent to said first phase.

19. The system of claim 17, wherein said memory unit is for taking said data from said second cache off said bus and storing said data.

20. The system of claim 17, wherein:

said requesting agent is also for placing data corresponding to said request on said bus in a third phase of said pipelined transaction; and

said memory unit is for taking said data corresponding to said request off said bus in said third phase.

21. The system of claim 20 wherein said memory unit is also for creating an updated data unit by merging said data corresponding to said request and said data from said second cache, and storing said updated data unit.

22. A computer system comprising:

a pipelined bus;

a first processor coupled to said bus, said first processor having a first cache memory.;

a second processor coupled to said bus, said second cache memory processor having a second

means for issuing a pipelined transaction on said bus from said first processor;

means for snooping said bus to determine whether first data corresponding to said pipelined transaction is contained in said second cache memory, said means for snooping being coupled to said bus;

means for issuing one of either a first signal or a second signal on said bus in a first phase of said pipelined transaction, said first signal indicating said first data is contained in said second cache memory in a modified state, and said second signal indicating said first data is contained in said second cache memory in an unmodified state;

means for placing said first data onto said bus in a second phase of said pipelined transaction; and

means for updating, based on whether said means for issuing issues said first signal or said second signal, a state of a cache line of said first cache memory corresponding to said first data and a state of a cache line of said second cache memory corresponding to said first data.

23. The system of claim 22 wherein said first phase is prior to said second phase.

24. The system of claim 22 further comprising means for placing second data corresponding to said pipelined transaction onto said bus prior to said first phase.

25. The system of claim 24 further comprising: means for creating an updated data unit by combining said first data and said second data; and means for storing said updated data unit.

26. A computer system comprising:

a bus;

a requesting processor coupled to said bus for issuing a write request in a first phase of a pipeline, said write request having a target agent, said requesting processor including a first cache memory; a memory unit coupled to said bus;

a snooping agent coupled to said bus, wherein said snooping agent includes a second cache memory, and wherein said snooping agent is for,

snooping on said bus in a second phase of said pipeline to determine whether data corresponding to said write request is contained in a cache line of said second cache memory, and

issuing one of either a first signal or a second signal in said second phase and placing first data from said cache to satisfy said write request onto said bus in a third phase of said pipeline, wherein said first signal indicates that said first data is contained in said second cache memory in a modified state, and said second signal indicates that said first data is contained in said second cache memory in an unmodified state, and wherein said snooping agent is also for updating a state of a

cache line of said second cache memory based on whether said first

signal or said second signal is issued;

wherein said requesting processor is also for updating a state of a cache line of said first cache memory based on whether said first signal or said second signal is issued; and

said memory unit for taking said first data off said bus and storing said first data.

27. The system of claim 26 wherein:

said requesting processor is also for placing second data corresponding to said write request on said bus in said first phase; and

said memory unit is also for taking said second data off said bus in said second phase.

28. The system of claim 27 wherein said memory unit is also for creating an updated data unit by merging said first data and said second data, and storing said updated data unit.

29. The method of claim 1, wherein said updating step (e) comprises the steps of:

indicating said state of said cache line of said first cache is shared; and

indicating said state of said cache line of said second cache is shared.

30. The method of claim 1, wherein said updating step (e) keeps both said cache line of said first cache and said cache line of said second cache valid.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention pertains to the field of computer system buses. More particularly, this invention relates to the field of maintaining cache consistency in a multi-agent computer system.

2. Background

The speed and performance of modern computer systems is ever-increasing. One aspect of these advancements is the increasing use of high-speed cache memories. In many modern computer systems, multiple agents may reside on a single bus, each of which may have its own, individual high-speed cache.

The implementations of these :aches vary, however each generally attempts to operate efficiently with both the latest advances in microprocessor technology as well as other agents residing on the bus, such as I/O devices. For example, some agents on the bus nay be capable of performing accesses which take advantage of the full size of a cache line, whereas other agents may only be capable of performing accesses which use a portion of a cache line. Thus, computer systems must efficiently manage writing to a portion of a cache line which has been modified within the cache of another agent, i.e., a partial write data transfer to a modified cache line, while at the same time efficiently managing other cache line requests.

Additionally, in computer systems with multiple caches, cache consistency must be maintained in order to ensure proper system behavior. Cache consistency refers to all caches agreeing on which cache(s), or memory, has the most recent version of any particular cache line. One method of maintaining cache consistency includes a modified dirty (altered) bit associated with each cache line. This bit indicates whether the data contained in the cache is more recent than the data in main memory.

When a new memory transaction over the bus begins, all memory and caching agents participate in order to complete the transaction and maintain cache consistency. By maintaining cache consistency, all agents know whether their copy of the data is dirty, clean, or invalid.

One solution to the cache consistency problem utilizes a three-transaction, or three-phase, solution. For example, in a partial memory write data transaction to a writeback memory space, the agent making the request to memory issues the request in the first phase. Another agent on the bus, the snooping agent, which has a modified copy of the cache line corresponding to the request asserts a signal indicating this. In response to this signal, the memory unit, such as a memory controller connected to a main memory, issues a back-off signal. The requesting agent receives this back-off signal and knows it must wait to reinitiate the write request.

In phase two, the snooping agent knows from phase one that another agent desires access to the modified cache line, so the snooping agent issues a write operation of the cache line to the memory unit. Thus, main memory is updated with the most recent version of the cache line. In phase three, the memory unit releases the back-off signal to the requesting processor and the requesting processor is allowed to re-try the request. When the request is retried, the most recent version is in main memory and the request is successful (assuming no other agent has modified the cache line in its cache in the meantime).

While this three-phase solution is effective, it is not efficient in a pipelined environment due to the requirement of three transactions to successfully complete the request. Therefore, it would be advantageous to provide a mechanism that maintains cache consistency, while at the same time efficiently performing memory transactions to modified cache lines. The present invention provides such a solution.

In modern computer systems, the importance of system speed is ever-increasing. Thus, while this three-phase solution is effective, it is not efficient in a pipelined environment due to the requirement of three transactions to successfully complete the request. Therefore, it would be advantageous to provide a mechanism that maintains cache consistency, while at the same time efficiently performs memory transactions to modified cache lines. The present invention provides such a solution.

Furthermore, as microprocessors become smaller and smaller, efficient use of a limited amount of space is becoming increasingly important. Thus, it would be advantageous to provide a system which maintains cache consistency while employing a minimal amount of logic complexity.

Additionally, many modern computer systems utilize multiple processing agents, each of which may have its own cache. The number of processing agents, as well as the existence of multiple processing agents, varies. Thus, it would be advantageous to provide a versatile system which supports the insertion of multiple processors with a minimal amount of additional logic and expense. The present invention provides such a solution.

SUMMARY AND OBJECTS OF THE INVENTION

A method and apparatus for supporting read, write, and invalidation operations to memory which maintain cache consistency are described herein. Multiple caching agents and a memory unit are coupled to a bus. One agent may issue a request to another agent, or the memory unit, by placing the request on the bus. Each agent on the bus snoops the bus to determine whether the issued request can be satisfied by accessing its cache. If no cache intervention is necessary, the operation is completed between the memory controller and the caching agent. Otherwise, an agent which can satisfy the request using its cache, i.e., the snooping agent, issues a signal to the requesting agent indicating so. The snooping agent then places the cache line which corresponds to the request onto the bus during the response phase. The requesting agent retrieves this data from the bus, if needed, and proceeds to use it. The memory agent in this case becomes a participant instead of the main responding agent.

in the event of a read request, the memory unit also retrieves the cache line data from the bus. The memory unit stores the cache line in main memory, thereby updating the cache line in main memory.

In the event of a write request, the requesting agent transfers write data over the bus along with the request. This write data is retrieved by the memory unit, which temporarily stores the data. Subsequently, the snooping agent transfers the entire cache line over the bus; the memory unit retrieves this cache line, merges it with the write data previously stored, and writes the merged cache line to memory.

In a pipelined bus architecture, multiple read, write and other memory operations are carried out. One aspect of cache consistency management is an accurate method of snoop ownership dropoff and pickup boundaries. That is, the bus agents accurately indicate to other agents whether requests can be satisfied by accessing their respective caches. The bus agents participating in a memory transaction complete their snoop state transitions prior to the snoop phase of the subsequent transaction, thereby providing accurate snooping for the subsequent transaction.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1A shows an overview of a multiprocessor computer system of the present invention;

FIG. 1B shows a block diagram of an exemplary bus cluster system of the present invention;

FIG. 2 is a timing diagram of two bus transactions in one embodiment of the present invention;

FIG. 3 is a flowchart describing the steps for supporting an operation to memory on a pipelined bus in one embodiment of the present invention;

FIG. 4A is a flowchart describing the steps for performing a read operation on a pipelined bus in one embodiment of the present invention;

FIG. 4B is a flowchart describing the steps for updating cache states in a read operation in an alternate embodiment of the present invention;

FIG. 4C is a flowchart describing the steps for updating cache states in a read operation in another embodiment of the present invention;

FIG. 5A is a flowchart describing the steps for performing a write operation on a pipelined bus in one embodiment of the present invention;

FIG. 5B is a flowchart describing the steps for updating cache states in a write operation in an alternate embodiment of the present invention;

FIG. 5C is a flowchart describing the steps for updating cache states in a write operation in another embodiment of the present invention; and

FIG. 6 is a flowchart describing the steps for performing a cache line invalidation operation on a pipelined bus in one embodiment of the present invention.

DETAILED DESCRIPTION

A method and apparatus for supporting read, write, and invalidation operations to memory which maintain cache consistency are described in detail. In the following description for purposes of explanation, specific details such as processor configurations, components, bus hierarchies, etc., are set forth in order to provide a thorough understanding of the present invention. However, it will be comprehended one skilled in the art that the present invention may be practiced without these specific details. In other instances, well known structures, devices, functions, and procedures are shown in block diagram form in order not to avoid obscuring the present invention. It should be noted that the present invention can be applied to a variety of different processor architectures. Furthermore, the present invention can be practiced in a variety of manners, such as by a single or multiple chip implementation or by fabrication by silicon, gallium arsenide, or other processes.

FIG. 1A shows an overview of an example multiprocessor computer system of the present invention. The computer system generally comprises a processor-memory bus or other communication means 101 for communicating information between one or more processors 102 and 103. Processor-system bus 101 includes address, data and control buses. Processors 102 and 103 may include a small, extremely fast internal cache memory, commonly referred to as a level one (L1) cache memory for temporarily storing data and instructions on-chip. In addition, a bigger, slower level two (L2) cache memory 104 can be coupled to processor 102 for temporarily storing data and instructions for use by processor 102. In one embodiment, processors 102 and 103 are Intel.RTM. architecture compatible microprocessors; however, the present invention may utilize any type of microprocessor, including different types of processors.

Also coupled to processor-memory bus 101 is processor 103 for processing information in conjunction with processor 102. Processor 103 may comprise a parallel processor, such as a processor similar to or the same as processor 102. Alternatively, processor 103 may comprise a co-processor, such as a digital signal processor. The processor-memory bus 101 provides system access to the memory and input/output (I/O) subsystems. A memory controller 122 is coupled with processor-memory bus 101 for controlling access to a random access memory (RAM) or other dynamic storage device 121 (commonly referred to as a main memory) for storing information and instructions for processor 102 and processor 103. Memory controller 122 maintains a strong order of read and write operations. A mass data storage device 125, such as a magnetic disk and disk drive, for storing information and instructions, and a display device 123, such as a cathode ray tube (CRT), liquid crystal display (LCD), etc., for displaying information to the computer user are coupled to processor-memory bus 101.

In one embodiment memory controller 122 contains a snarfing buffer 170, cache line buffer 171, implicit writeback (IW) logic 172 and merge logic 173. Snarfing buffer 170 is a temporary data buffer for storage of data "snarfed" off the bus. Cache line buffer 171 is a temporary data buffer used for storing Implicit Writeback Data Transfer data taken off the bus. In one embodiment, IW logic 172 stores data received from the bus into either snarfing buffer 170 or cache line buffer 171, depending on the source of the data. In an alternate embodiment, IW logic 172 transfers the data directly to main memory without storing the data in a temporary buffer. IW logic 172 may also issue an implicit writeback response onto the bus, depending on the request and whether memory controller 122 will be transferring data to satisfy the request. In one mode, IW logic 172 is coupled to the bus through a bus interface 175, which takes data off the bus and places data onto the bus.

Merge logic 173 merges the data in snarfing buffer 170 and cache line buffer 171 together, then stores the cache line in main memory 121. In one mode, merge logic 173 stores the cache line in main memory 121 via a memory interface 176. In one embodiment, snarfing buffer 170 and cache line buffer 171 represent multiple buffers. That is, memory controller 122 contains multiple snarfing buffers and multiple cache line buffers. Additional buffers have not been shown so as not to clutter the drawings.

Memory controller 122's use of snarfing buffer 170, cache line buffer 171, IW logic 172, and merge logic 173 in the present invention is discussed in more detail below. The transferring of data between memory controller 122 and the bus and between memory controller 122 and main memory 121 will be understood by one skilled in the art, and thus will not be described further.

In an alternate embodiment, snarfing buffer 170, cache line buffer 171, IW logic 172, and merge logic 173 are included in other agent(s) on the bus. For example, processors 102 or 103 of FIG. 1A, or agents 153 through 156 or cluster manager 157 of FIG. 1B, may include this additional logic, snarfing data off the bus and merging it to store in the agent's internal memory. In one mode, processors 102 and 103 include a temporary storage buffer for storing the merged data line; thus, a subsequent request for the same data line can be satisfied directly from the temporary storage buffer rather than memory. In another mode, agents store the merged cache line in an L2 cache memory, such as L2 cache memory 104 of FIG. 1A.

In one embodiment, processor 102 also includes a L1 cache memory 138 coupled to a cache controller 139. In one mode, cache controller 139 asserts signals on processor-memory bus 101 and receives signals on bus 101 through bus interface 141. Cache controller 139 checks requests on bus 101 to determine whether cache memory 138 contains a copy of the requested cache line by checking the state of the cache line as described below. Cache controller 139 may assert a HIT# signal or a HITM# signal, depending on the state of the cache line, as discussed below. In one mode, cache controller 139 transfers the cache line in cache memory 138 to bus 101 when it asserts a HITM# signal in response to the request. In one embodiment, cache controller 139 places the cache line onto processor-memory bus 101 through bus interface 141. The issuing of signals and transferring of data to and from bus 101 by cache controller 139 will be understood by one skilled in the art, and thus will not be described further.

In an alternate embodiment, cache controller 139 also controls a L2 cache memory 104 coupled to processor 102. The interaction between cache controller 139 and cache memory 104 is as described above between cache memory 138 and cache controller 139.

Processor 102 as shown includes additional detail. The remaining agents on processor-memory bus 101, as well as those agents in cluster 151 and 152 of FIG. 1B, may also include the additional logic shown in processor 102. In one embodiment, all agents on the bus which issue requests include cache memory 138 and cache controller 139. This additional logic has not been shown in all agents so as not to clutter the drawings.

An input/output (I/O) bridge 124 is coupled to processor-memory bus 101 and system I/O bus 131 to provide a communication path or gateway for devices on either processor-memory bus 101 or I/O bus 131 to access or transfer data between devices on the other bus. Essentially, bridge 124 is an interface between the system I/O bus 131 and the processor-memory bus 101.

I/O bus 131 communicates information between peripheral devices in the computer system. Devices that may be coupled to system bus 131 include a display device 132, such as a cathode ray tube, liquid crystal display, etc., an alphanumeric input device 133 including alphanumeric and other keys, etc., for communicating information and command selections to other devices in the computer system (e.g., processor 102) and a cursor control device 134 for controlling cursor movement. Moreover, a hard copy device 135, such as a plotter or printer, for providing a visual representation of the computer images and a mass storage device 136, such as a magnetic disk and disk drive, for storing information and instructions may also be coupled to system bus 131.

In some implementations, it may not be required to provide a display device for displaying information. Certain implementations of the present invention may include additional processors or other components. Additionally, certain implementations of the present invention may not require nor include all of the above components. For example, processor 103, display device 123, or mass storage device 125 may not be coupled to processor-memory bus 101. Furthermore, the peripheral devices shown coupled to system I/O bus 131 may be coupled to processor-memory bus 101; in addition, in some implementations only a single bus may exist with the processors 102 and 103, memory controller 122, and peripheral devices 132 through 136 coupled to the single bus.

FIG. 1B is a block diagram showing an exemplary bus cluster system of the present invention. The present invention can apply to multiprocessor computer systems having one or more clusters of processors. FIG. 1B shows two such clusters 151 and 152. Each of these dusters are comprised of a number of processors. For example, cluster 151 is comprised of four agents 153, 154, 155 and 156 and a cluster manager 157, which includes another cache memory. Agents 153 through 156 can include microprocessors, co-processors, digital signal processors, etc.; in one embodiment, agents 153 through 156 are identical to processor 102 of FIG. 1A. Cluster manager 157 and its cache is shared between these four agents 153 through 156.

In one embodiment, each cluster also includes a local memory controller and/or a local I/O bridge. For example, cluster 151 may include a local memory controller 165 coupled to processor bus 162. Local memory controller 165 manages accesses to a RAM or other dynamic storage device 166 contained within cluster 151. Cluster 151 may also include a local I/O bridge 167 coupled to processor bus 162. Local I/O bridge 167 manages accesses to I/O devices within the cluster, such as a mass storage device 168, or to an I/O bus, such as system I/O bus 131 of FIG. 1A.

Each cluster is coupled to a memory system bus 158. These clusters 151 and 152 are coupled to various other components of the computer system through a system interface 159. The system interface 159 includes a high speed I/O interface 160 for interfacing the computer system to the outside world and a memory interface 161 which provides access to a main memory, such as a DRAM memory array (these interfaces are described in greater detail in FIG. 1A).

Certain implementations of the present invention may not require nor include all of the above components. For example, cluster 151 or 152 may comprise fewer than four agents. Additionally, certain implementations of the present invention may include additional processors or other components.

The computer system of the present invention utilizes a writeback configuration with the well-known M.E.S.I. (Modified, Exclusive, Shared, or Invalid) protocol. The caches have tags that include a bit called the modified dirty (altered) bit. This bit is set if a cache location has been updated with new information and therefore contains information that is more recent than the corresponding information in main system memory 121.

The M.E.S.I. protocol is implemented by assigning state bits for each cached line. These states are dependent upon both data transfer activities performed by the local processor as the bus master, and snooping activities performed in response to transactions generated by other bus masters. M.E.S.I. represents four states. They define whether a line is valid (i.e., hit or miss), if it is available in other caches (i.e., shared or exclusive), and if it is modified (i.e., has been modified). The four states are defined as follows:

[M]- MODIFIED This state indicates a line which is exclusively available in only this cache (all other caches are I), and is modified (i.e., main memory's copy is stale). A Modified line can be read or updated locally in the cache without acquiring the memory bus. Because a Modified line is the only up-to-date copy of data, it is the cache controller's responsibility to write-back this data to memory on snoop accesses to it.

[E]- EXCLUSIVE Indicates a line which is exclusively available in only this cache (all other caches are I), and that this line is not modified (main memory also has a valid copy). Writing to an Exclusive line causes it to change to the Modified state and can be done without informing other caches or memory. On a snoop to E state it is the responsibility of the main memory to provide the data.

[S]- SHARED Indicates that this line is potentially shared with other caches. The same line may exist in one or more other caches (main memory also has a valid copy).

[I]- INVALID Indicates that the line is not available in the cache. A read to this cache line will be a miss and cause the cache controller to execute a line fill (i.e., fetch the entire line and deposit it into the cache SRAM).

The states determine the actions of the cache controller with regard to activity related to a line, and the state of a line may change due to those actions. All transactions which may require state changes in other caches are broadcast on the shared memory bus.

In one embodiment, bus activity is hierarchically organized into operations, transactions, and phases. An operation is a bus procedure that appears atomic to software such as reading a naturally aligned memory location. Executing an operation usually requires one transaction but may require multiple transactions, such as in the case of deferred replies in which requests and replies are different transactions. A transaction is the set of bus activities related to a single request, from request bus arbitration through response-initiated data transfers on the data bus. In this embodiment, a transaction is the set of bus activities related to a single request, from request bus arbitration through response-initiated data transfers on the data bus.

A transaction contains up to six distinct phases. However, certain phases are optional based on the transaction and response type. A phase uses a particular signal group to communicate a particular type of information. These phases are:

Arbitration Phase

Request Phase

Error Phase

Snoop Phase

Response Phase

Data Transfer Phase

In one mode, the Data Transfer Phase is optional and used if a transaction is transferring data. The data phase is request-initiated, if the data is available at the time of initiating the request (e.g., for a write transaction). The data phase is response-initiated, if the data is available at the time of generating the transaction response (e.g., for a read transaction). A transaction may contain both a request-initiated data transfer and a response-initiated data transfer.

Different phases from different transactions can overlap, thereby pipelining bus usage and improving bus performance. FIG. 2 shows exemplary overlapped request/response phases for two transactions. Referring to FIG. 2, every transaction begins with an Arbitration Phase, in which a requesting agent becomes the bus owner. The second phase is the Request Phase in which the bus owner drives a request and address information on the bus. The third phase of a transaction is an Error Phase, three clocks after the Request Phase. The Error Phase indicates any immediate errors triggered by the request. The fourth phase of a transaction is a Snoop Phase, four or more clocks from the Request Phase. The Snoop Phase indicates if the cache line accessed in a transaction is valid or modified (dirty) in any agent's cache.

The Response Phase indicates whether the transaction failed or succeeded, whether the response is immediate or deferred, and whether the transaction includes data phases. If a transaction contains a response-initiated dat