WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multiple variable cache replacement policy    
United States Patent6523091   
Link to this pagehttp://www.wikipatents.com/6523091.html
Inventor(s)Tirumala; Anup S. (San Jose, CA); Tremblay; Marc (Menlo Park, CA)
AbstractA method for selecting a candidate to mark as overwritable in the event of a cache miss while attempting to avoid a write back operation. The method includes associating a set of data with the cache access request, each datum of the set is associated with a way, then choosing an invalid way among the set. Where no invalid ways exist among the set, the next step is determining a way that is not most recently used among the set. Next, the method determines whether a shared resource is crowded. When the shared resource is not crowded, the not most recently used way is chosen as the candidate. Where the shared resource is crowded, the next step is to determine whether the not most recently used way differs from an associated source in the memory and where the not most recently used way is the same as an associated source in the memory, the not most recently used way is chosen as the candidate. Where the not most recently used way differs from an associated source in the memory, the candidate is chosen as the way among the set that does not differ from an associated source in the memory. Where all ways among the set differ from respective sources in the memory, the not most recently used way is chosen as the candidate and the not most recently used way is stored in the shared resource.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Tirumala; Anup S. (San Jose, CA); Tremblay; Marc (Menlo Park, CA)
Owner/Assignee     Sun Microsystems, Inc. (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     February 18, 2003
Application Number     09/931,115
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 16, 2001
US Classification     711/133 711/134 711/136 711/143
Int'l Classification     G06F  012/00
Examiner     Kim; Matthew
Assistant Examiner     Elmon; Stephen
Attorney/Law Firm     Zagorin, O'Brien & Graham, LLP
Address
Parent Case     CROSS-REFERENCE TO RELATED APPLICATION This application is a Continuation of U.S. patent application Ser. No. 09/411,468, filed Oct. 1, 1999 now U.S. Pat. No. 6,282,617 and entitled "Multiple Variable Cache Replacement Policy," and naming Anup S. Tirumala and Marc Tremblay as inventors issued as U.S. Pat. No. 6,282,617 on Aug. 21, 2001. This application relates to U.S. patent application Ser. No. 09/204,480, filed Dec. 12, 1998, and entitled, "A Multiple-Thread Processor for Threaded Software Applications," and naming Marc Tremblay and William Joy as inventors. These applications are assigned to Sun Microsystems, Inc., the assignee of the present invention, and are hereby incorporated by reference, in their entirety and for all purposes.
Priority Data    
USPTO Field of Search     711/133 711/134 711/136 711/143
Patent Tags     multiple variable cache replacement policy
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6282617
Tirumala
711/133
Aug,2001

[0 after 0 votes]
6192449
Circello
711/133
Feb,2001

[0 after 0 votes]
6014728
Baror
711/133
Jan,2000

[0 after 0 votes]
5996071
White
712/238
Nov,1999

[0 after 0 votes]
5765199
Chang
711/168
Jun,1998

[0 after 0 votes]
5765190
Circello
711/118
Jun,1998

[0 after 0 votes]
5734881
White
712/238
Mar,1998

[0 after 0 votes]
5701448
White
712/233
Dec,1997

[0 after 0 votes]
5636354
Lear
711/3
Jun,1997

[0 after 0 votes]
5627992
Baror
711/133
May,1997

[0 after 0 votes]
5590379
Hassler
710/31
Dec,1996

[0 after 0 votes]
5561779
Jackson
711/122
Oct,1996

[0 after 0 votes]
5553262
Ishida
711/123
Sep,1996

[0 after 0 votes]
5479636
Vanka
711/133
Dec,1995

[0 after 0 votes]
5428761
Herlihy
711/130
Jun,1995

[0 after 0 votes]
5386546
Hamaguchi
711/133
Jan,1995

[0 after 0 votes]
5025366
Baror
711/128
Jun,1991

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A circuit that selects a candidate to mark as overwritable in the event of a cache miss, comprising:

logic that receives a cache access request, wherein said cache access request is associated with a main memory address;

logic that determines whether the contents of said main memory address are present in a data cache;

logic that associates, when the contents of said main memory address are not present in said data cache, said main memory address with a set within said data cache, wherein said set includes a plurality of ways;

logic that determines whether any of said plurality of ways is an invalid way;

logic that selects, if said invalid way exists, said invalid way as the candidate;

cache replacement logic that selects, where no said invalid way exists among said set, a way among said set as a preliminary candidate, wherein said cache replacement logic is based on the state of at least one affected resource.

2. A computer system for selecting a candidate to mark as overwritable in the event of a cache miss comprising:

a main memory;

a data cache that is shared by a plurality of processing units;

means for receiving a main memory address in said main memory;

means for determining whether the contents of said main memory address are present in said data cache;

means for associating said main memory address with a set within said data cache when the contents of said main memory address are not present in said data cache, wherein said set includes a plurality of ways;

means for determining whether any of said plurality of ways is an invalid way and that selects said invalid way as the candidate;

means for applying, where no invalid ways exist among said set, a cache replacement policy to select a way among said set as a preliminary candidate, wherein said cache replacement policy is based on the state of at least one affected resource.

3. The computer system of claim 2 wherein said at least one affected resource comprises a cross bar switch.

4. The computer system of claim 2 wherein said at least one affected resource comprises a memory controller.

5. The computer system of claim 2 wherein said at least one affected resource comprises a memory controller buffer.

6. The computer system of claim 2 wherein said at least one affected resource comprises a write back buffer.

7. The computer system of claim 6 wherein said means for applying a cache replacement policy further comprises:

means for determining whether said write back buffer is crowded;

means for determining, when said write back buffer is crowded, whether said preliminary candidate is dirty;

means for selecting said preliminary candidate as the candidate when (said write back buffer is not crowded) OR (said write back buffer is crowded AND said preliminary candidate is not dirty); and

means for storing the contents of said preliminary candidate in said write back buffer and selects said preliminary candidate as the candidate, when said write back buffer is crowded and said preliminary candidate is dirty.

8. A circuit that selects a candidate to mark as overwritable in the event of a cache miss, comprising:

means for receiving a cache access request, wherein said cache access request is associated with a main memory address;

means for determining whether the contents of said main memory address are present in a data cache;

means for associating, when the contents of said main memory address are not present in said data cache, said main memory address with a set within said data cache, wherein said set includes a plurality of ways;

means for determining whether any of said plurality of ways is an invalid way;

means for selecting, if said invalid way exists, said invalid way as the candidate;

means for selecting, when no said invalid way exists among said set, a way among said set as a preliminary candidate, wherein said selecting a way is based on the state of at least one affected resource.

9. The circuit of claim 8 wherein said at least one affected resource comprises a cross bar switch.

10. The circuit of claim 8 wherein said at least one affected resource comprises a memory controller.

11. The circuit of claim 8 wherein said at least one affected resource comprises a memory controller buffer.

12. The circuit of claim 8 wherein said at least one affected resource comprises a write back buffer.

13. The circuit of claim 8 wherein said means for selecting a way comprises:

means for determining whether said write back buffer is crowded;

means for determining, when said write back buffer is crowded, whether said preliminary candidate is dirty;

means for selecting, where (said write back buffer is not crowded) OR (said write back buffer is crowded AND said preliminary candidate is not dirty), said preliminary candidate as the candidate; and

means for storing, where said write back buffer is crowded and said preliminary candidate is dirty, the contents of said preliminary candidate in said write back buffer and that selects said preliminary candidate as the candidate.
 Description Submit all comments and votes
 


BACKGROUND

1. Field of the Invention

The invention relates to processor caches and more particularly to determining which data in a cache to overwrite or write back in the event of a cache miss.

2. Discussion of Related Art

Processors have attained widespread use throughout many industries. A goal of any processor is to process information quickly. One technique which is used to increase the speed with which the processor processes information is to provide the processor with an architecture which includes a fast local memory called a cache. Another technique which is used to increase the speed with which the processor processes information is to provide a processor architecture with multiple processing units.

A cache is used by the processor to temporarily store instructions and data. A cache which stores both instructions and data is referred to as a unified cache; a cache which stores only instructions is an instruction cache, and a cache which stores only data is a data cache. Providing a processor architecture with either a unified cache or an instruction cache and a data cache is a matter of design choice.

A factor in the performance of the processor is the probability that a processor-requested data item is already in the cache. When a processor attempts to access an item of information, it is either present in the cache or not. If present, a cache "hit" occurs. If the item is not in the cache when requested by the processor, a cache "miss" occurs. It is desirable when designing a cache system to achieve a high cache hit rate, or "hit ratio".

After a cache miss occurs, the information requested by the processor must then be retrieved from memory and brought into the cache so that it may be accessed by the processor. A search for an item of information that is not stored in the cache after a cache miss usually results in an expensive and time-consuming effort to retrieve the item of information from the main memory of the system. To maximize the number of cache hits, data that is likely to be referenced in the near future operation of the processor is stored in the cache. Two common strategies for maximizing cache hits are storing the most recently referenced data and storing the most commonly referenced data.

In most existing systems, a cache is subdivided into sets of cache line slots. When each set contains only one line, then each main memory line can only be stored in one specific line slot in the cache. This is called direct mapping. In contrast, each set in most modern processors contains a number of lines. Because each set contains several lines, a main memory line mapped to a given set may be stored in any of the lines, or "ways", in the set.

When a cache miss occurs, the line of memory containing the missing item is loaded into the cache, replacing another cache line. This process is called cache replacement. In a direct mapping system, each line from main memory is restricted to be placed in a single line slot in the cache. This direct mapping approach simplifies the cache replacement process, but tends to limit the hit ratio due to the lack of flexibility with line mapping. In contrast, flexibility of line mapping, and therefore a higher hit ratio, can be achieved by increasing the level of associativity. Increased associativity means that the number of lines per set is increased so that each line in main memory can be placed in any of the line slots ("ways") within the set. During cache replacement, one of the lines in the set must be replaced. The method for deciding which line in the set is to be replaced after a cache miss is called a cache replacement policy.

Several conventional cache replacement policies for selecting a datum in the cache to overwrite include Random, Least-Recently Used (LRU), Pseudo-LRU, and Not-Most-Recently-Used (NMRU). Random is the simplest cache replacement policy to implement, since the line to be replaced in the set is chosen at random. The LRU method is more complex, as it requires a logic circuit to keep track of actual access of each line in the set by the processor. According to the LRU algorithm, if a line has not been accessed recently, chances are that it will not be accessed any more, and therefore it is a good candidate for replacement. Another replacement policy, NMRU, keeps track of the most recently accessed line. This most recently accessed line is not chosen for replacement, since the principle of spatial locality says that there is a high probability that, once an information item is accessed, other nearby items in the same line will be accessed in the near future. The NMRU method requires a logic circuit to keep track of the most recently accessed line within a set. In all cache replacement policies, the line selected for replacement may be referred to as a "candidate".

Once a candidate is selected, further processing must occur in the cache in order to ensure the preservation of memory coherency. If the contents of the candidate has been altered in the cache since it was retrieved from memory, then the candidate is "dirty" and a memory incoherency exists. Before the contents of the dirty candidate can be replaced with the new information requested by the processor, the current contents of the dirty candidate must be updated to memory. This operation is called a "write back" operation. While the implementation of such a scheme allows reduced bus traffic because multiple changes to a cache line need be loaded into memory only when the cache line is about to be replaced, a drawback to the write back operation is delay. That is, access to the cache is slowed or even halted during a write back operation.

SUMMARY

A method selects a candidate to mark as overwritable in the event of a cache miss while attempting to avoid a write back operation. The selection includes associating a set of data with the cache access request, each datum of the set is associated with a way, then choosing an invalid way among the set. Where no invalid ways exist among the set, the processor next determines a way that is not most recently used among the set. Next, the method determines whether a shared resource is crowded. When the shared resource is not crowded, the not most recently used way is chosen as the candidate. Where the shared resource is crowded, the method next determines whether the not most recently used way differs from an associated source in the memory and where the not most recently used way is the same as an associated source in the memory, the not most recently used way is chosen as the candidate.

In one embodiment, the method for selecting a candidate to mark as overwritable in the event of a cache miss includes receiving the cache access requests, where the request is associated with a main memory address, determining whether the contents of the main memory address are present in a data cache, when the access "misses", associating the main memory address with a set within the data cache, determining whether any way in the set is invalid, and choosing an invalid way, if one exists, as the candidate. If no invalid way exists, a cache replacement policy is applied to select a way as a preliminary candidate. The cache replacement policy is based on the state of at least one affected resource, such as the "crowded" state of a write back buffer, the state of a cross bar switch, or the state of a memory controller buffer.

When the shared resource is a write back buffer, when the not most recently used way differs from an associated source in the memory, the candidate is chosen as the way among the set that does not differ from an associated source in the memory. Where all ways among the set differ from respective sources in the memory, the not most recently used way is chosen as the candidate and the not most recently used way is stored in the write back buffer.

When the back buffer is crowded, one embodiment determines whether the preliminary candidate is dirty. Where the write back buffer is not crowded or the write back buffer is crowded and the preliminary candidate is not dirty, the preliminary candidate is chosen as the candidate. Where, however, the write back buffer is crowded and the preliminary candidate is dirty, then the contents of the preliminary candidate are stored in the write back buffer and the preliminary candidate is chosen s the candidate.

In one embodiment, all or some of the method described above is implemented in a computer system having a data cache that is shared by a plurality of processing units.

The present invention will be more fully understood in light of the following detailed description taken together with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a computer system in accordance with an embodiment of the present invention.

FIG. 2 shows a block diagram of a data cache unit of the computer system of FIG. 1.

FIG. 3 shows a sample status word.

FIG. 4 shows a block diagram of a shared write back buffer of the data cache unit of FIG. 2.

FIG. 5 shows a block diagram of a selection circuit of the data cache unit of FIG. 2.

FIG. 6 shows two logical banks of a data array of the data cache unit of FIG. 2.

FIG. 7 shows one embodiment of a cache replacement operation.

FIG. 8 shows one embodiment of an arbitration circuit.

The use of the same reference numbers in different figures indicates the same or like elements.

DETAILED DESCRIPTION

The present invention relates to a multi-variable cache replacement policy that takes information in addition to the state of the cache into account. As with many prior art cache replacement policies, the status of the requested set is taken into account. Status bits indicate the state of the set, including which ways are available to be overwritten (i.e., invalid), which ways have recently been used for a cache miss, and which ways are "dirty" and therefore are candidates for a write back operation. In addition, the present invention also takes into account the state of the machine outside the cache. That is, the present invention also factors into the replacement policy the state of resources that are affected by the replacement policy. In the preferred embodiment, this is accomplished by factoring into the replacement policy whether the write back buffer is crowded. This allows more efficiency than prior art systems. If the write back buffer is crowded, a dirty way is not selected unless there are no non-dirty candidates. Triggering of a write back operation is thus prevented until both the write back buffer is crowded and all ways in the set are both valid and dirty. In alternative embodiments, the state of other resources, such as a cross bar switch, memory controller, and memory controller buffers may be factored into the replacement policy. The following sets forth a detailed description of a mode for carrying out the invention. The description is intended to be illustrative of the invention and should not be taken to be limiting.

FIG. 1 shows a computer system 100 in accordance with the present invention. Computer system 100 includes a data cache unit (DCU) 102 coupled to first processing unit 104 (MPU0) and second processing unit 106 (MPU1). While the preferred embodiment includes two processing units, the invention may include a plurality of any number of processing units. The units included in this plurality, such as first processing unit 104 and second processing unit 106 may be media processor units. For example, U.S. application Ser. No. 09/204,480 filed by inventors Marc Tremblay and William Joy, entitled "Multiple-Tread Processor for Threaded Software Applications", which is hereby incorporated by reference, sets forth a media processor unit in accordance with the invention.

FIG. 1 illustrates that the data cache unit 102 is coupled to each MPU as well as to main memory. First processing unit 104 is coupled to data cache unit 102 via a 64-bit data path, a 32-bit address path, a retry path and a hit path. Second processing unit 106 is also coupled to data cache unit 102 via a 64-bit data path, a 32-bit address path, a retry path and a hit path. The data cache unit 102 is coupled to a conventional main memory 108 by conventional bus 110. More specifically, data cache unit 102 is coupled to bus 110 via a 64-bit data-in path, as well as a 64-bit data-out path, and a 27-bit buffer flush address path.

FIG. 2 illustrates in greater detail the data cache unit 102, which stores data for faster access by first processing unit 104 and second processing unit 106 than would be possible by accessing main memory 108. FIG. 2 shows that data cache unit 102 comprises data array 202, status array 204, directory array 206, fill buffer 208, shared write back buffer 210, and selection circuit 212. Each of these constituents of the data cache unit 102 is discussed in further detail below. Data array 202 is discussed first, followed by discussions of directory array 206, status array 204, selection circuit 212, fill buffer 208, and write back buffer 210.

FIG. 2 illustrates that data array 202 receives a 32-bit address signal (add_MPU0) from first processing unit 104, a 32-bit address signal (add_MPU1) from second processing unit 106, and a 256-bit data signal from fill buffer 208. Data array 202 also receives first and second hit signals from directory array 206 (hit0, hit1). Data array 202 provides a 64-bit data signal to first processing unit 104 (datum0) and a 64-bit data signal to second processing unit 106 (datum1). Data array 202 also provides the 64-bit data signal datum0 and the 64-bit data signal datum 1 to write back buffer 210.

Data array 202 stores the data of data cache unit 102. In the preferred embodiment, data array 202 includes four logical banks 240a-240d, each bank storing 128 lines of 256 bits. A suitable implementation of a logical bank 240 is a static random access memory (SRAM). FIG. 2 shows that data array 202 also comprises two multiplexers 230a, 230b. The operation of data array 202 is described in more detail below.

Regarding the directory array 206, FIG. 2 illustrates that directory array 206 receives the 32-bit address signal (add_MPU0) from first processing unit 104 and the 32-bit address signal (add_MPU1) from second processing unit 106. Directory array 206 also receives the first and second 15-bit status signals from status array 204 (status0, status1). Directory array 206 provides first and second hit signals to data array 202. Directory array 206 also provides first and second data-out signals containing a tag address (rdata0, rdata1) to write back buffer 210.

Directory array 206 stores addresses of data stored in a corresponding location within data array 202 of data cache unit 102. Directory array 206 includes four logical banks 260a-260d that each stores 128 20-bit wide lines, where the 20-bits correspond to the 20 more significant bits of the 32-bit address. A datum is stored in a predetermined location within one of the four logical banks 260a-260d. Each of the four predetermined locations is labeled a "way". A "set" includes the four possible "ways" in which a datum can be stored. A suitable implementation of a logical bank 260 is a static random access memory (SRAM). FIG. 2 shows that directory array 206 also includes two comparators 270a, 270b. The operation of directory array 206 is described in more detail below.

Turning now to the status array, FIG. 2 illustrates that status array 204 receives the 32-bit address signal (add_MPU0) from first processing unit 104 and the 32-bit address signal (add_MPU1) from second processing unit 106. Status array 204 also receives first and second 15-bit status signals from selection circuit 212 (status0, status1). Status array 204 provides valid bits to the directory array 206. Status array 204 also provides a first and second 15-bit status signal (status0, status1). to selection circuit 212.

Status array 204 stores status words that include information concerning each "way" of data array 202. Status array 204 includes one or more logical banks 250 for storing 128 status words that are 15 bits each. A suitable implementation of a logical bank 250 is a static random access memory (SRAM). The operation of status array 204 is described in more detail later.

Still referring to FIG. 2, our discussion of the data cache unit 102 constituents turns to the selection circuit 212. Selection circuit 212 generates a new 15-bit status word to be updated a cycle after every load/store access and stored in the status array 204. (FIG. 3 illustrates the format of the 15-bit status word, as is discussed immediately below.) The selection circuit 212 also generates the victim number for cache replacement and indicates if the candidate is dirty, signifying that the candidate's current data must be loaded into the write back buffer before it is overwritten. FIG. 2 illustrates that the selection circuit 212 receives from the status array 204 the status word for the access. The selection circuit then modifies the status word. For example, the dirty bit may need to be set (on a store hit), the replacement bits may need to be updated and the valid bit may need to be cleared. The updated status word 300 is then set back to the status array.

FIG. 3 shows a sample status word 300. Status word 300 is a 15-bit word that indicates lock status, a reference way, whether each of four ways, 0-3, has been utilized on a previous cache miss, whether each of the four ways is dirty, and whether each of the four ways is valid. More specifically, bits R1 and R2 represent the reference way to be used by the selection circuit 212 to implement the cache replacement algorithm; as discussed below. For instance, in a NMRU cache replacement policy, bits R1 and R2 would contain the most-recently-used way for a particular set. In a LRU cache replacement policy, bits R1 and R2 would contain the least-recently-used way. Bits M0-M3 indicate whether the corresponding way has already been taken due to a cache miss. This M indicator simplifies the victim number generation logic in the cache replacement algorithm. Bits V0-V3 indicate whether the corresponding way is valid. An invalid way is a way that is free of meaningful data and therefore is a likely candidate to be overwritten on a cache miss. In other words, no new data has been fetched into an invalid way since that way was last flushed to memory. Bits D0-D3 indicate whether the corresponding way is dirty. That is, not only does that way contain meaningful data, but the data has been changed since it was retrieved from memory, and a memory incoherency therefore exists. Bit L, the lock bit, indicates that the cache line is locked in place and cannot be moved. The lock bit is set, for example, upon an atomic load hit. Setting the lock bit operates to disable any access to the set until the lock bit is reset.

Selection circuit 212 of data cache unit 102 implements a cache replacement policy by changing the "miss" bit in the appropriate status word to reflect which "way" is a candidate for replacement. Selection circuit 212 receives status words associated with requested data from status array 204 and provides an updated status word to status array 204 where applicable.

FIG. 5 shows a block diagram of selection circuit 212 which updates the status array 204 and implements the multi-variable replacement policy 700 of the present invention to generate a victim (or "candidate") number to be used for cache overwrite upon a cache miss. Selection circuit 212 receives the 15-bit status0 signal and the 15-bit status1 signal from the status array 204 as well as the full bits f1, f2 from the write back buffer 210. Selection circuit 212 also receives as control inputs a miss0 and miss1 signal. These 4-bit miss signals are logical inversions of the hit0 and hit1 signals that are sent from the directory array 206 to the data array 202. Another input that the selection circuit 212 is a fill buffer status from the fill buffer 208. Selection circuit 212 provides an updated 15-bit status0 signal and an updated 15-bit status1 signal to status array 204. The operation of selection circuit 212 will be discussed in more detail below.

The fill buffer 208, the next constituent of the data cache unit 102 to be discussed, is used when a cache miss occurs. A cache miss occurs when the line of memory requested by a processor MPU0, MPU1 is not already in the data cache unit 102. Fill buffer 208 receives the 32-bit address signal (add_MPU0) from first processing unit 104 and the 32-bit address signal (add_MPU1) from second processing unit 106. Fill buffer 208 receives a 64-bit data signal from main memory 108 and holds the data from main memory 108 that is to be stored in the data cache unit 102. FIG. 2 illustrates that fill buffer 208 includes a data register 220 that stores data to be written into data array 202. Data register 220 stores 256 bits of data. Fill buffer 208 provides the 256-bit data signal to data array 202. Fill buffer 208 also sends a 64-bit data signal, data_MPU0, and a second 64-bit data signal, data_MPU1, to the data array 202. Finally, fill buffer 208 also provides a fill buffer hit status to the data array 202 and to the selection circuit 212.

FIG. 2 further illustrates that fill buffer 208 also includes an address register 222 that stores addresses and certain status bits associated with data to be written into the data array. Address register also stores the "way" to which the data is to be stored in the data array. The operation of fill buffer 208 is described in more detail below.

Finally, our discussion of the data cache unit 102 constituents turns to the write back buffer 210. Write back buffer 210 serves, when a cache miss occurs, as a temporary place holder for dirty blocks until they can be pushed to memory. A "dirty" block is a block whose contents have been modified since the block was last obtained from main memory 108. Before a dirty block is stored in the write back buffer 210, the selection circuit 212 assigns it a "victim" number that is stored in the status word 300 (see M0, M1, M2, M3 in FIG. 3, discussed below). A victim number is the particular way chosen, according to the cache replacement policy, to be the place holder on a cache miss for a given set. Once a dirty block is "victimized", then data may be read out of the dirty victim and latched into the write back buffer 210. FIG. 2 illustrates that the write back buffer 210 receives from the data array 202 a 64-bit data signal (datum0) associated with first processing unit 104 and also receives from the data array 202 a 64-bit data signal (datum1) associated with second processing unit 106. The write back buffer also receives from the directory array 206 a data-out signal (rdata0) for first processing unit 104 and a data-out signal (rdata1) for second processing unit 106. The data-out signals (rdata0, rdata1) contain the tag address of the dirty block. FIG. 2 illustrates that the write back buffer 210 also receives a set_addr signal for each processing unit 104, 106, which indicates the set address for the dirty block. The set_addr signals are made up of all or part of the bits present in add_MPU0 and add_MPU1.

FIG. 4 shows a block diagram of shared write back buffer 210. The write back buffer is shared by MPU0104 and MPU1106 (as is illustrated in FIG. 2) because there is only one write back buffer 210 in the data cache unit 102. FIG. 4 illustrates that the shared write back buffer 210 includes address bank 402, data bank 404, and selector circuit 406, which is controlled by the cache control logic (not shown). Data bank 404 of shared write back buffer 210 comprises two entries, each entry consisting of a cache-line-sized data register 404a, 404b. In the preferred embodiment, each data register 404a, 404b stores 256 bits of data that it receives from the data array 202. Similarly, address bank 402 of the write back buffer 210 also comprises two entries 402a, 402b, with each entry able to store the address of a dirty candidate that should be written back to main memory 108. One skilled in the art will recognize that the architecture of a write back buffer may have many variations and should not be limited to the physical implementation depicted in FIG. 4. A write back buffer can have several levels. For instance, a shared write back buffer could be implemented in multiple levels, instead of the two-entry address bank 402 and data bank 404 illustrated in FIG. 4, with each MPU 104, 106 having a lower-level separate buffer that communicates-with a higher-level shared buffer. Similarly, a shared write back buffer could have a shared write back buffer communicating with a lower-level split write back buffer Furthermore, one skilled in the art will realize that, although the buffer components 404a, 404b, 404a, 404b, 406 are logically connected, they need not necessarily reside physically adjacent to each other within the processor architecture. (As an analogous example, one should note that, in the preferred embodiment, the fill buffer data registers 222a, 222b illustrated in FIG. 2 are logically associated with the fill buffer 208, but they are physically partitioned as part of the data array 202.)

Address entries 402a, 402b further include an f bit, f1 and f2, respectively, that indicates whether each respective address entry 402a, 402b is full. For example, if both f1 and f2 contain a value of binary one, then write back buffer 210 is full. The f1 and f2 bits are set by control logic associated with the write back buffer 210. Shared write back buffer 210 provides signal "full" to the selection circuit 212 for use in the cache replacement policy described in more detail below.

The present invention's use of a single shared write buffer 210 comprising multiple data registers 404a, 404b and address entries 402a, 402b departs from prior art data cache units that contain a separate write back buffer allocated to each processor. The preferred embodiment of the present invention, with its shared write back buffer 210, provides for more efficient usage of the data registers 404a, 404b. Because write back operations slow or halt the operation of the data cache unit 102, providing a shared write back buffer 210 reduces delays in the operation of the data cache unit 102 by reducing write back operations. For instance, in a prior art system, when a first processor causes a write of a first data word to an associated first write back buffer but the associated first write back buffer is filled to capacity, a data word stored in the first write back buffer is written back to memory. In contrast, FIG. 4 illustrates that the present invention provides a second register 404b with capacity to store a data word. Applying the above example to the present invention, the write back operation could be avoided by writing the first data word to the second data register 404b. If both entries of the write back buffer 210 is full, then it operates in a first-in-first-out (FIFO) fashion