WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for cache memory access with separate fetch and store queues    
United States Patent5590379   
Link to this pagehttp://www.wikipatents.com/5590379.html
Inventor(s)Hassler; Joseph A. (West Chester, PA); Deal; Gregory K. (West Chester, PA); Koss; Timothy A. (Pottstown, PA); Heil; Stephen F. (West Chester, PA)
AbstractA two-way set associative cache memory system for a parallel-pipelined computer system uses separate queue structures to hold main memory fetch and store requests generated by the central processing unit (CPU). A memory access unit, coupled between the cache memory system and the CPU selects the next request to be processed by the main memory from between the requests at the heads of the fetch and store queues. The request at the head of the fetch queue is preferred over the request at the head of the store queue unless the memory partition to be used by the fetch request is still busy with a previous request while the partition to be used by the store request is idle. Data retrieved from the main memory replaces data in the cache according to an algorithm that prefers empty pages within a set to pages that contain data and prefers pages that do not have pending update requests scheduled to pages that do have pending update requests scheduled. In the event that only pages having pending update requests are found, input requests to the cache are inhibited until at least one fetch request for a page in the set is completed and the page is no longer marked as having a pending update request.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5590379
Method and apparatus for cache memory access with separate fetch and

     store queues - US Patent 5590379 Drawing
Method and apparatus for cache memory access with separate fetch and store queues
Inventor     Hassler; Joseph A. (West Chester, PA); Deal; Gregory K. (West Chester, PA); Koss; Timothy A. (Pottstown, PA); Heil; Stephen F. (West Chester, PA)
Owner/Assignee     Unisys Corporation (Blue Bell, PA)
Patent assignment
All assignments
Publication Date     December 31, 1996
Application Number     08/212,129
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 14, 1994
US Classification     710/31 710/3 710/5 711/113 711/119 711/136 711/141 711/164
Int'l Classification     G06F 013/18 G06F 005/06 G06F 013/364
Examiner     Swann; Tod R.
Assistant Examiner     Peikari; James
Attorney/Law Firm     Starr; Mark T. Sowell; John B. , Scott; Thomas J. ,
Address
Parent Case     This application is a divisional of Ser. No. 013,254, filed Feb. 3, 1993, now U.S. Pat. No. 5,450,564, which is a continuation of Ser. No. 518,911, filed May 4, 1990, abandoned.
Priority Data    
USPTO Field of Search     395/375 395/250 395/440 395/446 395/463 395/468 395/491 395/823 395/825 395/851 246/488 364/DIG. 1
Patent Tags     cache memory access separate fetch and store queues
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5379379
Becker
711/3
Jan,1995

[0 after 0 votes]
5185871
Frey
712/205
Feb,1993

[0 after 0 votes]
5133522
Testart
246/468
Jul,1992

[0 after 0 votes]
5123095
Papadopoulos
712/218
Jun,1992

[0 after 0 votes]
4901233
Liptay
712/218
Feb,1990

[0 after 0 votes]
4872111
Daberkow
711/140
Oct,1989

[0 after 0 votes]
4870569
Nakatani
711/151
Sep,1989

[0 after 0 votes]
4870565
Yamamoto
711/113
Sep,1989

[0 after 0 votes]
4853603
Onoue
700/262
Aug,1989

[0 after 0 votes]
4853846
Johnson
710/300
Aug,1989

[0 after 0 votes]
4843543
Isobe
711/148
Jun,1989

[0 after 0 votes]
4839791
Ito

Jun,1989

[0 after 0 votes]
4814981
Rubinfeld
711/144
Mar,1989

[0 after 0 votes]
4797813
Igarashi
711/118
Jan,1989

[0 after 0 votes]
4794521
Ziegler
711/130
Dec,1988

[0 after 0 votes]
4757440
Scheuneman
714/53
Jul,1988

[0 after 0 votes]
4755936
Stewart
711/118
Jul,1988

[0 after 0 votes]
4745545
Schiffleger

May,1988

[0 after 0 votes]
4742446
Morioka
711/144
May,1988

[0 after 0 votes]
4736287
Druke
711/128
Apr,1988

[0 after 0 votes]
4707784
Ryan
711/140
Nov,1987

[0 after 0 votes]
4682284
Schrofer
710/55
Jul,1987

[0 after 0 votes]
4648030
Bomba
711/141
Mar,1987

[0 after 0 votes]
4636946
Hartung
711/136
Jan,1987

[0 after 0 votes]
4583166
Hartung
711/113
Apr,1986

[0 after 0 votes]
4533996
Hartung
710/3
Aug,1985

[0 after 0 votes]
4430701
Christian
711/119
Feb,1984

[0 after 0 votes]
4403288
Christian
710/5
Sep,1983

[0 after 0 votes]
5134561
Liptay
711/164
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


The invention claimed is:

1. In a computer system having a main memory and a central processing unit (CPU) including an N-way set associative cache memory system with each data set including M data values and being stored in a respectively different one of N pages of said cache memory, where M and N are integers, said cache memory system including a list of data sets reserved for pending operations, a method of selecting which data set of an associated plurality of N data sets, constituting a group of candidates for replacement, is to be replaced by a set of data to be retrieved from the main memory, said method comprising the steps of:

determining if any of the N page entries for the N data sets in the associated group of data sets is marked for a deferred replace operation;

if a selected one of the N pages entries is found to be marked as being subject to a deferred replace operation, removing the data set scored in the selected page from the group of candidates for replacement;

comparing the page entries to the list of reserved data sets to determine if the page entries for all of the candidates for replacement are reserved for pending operations;

if one of the page entries for the candidates for replacement is not reserved for pending operations, selecting for replacement said candidate for replacement which is not reserved; and,

if all of the page entries for the candidates for replacement are reserved for pending operations, selecting one of said candidates for replacement depending on whether each of said pending operations is a store operation or a fetch operation, and marking the page entry corresponding to the selected candidate data set as being subject to a deferred replace operation.

2. The method of selecting a data set to be replaced set forth in claim 1, wherein said deferred replace operation includes the steps of:

scheduling said cache memory system to transmit the M data values in the selected data set to main memory when the status of the selected data set is no longer reserved for a previous request; and

changing the status of the selected data set to be reserved for said deferred replace operation.

3. A method in accordance with claim 1, in which the step of selecting one of said candidates when all of the page entries for the candidates for selection are reserved for pending operations includes the steps of:

if all of the page entries for the candidates for selection are reserved for pending snore operations, and one of said candidates for selection has an active fetch request, selecting another of said candidates selection which does not have an active fetch request;

if all of the page entries for the candidates for selection are reserved for pending fetch operations, and at least one of said candidates for selection is empty, selecting one of said candidates for selection which is empty; and

if all of the page entries for the candidates for selection are reserved for pending fetch operations, and none of said candidates for selection is empty, selecting any one of said candidates for selection.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

This invention relates to cache memory systems for use in a parallel-pipelined computer system and in particular to such a cache memory having an efficient interface with main memory.

A conventional computer system includes a central processing unit (CPU) a memory subsystem and an input/output (I/O) subsystem. In some computer systems, the principle connection among these three entities is a single bidirectional bus. In other systems, the CPU and memory are connected by one bus while the I/O subsystem is connected to the memory by another bus. It is well known that, in either configuration, the performance of the system is at least partly dependent on the speed at which data can be transferred between the CPU and the memory.

In general terms, this problem has been addressed by increasing the "bandwidth" of the bus. As used herein, the bandwidth of a bus is a measure of the amount of data which can be transferred over the bus in a unit time interval.

There are, however, many ways in which the apparent bandwidth of the memory bus may be increased. One method is to provide a wider bus, that is to say one which conveys a greater number of bits in parallel between the CPU and the memory. Another method is to provide faster memory. This may be done either by using higher performance memory components or by partitioning the memory so that multiple memory fetch operations may proceed in parallel.

Interleaving is a particular type of memory partitioning. In an interleaved memory subsystem, memory cells having successive addresses are assigned to respectively different partitions. This partitioning allows the fetch operations for consecutively addressed words to be overlapped. In a four-way interleaved memory subsystem, for example, data for any four memory cells having consecutive addresses may be fetched or stored in parallel. Thus, the apparent memory cycle time of an interleaved memory is a fraction of the memory cycle time of the actual memory devices as long as a significant portion of memory operations access consecutive memory cells.

Partitioning does not speed up all memory accesses, however, since successive memory access requests which map onto the same memory partition will occur at the memory cycle time of the actual memory devices.

Another way in which the bandwidth of data transfers between the CPU and memory may be increased is to use high-performance devices in the memory subsystem. This method tends to be of limited utility since both the devices themselves and the hardware used to support them tend to increase the cost of the computer system without producing a proportional increase in performance.

High-performance memory devices of this type, however, are widely used in cache memory systems which interface between the CPU and the main memory subsystem. A cache memory attempts to take advantage of any tendencies toward temporal or spatial locality of reference in computer programs by fetching data from cells surrounding each memory cell accessed by the program. Data in these surrounding cells is held in a relatively small, high-speed memory local to the CPU until it is overwritten by data from other memory access requests. While this data is in the cache, it may be accessed by the CPU-without substantial delay.

While cache memories may tend to increase system performance for programs that frequently access data in relatively small data sets, they present problems when large volumes of data are accessed by the program controlling the CPU. For these programs, data values in the cache memory are continually being replaced to satisfy the requests issued by the CPU. Due to the operation of the cache, each of these requests produces multiple memory access requests which may delay subsequent requests. In addition, when new data to be added to the cache must replace some of the existing data, a data replacement algorithm is invoked to determine which of the existing data entries in the cache is to be replaced by the new data. This replacement algorithm may significantly delay memory accesses by the CPU or it may entail the use of relatively costly dedicated electronic devices.

SUMMARY OF THE INVENTION

The present invention is embodied in a computer system having a cache memory which provides an interface between a CPU and a main memory. The cache memory is coupled to the main memory by two queues, one for data to be fetched from the memory and one for data to be stored into the memory. The interface between the cache and main memories includes circuitry which selects between the next fetch request and the next store request for the next memory access to be performed.

According to one aspect of the invention, the selection circuitry is conditioned to prefer fetch requests to store requests.

According to another aspect of the invention, the main memory is partitioned and the selection circuitry compares the requested partitions of both the store and fetch requests to the partitions accessed by each of the last N memory requests, where N is an integer. The selection circuitry favors access requests to other partitions over requests to the partitions of these last N memory requests.

According to yet another aspect of the invention, the cache memory is organized as a two-way set associative memory and the data replacement algorithm takes into account such factors as impending replacement requests for either set of data values at the requested location.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system which includes an embodiment of the present invention.

FIG. 2 is a block diagram which shows details of the memory access unit (MAU) of the computer system shown in FIG. 1.

FIG. 3 is a block diagram which shows details of the server availability priority control (SAPC) circuitry used in the MAU circuitry shown in FIG. 2.

FIG. 4 is a block diagram which shows details of the cache memory circuitry of the computer system shown in FIG. 1.

FIG. 5 is a flow chart diagram which is useful for explaining the operation of the cache memory circuitry shown in FIG. 4.

FIGS. 6A and 6B are flow chart diagrams which are useful for explaining the operation of the circuitry shown in FIG. 2.

DETAILED DESCRIPTION OF THE INVENTION

Overview

FIG. 1 is a block diagram of a parallel-pipelined computer system in which multiple central processing units (CPU's) may be coupled to a main memory 130. Only one CPU 110 is shown in FIG. 1. The CPU 110 includes a code unit (CU) 112, an execution unit (EU) 114, a reference unit (RU) 116 and a cache memory 118. The code unit retrieves instructions from a main memory 130 and partially decodes them. Any memory references in the code processed by the code unit are handled by the RU 116. The data cache memory 118 of the CPU 110 is coupled to main memory 130 via a memory access unit (MAU) 120.

When the RU 116 attempts to resolve a memory reference from an instruction, it passes a virtual memory address to an address conversion unit (ACU) 117 which translates the virtual address into an absolute address. The ACU 117 then passes the absolute address to the cache memory 118. If the referenced address is found, the cache memory 118 either returns the referenced data item to the EU 114 or performs the store operation into the cache using data provided by the EU.

If, however, the referenced address is not found, the cache memory 118 conditions the MAU 120 to make an access request to the main memory 130. As shown in FIG. 1, the cache 118 may simultaneously make two types of access request, a memory fetch request via the bus FR and a memory store request via the bus SR. These parallel requests are analyzed by the MAU 120 to determine if either one would make better use of the memory resources. To this end, two tests are made by the MAU 120. First, the memory access unit 120 determines if either one of the access requests would use a memory partition that may be still busy handling a previous access request. If so, the non-interfering request is preferred over the potentially interfering request. The second test prefers memory fetch requests to memory store requests, that is to say, fetch requests are given a higher priority than store requests. This second test is only performed if the first test does not establish a preferred request.

The cache memory 118, used in this embodiment of the invention, is a two-way set associative memory. Consequently, each addressed location in the memory includes two four-word data sets. Whenever a new data request is made and both of its corresponding data sets are occupied, the cache memory 118 chooses one set to be replaced by the new data. The algorithm for choosing the set to be replaced avoids choosing a set for which a pending request already exists. If, however, both sets have pending requests, one of the sets is selected at random.

The exemplary cache memory 118 is a purgeless cache. Data values are stored in the cache memory 118 in four-word sets and the status of any four-word set is held in the main memory 130. Thus, if processor A fetches a four-word set from memory and intends to modify the data in the set, the cache status entry for the data in the main memory 130 will indicate that the data in the set is held by processor A exclusively. Any attempt by, for example, processor B to access the data in main memory will be unsuccessful until the modified data has been written back into the main memory 130 by processor A.

Alternatively, processor A may request data from memory which will not be modified. In this instance, the cache-status entry for the four-word set will indicate that processor A has the data in a shared state. Processor B may also access shared data but may not gain exclusive access to the data until the set has been invalidated in processor A's cache.

Access to data in the memory 130 is controlled by a demand scheme. A processor which needs to access data can cause the processor which currently holds the data in its associated cache to give it up. Thus, if a task running on processor A requests exclusive access to data which is held with share status by processor B, The memory 130 will send the data to processor A and simultaneously send a purge command for the addressed location to processor B. This command will cause processor B to mark the data set invalid in its cache. It is noted that the memory 130 issues purge commands which invalidate copies of data held in the various cache memories 118 coupled to the memory 130. This system has a reduced processing overhead relative to a system in which the cache 118 or the MAU 120 would be assigned the task of invalidating copies held by other cache memories.

When processor A requests share access to data which is held exclusively by processor B, the memory 130 cannot send the addressed data to the MAU 120 of processor A but, instead, sends a return code to processor B which causes processor B to write the data set back into memory. Once the data has been returned to memory and its cache status has been updated, the data may be sent to processor A.

DETAILED DESCRIPTION

FIG. 2 is a block diagram of the MAU 120 which couples the cache memory 118 to the main memory 130. As shown in FIG. 2, the main memory 130 is divided into two memory banks (BANK 0 and BANK 1). Each bank, in turn, is divided into six slots (M00 through M05 and M10 through M15, respectively) and each slot is divided into two partitions which are controlled by respective servers S0 and S1.

On the occurrence of a cache miss or when a data set in the cache is to be replaced, the cache 119 issues, respectively, a fetch request, through the cache fill list/memory reference table (CFL/MRT) 212, or a store request, through the replace queue (RQ) 210. As described below, in reference to FIG. 4, the CFL/MRT 212 and the RQ 210 are circular queues. The cache memory 118 places memory requests into each of these queues and the MAU 120 removes requests from the queues.

The next entries to be taken from the queues RQ and CFL/MRT are applied to the forward address translation (FAT) circuitry 214 and 216 respectively. The FAT circuitry 214 and 216 are coupled to a forward address map (FMAP) 218 which provides the translation from the absolute memory address held in the queue entry to the physical address of the data in the main memory 130. The translated addresses provided by the FATs 214 and 216 and data values to be stored into the main memory 130, provided by the RQ 210 are applied to respectively different data input ports of a multiplexer 220.

The operation of the MAU 120 is described with reference to FIGS. 2, 6A and 6B. As shown in FIG. 2, the multiplexer 220 and the RQ 210 and CFL/MRT 212 are controlled by signals provided by the server availability priority control (SAPC) circuitry 230. As set forth below in reference to FIG. 3, the SAPC 230, at step 252 of FIG. 6A, conditions the multiplexer 220 to select either the next store request from the RQ 210 or the next fetch request from the memory reference table 212 as the next memory operation to be performed. In addition, when a memory operation has been selected, the SAPC updates the read pointer in the appropriate one of RQ 210 or CFL/MRT 212 via a control bus CONT.

The output port of the multiplexer 220 is coupled to three registers, two memory operation control registers (MOCA 222 and MOCB 224) and an MAU output register (MO) 226. The registers 222, 226 and 224 are controlled by signals (not shown) provided by the SAPC circuitry 230. In response to these control signals, the address values provided by the FATs 214 and 216 is selectively loaded into the MO register 226 and the appropriate control bits are set in either MOCA 222 or MOCB 224, depending on which memory bank is to receive the request. The data values provided by the replace queue 210 are always loaded into the MAU output register 226. The output port of the register MOCA 222 is coupled to the address input port of each slot of BANK 0 while the output port of the register MOCB 224 is coupled to the address input port of each slot of BANK 1. The MO register 226 has two output ports, one coupled to the data bus of BANK 0 and one coupled to the data bus of BANK 1.

In response to a store request from the replace queue 210, the server availability priority control circuitry, at step 254 of FIG. 6A, first conditions the multiplexer 220 to pass the store address value provided by the FAT 214. Based on memory bank information contai