|
|
|
| United States Patent | 5590379 |
| Link to this page | http://www.wikipatents.com/5590379.html |
| Inventor(s) | Hassler; Joseph A. (West Chester, PA);
Deal; Gregory K. (West Chester, PA);
Koss; Timothy A. (Pottstown, PA);
Heil; Stephen F. (West Chester, PA) |
| Abstract | A two-way set associative cache memory system for a parallel-pipelined
computer system uses separate queue structures to hold main memory fetch
and store requests generated by the central processing unit (CPU). A
memory access unit, coupled between the cache memory system and the CPU
selects the next request to be processed by the main memory from between
the requests at the heads of the fetch and store queues. The request at
the head of the fetch queue is preferred over the request at the head of
the store queue unless the memory partition to be used by the fetch
request is still busy with a previous request while the partition to be
used by the store request is idle. Data retrieved from the main memory
replaces data in the cache according to an algorithm that prefers empty
pages within a set to pages that contain data and prefers pages that do
not have pending update requests scheduled to pages that do have pending
update requests scheduled. In the event that only pages having pending
update requests are found, input requests to the cache are inhibited until
at least one fetch request for a page in the set is completed and the page
is no longer marked as having a pending update request. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5590379 |
|
|
Method and apparatus for cache memory access with separate fetch and
store queues |
|
|
|
|
|
| Publication Date |
December 31, 1996 |
|
|
|
|
|
| Filing Date |
March 14, 1994 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This application is a divisional of Ser. No. 013,254, filed Feb. 3, 1993,
now U.S. Pat. No. 5,450,564, which is a continuation of Ser. No. 518,911,
filed May 4, 1990, abandoned. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 5379379 Becker 711/3 Jan,1995 |      Your vote accepted [0 after 0 votes] | | 5185871 Frey 712/205 Feb,1993 |      Your vote accepted [0 after 0 votes] | | 5133522 Testart 246/468 Jul,1992 |      Your vote accepted [0 after 0 votes] | | 5123095 Papadopoulos 712/218 Jun,1992 |      Your vote accepted [0 after 0 votes] | | 4901233 Liptay 712/218 Feb,1990 |      Your vote accepted [0 after 0 votes] | | 4872111 Daberkow 711/140 Oct,1989 |      Your vote accepted [0 after 0 votes] | | 4870569 Nakatani 711/151 Sep,1989 |      Your vote accepted [0 after 0 votes] | | 4870565 Yamamoto 711/113 Sep,1989 |      Your vote accepted [0 after 0 votes] | | 4853603 Onoue 700/262 Aug,1989 |      Your vote accepted [0 after 0 votes] | | 4853846 Johnson 710/300 Aug,1989 |      Your vote accepted [0 after 0 votes] | | 4843543 Isobe 711/148 Jun,1989 |      Your vote accepted [0 after 0 votes] | | 4839791 Ito
Jun,1989 |      Your vote accepted [0 after 0 votes] | | 4814981 Rubinfeld 711/144 Mar,1989 |      Your vote accepted [0 after 0 votes] | | 4797813 Igarashi 711/118 Jan,1989 |      Your vote accepted [0 after 0 votes] | | 4794521 Ziegler 711/130 Dec,1988 |      Your vote accepted [0 after 0 votes] | | 4757440 Scheuneman 714/53 Jul,1988 |      Your vote accepted [0 after 0 votes] | | 4755936 Stewart 711/118 Jul,1988 |      Your vote accepted [0 after 0 votes] | | 4745545 Schiffleger
May,1988 |      Your vote accepted [0 after 0 votes] | | 4742446 Morioka 711/144 May,1988 |      Your vote accepted [0 after 0 votes] | | 4736287 Druke 711/128 Apr,1988 |      Your vote accepted [0 after 0 votes] | | 4707784 Ryan 711/140 Nov,1987 |      Your vote accepted [0 after 0 votes] | | 4682284 Schrofer 710/55 Jul,1987 |      Your vote accepted [0 after 0 votes] | | 4648030 Bomba 711/141 Mar,1987 |      Your vote accepted [0 after 0 votes] | | 4636946 Hartung 711/136 Jan,1987 |      Your vote accepted [0 after 0 votes] | | 4583166 Hartung 711/113 Apr,1986 |      Your vote accepted [0 after 0 votes] | | 4533996 Hartung 710/3 Aug,1985 |      Your vote accepted [0 after 0 votes] | | 4430701 Christian 711/119 Feb,1984 |      Your vote accepted [0 after 0 votes] | | 4403288 Christian 710/5 Sep,1983 |      Your vote accepted [0 after 0 votes] | | 5134561 Liptay 711/164 Dec,1969 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
The invention claimed is:
1. In a computer system having a main memory and a central processing unit
(CPU) including an N-way set associative cache memory system with each
data set including M data values and being stored in a respectively
different one of N pages of said cache memory, where M and N are integers,
said cache memory system including a list of data sets reserved for
pending operations, a method of selecting which data set of an associated
plurality of N data sets, constituting a group of candidates for
replacement, is to be replaced by a set of data to be retrieved from the
main memory, said method comprising the steps of:
determining if any of the N page entries for the N data sets in the
associated group of data sets is marked for a deferred replace operation;
if a selected one of the N pages entries is found to be marked as being
subject to a deferred replace operation, removing the data set scored in
the selected page from the group of candidates for replacement;
comparing the page entries to the list of reserved data sets to determine
if the page entries for all of the candidates for replacement are reserved
for pending operations;
if one of the page entries for the candidates for replacement is not
reserved for pending operations, selecting for replacement said candidate
for replacement which is not reserved; and,
if all of the page entries for the candidates for replacement are reserved
for pending operations, selecting one of said candidates for replacement
depending on whether each of said pending operations is a store operation
or a fetch operation, and marking the page entry corresponding to the
selected candidate data set as being subject to a deferred replace
operation.
2. The method of selecting a data set to be replaced set forth in claim 1,
wherein said deferred replace operation includes the steps of:
scheduling said cache memory system to transmit the M data values in the
selected data set to main memory when the status of the selected data set
is no longer reserved for a previous request; and
changing the status of the selected data set to be reserved for said
deferred replace operation.
3. A method in accordance with claim 1, in which the step of selecting one
of said candidates when all of the page entries for the candidates for
selection are reserved for pending operations includes the steps of:
if all of the page entries for the candidates for selection are reserved
for pending snore operations, and one of said candidates for selection has
an active fetch request, selecting another of said candidates selection
which does not have an active fetch request;
if all of the page entries for the candidates for selection are reserved
for pending fetch operations, and at least one of said candidates for
selection is empty, selecting one of said candidates for selection which
is empty; and
if all of the page entries for the candidates for selection are reserved
for pending fetch operations, and none of said candidates for selection is
empty, selecting any one of said candidates for selection. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
This invention relates to cache memory systems for use in a
parallel-pipelined computer system and in particular to such a cache
memory having an efficient interface with main memory.
A conventional computer system includes a central processing unit (CPU) a
memory subsystem and an input/output (I/O) subsystem. In some computer
systems, the principle connection among these three entities is a single
bidirectional bus. In other systems, the CPU and memory are connected by
one bus while the I/O subsystem is connected to the memory by another bus.
It is well known that, in either configuration, the performance of the
system is at least partly dependent on the speed at which data can be
transferred between the CPU and the memory.
In general terms, this problem has been addressed by increasing the
"bandwidth" of the bus. As used herein, the bandwidth of a bus is a
measure of the amount of data which can be transferred over the bus in a
unit time interval.
There are, however, many ways in which the apparent bandwidth of the memory
bus may be increased. One method is to provide a wider bus, that is to say
one which conveys a greater number of bits in parallel between the CPU and
the memory. Another method is to provide faster memory. This may be done
either by using higher performance memory components or by partitioning
the memory so that multiple memory fetch operations may proceed in
parallel.
Interleaving is a particular type of memory partitioning. In an interleaved
memory subsystem, memory cells having successive addresses are assigned to
respectively different partitions. This partitioning allows the fetch
operations for consecutively addressed words to be overlapped. In a
four-way interleaved memory subsystem, for example, data for any four
memory cells having consecutive addresses may be fetched or stored in
parallel. Thus, the apparent memory cycle time of an interleaved memory is
a fraction of the memory cycle time of the actual memory devices as long
as a significant portion of memory operations access consecutive memory
cells.
Partitioning does not speed up all memory accesses, however, since
successive memory access requests which map onto the same memory partition
will occur at the memory cycle time of the actual memory devices.
Another way in which the bandwidth of data transfers between the CPU and
memory may be increased is to use high-performance devices in the memory
subsystem. This method tends to be of limited utility since both the
devices themselves and the hardware used to support them tend to increase
the cost of the computer system without producing a proportional increase
in performance.
High-performance memory devices of this type, however, are widely used in
cache memory systems which interface between the CPU and the main memory
subsystem. A cache memory attempts to take advantage of any tendencies
toward temporal or spatial locality of reference in computer programs by
fetching data from cells surrounding each memory cell accessed by the
program. Data in these surrounding cells is held in a relatively small,
high-speed memory local to the CPU until it is overwritten by data from
other memory access requests. While this data is in the cache, it may be
accessed by the CPU-without substantial delay.
While cache memories may tend to increase system performance for programs
that frequently access data in relatively small data sets, they present
problems when large volumes of data are accessed by the program
controlling the CPU. For these programs, data values in the cache memory
are continually being replaced to satisfy the requests issued by the CPU.
Due to the operation of the cache, each of these requests produces
multiple memory access requests which may delay subsequent requests. In
addition, when new data to be added to the cache must replace some of the
existing data, a data replacement algorithm is invoked to determine which
of the existing data entries in the cache is to be replaced by the new
data. This replacement algorithm may significantly delay memory accesses
by the CPU or it may entail the use of relatively costly dedicated
electronic devices.
SUMMARY OF THE INVENTION
The present invention is embodied in a computer system having a cache
memory which provides an interface between a CPU and a main memory. The
cache memory is coupled to the main memory by two queues, one for data to
be fetched from the memory and one for data to be stored into the memory.
The interface between the cache and main memories includes circuitry which
selects between the next fetch request and the next store request for the
next memory access to be performed.
According to one aspect of the invention, the selection circuitry is
conditioned to prefer fetch requests to store requests.
According to another aspect of the invention, the main memory is
partitioned and the selection circuitry compares the requested partitions
of both the store and fetch requests to the partitions accessed by each of
the last N memory requests, where N is an integer. The selection circuitry
favors access requests to other partitions over requests to the partitions
of these last N memory requests.
According to yet another aspect of the invention, the cache memory is
organized as a two-way set associative memory and the data replacement
algorithm takes into account such factors as impending replacement
requests for either set of data values at the requested location.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a computer system which includes an embodiment
of the present invention.
FIG. 2 is a block diagram which shows details of the memory access unit
(MAU) of the computer system shown in FIG. 1.
FIG. 3 is a block diagram which shows details of the server availability
priority control (SAPC) circuitry used in the MAU circuitry shown in FIG.
2.
FIG. 4 is a block diagram which shows details of the cache memory circuitry
of the computer system shown in FIG. 1.
FIG. 5 is a flow chart diagram which is useful for explaining the operation
of the cache memory circuitry shown in FIG. 4.
FIGS. 6A and 6B are flow chart diagrams which are useful for explaining the
operation of the circuitry shown in FIG. 2.
DETAILED DESCRIPTION OF THE INVENTION
Overview
FIG. 1 is a block diagram of a parallel-pipelined computer system in which
multiple central processing units (CPU's) may be coupled to a main memory
130. Only one CPU 110 is shown in FIG. 1. The CPU 110 includes a code unit
(CU) 112, an execution unit (EU) 114, a reference unit (RU) 116 and a
cache memory 118. The code unit retrieves instructions from a main memory
130 and partially decodes them. Any memory references in the code
processed by the code unit are handled by the RU 116. The data cache
memory 118 of the CPU 110 is coupled to main memory 130 via a memory
access unit (MAU) 120.
When the RU 116 attempts to resolve a memory reference from an instruction,
it passes a virtual memory address to an address conversion unit (ACU) 117
which translates the virtual address into an absolute address. The ACU 117
then passes the absolute address to the cache memory 118. If the
referenced address is found, the cache memory 118 either returns the
referenced data item to the EU 114 or performs the store operation into
the cache using data provided by the EU.
If, however, the referenced address is not found, the cache memory 118
conditions the MAU 120 to make an access request to the main memory 130.
As shown in FIG. 1, the cache 118 may simultaneously make two types of
access request, a memory fetch request via the bus FR and a memory store
request via the bus SR. These parallel requests are analyzed by the MAU
120 to determine if either one would make better use of the memory
resources. To this end, two tests are made by the MAU 120. First, the
memory access unit 120 determines if either one of the access requests
would use a memory partition that may be still busy handling a previous
access request. If so, the non-interfering request is preferred over the
potentially interfering request. The second test prefers memory fetch
requests to memory store requests, that is to say, fetch requests are
given a higher priority than store requests. This second test is only
performed if the first test does not establish a preferred request.
The cache memory 118, used in this embodiment of the invention, is a
two-way set associative memory. Consequently, each addressed location in
the memory includes two four-word data sets. Whenever a new data request
is made and both of its corresponding data sets are occupied, the cache
memory 118 chooses one set to be replaced by the new data. The algorithm
for choosing the set to be replaced avoids choosing a set for which a
pending request already exists. If, however, both sets have pending
requests, one of the sets is selected at random.
The exemplary cache memory 118 is a purgeless cache. Data values are stored
in the cache memory 118 in four-word sets and the status of any four-word
set is held in the main memory 130. Thus, if processor A fetches a
four-word set from memory and intends to modify the data in the set, the
cache status entry for the data in the main memory 130 will indicate that
the data in the set is held by processor A exclusively. Any attempt by,
for example, processor B to access the data in main memory will be
unsuccessful until the modified data has been written back into the main
memory 130 by processor A.
Alternatively, processor A may request data from memory which will not be
modified. In this instance, the cache-status entry for the four-word set
will indicate that processor A has the data in a shared state. Processor B
may also access shared data but may not gain exclusive access to the data
until the set has been invalidated in processor A's cache.
Access to data in the memory 130 is controlled by a demand scheme. A
processor which needs to access data can cause the processor which
currently holds the data in its associated cache to give it up. Thus, if a
task running on processor A requests exclusive access to data which is
held with share status by processor B, The memory 130 will send the data
to processor A and simultaneously send a purge command for the addressed
location to processor B. This command will cause processor B to mark the
data set invalid in its cache. It is noted that the memory 130 issues
purge commands which invalidate copies of data held in the various cache
memories 118 coupled to the memory 130. This system has a reduced
processing overhead relative to a system in which the cache 118 or the MAU
120 would be assigned the task of invalidating copies held by other cache
memories.
When processor A requests share access to data which is held exclusively by
processor B, the memory 130 cannot send the addressed data to the MAU 120
of processor A but, instead, sends a return code to processor B which
causes processor B to write the data set back into memory. Once the data
has been returned to memory and its cache status has been updated, the
data may be sent to processor A.
DETAILED DESCRIPTION
FIG. 2 is a block diagram of the MAU 120 which couples the cache memory 118
to the main memory 130. As shown in FIG. 2, the main memory 130 is divided
into two memory banks (BANK 0 and BANK 1). Each bank, in turn, is divided
into six slots (M00 through M05 and M10 through M15, respectively) and
each slot is divided into two partitions which are controlled by
respective servers S0 and S1.
On the occurrence of a cache miss or when a data set in the cache is to be
replaced, the cache 119 issues, respectively, a fetch request, through the
cache fill list/memory reference table (CFL/MRT) 212, or a store request,
through the replace queue (RQ) 210. As described below, in reference to
FIG. 4, the CFL/MRT 212 and the RQ 210 are circular queues. The cache
memory 118 places memory requests into each of these queues and the MAU
120 removes requests from the queues.
The next entries to be taken from the queues RQ and CFL/MRT are applied to
the forward address translation (FAT) circuitry 214 and 216 respectively.
The FAT circuitry 214 and 216 are coupled to a forward address map (FMAP)
218 which provides the translation from the absolute memory address held
in the queue entry to the physical address of the data in the main memory
130. The translated addresses provided by the FATs 214 and 216 and data
values to be stored into the main memory 130, provided by the RQ 210 are
applied to respectively different data input ports of a multiplexer 220.
The operation of the MAU 120 is described with reference to FIGS. 2, 6A and
6B. As shown in FIG. 2, the multiplexer 220 and the RQ 210 and CFL/MRT 212
are controlled by signals provided by the server availability priority
control (SAPC) circuitry 230. As set forth below in reference to FIG. 3,
the SAPC 230, at step 252 of FIG. 6A, conditions the multiplexer 220 to
select either the next store request from the RQ 210 or the next fetch
request from the memory reference table 212 as the next memory operation
to be performed. In addition, when a memory operation has been selected,
the SAPC updates the read pointer in the appropriate one of RQ 210 or
CFL/MRT 212 via a control bus CONT.
The output port of the multiplexer 220 is coupled to three registers, two
memory operation control registers (MOCA 222 and MOCB 224) and an MAU
output register (MO) 226. The registers 222, 226 and 224 are controlled by
signals (not shown) provided by the SAPC circuitry 230. In response to
these control signals, the address values provided by the FATs 214 and 216
is selectively loaded into the MO register 226 and the appropriate control
bits are set in either MOCA 222 or MOCB 224, depending on which memory
bank is to receive the request. The data values provided by the replace
queue 210 are always loaded into the MAU output register 226. The output
port of the register MOCA 222 is coupled to the address input port of each
slot of BANK 0 while the output port of the register MOCB 224 is coupled
to the address input port of each slot of BANK 1. The MO register 226 has
two output ports, one coupled to the data bus of BANK 0 and one coupled to
the data bus of BANK 1.
In response to a store request from the replace queue 210, the server
availability priority control circuitry, at step 254 of FIG. 6A, first
conditions the multiplexer 220 to pass the store address value provided by
the FAT 214. Based on memory bank information contai | | |