|
Claims  |
|
|
What is claimed is:
1. A computer system, comprising:
a system controller;
a main memory coupled to said system controller;
a data processor having a cache memory having N cache lines for storing N
data blocks, where N is an integer greater than 4, N master cache tags
(Etags), including one Etag for each said cache line in said cache memory,
and a writeback buffer for storing a dirty victim data block displaced
from said cache memory until it is written back into said main memory;
said Etag for each cache line storing an address index and an Etag state
value that indicates whether said data block stored in said cache line
includes data modified by said data processor;
said data processor including a master interface, coupled to said system
controller, for sending memory transaction requests to said system
controller, said memory transaction requests including read requests and
writeback requests; each memory transaction request specifying an address
for an associated data block to be read or written;
said master interface further including cache coherence logic for
responding to a cache miss on any cache line in said cache memory by (A)
generating a read request, and (B) when said cache miss requires a cache
line to be victimized and said victim cache line includes modified data,
according to the Etag state value in the corresponding Etag, storing the
data block having said modified data in said writeback buffer and
generating a writeback request;
said system controller including a set of N duplicate cache tags (Dtags),
each Dtag corresponding to one of said Etags and storing a Dtag state
value and the same address index as the corresponding Etag; said Dtag
state value indicating whether said data block stored in the corresponding
cache line includes data modified by said data processor;
said system controller further including an N+1th Dtag;
said system controller including memory transaction request logic for
processing each said memory transaction request by said data processor;
said system controller's memory transaction request logic including
writeback logic for processing said writeback request by writing the data
block in said writeback buffer into said main memory and invalidating the
state value in the corresponding Dtag;
said system controller's memory transaction request logic including read
logic for processing said read request by (A) identifying a victim cache
line, if any, in said cache memory and accessing the Dtag corresponding to
said victim cache line to determine whether processing said read request
will displace from said cache memory a data block that includes modified
data, (B) retrieving a data block from said main memory corresponding to
said read request and providing it to said data processor for storage in
said data processor's cache memory, (C) storing a Dtag state value and
address tag in the Dtag corresponding to said victim cache line when
processing said read request does not displace from said cache memory a
modified data block and when said corresponding Dtag's state value is
invalid, (D) storing said Dtag state value and address tag for said read
request in said N+1th Dtag when processing said read request does displace
from said cache memory a modified data block and said corresponding Dtag's
state value is not invalid, and (E) transferring said N+1th Dtag into said
Dtag corresponding to said victim cache line when said writeback logic
invalidates said Dtag state value in said corresponding Dtag; and
wherein said memory transaction request logic processes said read request
and writeback request such that processing of either of said read request
and writeback request may be completed prior to the other in accordance
with resource availability for processing said requests.
2. The computer system of claim 1,
each said read request including a DVP flag that has a first value when
said read request corresponds to a cache fill operation that displaces a
modified data block from said cache memory, said data block displacement
being represented by said writeback request; said DVP flag having a second
value, distinct from said first value, when said read request corresponds
to a cache fill operation that does not displace a modified data block
from said cache memory; and
said transaction request logic including logic for storing a Dtag state
value and address tag in the Dtag corresponding to said victim cache line
when DVP flag in said read request has said second value and when said
corresponding Dtag's state value is invalid, and storing said Dtag state
value and address tag for said read request in said N+1th Dtag when said
DVP flag in said read request has said first value and said corresponding
Dtag's state value is not invalid.
3. The computer system of claim 1,
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Exclusive
Clean (E), Shared Clean (S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Shared Clean
(S), and Invalid (I); and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
4. The computer system of claim 1,
wherein said main memory is a reflective memory;
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Exclusive Clean (E), Shared Clean
(S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Clean (S), and Invalid (I);
and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
5. A computer system, comprising:
a system controller;
a main memory coupled to said system controller;
a data processor having a cache memory having N cache lines for storing N
data blocks, where N is an integer greater than 4, N master cache tags
(Etags), including one Etag for each said cache line in said cache memory,
and a writeback buffer for storing a dirty victim data block displaced
from said cache memory until it is written back into said main memory;
said Etag for each cache line storing an address index and an Etag state
value that indicates whether said data block stored in said cache line
includes data modified by said data processor;
said data processor including a master interface, coupled to said system
controller, for sending memory transaction requests to said system
controller, said master interface including at least two parallel outgoing
request queues for storing memory transaction requests to be sent to said
system controller; said memory transaction requests including read
requests and writeback requests; each memory transaction request
specifying an address for an associated data block to be read or written;
said master interface further including cache coherence logic for
responding to a cache miss on any cache line in said cache memory by (A)
storing a read request in a first one of said outgoing request queues, and
(B) when said cache miss requires a cache line to be victimized and said
victim cache line, according to the Etag state value in the corresponding
Etag, includes modified data, storing the data block having said modified
data in said writeback buffer and storing a writeback request in a second
one of said outgoing request queues;
said system controller including a set of N duplicate cache tags (Dtags),
each Dtag corresponding to one of said Etags and storing a Dtag state
value and the same address index as the corresponding Etag; said Dtag
state value indicating whether said data block stored in the corresponding
cache line includes data modified by said data processor;
said system controller further including an N+1th Dtag;
said system controller including memory transaction request logic for
processing each said memory transaction request by said data processor;
said system controller's memory transaction request logic including
writeback logic for processing said writeback request by writing the data
block in said writeback buffer into said main memory and invalidating said
state value in the corresponding Dtag;
said system controller's memory transaction request logic including read
logic for processing said read request by (A) identifying a victim cache
line in said cache memory, if any, and accessing the Dtag corresponding to
said victim cache line to determine whether processing said read request
will displace from said cache memory a data block that includes modified
data, (B) retrieving a data block from said main memory corresponding to
said read request and providing it to said data processor for storage in
said data processor's cache memory, (C) storing a Dtag state value and
address tag in the Dtag corresponding to said victim cache line when
processing said read request does not displace from said cache memory a
modified data block and when the Dtag state value corresponding to the
victim cache line is invalid, (D) storing said Dtag state value and
address tag for said retrieved data block in said N+1th Dtag when
processing said read request does displace from said cache memory a
modified data block and said corresponding Dtag's state value is not
invalid, and (E) transferring said N+1th Dtag into said Dtag corresponding
to said victim cache line when said writeback logic invalidates said Dtag
state value in said corresponding Dtag;
wherein said memory transaction request logic processes said read request
and writeback request such that processing of either of said read request
and writeback request may be completed prior to the other in accordance
with resource availability for processing said requests.
6. The computer system of claim 5,
each said read request including a DVP flag that has a first value when
said read request corresponds to a cache fill operation that displaces a
modified data block from said cache memory, said data block displacement
being represented by said writeback request; said DVP flag having a second
value, distinct from said first value, when said read request corresponds
to a cache fill operation that does not displace a modified data block
from said cache memory; and
said transaction request logic including logic for storing a Dtag state
value and address tag in the Dtag corresponding to said victim cache line
when DVP flag in said read request has said second value and when said
corresponding Dtag's state value is invalid, and storing said Dtag state
value and address tag for said read request in said N+1th Dtag when said
DVP flag in said read request has said first value and said corresponding
Dtag's state value is not invalid.
7. The computer system of claim 5,
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Exclusive
Clean (E), Shared Clean (S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Shared Clean
(S), and Invalid (I); and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
8. The computer system of claim 5,
wherein said main memory is a reflective memory;
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Exclusive Clean (E), Shared Clean
(S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Clean (S), and Invalid (I);
and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
9. A method for parallelizing writeback and read transactions in a packet
switched cache coherent multiprocessor system having a system controller
coupled to a main memory and to a data processor having a cache memory
comprising the steps of:
storing master cache tags (Etags) in said data processor, including one
Etag for each cache line in said cache memory, said Etag for each cache
line storing an address index and an Etag state value that indicates
whether a data block stored in said cache line includes data modified by
said data processor;
storing in a writeback buffer of said data processor a dirty victim data
block displaced from said cache memory until it is written back into said
main memory;
storing a set of N duplicate tags (Dtags) for said cache memory in said
system controller, each Dtag corresponding to one of said Etags including
a Dtag state value and the same address index as the corresponding Etag;
said Dtag state value indicating whether said data block stored in the
corresponding cache line includes data modified by said data processor;
sending memory transaction requests from said data processor to said system
controller, said memory transaction requests including read requests and
writeback requests;
responding to a cache miss in said cache memory by (A) generating a read
request, and (B) when said cache miss requires victimizing a data block
that, according to the Etag state value in a corresponding Etag, includes
modified data, storing the data block having said modified data in a
writeback buffer and generating a writeback request;
processing writeback requests by writing the data block in said writeback
buffer into said main memory and invalidating the state value in the
corresponding Dtag; and
processing said read request by:
(A) identifying a victim cache line in said cache memory, if any, and
accessing the Dtag corresponding to said victim cache line to determine
whether processing said read request will displace from said cache memory
a data block that includes modified data;
(B) retrieving a data block from said main memory corresponding to said
read request and providing it to said data processor for storage in said
data processor's cache memory at said victim cache line;
(C) storing a Dtag state value and address tag in the Dtag corresponding to
said victim cache line when processing said read request does not displace
from said cache memory a modified data block and when said corresponding
Dtag's state value is invalid;
(D) storing said Dtag state value and address tag for said retrieved data
block in a N+1th Dtag when processing said read request does displace from
said cache memory a modified data block and said corresponding Dtag's
state value is not invalid; and
(E) transferring said N+1th Dtag into said Dtag corresponding to said
victim cache line when said writeback processing step invalidates said
Dtag state value in said corresponding Dtag;
wherein memory transaction request logic processes said read request and
writeback request such that processing of either of said read request and
writeback request may be completed prior to the other in accordance with
resource availability for processing said requests.
10. The method of claim 9,
each said read request including a DVP flag that has a first value when
said read request corresponds to a cache fill operation that displaces a
modified data block from said cache memory, said data block displacement
being represented by said writeback request; said DVP flag having a second
value, distinct from said first value, when said read request corresponds
to a cache fill operation that does not displace a modified data block
from said cache memory; and
read request processing step including storing a Dtag state value and
address tag in the Dtag corresponding to said victim cache line when DVP
flag in said read request has said second value and when said
corresponding Dtag's state value is invalid, and storing said Dtag state
value and address tag for said read request in said N+1th Dtag when said
DVP flag in said read request has said first value and said corresponding
Dtag's state value is not invalid.
11. The method of claim 9,
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Exclusive
Clean (E), Shared Clean (S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Shared Clean
(S), and Invalid (I); and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
12. The method system of claim 9,
wherein said main memory is a reflective memory;
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Exclusive Clean (E), Shared Clean
(S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Clean (S), and Invalid (I);
and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
13. A method for parallelizing writeback and read transactions in a packet
switched cache coherent multiprocessor system having a system controller
coupled to a main memory and to a data processor having a cache memory
comprising the steps of:
storing master cache tags (Etags) in said data processor, including N
Etags, one Etag for each cache line in said cache memory, said Etag for
each cache line storing an address index and an Etag state value that
indicates whether a data block stored in said cache line includes data
modified by said data processor;
storing in a writeback buffer of said data processor a dirty victim data
block displaced from said cache memory until it is written back into said
main memory;
storing duplicate tags (Dtags) for said cache memory in said system
controller;
responding to a cache miss in said cache memory by (A) generating a read
request, and (B) when said cache miss requires victimizing a cache line
containing a data block that, according to the Etag state value in the
corresponding Etag, includes modified data, storing said data block having
said modified data in a writeback buffer and generating a writeback
request;
processing said writeback requests by writing the data block in said
writeback buffer into said main memory and invalidating the state value in
the corresponding Dtag; and
processing said read request by:
(A) identifying a victim cache line in said cache memory, if any, and
accessing the Dtag corresponding to said victim cache line to determine
whether processing said read request will displace from said cache memory
a data block that includes modified data;
(B) retrieving a data block from said main memory corresponding to said
read request and providing it to said data processor for storage in said
data processor's cache memory;
(C) storing a Dtag state value and address tag in the Dtag corresponding to
said victim cache line when processing said read request does not displace
from said cache memory a modified data block and when said corresponding
Dtag's state value is invalid;
(D) storing said Dtag state value and address tag for said retrieved data
block in a N+1th Dtag when processing said read request does displace from
said cache memory a modified data block and said corresponding Dtag's
state value is not invalid; and
(E) transferring said N+1th Dtag into said Dtag corresponding to said
victim cache line when said writeback processing step invalidates said
Dtag state value in said corresponding Dtag;
wherein memory transaction request logic processes said read request and
writeback request such that processing of either of said read request and
writeback request may be completed prior to the other in accordance with
resource availability for processing said requests.
14. The method of claim 13,
each said read request including a DVP flag that has a first value when
said read request corresponds to a cache fill operation that displaces a
modified data block from said cache memory, said data block displacement
being represented by said writeback request; said DVP flag having a second
value, distinct from said first value, when said read request corresponds
to a cache fill operation that does not displace a modified data block
from said cache memory; and
read request processing step including storing a Dtag state value and
address tag in the Dtag corresponding to said victim cache line when DVP
flag in said read request has said second value and when said
corresponding Dtag's state value is invalid, and storing said Dtag state
value and address tag for said read request in said N+1th Dtag when said
DVP flag in said read request has said first value and said corresponding
Dtag's state value is not invalid.
15. The method of claim 13,
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Exclusive
Clean (E), Shared Clean (S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Modified (O), Shared Clean
(S), and Invalid (I); and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state.
16. The method of claim 13,
wherein said main memory is a reflective memory;
said Etag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Exclusive Clean (E), Shared Clean
(S), and Invalid (I);
said Dtag state being selected from the set of states consisting
essentially of: Exclusive Modified (M), Shared Clean (S), and Invalid (I);
and
wherein said Dtag state stored in said Dtags never indicates said Exclusive
Clean (E) state and when each data processor modifies data stored in its
cache memory in a cache line whose Etag thereby transitions from said E
state to said M state, said data processor does not generate a
corresponding transaction request and the corresponding Dtag remains
unchanged with a Dtag state equal to said M state. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
The present invention relates generally to multiprocessor computer systems
in which the processors share memory resources, and particularly to a
multiprocessor computer system that utilizes an interconnect architecture
and cache coherence methodology to minimize memory access latency by
parallelizing read and writeback transactions for improved system
throughput.
BACKGROUND OF THE INVENTION
The need to maintain "cache coherence" in multiprocessor systems is well
known. Maintaining "cache coherence" means, at a minimum, that whenever
data is written into a specified location in a shared address space by one
processor, the caches for any other processors which store data for the
same address location are either invalidated, or updated with the new
data.
There are two primary system architectures used for maintaining cache
coherence. One, herein called the cache snoop architecture, requires that
each data processor's cache include logic for monitoring a shared address
bus and various control lines so as to detect when data in shared memory
is being overwritten with new data, determining whether its data
processor's cache contains an entry for the same memory location, and
updating its cache contents and/or the corresponding cache tag when data
stored in the cache is invalidated by another processor. Thus, in the
cache snoop architecture, every data processor is responsible for
maintaining its own cache in a state that is consistent with the state of
the other caches.
In a second cache coherence architecture, herein called the memory
directory architecture, main memory includes a set of status bits for
every block of data that indicate which data processors, if any, have the
data block stored in cache. The main memory's status bits may store
additional information, such as which processor is considered to be the
"owner" of the data block if the cache coherence architecture requires
storage of such information.
In these cache coherence architectures, read-writeback transaction pairs
arise when a read miss requires victimizing a cache line which has
modified data, thereby necessitating a writeback to main memory. In the
prior art, these transactions normally are strictly ordered, with the
victimizing read transaction executing prior to the writeback transaction
in order to allow the requesting processor to receive the data right away.
In addition to the strict ordering, cache coherence architectures of the
prior art required these read and writeback transactions be sequentially
executed, not allowing for any other coherent transactions to be executed
from the same processor between the read and the writeback transactions,
even when transactions are directed to a different cache index.
Accordingly, an architecture which supported parallelized transactions
would provide reduced latency in processing the individual read-writeback
transaction pairs along with an improvement in the overall transaction
throughput.
SUMMARY OF THE INVENTION
In summary, the present invention is a multiprocessor computer system that
has a multiplicity of sub-systems and a main memory coupled to a system
controller. An interconnect module, interconnects the main memory and
sub-systems in accordance with interconnect control signals received from
the system controller.
All of the sub-systems include a port that transmits and receives data as
data packets of a fixed size. At least two of the sub-systems are data
processors, each having a respective cache memory that stores multiple
blocks of data and a set of master cache tags (Etags), including one cache
tag for each data block stored by the cache memory.
Each data processor includes a master interface having master classes for
sending memory transaction requests to the system controller and for
receiving cache access requests from the system controller corresponding
to memory transaction requests by other ones of the data processors. The
master classes allow for the simultaneous launching of read and writeback
transactions. The system controller includes memory transaction request
logic for processing each memory transaction request by a data processor,
for determining which one of the cache memories and main memory to couple
to the requesting data processor, for sending corresponding interconnect
control signals to the interconnect module so as to couple the requesting
data processor to the determined one of the cache memories and main
memory, and for sending a reply message to the requesting data processor
to prompt the requesting data processor to transmit or receive one data
packet to or from the determined one of the cache memories and main
memory.
The system controller maintains a set of duplicate cache tags (Dtags) for
each of the data processors, the set of duplicate cache tags for each data
processor having an equal number of cache tags as the corresponding set of
master cache tags. Each master cache tag denotes a master cache state and
an address tag; the duplicate cache tag corresponding to each master cache
tag denotes a second cache state and the same address tag as the
corresponding master cache tag.
The system controller includes further includes logic for executing a
read-writeback pair of transactions in parallel, including an Nth+1 Dtag
and a transient writeback buffer for each data processor. The Nth+1 Dtag
for each processor stores the cache state and address tag of the cache
line associated with a read transaction which is executed prior to an
associated writeback transaction of a read-writeback transaction pair. The
system controller contains Dtag update logic for transferring the Dtag
value stored in the Nth+1 Dtag entry to its proper Dtag location upon the
execution of the associated writeback transaction.
The writeback buffer in each data processor stores the data block
previously stored in a victimized cache line until the associated
writeback transaction is completed. Accordingly, upon a cache miss, the
interconnect may execute the read and writeback transactions in parallel
relying on the transient writeback buffer and the Nth+1 Dtag entry to
accommodate any ordering of the transactions. As a result, read request
and writeback request of a read-writeback transaction pair are processed
such that processing of either of said read request and writeback request
may be completed prior to the other in accordance with resource
availability for processing those requests. For instance, if the read and
writeback transactions reference two different main memory banks, one of
those memory banks may be busy while the other is available for immediate
use. Thus, using the present invention the transaction which references
the available bank memory will be processed first, regardless of whether
that transaction is the read transaction or the writeback transaction.
This is in direct contrast with other systems in which read-writeback
pairs are handled in a fixed order, and thus do not make optimal use of
system resources.
BRIEF DESCRIPTION OF THE DRAWINGS
Additional objects and features of the invention will be more readily
apparent from the following detailed description and appended claims when
taken in conjunction with the drawings, in which:
FIG. 1 is a block diagram of a computer system incorporating the present
invention.
FIG. 2 is a block diagram of a computer system showing the data bus and
address bus configuration used in one embodiment of the present invention.
FIG. 3 depicts the signal lines associated with a port in a preferred
embodiment of the present invention.
FIG. 4 is a block diagram of the interfaces and port ID register found in a
port in a preferred embodiment of the present invention.
FIG. 5 is a block diagram of a computer system incorporating the present
invention, depicting request and data queues used while performing data
transfer transactions.
FIG. 6 is a block diagram of the System Controller Configuration register
used in a preferred embodiment of the present invention.
FIG. 7 is a block diagram of a caching UPA master port and the cache
controller in the associated UPA module.
FIGS. 8, 8A, 8B, 8C, and 8D show a simplified flow chart of typical
read/write data flow transactions in a preferred embodiment of the present
invention.
FIG. 9 depicts the writeback buffer and Dtag Transient Buffers used for
handling coherent cache writeback operations.
FIGS. 10A-10E shows the data packet formats for various transaction request
packets.
FIG. 11 is a state transition diagram of the cache tag line states for each
cache entry in an Etag array in a preferred embodiment of the present
invention.
FIG. 12 is a state transition diagram of the cache tag line states for each
cache entry in | | |