A non-uniform memory access (NUMA) computer system includes a node interconnect to which a remote node and a home node are coupled. The home node contains a home system memory, and the remote node includes at least one processing unit and a cache. In response to the cache deallocating an unmodified cache line that corresponds to data resident in the home system memory, a cache controller of the cache issues a deallocate operation on a local interconnect of the remote node. In one embodiment, the deallocate operation is further transmitted to the home node via the node interconnect only in response to an indication, such as a combined response, that no other cache in the remote node caches the cache line. In response to receipt of the deallocate operation, a memory controller in the home node updates a local memory directory associated with the home system memory to indicate that the remote node does not hold a copy of the cache line.
CROSS-REFERENCE TO RELATED APPLICATIONS
The present application is related to the following co-pending applications, which are filed of even date herewith, assigned to the assignee of the present application and incorporated herein by reference: (1) U.S. patent application Ser. No. 09/885,992 (2) U.S. patent application Ser. No. 09/885,996 (3) U.S. patent application Ser. No. 09/885,994 (4) U.S. patent application Ser. No. 09/886,000 (5) U.S. patent application Ser. No. 09/885,991 (6) U.S. patent application Ser. No. 09/885,998 (7) U.S. patent application Ser. No. 09/885,999 (8) U.S. patent application Ser. No. 09/886,004 (9) U.S. patent application Ser. No. 09/885,997 and.
A cache controller for a processor in a remote node of a system bus in a multiway multiprocessor link sends out a cache deallocate address transaction (CDAT) for a given cache line when that cache line is flushed and information from memory in a home node is no longer deemed valid for that cache line of that remote node processor. A local snoop of that CDAT transaction is then performed as a background function by other processors in the same remote node. If the snoop results indicate that same information is valid in another cache, and that cache decides it better to keep it valid in that remote node, then the information remains there. If the snoop results indicate that the information is not valid among caches in that remote node, or will be flushed due to the CDAT, the system memory directory in the home node of the multiprocessor link is notified and changes state in response to this. The system has higher performance due to the cache line maintenance functions being performed in the background rather than based on mainstream demand.
A non-uniform memory access (NUMA) computer system includes at least one remote node and a home node coupled by a node interconnect. The home node contains a home system memory and a memory controller. In response to receipt of a data request from a remote node, the memory controller determines whether to grant exclusive or non-exclusive ownership of requested data specified in the data request by reference to history information indicative of prior data accesses originating in the remote node. The memory controller then transmits the requested data and an indication of exclusive or non-exclusive ownership to the remote node.
In a multiprocessor non-uniform cache architecture system, multiple CPU cores shares one non-uniform cache that can be partitioned into multiple cache portions with varying access latencies. A placement prediction mechanism predicts whether a cache line should remain in a cache portion or migrate to another cache portion. The prediction mechanism maintains one or more prediction counters for each cache line. A prediction counter can be incremented or decremented by a constant or a variable determined by some runtime information, or set to its maximum or minimum value. An effective placement prediction mechanism can reduce average access latencies without causing cache thrashing among cache portions.
A computer system includes a home node and one or more remote nodes coupled by a node interconnect. The home node includes a local interconnect, a node controller coupled between the local interconnect and the node interconnect, a home system memory, a memory directory including a plurality of entries, and a memory controller coupled to the local interconnect, the home system memory and the memory directory. The memory directory includes a plurality of entries that each provide an indication of whether or not an associated data granule in the home system memory has a corresponding cache line held in at least one remote node. The memory controller includes demand invalidation circuitry that, responsive to a data request for a requested data granule in the home system memory, reads an associated entry in the memory directory and issues an invalidating command to at least one remote node holding a cache line corresponding to the requested data granule. In addition, the memory controller includes directory scrubbing logic that, independently of any data request, periodically reads entries in the memory directory and, responsive to an entry indicating at least one remote node holds a cache line corresponding to the associated data granule, issues a flush query to the at least one remote node to request deallocation of the cache line corresponding to the associated data granule.
A memory module includes several memory devices coupled to a memory hub. The memory hub includes several link interfaces coupled to respective processors, several memory controller coupled to respective memory devices, a cross-bar switch coupling any of the link interfaces to any of the memory controllers, a write buffer and read cache for each memory device and a read synchronization module. The read synchronization module includes a write pointer, a read pointer and a buffer. The write pointer is incremented in response to the receipt of read data. The read pointer increments in response to coupling of the read data from the memory hub. A comparator compares the read pointer an the write pointer, and the comparison is used to adjust the memory timing.