This disclosure describes a snooping coherency protocol for a multiprocessor network wherein every processor has its own private cache and bus interface means and the network is connected via a common system bus. Each processor has its own cache directory and image directory that duplicate each other non-atomically. The snooping protocol utilizes the duality of directories coupled with the non-atomicity of directory updates to maximize processor-cache availability and minimize processor-cache access times thus supporting high performance architectures.
A system including a plurality of processor nodes is configured to execute a cache coherence protocol that avoids the use of negative acknowledgments and ordering requirements on the underlying transaction-message interconnect/network, and implements store-conditional memory transactions. A store-conditional memory transaction succeeds if a directory tracking the state of a memory line of information unambiguously indicates that the requesting node is the exclusive owner of the memory line, if the directory ambiguously indicates that the requesting node is sharing the memory line and the requesting node is in fact sharing the memory line, or if the directory unambiguously indicates that the requesting node is sharing the memory line. The store-conditional memory transaction fails if the directory unambiguously indicates that the requesting node is not sharing the memory line, or if the directory ambiguously indicates that the requesting node may be sharing the memory line and the requesting node is in fact not sharing the memory line.
A shared memory multiprocessor having a packet switched bus, together with write back caches for connecting individual processor to that bus, employs a consistency protocol that permits the caches to store multiple copies of read/write data at identical physical addresses for use as needed by the respective processors. The protocol causes the hardware to automatically and transparently maintain the consistency of this data. To that end, the caches detect when a datum becomes shared by monitoring the traffic on the bus, thereby enabling them to broadcast an updating write on the bus whenever their respective processors issue a write to a shared address. If desired, this protocol may be extended to include an advisory invalidate for reducing the amount of address sharing that occurs, thereby enhancing the efficiency of the protocol. The protocol maintains a consistent view of memory for the processors, while permitting I/O devices to have direct access to the memory system.
A static random access memory provides interconnection of local wordlines and bit lines to share charge during bulk write operations. Prior to a bulk write cycle, a bit line for each memory cell is driven to a first voltage level. Subsequently, the bit lines and the local wordlines are interconnected for sharing charge between the bit lines and the local wordlines. Next, the bit lines are disconnected from the local wordlines and the bit lines are driven to a second voltage level while the local wordlines are driven to the first voltage level to address the memory cells. Then the bit lines and local wordlines are reconnected to distribute charge from the local wordlines to the bit lines. Lastly, the bit lines are again disconnected from the local wordlines and driven to the first voltage level preparatory to resuming normal operation.
A hierarchical memory structure includes a directory-based main memory coupled to multiple first storage devices, each to store data signals retrieved from the main memory. Ones of the first storage devices are further respectively coupled to second storage devices, each to store data signals retrieved from the respectively coupled first storage devices. Fetch requests to retrieve data signals are issued by ones of the storage devices to the main memory. In response, the main memory determines where the most recent data copy resides, and issues a return request, if necessary to retrieve that copy for the requesting storage device. A speculative return generation logic circuit is coupled to at least two of the first storage devices to intercept the fetch requests. In response to an intercepted request, the speculative return generation logic circuit generates a speculative return request directly to one or more of the other coupled first storage devices. This speculative return request causes any updated copies of the requested data signals that may be stored at a lower level in the hierarchical memory, to be transferred to the first storage device. If a return request for the data is then issued by the main memory in response to the fetch request, the requested data signals are resident in a first storage device, and are readily available to the main memory.
A main memory is subdivided into a shared region to undergo a write access from a plurality of processors and an input/output device and a plurality of private regions to undergo a write access only from the associated processor. Each of the cache devices includes a region discriminating circuit for determining whether an address generated from the processor is to be employed for an access to the shared region or to the private regions. If the access is to be conducted to the shared region, the cache devices operate according to the write-through method. On the other hand, if the access is to be conducted to the private region, the cache devices operate according to the copy-back method. When the processor or the input/output device rewrites data in the shared region of the main memory, the stored data of the shared region in the cache device of the processor is invalidated.