A high speed buffer store arrangement for use in a data processing system having multiple cache buffer storage units in a hierarchial arrangement permits fast transfer of wide data blocks. On each cache chip, input and output latches are integrated thus avoiding separate intermediate buffering. Input and output latches are interconnected by 64-byte wide data buses so that data blocks can be shifted rapidly from one cache hierarchy level to another and back. Chip-internal feedback connections from output to input latches allow data blocks to be selectively reentered into a cache after reading. An additional register array is provided so that data blocks can be furnished again after transfer from cache to main memory or CPU without accessing the respective cache. Wide data blocks can be transferred within one cycle, thus tying up caches much less in transfer operations, so that they have increased availability.
A set-associative cache memory having incremental access latencies among sets is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. In accordance with a preferred embodiment of the present invention, the cache memory further includes a means for accessing each of the sets with an access time dependent on a relative location of each of the sets such that access latency varies incrementally among sets.
A method and apparatus are described that permit an application to control data transfer from a memory object of a source device to a sink device. The application can request that an operating system establish a mapping between a fast buffer and a memory object storing the data. The operating system then establishes the mapping between the fast buffer and the memory object thereby permitting the application to direct that the data of the memory object be transferred to the sink device. Thus, the sink device can use direct memory access to the source device to transfer the data from the memory object. Furthermore, if the application modifies a portion of the data of the memory object prior to directing the transfer, only the modified portion of the data is copied to main memory prior to transfer to the sink device.
A set-associative cache memory having asymmetric latency among sets is disclosed. The cache memory has multiple congruence classes of cache lines. Each congruence class includes a number of sets organized in a set-associative manner. The cache memory further includes a means for accessing at least one of the sets faster than the remaining sets having an identical access latency.
A low latency network receive interface reduces the copying of message data by directly coupling the network to a cache and by providing an address-based message in which an incoming message block preincorporates an address so that messages can be directly stored in their final destination. In a preferred embodiment, the message data size is made equal to the cache block size so that cache blocks can be updated atomically. The small message size--which is equivalent in size to a cache block--also reduces transfer time, unlike Direct Memory Access (DMA) approaches in which a large amount of data must accumulate prior to transfer to main memory as a block. In one embodiment, the cache to which message data is directly coupled is divided into a message cache and a data cache, with the incoming message block coupled directly to the message cache. When an incoming message arrives, its address is compared with addresses in the data cache, with the data in the data cache at this address being purged in an invalidation process if the particular address is priorly occupied. The processor first accesses the data cache, and if no valid data exists at the corresponding address, it accesses the memory cache, which is in turn followed by accessing main memory if no valid cache data exists. This direct cache coupling of incoming message data eliminates latency due to buffering of the incoming message data in temporary storage prior to copying the message data.
A buffer storage system provided between an instruction executing portion and a main storage for enabling the instruction executing portion to quickly fetch and store date in a frequently-accessed address area. The buffer storage system includes a plurality of buffer storages each storing the same data; and a move-out buffer register provided between the plurality of buffer storages and the main storage. When data held in the buffer storages is required to be moved out to the main storage, a part of the data to be moved out is transferred from each of the plurality of buffer storages to a corresponding portion of the move-out buffer register, and then the data is transferred from the move-out buffer register to the main storage. The transferring of the parts of the data from the plurality of buffer storages to the corresponding portions of the move-out buffer register are concurrently carried out, and the parts of the data concurrently transferred from all of the buffer storages to the move-out buffer register constitute the whole of the data to be moved out.