A cache memory unit is constructed to have a two-stage pipeline shareable by a plurality of sources which include two independently operated central processing units (CPUs). Apparatus included within the cache memory unit operates to allocate alternate time slots to the two CPUs which offset their operations by a pipeline stage. This permits one pipeline stage of the cache memory unit to perform a directory search for one CPU while the other pipeline stage performs a data buffer read for the other CPU. Each CPU is programmed to use less than all of the time slots allocated to it. Thus, the processing units operate conflict-free while pipeline stages are freed up for processing requests from other sources, such as replacement data from main memory or cache updates.
The present invention relates generally to a protocol engine for use in a multiprocessor computer system. The protocol engine, which implements a cache coherence protocol, includes a clock signal generator for generating signals denoting interleaved even clock periods and odd clock periods, a memory transaction state array for storing entries, each denoting the state of a respective memory transaction, and processing logic. The memory transactions are divided into even and odd transactions whose states are stored in distinct sets of entries in the memory transaction state array. The processing logic has interleaving circuitry for processing during even clock periods the even memory transactions and for processing during odd clock periods the odd memory transactions.
In a pipeline computer, a store sub-instruction address is stored into a most recent address register as well as into an address buffer in response to a store request and a load sub-instruction address is supplied to a main memory in response to a load sub-instruction. When a store sub-instruction data is stored into a buffer following the storage of the store sub-instruction address into the most recent address register, the contents of the address buffer and the data buffer are transferred to a location of the main memory specified by the sub-instruction address. Main memory data is retrieved from a location specified by an associated load sub-instruction address. A match or a mismatch is detected between the address in the most recent address register and an address generated in response to a subsequent load sub-instruction. If the latter occurs just prior to the storage of store sub-instruction data into the data buffer and if it accesses the same data as the store sub-instruction data, a match will be detected and data from an arithmetic logic unit is stored into a replay data register and returned to the request source. If a mismatch is detected and data read out of the main memory is passed through the reply data register to the request source.
A processor controlled interface between a processor, instruction cache, and main memory provides for simultaneously refilling the cache with an instruction block from main memory and processing the instructions in the block while they are being written to the cache.
In a binary search, two storage units are prepared so that when the least significant bit of the search address is "0" and "1", even and odd address banks respectively are used. The search object data is classified according to data belonging to the odd and even addresses in continuous addresses and allocated to these two storage units. Further, a search tree of the search address is constructed so that two data of object for a next search are stored in different storage units. Upon the binary search, addresses for the two storage units are set according to this search tree. Therefore, simultaneous readout of data is enabled, so that readout and comparison are carried out in parallel. Further, according to multiple division search of the invention, if 2 bits of the least significant bits of the search address are "00", "01", "10" and "11", a search object data is stored in first through fourth banks respectively. A search tree of the search address is constructed so that four data of object for a next search are stored in different banks. Upon a search, an address for each bank is set according to the search tree so as to enable readout of data at the same time, thereby reducing time required for the search.
A technique of processing pipeline commands in parallel so as to minimize pipeline stalls. This is accomplished in accordance with the invention without need for the complex resource allocation techniques of the prior art by arbitrating access to critical pipeline resources on the phase of the system clock. For example, one control process may access the critical pipeline resource only during an even phase of the system clock, while a second control process may access the critical pipeline resource only during the odd phase of the clock. These processes may run at the same time if the pipelined instructions being executed by each process have no data dependencies since structural hazards are effectively eliminated by time-sharing the data buses on the respective phases of the system clock. The benefits of dynamically scheduled pipelined systems may thus be obtained without the complex scoreboarding and other scheduling algorithms used in the prior art to prevent pipeline hazards.