The disclosed embodiments filter out many unnecessary interrogations of the cache directories of processors in a multiprocessor (MP) system, thereby reducing the required size of the buffer invalidation address stack (BIAS) with each associated processor, and increasing the efficiency of each processor by allowing it to access its cache during the machine cycles which in prior MP's had been required for invalidation interrogation. Invalidation interrogation of each remote processor cache directory may be done when each channel or processor generates a store request to a shared main storage. A filter memory is provided with each BIAS in the MP. The filter memory records the cache block address in each invalidation request transferred to its associated BIAS. The filter memory deletes an address when it is deleted from the cache directory and retains the most recent cache access requests. The filter memory may have one or more registers, or be an array. Invalidation interrogation addresses from each remote processor and from local and/or remote channels are received and compared against each valid address recorded in the filter memory. If they compare unequal, the received address is recorded in the filter memory as a valid address, and it is gated into BIAS to perform a cache interrogation. If equal, the inputted address is prevented from entering the filter memory or the BIAS, so that it cannot cause any cache interrogation. Deletion from the filter memory is done when the associated processor fetches a block of data into its cache. Deletion may be of all entries in the filter memory, or of only a valid entry having an address equal to the block fetch address in a fetch address register (FAR). Deletion may be done by resetting a valid bit with each entry.
Effective expansion of a common intermediate buffer memory by equivalent use of the buffer memory in each CPU in a multiprocessor system. A method and system for achieving buffer memory coincidence is applied to a multiprocessor system provided with central processing units, buffer memories contained in respective central processing units, a main memory, and an intermediate buffer memory connected between the main memory and the buffer memories, wherein a buffer invalidation address information (BIA GO) is sent from the intermediate buffer memory to the i-th central processing unit (BIA GO #i) in accordance with the following logical expression: where the term "REQ.CPU" indicates that the i-th central processing unit does not provide a request for accessing the intermediate buffer memory, the term "W" indicates that the above request is a request for writing a data block, term "F" and "F" indicate that the accessed data block is found and is not found, respectively, in the intermediate buffer memory, the term "COPY #i" indicates that a copy of the corresponding data block is stored in the buffer memory of the i-th central processing unit, the validity flag term VIF indicates the possibility that the copy flags are incorrect, and the symbols "x" and ".upsilon." represent a logical product and a logical sum, respectively. This method makes it possible to store data blocks which exist only in the buffer memory but do not exist in the intermediate buffer memory.
A data processing system for vector processing having a main memory accessible in parallel by a plurality of processors, each processor having a cache memory, wherein, in response to a storage instruction given to the main memory by a processor, a main memory block of a given size (BS) and having a give start address (B) and containing element data spaced at an interelement distance (D) being preempted as a result of the storage instruction, a single block address invalidation takes place at each cache memory previously having data stored at that main memory location, the single block address invalidation corresponding to (BS/D) cache address invalidations, whereby repeated sequential individual cache address invalidation operations for each address in the preeempted block no longer are required.
An apparatus which filters the number of invalidates to be propagated onto a private processor bus is provided. This is desirable so that the processor bus is not overloaded with invalidate requests. The present invention describes a method of filtering the number of invalidates to be propagated to each processor. A memory interface filters the invalidates by using a second private bus, the invalidate bus, which communicates with the cache controller. The cache controller can tell the memory interface whether data corresponding to the address on the invalidate bus is resident in the private cache memory of that processor. In this way, the memory interface only has to request the private processor bus when necessary, in order to perform the invalidate.
The hybrid cache control provides a sharing (SH) flag with each line representation in each private CP cache directory in a multiprocessor (MP) to uniquely indicate for each line in the associated cache whether it is to be handled as a store-in-cache (SIC) line when its SH flag is in non-sharing state, and as a store-through (ST) cache line when its SH flag is in sharing state. At any time the hybrid cache can have some lines operating as ST lines, and other lines as SIC lines. A newly fetched line (resulting from a cache miss) has its SH flag set to non-sharing (SIC) state in its location determined by cache replacement selection circuits, unless the SH flag for the requested line is dynamically set to sharing (ST) state and if a cross-interrogation (XI) hit in another cache is found by cross-interrogation (XI) controls, which XIs all other cache directories in the MP for every store or fetch cache miss and for every store cache hit of a ST line (having SH= 1). A XI hit signals that a conflicting copy of the line has been found in another cache. If the conflicting cache line is changed from its corresponding MS line, the cache line is castout to MS. The sharing (SH) flag for the conflicting line is set to sharing state for a fetch miss, but the conflicting line is invalidated for a store miss.
Hierarchical multiprocessors systems with common level expansion modules. The invention includes an architecture for such multiprocessor system. One facet of such multiprocessor system including a memory control system for minimizing duplicate read requests comprising: a plurality of processing systems; a bus connecting the processing systems; a memory for storing variables; circuitry operable for receiving read requests through the bus from other processing systems; a memory for queuing incoming read requests, wherein the memory for queuing incoming read requests is connected to the circuitry operable for receiving read requests; a memory for queuing outgoing read requests, wherein the memory for queuing outgoing read requests is connected to bus and the memory for storing variables; and circuitry for comparing the incoming read requests to the queued read requests, wherein the circuitry ignores duplicates of a first read request prior to the first read request leaving the memory for queuing outgoing read requests.