Multiprocessor coherent cache system including two level shared cache with separately allocated processor storage locations and inter-level duplicate entry replacement
A cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units. For completely independent operation, each processing unit is allocated one-half of the total available cache memory space by separate accounting replacement apparatus included within the buffer memory stage. A multiple allocation memory (MAM) is also included in the buffer memory stage. During each directory allocation cycle performed for a processing unit, the allocated space of the other processing unit is checked for the presence of a multiple allocation. The address of the multiple allocated location associated with the processing unit having the lower priority is stored in the MAM allowing for earliest data replacement thereby maintaining data coherency between both independently operated processing units.
The invention comprises a system bus apparatus and method for a multi-arm, multiprocessor computer system having a main memory and localized buffer cache memories at each processor. Each block of data in a cache includes tag bits which identifies the condition of the data block in relation to the corresponding data in main memory and other caches. The system bus (SYSBUS) comprises three subparts; 1) a MESSAGE/DATA bus, 2) a REQUEST/GRANT bus and 3) a BCU bus. The MESSAGE/DATA bus is coupled to every device on the system and is used for transferring messages, data and addresses. The REQUEST/GRANT bus couples between every device on an arm of the system and that arm's bus control unit (BCU). The BCU bus couples between the various BCUs. Both the MESSAGE/DATA bus and the BCU bus include ACK/NACK/HIT bits which are used when responding to messages received over the SYSBUS to inform the message-issuing device if the devices received the message and, if so, the condition of the data in relation to other caches and main memory. The protocol allows inconsistent copies of data to exist and prevents stale data from being used erroneously by monitoring the tag bits and the ACK/NACK/HIT bits. Further, under the appropriate conditions, a copy of the most recent data block may be transferred from one cache to another (with appropriate updating of tags) without updating the main memory. When a memory operation will bring about a situation where cache coherence can no longer be maintained, main memory is updated with the most recent copy of the data and the other caches are either updated or tagged as invalid.
A method and arrangement for providing each thread of execution (28, 30, 32 and 34) of a multi-threading digital data processing environment with private copies of each set of initialization data (regions 60-1 through 60-4 and 62-1 through 62-4) that is required by procedures (44, 46) which are executed in the context of more than one of the threads. The regions (duplicate data copies) are generated from templates (56, 58) that include a base or original copy of the required set of initialization data. The templates are formulated during operation of the digital data processing system to compile, link and load the procedures and are each identified by a region descriptor (72) which includes a region index (a non-negative integer) and the memory address of the template. Regions are created when the initialization data of the region is required by a procedure that is executing within the context of a thread (i.e., regions are created on an as needed basis) and the memory address of each region is stored in a thread address array (70) so that subsequent access by procedures executing in the context of the same thread can be made using the region descriptor. To conserve system memory, the thread address array is not established in memory until the corresponding thread is being executed and a procedure of the thread requires initialization data (i.e., a region for that thread is to be created).
A toroidally-connected distributed-memory parallel computer having rows of processors, with each processor having an independent memory. The computer includes at least one common I/O channel adapted to be connected to a single row of processors by buffering mechanisms. Each buffering mechanism is associated with one processor of the single row of processors.
A method of addressing units of data stored in a memory using logical addresses. The logical addresses include fewer bits than necessary to uniquely address each unit of data stored in the memory. Translation of a logical address begins with analysis of a first logical address associated with a first unit of data to determine whether it is an even logical address or an odd logical address. A number of similar steps are taken in either case. If the first logical address is an even logical address these steps include coupling the first number of bits to an even translation table. The even translation table stores an even pointer for each even logical address. Each even pointer includes fewer bits than necessary to uniquely identify each unit of data stored in the memory. The even translation table couples the pointer to a first memory, which stores units of data associated with even logical addresses. Finally, in response to the even pointer a first unit of data is output from the even memory. On the other hand, if the first logical address is an odd logical address then the first logical address is coupled to an odd translation table. The odd translation table stores an odd pointer for each odd logical address. Like the even pointers, each odd pointer includes fewer bits than necessary to uniquely identify each unit of data stored in the memory array. In response to the first logical address, the odd translation table couples a second odd pointer to an odd memory, which stores units of data associated with odd logical addresses. The odd memory then outputs the first unit of data associated with the first logical address.
The index field of an address maps to low order cache directory address lines. The remaining cache directory address line, the highest order line, is indexed by the parity of the address tag for the cache entry to be stored to or retrieved from the corresponding cache directory entry. Thus, even parity address tags are stored in cache directory locations with zero in the most significant index/address bit, while odd parity address tags are stored in cache directory locations with one in the most significant index/address bit. The opposite arrangement (msb 1=even parity; msb 0=odd parity) may also be employed, as may configurations in which parity supplies the least significant bit rather than the most significant bit. In any of these cases, even/odd parity is implied based on the location of the address tag within the cache directory. In associative caches, the mechanism may be configured so that even parity address tags are stored in one set of congruence classes (rows) or congruence class members (columns) of the cache directory, while odd parity address tags are stored in another set. The parity of an address tag field within a presented address is also utilized to test the parity of an address tag stored in the indexed location, with address tag and parity matches indicating a cache hit. In the described example, the implied parity mechanism disclosed saves about 1/12th (approximately 9%) of the cache directory array space required over configurations requiring stored parity associated with each cache directory entry. Furthermore, this mechanism improves delays within critical cache directory access paths.