An integrated processor and level two (L2) dynamic random access memory (DRAM) are fabricated on a single chip. As an extension of this basic structure, the invention also contemplates multiprocessor "node" chips in which multiple processors are integrated on a single chip with L2 cache. By integrating the processor and L2 DRAM cache on a single chip, high on-chip bandwidth, reduced latency and higher performance are achieved. A multiprocessor system can be realized in which a plurality of processors with integrated L2 DRAM cache are connected in a loosely coupled multiprocessor system. Alternatively, the single chip technology can be used to implement a plurality of processors integrated on a single chip with an L2 DRAM cache which may be either private or shared. This approach overcomes a number of issues which limit the performance and cost of a memory hierarchy. When the L2 DRAM cache is placed on the same chip as the processor, the time needed for two chip-to-chip crossings is eliminated. Since these crossings require off-chip drivers and receivers and must be synchronized with the system clock, the time involved is substantial. This means that with the integrated L2 DRAM cache, latency is reduced.
Embodiments of an apparatus, method, and system provide for no-operation instruction ("NOP") folding such that information regarding the presence of a NOP instruction in the instruction stream is folded into a buffer entry for another instruction. Information regarding a target NOP instruction is thus maintained in a buffer entry associated with an instruction other than the target NOP instruction. For at least one embodiment, NOP information is folded into entries of a re-order buffer.
An efficient system for bootstrap loading scans cache lines into a cache store queue during a scan phase, and then transmits the cache lines from the cache store queue to a cache memory array during a functional phase. Scan circuitry stores a given cache line in a set of latches associated with one of a plurality of cache entries in the cache store queue, and passes the cache line from the latch set to the associated cache entry. The cache lines may be scanned from test software that is external to the computer system. Read/claim dispatch logic dispatches store instructions for the cache entries to read/claim machines which write the cache lines to the cache memory array without obtaining write permission, after the read/claim machines evaluate a mode bit which indicates that cache entries in the cache store queue are scanned cache lines. In the illustrative embodiment the cache memory is an L2 cache.
A method and device for maintaining data coherency in a semiconductor memory device, having two or more memory chips combined into one chip and operated according to a late select synchronous pipeline type input/output protocol. A method includes the steps of generating first and second bypass summation signals by utilizing a chip block select address signal inputted in a latest write operation and comparison signals obtained from comparison between a latest write address and a current read address; and generating first and second bypass control signals having logic values contrary to each other by utilizing the first and second bypass summation signals and an internal clock signal, wherein a bypass operation is performed in one of read paths associated with the memory chips and a normal read operation is performed through other read paths when all the comparison signals are same.
Architectures, methods and systems are presented which combine a multiple of directories (e. g. L1 and L2 directory) into a single directory, while still allowing the individual levels to use their own organization which is best for overall performance. This integration is performed without compromising the organization at each level. With some small additions to the L2 directory, it is used simultaneously to perform both the L1 and L2 directory functions. Additionally, the same organizational structure allows the L2 array to serve both as a traditional L1 and simultaneous L2 array. In one aspect of the present invention an architecture is provided for a first and second level memory hierarchy, or cache, including a first data storage array for the first level memory hierarchy; a second data storage array for the second level memory hierarchy, a single address translation directory combining the directories for the first and second level memory hierarchy into a single directory satisfying the organization requirements of both the first and second level memory hierarchy. Also provided is a system having three level memory hierarchy comprising: a single combined directory used to serve each of three separate storage arrays. Each of the storage arrays serves a respective level of the three level memory hierarchy wherein the organization of the various levels is not compromised by the use of the single combined directory.
A data cache is constructed with the same dimensions as for a conventional n-way associative cache, but is constructed as an (n-1)-way associative cache, so that one associative column of the cache is left unused, although the cache has the same memory array size as a typical n-way associative cache. The extra column of data in the cache is organized as an independent logical translation look-aside buffer (TLB) that is n-way associative. Thus, there is no separate TLB array for the cache, rather, the TLB is contained within the data cache array. In this way, the cache can be implemented with a single chip, and can be of relatively large size, on the order of 8 MB or more.