Apparatus and method for prefetching subblocks from a low speed memory to a high speed memory of a memory hierarchy depending upon state of replacing bit in the low speed memory
A prefetching mechanism for a memory hierarchy which includes at least two levels of storage, with L1 being a high-speed low-capacity memory, and L2 being a low-speed high-capacity memory, with the units of L2 and L1 being blocks and sub-blocks respectively, with each block containing several sub-blocks in consecutive addresses. Each sub-block is provided an additional bit, called a r-bit, which indicates that the sub-block has been previously stored in L1 when the bit is 1, and has not been previously stored in L1 when the bit is 0. Initially when a block is loaded into L2 each of the r-bits in the sub-block are set to 0. When a sub-block is transferred from L1 to L2, its r-bit is then set to 1 in the L2 block, to indicate its previous storage in L1. When the CPU references a given sub-block which is not present in L1, and has to be fetched from L2 to L1, the remaining sub-blocks in this block having r-bits set to 1 are prefetched to L1. This prefetching of the other sub-blocks having r-bits set to 1 results in a more efficient utilization of the L1 storage capacity and results in a highter hit ratio.
A spatial footprint predictor includes a mechanism to measure spatial footprints of nominating cache-lines and hold the footprints. In some embodiments, the mechanism includes an active macro-block table (AMBT) to measure the spatial footprints and a spatial footprint table (SFT) to hold the spatial footprints. In other embodiments, the mechanism includes a macro-block table (MBT) in which macro-blocks may be active or inactive.
A multi-level instruction cache memory system for a computer processor. A relatively large cache has both instructions and data. The large cache is the primary source of data for the processor. A smaller cache dedicated to instructions is also provided. The smaller cache is the primary source of instructions for the processor. Instructions are copied from the larger cache to the smaller cache during times when the processor is not accessing data in the larger cache. A prefetch buffer transfers instructions from the larger cache to the smaller cache. If a cache miss occurs for the smaller cache, and the instruction is in the prefetch buffer, the system provides the instruction with no delay relative to a fetch from the smaller instruction cache. If a cache miss occurs for the smaller cache, and the instruction is being fetched from the larger cache, or available in the larger cache, the system provides the instruction with minimal delay relative to a fetch from the smaller instruction cache.
A computer memory management method for cache memory uses a deconfirmation technique to provide a simple sequential prefetching algorithm. Access sequentially is predicted based on simple histories. Each memory line in cache memory is associated with a bit in an S-vector, which is called the S-bit for the line. When the S-bit is on, sequentiality is predicted meaning that the sequentially next line is regarded as a good candidate for prefetching, if that line is not already in the cache memory. The key to the operation of the memory management method is the manipulation (turning on and off) the S-bits.
When a CPU outputs an address for read-out from a memory, access to a cache memory is immediately started by use of its address signal, and in the mean time a cache controller determines whether or not the data required by the CPU exists in the cache memory and, if so, generates a selection signal for outputting only the data read out from a desired bank of the cache memory to a data bus. Acccordingly, the time necessary for address comparison in the cache controller is not added to the access cycle time of the cache memory so that the overall access time can be shortened and the through-put of the system can be improved.
A memory system for a computational circuit having a pipeline includes at least one functional unit and an address generator that generates a memory address. A coherent cache memory is responsive to the address generator and is addressed by the memory address. The cache memory is capable of generating a cache memory output. A non-coherent directory-less associative memory is responsive to the address generator and is addressable by the memory address. The associative memory receives input data from the cache memory. The associative memory is capable of generating an associative memory output that is delivered to the functional unit. A comparison circuit compares the associative memory output to the cache memory output and asserts a miscompare signal when the associative memory output is not equal to the cache memory output.