A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel. Current accesses to the cache are handled by current-access-completion circuitry which determines whether the current access is capable of immediate completion and either completes the access immediately if so capable or transfers the access to pending-access-completion circuitry if not so capable. The latter circuitry works on completion of pending accesses; it determines and stores for each pending access status information prescribing the steps required to complete the access and redetermines that status information as conditions change. In working on completion of current and pending accesses, the addresses of the accesses are compared to those of memory accesses in progress on the system.
A vector processing system provides high performance vector processing using a System-On-a-Chip (SOC) implementation technique. One or more scalar processors (or cores) operate in conjunction with a vector processor, and the processors collectively share access to a plurality of memory interfaces coupled to Dynamic Random Access read/write Memories (DRAMs). In typical embodiments the vector processor operates as a slave to the scalar processors, executing computationally intensive Single Instruction Multiple Data (SIMD) codes in response to commands received from the scalar processors. The vector processor implements a vector processing Instruction Set Architecture (ISA) including machine state, instruction set, exception model, and memory model.
A cache memory with reduced request-blocking blocks requests from being accepted by the cache memory based on the types of requests the cache is already servicing. A request which hits the cache memory or a request which misses the cache memory but does not conflict with any requests already being serviced is not blocked. A request which misses the cache memory and also conflicts with a request(s) already being serviced causes the request to be blocked. In one embodiment, conflicts for write requests are determined by checking whether the cache is already retrieving a cache line from system memory for a request which maps into the same cache set as the write request. If such a request exists, then a conflict occurs. In this embodiment, conflicts for read requests are determined by checking whether the cache is already servicing an outstanding request to memory for the same address. If so, then a conflict occurs. If not, then a conflict does not occur unless the victim line for the read request is dirty and no space exists in a write-back buffer to temporarily store the victim line.
A memory system and method of using same are provided. In one embodiment of the present invention, the memory system may include a plurality of logic sections that may be used to facilitate execution of relatively complex atomic read-modify-write operations.
The invention describes a system for and a method of creating and using dependencies to determine the order of servicing transaction requests in a multiple queue environment. When more than one outstanding transaction affects the same memory location, dependencies are established to ensure the correct sequencing of the competing transactions. In a preferred embodiment the dependency is configured to ensure that, as each request is inserted, other outstanding requests are checked to determine if the same memory location is accessed. If the same memory location is affected, a dependency is created which ensures the youngest queue entry which is present at the time the check is made occurs before the present outstanding request.
A non-locking queueing mechanism is described for transferring information from a sending unit to a receiving unit through a queue in which there is no interference between the independent units (sender and receiver) during enqueueing or dequeueing. The invention thus avoids any form of interlock or serialism. The mechanism includes a first pointer (D), identifying the element area in the queueing device where the last dequeued information element, if any, was located, and a second pointer register for logging a second pointer (E) identifying the element area in the queueing device where the last enqueued information element, if any, was located, a first control block activated by the sending unit to enqueue the information element into the queueing device and for updating the second pointer, and a second control block activated by the recieving unit to dequeue the information element from the queueing device and for updating the first pointer.