A computer system has a plurality of processors in a multiprocessor system with each processor associated with a cache memory. The cache traffic is monitored by the respective processors to determine the load for each of the cache memories. Signals corresponding to the cache loads are generated and analyzed. A target processor is selected for a push data operation from a bus agent to the cache memory using the load information. The push operations to the caches are optimized based on the cache traffic information.
Systems and methods for improving the performance of a multiprocessor system by enabling a first processor to initiate the retrieval of data and the storage of the data in the cache memory of a second processor. One embodiment comprises a system having a plurality of processors coupled to a bus, where each processor has a corresponding cache memory. The processors are configured so that a first one of the processors can issue a preload command directing a target processor to load data into the target processor's cache memory. The preload command may be issued in response to a preload instruction in program code, or in response to an event. The first processor may include an explicit identifier of the target processor in the preload command, or the selection of the target processor may be left to another agent, such as an arbitrator coupled to the bus.
Methods and apparatus to perform direct cache access in multiple core processors are described. In an embodiment, data corresponding to a direct cache access request is stored in a storage unit and a corresponding read request is generated. Other embodiments are also described.