A high-speed method for maintaining a summary of thread activity reduces the number of remote-memory operations for an n processor, multiple node computer system from n.sup.2 to (2n-1) operations. The method uses a hierarchical summary of-thread-activity data structure that includes structures such as first and second level bit masks. The first level bit mask is accessible to all nodes and contains a bit per node, the bit indicating whether the corresponding node contains a processor that has not yet passed through a quiescent state. The second level bit mask is local to each node and contains a bit per processor per node, the bit indicating whether the corresponding processor has not yet passed through a quiescent state. The method includes determining from a data structure on the processor's node (such as a second level bitmask) if the processor has passed through a quiescent state. If so, it is then determined from the data structure if all other processors on its node have passed through a quiescent state. If so, it is then indicated in a data structure accessible to all nodes (such as the first level bitmask) that all processors on the processor's node have passed through a quiescent state. The local generation number can also be stored in the data structure accessible to all nodes. If a processor determines from this data structure that the processor is the last processor to pass through a quiescent state, the processor updates the data structure for storing a number of the current generation stored in the memory of each node.
A method, system and computer program product for modifying data elements in a shared data element group that must be updated atomically for the benefit of readers requiring group integrity. A global generation number is associated with the data element group and each member receives a copy of this number when it is created. Each time an update is performed, the global generation number is incremented and the updated element's copy of this number is set to the same value. For each updated data element, a link is maintained from the new version to the pre-update version thereof, either directly or using pointer-forwarding entities. When a search is initiated, the current global generation number is referenced at the commencement of the search. As data elements in the group are traversed, the reader traverses the links between new and old data element versions to find a version having a matching generation number, if any. Following the occurrence of a grace period in which all readers have passed through quiescent states, all old data element versions are freed.
A system and method for increasing Operating System (OS) idle loop performance in a simulator environment. Upon encountering an OS idle loop condition on a processor, OS program flow is skipped ahead by an amount of time, thereby conserving the host machine's resources that would otherwise have been spent in supporting the OS idle loop execution. If another processor initiates an inter-processor message directed to a processor whose OS program flow has been skipped forward, that processor is capable of skipping backward in time, if necessary, to service the inter-processor message.
Read-copy-update (RCU) is performed within real-time and other types of systems, such that memory barrier usage within RCU is reduced. A computerized system includes processors, memory, updaters, and readers. The updaters update contents of a section of the memory by using first and second sets of per-processor counters, first and second sets of per-processor need-memory-barrier bits, and a global flip-counter bit. The global flip-counter bit specifies which of the first or second set of the per-processor counters and the per-processor need-memory-barrier bits is a current set, and which is a last set. The readers read the contents of the section of the memory by using the first and second sets of per-processor counters, the first and second sets of per-processor need-memory-barrier bits, and the global flip-counter bit, in a way that significantly reduces the need for memory barriers during such read operations.
A method for managing a mutex in a data processing system is presented. For each mutex, an average acquisition cost is maintained that indicates an average consumption of computational resources that has been incurred by threads attempting to acquire the mutex. If a thread attempts to acquire a locked mutex, then the thread enters a spin state or a sleep state based on restrictive conditions and the average acquisition cost value for the mutex at that time. A thread-specific current acquisition cost value is maintained that represents the consumption of computational resources by the thread after the initial attempt to acquire the mutex and prior to acquiring the mutex. When the thread acquires the mutex, the thread-specific current acquisition cost value is included into the average acquisition cost value.