|
Claims  |
|
|
What we claim is:
1. A memory subsystem comprising:
a first buffer comprising a plurality of entries corresponding in number to a subset of data blocks in a coupled memory device, each of the entries for storing status information for each of the associated data blocks;
a second buffer coupled to the first buffer, the second buffer comprising at least one entry comprising a data block field for storing a data block, an address field for indicating the address of the data block at the coupled memory device and a
flag field for indicating whether the data block stored in the associated data field should be written to the coupled memory device, where a state of the flag field is determined in response to status information stored in the first buffer at the entry
corresponding to the data block.
2. The memory subsystem according to claim 1, wherein the first buffer is accessed in parallel with a write operation of the data block to the second buffer and wherein the status information accessed from the first buffer is used for
selectively writing the data block in the second buffer to the coupled memory device.
3. The memory subsystem according to claim 1, wherein requests issued to the coupled memory device are serviced by the second buffer responsive to the address and flag fields of the at least one entry of the second buffer.
4. The memory subsystem according to claim 1, wherein each of the entries of the first buffer includes an owner field identifying an owner of the associated data block, and wherein the flag field in the second buffer is set in response to the
owner information in the first buffer.
5. The memory subsystem according to claim 4, wherein the data block in the data block field of the second buffer is written to the coupled memory device when the flag field in the second buffer is set.
6. The memory subsystem according to claim 5 wherein the flag field in the second buffer is set responsive to a source of the data in the data field of the second buffer corresponding to the owner identified in the owner field of the associated
entry of the first buffer.
7. The memory subsystem according to claim 1, further comprising:
means for selectively servicing a request to the coupled memory device by either the second buffer or the coupled memory device including:
means for comparing an address of the request to the address field of the at least one entry in the second buffer;
means, responsive to a match between the address of the request and the address in the address field of the at least one entry of the second buffer, and further responsive to a set flag in the associated flag field, for servicing the request with
the data block from the data field of the second buffer; and
means, responsive to either a mismatch between the address of the request and the address in the address field or a reset flag in the associated flag field, for servicing the request with the data block from the coupled memory device.
8. The memory subsystem according to claim 1 further comprising:
a cache for temporary storage of a plurality of data blocks from the coupled memory device, wherein each of the plurality of the entries of the first buffer correspond to one of the plurality of data blocks stored in the cache memory, and wherein
the status information of each entry further comprises:
address information identifying the address of the data block at the cache;
a validity flag indicating that the copy of the associated data block in the coupled cache memory device is current; and
wherein the flag field of the at least one entry of the second buffer is set responsive to the address information and validity flag of the corresponding data block in the first buffer.
9. The memory subsystem according to claim 8, wherein the flag field for the data block is set in the at least one entry of the second buffer in response to the validity flag of the entry of the first buffer associated with the data block
indicating that the associated data is current.
10. A memory subsystem comprising:
a first buffer comprising a plurality of entries corresponding in number to a first subset of data blocks in a coupled memory device, each of the entries for storing first status information for each of the associated data blocks;
a second buffer comprising a plurality of entries corresponding in number to a second subset of data blocks in a coupled memory device, each of the entries for storing second status information for each of the associated data blocks; and
a third buffer coupled to the first buffer and the second buffer, the third buffer comprising at least one entry comprising a data block field for storing a data block, an address field for indicating the address of the data block at the coupled
memory device and a flag field for indicating whether the data block stored in the associated data field should be written to the coupled memory device, where a state of the flag field is determined in response to the first and second status information
stored in the first buffer and the second buffer, at respective entries corresponding to the data block.
11. The memory subsystem according to claim 10, wherein the first and second buffers are accessed in parallel with a write operation of the data block in the third buffer, and wherein the first and second status information accessed from the
first and second buffers respectively are used for selectively writing the data block in the third buffer to the coupled memory device.
12. The memory subsystem according to claim 10, wherein requests issued to the coupled memory device are serviced by the third buffer responsive to the address and flag fields of the at least one entry of the second buffer.
13. The memory subsystem according to claim 10 further comprising:
a cache for temporary storage of a plurality of data blocks from the coupled memory device, wherein each of the plurality of the entries of the second buffer correspond to one of the plurality data blocks stored in the cache memory, and wherein
the status information of each entry further comprises:
address information identifying the address of data block at the coupled cache memory device; and
a validity flag indicating that the copy of the associated data block in the coupled cache memory device is current.
14. The memory subsystem according to claim 13, wherein each of the entries of the first buffer includes an owner field identifying an owner of the associated data block, and wherein the flag field in the third buffer is set responsive to the
address information and validity flag of the corresponding data block in the second buffer if the data block is one of the plurality of data blocks stored in the cache, or responsive to a source of the data in the data field of the second buffer
corresponding to the owner identified in the owner field of the associated entry of the first buffer if the data block is not one of the plurality of data blocks stored in the cache.
15. A method of writing a modified block of data from a cache device to a coupled memory device in response to an instruction comprising the steps of:
initiating a retrieval of status information associated with the modified block of data from a first buffer, the first buffer having a plurality of entries;
storing the modified block of data in a second buffer, the second buffer comprising at least one entry, with each of the at least one entries comprising a flag indicating whether the modified block of data at the associated entry is to be written
to the coupled memory device, the step of storing including the step of setting the flag associated with the modified block of data in the buffer to indicate that the modified data block should not be written to the coupled memory device; and
writing to the coupled memory device any of the at least one entries in the second buffer having the associated flag indicating that the data entry is to be written to the coupled memory device, wherein the steps of initiating, storing and
writing are performed in parallel.
16. The method according to claim 15, further comprising the step of:
updating the flag of the entry in the second buffer associated with the modified data block in response to the status information retrieved from the first buffer.
17. The method according to claim 15 wherein each of the at least one entries in the second buffer further includes an address field, and wherein the method further comprises the step of servicing a request issued to the coupled memory device by
the second buffer responsive to the address and flag field of one of the at least one entries in the second buffer, wherein the address of the one of the at least one entries corresponds to the address of the request.
18. The method according to claim 15, wherein each of the plurality of entries of the first buffer includes an owner field identifying an owner of the associated data block, and wherein the flag field in the second buffer is set in response to
the owner information in the first buffer.
19. The memory subsystem according to claim 18 wherein the flag field in the second buffer is set responsive to a source of the modified data the second buffer matching the owner identified in the owner field of status information of the
associated entry of the first buffer.
20. The memory subsystem according to claim 15 wherein each of the plurality of the entries of the first buffer correspond to one of a plurality data blocks stored in the cache memory, and wherein the status information of each entry in the
first buffer further comprises:
address information identifying the address of the data block at the coupled memory device;
a validity flag indicating that the associated data block in the coupled cache memory device is current; and
wherein the flag field of the at least one entry of the second buffer is set responsive to the address information and validity flag of the corresponding entry in the first buffer.
21. The memory subsystem according to claim 20, wherein the flag field for the data block is set in the at least one entry of the second buffer in response to the validity flag of the entry of the first buffer associated with the data block
indicating that the associated data is current.
22. A method of writing a modified block of data from a cache device having a plurality of entries to a coupled memory device in response to an instruction comprising the steps of:
initiating a retrieval of first status information associated with the modified block of data from a first buffer having a first plurality of entries;
initiating a retrieval of second status information associated with the modified block of data from a second buffer, the second buffer having a second plurality of entries corresponding in number to the plurality of entries of the cache device;
storing the modified block of data in a third buffer, the third buffer comprising at least one entry, with each of the at least one entries comprising a flag indicating whether the modified block of data at the associated entry is to be written
to the coupled memory device, the step of storing including the step of setting the flag associated with the modified block of data in the buffer to indicate that the modified data block should not be written to the coupled memory device; and
writing to the coupled memory device any of the at least one entries in the third buffer having the associated flag indicating that the data entry is to be written to the coupled memory device, wherein the steps of initiating a retrieval of first
status information, initiating a retrieval of second status information, storing the modified block and writing any of the at least one entries are performed in parallel.
23. The method according to claim 22, further comprising the step of:
updating the flag of the entry in the third buffer associated with the modified data block in response to the first and second status information.
24. The method according to claim 23 wherein each of the at least one entries in the third buffer further includes an address field, and wherein the method further comprises the step of servicing a request issued to the coupled memory device by
the third buffer responsive to the address and flag field of one of the at least one entries in the third buffer, wherein the address of the one of the at least one entries corresponds to the address of the request.
25. The method according to claim 24, wherein each of the plurality of entries of the first buffer includes an owner field identifying an owner of the associated data block.
26. The memory subsystem according to claim 25 wherein the status information of each of the plurality of entries in the second buffer further comprises:
address information identifying the address of a data block associated with the entry at the coupled memory device;
a validity flag indicating that the associated data block in the coupled cache memory device is current.
27. The method according to claim 26, wherein the flag field of the at least one entry of the third buffer is set responsive to the address information and validity flag of the corresponding entry in the second buffer if the second status
information is retrieved before the first status information, and wherein the flag field in the second buffer is set responsive to a source of the modified block of data matching the owner identified in the owner field of status information of the
associated entry of the first buffer if the first status information is retrieved before the second status information. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
This invention relates in general to the field of computer architecture and more specifically to distributed shared-memory multi-processing systems.
BACKGROUND OF THE INVENTION
As it is known in the art, symmetric multi-processing computers allow for high performance application processing. Typical symmetric multi-processing computer systems include a number of processors coupled together by a bus. One characteristic
of a symmetric multi-processing system is that memory space is shared among all of the processors. One or more operating systems are stored in memory and control the distribution of processes or threads among the various processors.
By allowing many different processors to execute different processes or threads simultaneously, the execution speed of a given application may be greatly increased. In theory the performance of a system could be improved by simply increasing the
number of processors in the multi-processing system. In reality, the continued addition of processors past a certain saturation point serves merely to increase communication bottlenecks and thereby limit the overall performance of the system.
For example, referring now to FIG. 1A, a typical prior art multi-processor system 2 including eight processors coupled together via a common interconnect bus is shown. During operation, each of the processors 3a-3h communicate with the other
processors and with a shared memory 4 via a shared interconnect bus 5. The symmetric multi-processing arrangement of FIG. 1A has been adequate for multi-processors built to date. However, with the advent of faster microprocessors, a common shared
interconnect is not capable of sufficiently exercising the full performance potential of the coupled microprocessors. Because the only communication link between the processors and memory is the shared bus, the bus may rapidly become saturated with
requests from the processors, thereby increasing delays as each processor attempts to gain access to the system bus. Therefore, although the processors may be able to operate at enhanced speeds, the limiting factor in terms of performance is the
available bandwidth of the system bus.
Communication bandwidth is a key factor in the performance of SMP systems. Since bandwidth may not be uniform between pairs or subsets of nodes in the SMP system, the industry uses a "bisection bandwidth" measurement for determining the
communication bandwidth of an SMP system. Bisection bandwidth is determined in the following manner. All possible ways of partitioning the system into two portions of equal compute power (equal number of processors) are ascertained. For each
partition, the sustainable bandwidth between the two partitions is determined. The minimum of all of the sustainable bandwidths is the bisection bandwidth of the interconnect. The minimum bandwidth between the two partitions indicates the communication
bandwidth sustainable by the multiprocessor system in the presence of worst-case communication patterns. Thus, a large bisection bandwidth is desirable.
Several interconnection architectures or "topologies" have been used in the prior art to overcome bus saturation problems. These topologies include meshes, touri, hypercubes and enhanced hypercubes.
As an example, a mesh interconnect is shown as system 7 in FIG. 1B. The major advantage of the mesh network is its simplicity and ease of wiring. Each node is connected to a small number of other neighboring nodes. However, the mesh
interconnect has three significant drawbacks. First, messages must on average traverse a large number of nodes to get to their destination, and as a result the communication latency is high. Second, the bisection bandwidth does not scale as well for a
mesh topology as it does for other topologies. Finally, because each of the messages may traverse different paths within the mesh, there are no natural ordering points within an SMP system, and therefore the cache coherence protocols required to
implement the mesh topology are often quite complex.
The torus, hypercube, and enhanced hypercube topologies are all topologies wherein the nodes are interconnected in various complex arrangements, for example in a torus arrangement or a cube arrangement. The torus, hypercube and enhanced
hypercube interconnects are more complex than the mesh interconnect, but offer better latency and bandwidth than the mesh interconnect. However, like the mesh interconnect, the torus, hypercube and enhanced hypercube topologies do not provide natural
ordering points, and thus a complex cache coherence protocol must be implemented for each of those systems.
In shared-memory multiprocessor systems, processors typically employ private caches to store data determined likely to be accessed in the future. Since processors may read data from their private cache and may update data in the private cache
without writing it back to memory, a mechanism is needed to ensure that the private caches of each of the processors are kept consistent, or coherent. The mechanism that is used to ensure coherency of data in the SMP system is referred to as the cache
coherence protocol.
Besides the topology, bandwidth, and latency of the physical interconnect the efficiency of the cache coherence protocol is a key factor in system performance. Cache coherency protocols may introduce latencies, bottlenecks, inefficiencies or
complexity in several ways.
The latency of load and store operations is often directly affected by the protocol of the design. For example, in some protocols, a store operation is not considered complete until all invalidate messages have made it to their target processors
and acknowledgment messages have made it all the way back to the original processor. The latency of stores here is much higher than a protocol wherein the original processor does not have to wait for the Invalidates to make it to their destination.
Further, the acknowledgments consume a significant fraction of the system bandwidth.
Bottlenecks are often introduced due to high occupancy of controllers. "Occupancy" is a term of art; it indicates the amount of time a controller is unavailable after it receives a request. In some protocols, when a directly controller receives
a request corresponding to a memory location, it becomes unavailable for other requests to the same memory location until certain acknowledgments corresponding to the former command arrive at the directory. If the controller receives conflicting
requests at a higher than average rate, it becomes a bottleneck.
The design of the cache coherence protocol also affects hardware complexity. For instance, some protocols introduce deadlock and fairness problems, which are then addressed with additional mechanisms. This results in added hardware complexity.
It is desirable to provide a symmetric multiprocessing system that minimizes the latency of operations, provides large communication bandwidth, provides low controller occupancy, and can scale to a large number of processors.
SUMMARY OF THE INVENTION
The present invention is advantageously employed in a system where multiple multi-processing nodes are coupled together via a switch. A memory control subsystem includes a delayed write buffer, coupled to a memory device, for temporary storage
of data that is to be conditionally written to the memory device. The delayed write buffer includes at least one entry storing a cache line size data block and a flag, where the flag is set when the data block should be written to memory. The memory
control system also includes a buffer for storing status information about each of the data blocks of the coupled memory device. The flag is set depending upon the status information (including owner information) associated with the data block in the
delayed write buffer. While the data block is stored in the delayed write buffer, the status information may be accessed to determine whether the data block that was written into the delayed write buffer is still owned by the source seeking to write the
data block to memory. If it was not owned, it is not written to memory. If it is owned, it is written to memory when the next data block is received at the delayed write buffer. References to the data block that are received while it is stored in the
first buffer may be serviced by the first buffer. With such an arrangement, a memory write operation is capable of being performed each time a block of data is received by the delayed write buffer. By accessing the status information while the data
block is stored in the delayed write buffer, in parallel with writing the data block into the delayed write buffer and writing the previous data block from the delayed write buffer into memory, memory cycles are not wasted pending status accesses.
According to one aspect of the invention, a memory subsystem includes a first buffer comprising a plurality of entries corresponding in number to a subset of data blocks in a coupled memory device. Each of the entries stores status information
for each of the associated data blocks. The subsystem further includes and a second buffer coupled to the first buffer, the second buffer comprising at least one entry comprising a data block field for storing a data block, an address field for
indicating the address of the data block at the coupled memory device and a flag field for indicating whether the data block stored in the associated data field should be written to the coupled memory device. A state of the flag field is determined in
response to status information stored in the first buffer at the entry corresponding to the data block.
According to another aspect of the invention, a memory subsystem includes a first buffer comprising a plurality of entries corresponding in number to a first subset of data blocks in a coupled memory device. Each of the entries for storing first
status information for each of the associated data blocks. The memory subsystem further includes a second buffer comprising a plurality of entries corresponding in number to a second subset of data blocks in a coupled memory device, where each of the
entries for store second status information for each of the associated data blocks. A third buffer is coupled to the first buffer and the second buffer. The third buffer includes at least one entry comprising a data block field for storing a data
block, an address field for indicating the address of the data block at the coupled memory device and a flag field for indicating whether the data block stored in the associated data field should be written to the coupled memory device. A state of the
flag field is determined in response to the first and second status information stored in the first buffer and the second buffer, at respective entries corresponding to the data block.
According to a further aspect of the invention, a method of writing a modified block of data from a cache device to a coupled memory device in response to an instruction includes the steps of initiating a retrieval of status information
associated with the modified block of data from a first buffer, the first buffer having a plurality of entries, storing the modified block of data in a second buffer, the second buffer comprising at least one entry, with each of the at least one entries
comprising a flag indicating whether the modified block of data at the associated entry is to be written to the coupled memory device, the step of storing including the step of setting the flag associated with the modified block of data in the buffer to
unwriteable, and writing any of the at least one entries in the second buffer having the associated flag indicating that the data entry is to be written to the coupled memory device to the coupled memory device, wherein the steps of initiating, storing
and writing are performed in parallel.
BRIEF DESCRIPTION OF THE DRAWINGS
The above-mentioned and other features of the invention will now become more apparent by reference to the following description taken in connection with the accompanying drawings in which:
FIGS. 1A-1B are block diagrams of two prior art symmetric multi-processor computer systems;
FIG. 2 is a block diagram of one embodiment of a multi-processor computer node of one embodiment of the invention comprising a switch;
FIG. 3 is a block diagram illustrating the data path of the switch of FIG. 1 comprising a number of Simultaneous Insertion Buffers;
FIG. 4A is a block diagram of one embodiment of one of the Simultaneous Insertion Buffers of FIG. 3;
FIG. 4B is a block diagram of one implementation of logic for controlling one of the Simultaneous Input Buffers of FIG. 4;
FIG. 5 is a block diagram of a second embodiment of one of the Simultaneous Insertion Buffers of FIG. 3;
FIG. 6 is a block diagram of the multi-processor computer node of FIG. 2, augmented for connection into a larger network of similar nodes;
FIG. 7A is one embodiment of an SMP system implemented using multiple nodes similar to the multi-processor node of FIG. 6;
FIG. 7B is another embodiment of an SMP system implemented using multiple nodes similar to the multi-processor node of FIG. 6;
FIG. 8 is a block diagram of a global port of FIG. 6;
FIG. 9 illustrates an entry in a directory of the multi-processor node of FIG. 6;
FIG. 10 illustrates a Transaction Tracking Table (TTT) for use in the global port of FIG. 8;
FIG. 11 is a block diagram of a hierarchical switch for coupling the multiple nodes in FIG. 7A;
FIG. 12A is a block diagram of one embodiment of interconnect logic for the hierarchical switch that eliminates deadlock;
FIG. 12B is a flow diagram of the operation of the interconnect logic of FIG. 12A;
FIG. 13 is a flow diagram of the method used in the interconnect logic of FIG. 12A to assert flow control to stop data being transmitted from one of the multi-processing nodes;
FIG. 14 is a timing diagram illustrating the transfer of address and data packets on the busses to and from the hierarchical switch;
FIG. 15 is a block diagram of one embodiment of buffer logic for maintaining order at the hierarchical switch;
FIG. 16 is a block diagram of another embodiment of buffer logic for maintaining order for the hierarchical switch;
FIG. 17 is a flow diagram illustrating one method of operating the buffer logic of FIG. 16;
FIG. 18 is a block diagram of another embodiment of buffer logic for maintaining order at the hierarchical switch;
FIG. 19 is a table illustrating the translation of processor instructions to network instructions for use in the SMP of FIGS. 7A or 7B;
FIGS. 20A-20J illustrate a number of communication flows for transferring packets between nodes in the SMP of FIGS. 7A or 7B;
FIG. 21 is a block diagram illustrating the layout of a memory module for use in the multi-processor system of FIGS. 2 or 6;
FIG. 22 is a timing diagram illustrating the control logic used by the memory module of FIG. 21 for delayed write operations;
FIG. 23 is a flow diagram illustrating the use of discrete transactions that are mapped to channels for maintaining cache coherency in one embodiment of the invention;
FIG. 24 is a block diagram illustrating one implementation of a shared queue structure for handling virtual channels in the SMP of FIGS. 7A or 7B;
FIG. 25 is a block diagram illustrating an implementation of individual channel buffering in the nodes and hierarchical switches of the SMP of FIGS. 7A or 7B;
FIG. 26 is a block diagram for illustrating the problems that may arise if some amount of ordering between virtual channels in not maintained;
FIGS. 27A-27C are block diagrams illustrating the flow and ordering constraints on the Q1 channel for providing coherent communication in the SMP of FIGS. 7A or 7B;
FIGS. 28A and 28B are a block diagram illustrating the ambiguity problems that arise because of the coarse vector presence bits of the directory entries of the SMP of FIGS. 7A and 7B;
FIG. 29 is a block diagram illustrating the method used to prevent data ambiguity from arising as a result of the problem described in FIG. 28;
FIG. 30 is a block diagram for illustrating a coherency issue that arises from packets on different channels being received out of sequence;
FIG. 31 is a block diagram illustrating the use of Fill Markers for preventing the coherency problem described in FIG. 29;
FIG. 32 is an entry in the TTT reflecting the status of an instruction during the flow described with regard to FIG. 31;
FIGS. 33A-33B are block diagrams illustrating the operation of Change to Dirty commands in the SMP system;
FIG. 34 is a block diagram illustrating the use of Shadow commands for remedying the problem described with regard to FIG. 33;
FIG. 35 is an entry in the TTT reflecting the status of an instruction during the flow described with regard to FIG. 34; and
FIG. 36 is a flow diagram illustrating permissible sequential ordering of instructions in the example described in FIG. 35.
DESCRIPTION OF THE PREFERRED EMBODIMENT
According to one embodiment of the invention, a hierarchical Symmetric Multi-Processing (SMP) system includes a number of SMP nodes coupled together via a high performance switch. Thus, each of the SMP nodes act as a building block in the SMP
system. Below, the components and operation of one SMP node building block is first described, followed by a description of the operation of the SMP system and subsequently a description of a cache coherence protocol that is used to maintain memory
coherency in the large SMP system.
SMP Node Building Block
Referring now to FIG. 2, a multi-processor node 10 includes four processor modules 12a, 12b, 12c, and 12d. Each processor module comprises a central processing unit (CPU). In a preferred embodiment, Alpha.RTM. 21264 processor chips,
manufactured by Digital Equipment Corporation.RTM. are used, although other types of processor chips capable of supporting the below described coherency protocol may alternatively be used.
Multi-processor node 10 includes a memory 13, which may include a number of memory modules 13a-13d. The memory may provide 32 GBytes of storage capacity, with each of the 4 memory modules storing 8 Gigabytes. Each of the memory modules is
apportioned into a number of blocks of memory, where
each block may include, for example 64 bytes of data. Data is generally retrieved from memory in blocks.
In addition, multi-processing node 10 includes an I/O processor (IOP) module 14 for controlling transfer of data between external devices (not shown) and the multi-processor node 10 via a coupled I/O bus 14a. In one embodiment of the invention,
the I/O bus may operate according to the Peripheral Computer Interconnect (PCI) protocol. The IOP 14 includes an IOP cache 14c and an IOP tag store 14b. The IOP cache 14c provides temporary storage for data from memory 13 that is transferred to
external devices on the PCI bus 14a. The IOP tag store 14b is a 64 entry tag store for storing coherency information for data being moved between external devices, processors and memory.
The coherency of data stored in the memory 13 of the multi-processor node is maintained by means of a Duplicate Tag store (DTAG) 20. The DTAG 20 is shared by all of the processors 12a-12d, and is apportioned into 4 banks, where each bank is
dedicated to storing status information corresponding to data used by an associated one of the processors.
The DTAG, Memory and IOP are coupled to a logical bus referred to as the Arb bus 17. Memory block requests issued by the processor are routed via the local switch 15 to the Arb bus 17. The DTAG 20 and IOP 14 look up the state of the block in
the processors' and IOP's caches and atomically update their state for the memory block. The Arb bus 17 acts as a serialization point for all memory references. The order in which memory request appear on the Arb bus is the order in which processors
perceive the results of the requests.
The processor modules 12a-12d, memory modules 13a-13d and IOP module 14 are coupled together via a local, 9 port switch 15. Each of the interfacing modules 12a-12d, 13a-13d and 14 are connected to the local switch by means of a like number of
bi-directional, clock forwarded data links 16a-16i. In one embodiment, each of the data links forwards 64 bits of data and 8 bits of error correcting code (ECC) one each edge of a system clock operating at a rate of 150 MHZ. Thus, the data bandwidth of
each of the data links 16a-16i is 2.4 Gigabytes/sec.
Local switch 15 includes an Quad Switch Address control chip (QSA chip) 18 and a Quad Switch data slice chip (QSD chip) 19. QSA chip 18 includes an arbiter (QS Arb) 11 for controlling address paths between the processor modules, IOP, and memory. In addition, QSA chip 18 provides control to the QSD chip 19 to control the flow of data through the local switch 15 as described below.
QSD chip 19 provides a switch interconnect for all data paths between the processor modules, memory modules and IOP. Although not shown in FIG. 2, as will be described below, if the multi-processor node 10 were coupled to other multi-processor
nodes via a global port, the QSD and OSA would additionally provide a switch interconnect for the global port. Each of the processors may request data from one of the available resources, such as the memory devices 13a-13d, other processors 12a-12d, IOP
14 or alternatively resources in other multi-processor nodes via the global port. Thus, the local switch 15 should be able to accommodate simultaneous input from a variety of resources while maintaining the high bus bandwidth of 2.4 GBytes.
The local switch is able to handle mulitple concurrent transactions. Since each transaction typically uses multiple resources (such as memory banks, datapaths, queues), the control functions of the local switch can be very complex. For
instance, a transaction may require a memory bank to be available in stage 0 of the transaction, the datapath from memory bank to processor port be available in stage 1, and the datapath from processor port to processor be available in stage 2. The
local switch arbiter (QSA Arb 11 in the QSA 18) arbitrates among requests in such a manner that once a transaction is initiated, resources required by a transaction in each stage are available as required.
More significantly, the arbiter guarantees that all requests and processors get fair access to the resources by ensuring that particular requests do not fail to win arbitration for a long time (potentially indefinitely) while others make
progress. For instance, consider a transaction T that requires three resources A, B, and C. Transaction T may not win arbitration until all three resources are guaranteed to be available in the appropriate stages of the transaction. If the artiber
bases its decision only on the availability of resources, then it is possible that T may not succeed for a long time while other transactions wh | | |