WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Memory cancel response optionally cancelling memory controller's providing of data in response to a read operation    
United States Patent6370621   
Link to this pagehttp://www.wikipatents.com/6370621.html
Inventor(s)Keller; James B. (Palo Alto, CA)
AbstractA messaging scheme that conserves system memory bandwidth during a memory read operation in a multiprocessing computer system is described. A source processing node sends a memory read command to a target processing node to read data from a designated memory location in a system memory associated with the target processing node. The target node transmits a read response to the source node containing the requested data and also concurrently transmits a probe command to one or more of the remaining nodes in the multiprocessing computer system. In response to the probe command each remaining processing node checks whether the processing node has a cached copy of the requested data. If a processing node, other than the source and the target nodes, finds a modified cached copy of the designated memory location, that processing node responds with a memory cancel response sent to the target node and a read response sent to the source node. The read response contains the modified cache block containing the requested data, and the memory cancel response causes the target node to abort further processing of the memory read command, and to stop transmission of the read response, if the target node hasn't transmitted the read response yet. The memory cancel message thus attempts to avoid relatively lengthy and time-consuming system memory accesses when the system memory has a stale data.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6370621
Memory cancel response optionally cancelling memory controller's providing

     of data in response to a read operation - US Patent 6370621 Drawing
Memory cancel response optionally cancelling memory controller's providing of data in response to a read operation
Inventor     Keller; James B. (Palo Alto, CA)
Owner/Assignee     Advanced Micro Devices, Inc. (Sunnyvale, CA)
Patent assignment
All assignments
Publication Date     April 9, 2002
Application Number     09/217,699
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 21, 1998
US Classification    
Int'l Classification    
Examiner     Ellis; Kevin L.
Assistant Examiner     Patel; Gautam R.
Attorney/Law Firm     Merkel; Lawrence J. Conley, Rose & Tayon, PC
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     memory cancel response optionally cancelling memory controller's providing data response read operation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
6275905
Keller
711/141
Aug,2001

[0 after 0 votes]
6138218
Arimilli

Oct,2000

[0 after 0 votes]
6108737
Sharma
710/107
Aug,2000

[0 after 0 votes]
6101420
VanDoren

Aug,2000

[0 after 0 votes]
6098115
Eberhard

Aug,2000

[0 after 0 votes]
6085263
Sharma

Jul,2000

[0 after 0 votes]
6070231
Ottinger

May,2000

[0 after 0 votes]
6049851
Bryg
711/141
Apr,2000

[0 after 0 votes]
6038644
Irie
711/141
Mar,2000

[0 after 0 votes]
6018791
Arimilli

Jan,2000

[0 after 0 votes]
6012127
McDonald
711/141
Jan,2000

[0 after 0 votes]
5991819
Young

Nov,1999

[0 after 0 votes]
5927118
Minote

Jul,1999

[0 after 0 votes]
5893144
Wood

Apr,1999

[0 after 0 votes]
5887138
Hagersten
709/215
Mar,1999

[0 after 0 votes]
5878268
Hagersten
712/28
Mar,1999

[0 after 0 votes]
5859983
Heller
709/251
Jan,1999

[0 after 0 votes]
5749095
Hagersten
711/141
May,1998

[0 after 0 votes]
5673413
Deshpande
711/141
Sep,1997

[0 after 0 votes]
5659708
Arimilli
711/146
Aug,1997

[0 after 0 votes]
5560038
Haddock

Sep,1996

[0 after 0 votes]
5303362
Butts, Jr.
711/121
Apr,1994

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A multiprocessing computer system comprising:

a plurality of processing nodes interconnected through an interconnect structure, wherein said plurality of processing nodes includes:

a first processing node configured to initiate a first read operation to read data from a designated memory location;

a second processing node configured to respond to said first read operation by initiating a second read operation to read and transfer said data from said designated memory location to said first processing node; and

a third processing node configured to transmit a memory cancel response to said second processing node upon detecting within said third processing node a modified copy of said designated memory location, and wherein said memory cancel response causes said second processing node to abort further processing of said second read operation;

wherein said second processing node is configured to transfer said data read during said second read operation by transmitting a first read response to said first processing node, and wherein said memory cancel response causes said second processing node to cancel transmission of said first read response when said second processing node receives said memory cancel response prior to transmitting said first read response.

2. The multiprocessing computer system as in claim 1, wherein said interconnect structure includes a first plurality of dual-unidirectional links.

3. The multiprocessing computer system of claim 2, wherein each dual-unidirectional link in said first plurality of dual-unidirectional links interconnects a respective pair of processing nodes from said plurality of processing nodes.

4. The multiprocessing computer system according to claim 3, further comprising a plurality of I/O devices, wherein said interconnect structure further includes a second plurality of dual-unidirectional links, and wherein each of said plurality of I/O devices is coupled to a respective processing node through a corresponding one of said second plurality of dual-unidirectional links.

5. The multiprocessing computer system of claim 4, wherein each dual-unidirectional link in said first and said second plurality of dual-unidirectional links performs packetized information transfer and includes a pair of unidirectional buses comprising:

a transmission bus carrying a first plurality of binary packets; and

a receiver bus carrying a second plurality of binary packets.

6. The multiprocessing computer system of claim 5, wherein each of said plurality of processing nodes includes:

a plurality of circuit elements comprising:

a processor core,

a cache memory,

a memory controller,

a bus bridge,

a graphics logic,

a bus controller, and

a peripheral device controller; and

a plurality of interface ports, wherein each of said plurality of circuit elements is coupled to at least one of said plurality of interface ports.

7. The multiprocessing computer system according to claim 6, wherein at least one of said plurality of interface ports in said each of said plurality of processing nodes is coupled to a corresponding dual-unidirectional link selected from the group consisting of said first and said second plurality of dual-unidirectional links.

8. The multiprocessing computer system as in claim 1, further comprising:

a plurality of system memories; and

a plurality of memory buses, wherein each of said plurality of system memories is coupled to a corresponding one of said plurality of processing nodes through a respective one of said plurality of memory buses.

9. The multiprocessing computer system of claim 8, wherein each of said plurality of memory buses is bi-directional.

10. The multiprocessing computer system according to claim 8, wherein a first memory from said plurality of system memories is coupled to said second processing node, wherein said first memory includes said designated memory location, and wherein said second processing node accesses said first memory during said second read operation.

11. The multiprocessing computer system as in claim 1, wherein said second processing node is configured to respond to said first read operation by transmitting a probe command to said third processing node.

12. The multiprocessing computer system of claim 11, wherein said second processing node is configured to transmit said probe command regardless of whether a copy of said designated memory location is cached within said third processing node.

13. The multiprocessing computer system according to claim 11, wherein said probe command causes said third processing node to transmit a read response to said first processing node.

14. The multiprocessing computer system as in claim 13, wherein said read response includes a data packet containing said modified copy of said designated memory location cached within said third processing node.

15. The multiprocessing computer system as in claim 1, wherein a size of said data read during said second read operation is dependent on a type of said first read operation.

16. The multiprocessing computer system according to claim 1, wherein said first read response includes a data packet containing said data read during said second read operation.

17. The multiprocessing computer system as in claim 1, wherein said third processing node is configured to transmit a second read response concurrently with said memory cancel response, wherein said second read response is transmitted to said first processing node.

18. The multiprocessing computer system of claim 17, wherein said second read response includes a data packet containing said modified copy of said designated memory location cached within said third processing node.

19. The multiprocessing computer system according to claim 18, wherein said second processing node is configured to transmit a target done response to said first processing node upon receiving said memory cancel response from said third processing node, wherein said target done response is transmitted regardless of whether said first read response is transmitted.

20. The multiprocessing computer system as in claim 19, wherein said first processing node is configured to transmit a source done message to said second processing node upon receiving said target done response and said second read response.

21. The multiprocessing computer system of claim 20, wherein said source done message signifies completion of said first read operation according to a predetermined data transfer protocol and allows said second processing node to respond to a subsequent data transfer operation involving said designated memory location.

22. In a multiprocessing computer system comprising a plurality of processing nodes interconnected through an interconnect structure, wherein said plurality of processing nodes includes a first processing node, a second processing node, and a third processing node, a method for selectively reading a content of a memory location in a memory associated with said second processing node, said method comprising:

initiating a first read operation by said first processing node to read said content of said memory location;

further initiating a second read operation by said second processing node to respond to said first read operation, wherein said second processing node reads and transfers said content of said memory location to said first processing node during said second read operation, the second read operation including a first read response from said second processing node to said first processing node, wherein said first read response includes a first data packet for said content of said memory location;

said third processing node transmitting a memory cancel response to said second processing node upon detecting within said third processing node a modified copy of said memory location;

said memory cancel response causing said second processing node to abort further processing of said second read operation; and

said memory cancel response causing said second processing node to cancel transmission of said first read response when said second processing node receives said memory cancel response prior to said transmitting said first read response.

23. The method as in claim 22, wherein a size of said first data packet is dependent on a type of said first read operation.

24. The method according to claim 22, further comprising:

said third processing node transmitting a second read response concurrently with said memory cancel response, wherein said second read response is transmitted to said first processing node.

25. The method as in claim 24, wherein said second read response includes a second data packet containing said modified copy of said memory location cached within said third processing node.

26. The method of claim 25, further comprising:

said second processing node transmitting a target done response to said first processing node upon receiving said memory cancel response from said third processing node, wherein said target done response is transmitted regardless of whether said first read response is transmitted.

27. The method according to claim 26, further comprising:

said first processing node transmitting a source done message to said second processing node upon receiving said target done response and said second read response.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the invention

The present invention broadly relates to computer systems, and more particularly, to a messaging scheme to accomplish cache-coherent data transfers in a multiprocessing computing environment.

2. Description of the Related Art

Generally, personal computers (PCs) and other types of computer systems have been designed around a shared bus system for accessing memory. One or more processors and one or more input/output (I/O) devices are coupled to memory through the shared bus. The I/O devices may be coupled to the shared bus through an I/O bridge, which manages the transfer of information between the shared bus and the I/O devices. The processors are typically coupled directly to the shared bus or through a cache hierarchy.

Unfortunately, shared bus systems suffer from several drawbacks. For example, since there are multiple devices attached to the shared bus, the bus is typically operated at a relatively low frequency. Further, system memory read and write cycles through the shared system bus take substantially longer than information transfers involving a cache within a processor or involving two or more processors. Another disadvantage of the shared bus system is a lack of scalability to larger number of devices. As mentioned above, the amount of bandwidth is fixed (and may decrease if adding additional devices reduces the operable frequency of the bus). Once the bandwidth requirements of the devices attached to the bus (either directly or indirectly) exceeds the available bandwidth of the bus, devices will frequently be stalled when attempting to access the bus. Overall performance may be decreased unless a mechanism is provided to conserve the limited system memory bandwidth.

A read or a write operation addressed to a non-cache system memory takes more processor clock cycles than similar operations between two processors or between a processor and its internal cache. The limitations on bus bandwidth, coupled with the lengthy access time to read or write to a system memory, negatively affect the computer system performance.

One or more of the above problems may be addressed using a distributed memory system. A computer system employing a distributed memory system includes multiple nodes. Two or more of the nodes are connected to memory, and the nodes are interconnected using any suitable interconnect. For example, each node may be connected to each other node using dedicated lines. Alternatively, each node may connect to a fixed number of other nodes, and transactions may be routed from a first node to a second node to which the first node is not directly connected via one or more intermediate nodes. The memory address space is assigned across the memories in each node.

Nodes may additionally include one or more processors. The processors typically include caches that store cache blocks of data read from the memories. Furthermore, a node may include one or more caches external to the processors. Since the processors and/or nodes may be storing cache blocks accessed by other nodes, a mechanism for maintaining coherency within the nodes is desired.

SUMMARY OF THE INVENTION

The problems outlined above are in large part solved by a computer system as described herein. The computer system may include multiple processing nodes, two or more of which may be coupled to separate memories which may form a distributed memory system. The processing nodes may include caches, and the computer system may maintain coherency between the caches and the distributed memory system.

In one embodiment, the present invention relates to a multiprocessing computer system where the processing nodes are interconnected through a plurality of dual unidirectional links. Each pair of unidirectional links forms a coherent link structure that connects only two of the processing nodes. One unidirectional link in the pair of links sends signals from a first processing node to a second processing node connected through that pair of unidirectional links. The other unidirectional link in the pair carries a reverse flow of signals, i.e. it sends signals from the second processing node to the first processing node. Thus, each unidirectional link forms as a point-to-point interconnect that is designed for packetized information transfer. Communication between two processing nodes may be routed through more than one remaining nodes in the system.

Each processing node may be coupled to a respective system memory through a memory bus. The memory bus may be bidirectional. Each processing node comprises at least one processor core and may optionally include a memory controller for communicating with the respective system memory. Other interface logic may be included in one or more processing nodes to allow connectivity with various I/O devices through one or more I/O bridges.

In one embodiment, one or more I/O bridges may be coupled to their respective processing nodes through a set of non-coherent dual unidirectional links. These I/O bridges communicate with their host processors through this set of non-coherent dual unidirectional links in much the same way as two directly-linked processors communicate with each other through a coherent dual unidirectional link.

In one embodiment, when a first processing node sends a read command to a second processing node to read data from a designated memory location associated with the second processing node, the second processing node responsively transmits a probe command to all the remaining processing nodes in the system. The probe command is transmitted regardless of whether one or more of the remaining nodes have a copy of the data cached in their respective cache memories. Each processing node that has a cached copy of the designated memory location updates its cache tag associated with that cached data to reflect the current status of the data. Each processing node that receives a probe command sends, in return, a probe response indicating whether that processing node has a cached copy of the data. In the event that a processing node has a cached copy of the designated memory location, the probe response from that processing node further includes the state of the cached data--i.e. modified, shared etc.

The target processing node, i.e. the second processing node, sends a read response to the source processing node, i.e. the first processing node. This read response contains the data requested by the source node through the read command. The first processing node acknowledges receipt of the data by transmitting a source done response to the second processing node. When the second processing node receives the source done response it removes the read command (received from the first processing node) from its command buffer queue. The second processing node may, at that point, start to respond to a command to the same designated memory location. This sequence of messaging is one step in maintaining cache-coherent system memory reads in a multiprocessing computer system. The data read from the designated memory location may be less than the whole cache block in size if the read command specifies so.

Upon receiving the probe command, all of the remaining nodes check the status of the cached copy, if any, of the designated memory location as described before. In the event that a processing node, other than the source and the target nodes, finds a cached copy of the designated memory location that is in a modified state, that processing node responds with a memory cancel response sent to the target node, i.e. the second processing node. This memory cancel response causes the second processing node to abort further processing of the read command, and to stop transmission of the read response, if it hasn't sent the read response yet. All the other remaining processing nodes still send their probe responses to the first processing node. The processing node that has the modified cached data sends that modified data to the first processing node through its own read response. The messaging scheme involving probe responses and read responses thus maintains cache coherency during a system memory read operation.

The memory cancel response further causes the second processing node to transmit a target done response to the first processing node regardless of whether it earlier sent the read response to the first processing node. The first processing node waits for all the responses to arrive--i.e. the probe responses, the target done response, and the read response from the processing node having the modified cached data--prior to completing the data read cycle by sending a source done response to the second processing node. In this embodiment, the memory cancel response conserves system memory bandwidth by causing the second processing node to abort time-consuming memory read operation when a modified copy of the requested data is cached at a different processing node. Reduced data transfer latencies are thus achieved when it is observed that a data transfer between two processing nodes over the high-speed dual unidirectional link is substantially faster than a similar data transfer between a processing node and a system memory that involves a relatively slow speed system memory bus.

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the present invention can be obtained when the following detailed description of the preferred embodiment is considered in conjunction with the following drawings, in which:

FIG. 1 is a block diagram of one embodiment of a computer system.

FIG. 2 shows in detail one embodiment of the interconnect between a pair of processing nodes from FIG. 1.

FIG. 3 is a block diagram of one embodiment of an information packet.

FIG. 4 is a block diagram of one embodiment of an address packet.

FIG. 5 is a block diagram of one embodiment of a response packet.

FIG. 6 is a block diagram of one embodiment of a command packet.

FIG. 7 is a block diagram of one embodiment of a data packet.

FIG. 8 is a table illustrating exemplary packet types that may be employed in the computer system of FIG. 1.

FIG. 9 is a diagram illustrating an example flow of packets corresponding to a memory read operation.

FIG. 10A is a block diagram of one embodiment of a probe command packet.

FIG. 10B is a block diagram for one embodiment of the encoding for the NextState field in the probe command packet of FIG. 10A.

FIG. 11A is a block diagram of one embodiment of a read response packet.

FIG. 11B shows in one embodiment the relationship between the Probe, Tgt and Type fields of the read response packet of FIG. 11A.

FIG. 12 is a block diagram of one embodiment of a probe response packet.

FIG. 13 is a diagram illustrating an example flow of packets involving a memory cancel response.

FIG. 14 is a diagram illustrating an example flow of packets showing a messaging scheme that combines probe commands and memory cancel response.

FIG. 15 is an exemplary flowchart for the transactions involved in a memory read operation.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

Turning now to FIG. 1, one embodiment of a multiprocessing computer system 10 is shown. In the embodiment of FIG. 1, computer system 10 includes several processing nodes 12A, 12B, 12C, and 12D. Each processing node is coupled to a respective memory 14A-14D via a memory controller 16A-16D included within each respective processing node 12A-12D. Additionally, processing nodes 12A-12D include one or more interface ports 18, also known as interface logic, to communicate among the processing nodes 12A-12D, and to also communicate between a processing node and a corresponding I/O bridge. For example, processing node 12A includes interface logic 18A for communicating with processing node 12B, interface logic 18B for communicating with processing node 12C, and a third interface logic 18C for communicating with yet another processing node (not shown). Similarly, processing node 12B includes interface logic 18D, 18E, and 18F; processing node 12C includes interface logic 18G, 18H, and 181; and processing node 12D includes interface logic 18J, 18K, and 18L. Processing node 12D is coupled to communicate with an I/O bridge 20 via interface logic 18L. Other processing nodes may communicate with other I/O bridges in a similar fashion. I/O bridge 20 is coupled to an I/O bus 22.

The interface structure that interconnects processing nodes 12A-12D includes a set of dual-unidirectional links. Each dual-unidirectional link is implemented as a pair of packet-based unidirectional links to accomplish high-speed packetized information transfer between any two processing nodes in the computer system 10. Each unidirectional link may be viewed as a pipelined, split-transaction interconnect. Each unidirectional link 24 includes a set of coherent unidirectional lines. Thus, each pair of unidirectional links may be viewed as comprising one transmission bus carrying a first plurality of binary packets and one receiver bus carrying a second plurality of binary packets. The content of a binary packet will primarily depend on the type of operation being requested and the processing node initiating the operation. One example of a dualunidirectional link structure is links 24A and 24B. The unidirectional lines 24A are used to transmit packets from processing node 12A to processing node 12B and lines 24B are used to transmit packets from processing node 12B to processing node 12A. Other sets of lines 24C-24H are used to transmit packets between their corresponding processing nodes as illustrated in FIG. 1.

A similar dual-unidirectional link structure may be used to interconnect a processing node and its corresponding I/O device, or a graphic device or an I/O bridge as is shown with respect to the processing node 12D. A dual-unidirectional link may be operated in a cache coherent fashion for communication between processing nodes or in a non-coherent fashion for communication between a processing node and an external I/O or graphic device or an I/O bridge. It is noted that a packet to be transmitted from one processing node to another may pass through one or more remaining nodes. For example, a packet transmitted by processing node 12A to processing node 12D may pass through either processing node 12B or processing node 12C in the arrangement of FIG. 1. Any suitable routing algorithm may be used. Other embodiments of computer system 10 may include more or fewer processing nodes than those shown in FIG. 1.

Processing nodes 12A-12D, in addition to a memory controller and interface logic, may include other circuit elements such as one or more processor cores, an internal cache memory, a bus bridge, a graphics logic, a bus controller, a peripheral device controller, etc. Broadly speaking, a processing node comprises at least one processor and may optionally include a memory controller for communicating with a memory and other logic as desired. Further, each circuit element in a processing node may be coupled to one or more interface ports depending on the functionality being performed by the processing node. For example, some circuit elements may only couple to the interface logic that connects an I/O bridge to the processing node, some other circuit elements may only couple to the interface logic that connects two processing nodes, etc. Other combinations may be easily implemented as desired.

Memories 14A-14D may comprise any suitable memory devices. For example, a memory 14A-14D may comprise one or more RAMBUS DRAMs (RDRAMs), synchronous DRAMs (SDRAMs), static RAM, etc. The memory address space of the computer system 10 is divided among memories 14A-14D. Each processing node 12A-12D may include a memory map used to determine which addresses are mapped to which memories 14A-14D, and hence to which processing node 12A-12D a memory request for a particular address should be routed. In one embodiment, the coherency point for an address within computer system 10 is the memory controller 16A-16D coupled to the memory that is storing the bytes corresponding to the address. In other words, the memory controller 16A-16D is responsible for ensuring that each memory access to the corresponding memory 14A-14D occurs in a cache coherent fashion. Memory controllers 16A-16D may comprise control circuitry for interfacing to memories 14A-14D. Additionally, memory controllers 16A-16D may include request queues for queuing memory requests.

Generally, interface logic 18A-18L may comprise a variety of buffers for receiving packets from one unidirectional link and for buffering packets to be transmitted upon another unidirectional link. Computer system 10 may employ any suitable flow control mechanism for transmitting packets. For example, in one embodiment, each transmitting interface logic 18 stores a count of the number of each type of buffers within the receiving interface logic at the other end of the link to which the transmitting interface logic is connected. The interface logic does not transmit a packet unless the receiving interface logic has a free buffer to store the packet. As a receiving buffer is freed by routing a packet onward, the receiving interface logic transmits a message to the sending interface logic to indicate that the buffer has been freed. Such a mechanism may be referred to as a "coupon-based" system.

Turning next to FIG. 2, a block diagram illustrating processing nodes 12A and 12B is shown to illustrate in more detail one embodiment of the dual unidirectional link structure connecting the processing nodes 12A and 12B. In the embodiment of FIG. 2, lines 24A (the unidirectional link 24A) include a clock line 24AA, a control line 24AB, and a command/address/data bus 24AC. Similarly, lines 24B (the unidirectional link 24B) include a clock line 24BA, a control line 24BB, and a command/address/data bus 24BC.

A clock line transmits a clock signal that indicates a sample point for its corresponding control line and the command/address/data bus. In one particular embodiment, data/control bits are transmitted on each edge (i.e. rising edge and falling edge) of the clock signal. Accordingly, two data bits per line may be transmitted per clock cycle. The amount of time employed to transmit one bit per line is referred to herein as a "bit time". The above-mentioned embodiment includes two bit times per clock cycle. A packet may be transmitted across two or more bit times. Multiple clock lines may be used depending upon the width of the command/address/data bus. For example, two clock lines may be used for a 32 bit command/address/data bus (with one half of the command/address/data bus referenced to one of the clock lines and the other half of the command/address/data bus and the control line referenced to the other one of the clock lines.

The control line indicates whether or not the data transmitted upon the command/address/data bus is either a bit time of a control packet or a bit time of a data packet. The control line is asserted to indicate a control packet, and deasserted to indicate a data packet. Certain control packets indicate that a data packet follows. The data packet may immediately follow the corresponding control packet. In one embodiment, other control packets may interrupt the transmission of a data packet. Such an interruption may be performed by asserting the control line for a number of bit times during transmission of the data packet and transmitting the bit times of the control packet while the control line is asserted. Control packets that interrupt a data packet may not indicate that a data packet will be following.

The command/address/data bus comprises a set of lines for transmitting the data, command, response and address bits. In one embodiment, the command/address/data bus may comprise 8, 16, or 32 lines. Each processing node or I/O bridge may employ any one of the supported numbers of lines according to design choice. Other embodiments may support other sizes of command/address/data bus as desired.

According to one embodiment, the command/address/data bus lines and the clock line may carry inverted data (i.e. a logical one is represented as a low voltage on the line, and a log