WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for enhancing the fault-tolerance of a network    
United States Patent5592610   
Link to this pagehttp://www.wikipatents.com/5592610.html
Inventor(s)Chittor; Suresh S. (Beaverton, OR)
AbstractA method and apparatus for enhancing the fault-tolerance of a network finds a set of computing nodes within the network which are available for use in the network upon detection of a faulty component. This set of available computing nodes is found by first determining a set of computing nodes within the network which are physically connected together. A connectivity value for each computing node within this set is then determined. A subset of this set is then generated such that each computing node in the subset is able to transfer data to and from each other computing node in the subset. This subset is then utilized as the set of available computing nodes. In one embodiment, the set of computing nodes which are physically connected together is the largest set of physically connected computing nodes in the system.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5592610
Method and apparatus for enhancing the fault-tolerance of a network - US Patent 5592610 Drawing
Method and apparatus for enhancing the fault-tolerance of a network
Inventor     Chittor; Suresh S. (Beaverton, OR)
Owner/Assignee     Intel Corporation (Santa Clara, CA)
Patent assignment
All assignments
Publication Date     January 7, 1997
Application Number     08/363,353
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     December 21, 1994
US Classification     714/4 714/43
Int'l Classification     G06F 011/34
Examiner     Beausoliel Jr.; Robert W.
Assistant Examiner     D ecady; Albert
Attorney/Law Firm     Blakely, Sokoloff Taylor & Zafman
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/183.19 395/182.02 395/183.17 395/183.2 395/185.04 395/180 395/181 395/182.01 395/183.19 370/85.14 370/85.13 370/85.4 370/60 370/94.1 370/94.3 364/268.9
Patent Tags     enhancing fault-tolerance network
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5218676
Ben-Ayed

Jun,1993

[0 after 0 votes]
5151900
Snyder
370/400
Sep,1992

[0 after 0 votes]
5138615
Lamport
370/400
Aug,1992

[0 after 0 votes]
5105424
Flaig
709/243
Apr,1992

[0 after 0 votes]
5058105
Mansour
370/228
Oct,1991

[0 after 0 votes]
5008882
Peterson

Apr,1991

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for finding a set of available computing nodes in a network upon detection of a faulty network component, the method comprising the steps of:

(a) determining a set of nodes in the network which are physically coupled together;

(b) for a first node of the set of nodes determining a connectivity value which indicates a number of nodes of the set of nodes to which the first node has a deadlock-free path;

(c) repeating the determining step (b) for each node of the set of nodes;

(d) determining a subset of the set of nodes such that each node in the subset can transfer data to and from each other node in the subset, wherein the determining of the subset is based on the connectivity value for each node of the set of nodes; and

(e) using the subset of the set of nodes as the set of available computing nodes.

2. The method of claim 1, wherein the determining step (a) comprises determining a largest set of nodes in the network which are physically coupled together, excluding the faulty network component.

3. The method of claim 1, wherein the network supports a plurality of virtual networks, and wherein the determining step (b) comprises determining the connectivity value using the plurality of virtual networks.

4. The method of claim 3, wherein the determining step (d) comprises a step of checking whether a deadlock-free path exists between each node in the subset using a different deadlock-free routing scheme for each of the plurality of virtual networks.

5. The method of claim 1, wherein the determining step (d) comprises a step of checking whether a deadlock-free path exists between each node in the subset.

6. The method of claim 1, wherein the determining step (d) comprises a step of removing nodes from the set of nodes until a deadlock-free path exists between each node remaining in the subset of nodes.

7. The method of claim 6, wherein the determining step (d) further comprises removing a second node prior to removing a third node, wherein the second node has a lower connectivity value than the third node.

8. The method of claim 6, wherein the determining step (d) further comprises adding a second node to the subset of nodes, wherein the second node is one of the set of nodes, and wherein the second node was previously removed from the subset of nodes.

9. An apparatus in a network comprising:

a memory unit;

a network interface unit coupled to the memory unit; and

a processing unit coupled to the memory unit and the network interface unit which, upon discovery of a faulty network component, determines a set of nodes in the network which the apparatus is physically coupled to and determines a number of nodes of the set of nodes to which the apparatus has a deadlock-free path, wherein the processing unit also determines a subset of the set of nodes such that each node in the subset can transfer data to and from each other node in the subset, and wherein the processing unit uses the subset of the set of nodes as a set of available network nodes.

10. The apparatus of claim 9, wherein the processing unit also determines, for each node of the set of nodes, a number of nodes of the set of nodes to which each node has a deadlock-free path.

11. The apparatus of claim 9, wherein the network supports a plurality of virtual networks and wherein the processing unit determines the number of nodes to which the apparatus has a deadlock-free path using the plurality of virtual networks.

12. The apparatus of claim 9, wherein the processing unit selects the deadlock-free path between the apparatus and each node of the subset of nodes and stores the path in the memory unit.

13. The apparatus of claim 9, wherein the apparatus is one of the set of nodes.

14. A network system comprising:

a plurality of routing devices;

a plurality of communication links coupled to the plurality of routing devices; and

a plurality of computing nodes coupled to the plurality of routing devices, wherein each node of the plurality of computing nodes determines, upon discovery of a faulty network component, a set of nodes in the network system which are physically coupled together and determines a number of nodes of the set of nodes to which each node has a deadlock-free path, wherein each of the plurality of computing nodes also determines a subset of the set of nodes such that each node in the subset can transfer data to and from each other node in the subset, and wherein each node of the subset of nodes uses the subset of the set of nodes as a set of available nodes.

15. The system of claim 14, wherein the plurality of routing devices are coupled together in a two-dimensional mesh network configuration.

16. The system of claim 14, wherein each of the plurality of computing nodes is connected to one routing device of the plurality of routing devices.

17. The system of claim 14, wherein each routing device of the plurality of routing devices comprises a plurality of input queues for each of the plurality of communication links input to each routing device.

18. The system of claim 14, wherein each of the plurality of routing devices supports a plurality of virtual networks.

19. An apparatus for finding a set of available nodes in a network upon detection of a faulty network component comprising:

means for determining a largest set of nodes in the network which are physically coupled together;

for each node of the largest set of nodes, means for determining a connectivity value which indicates a number of nodes of the largest set of nodes to which a particular node has a deadlock-free path;

means for determining a subset of the largest set of nodes such that each node in the subset can transfer data to and from each other node in the subset, wherein the means for determining a subset determines the subset based on the connectivity value for each node of the largest set of nodes; and

means for using the subset of the largest set of nodes as the set of available nodes.

20. The apparatus of claim 19, wherein the network supports a plurality of virtual networks, and wherein the means for determining the connectivity value determines the connectivity value using the plurality of virtual networks.

21. The apparatus of claim 20, wherein the means for determining a subset checks whether a deadlock-free path exists between each node in the subset using a different deadlock-free routing scheme for each of the plurality of virtual networks.

22. The apparatus of claim 19, wherein the means for determining a subset removes nodes from the largest set of nodes until a deadlock-free path exists between each node remaining in the subset of nodes.

23. The apparatus of claim 22, wherein the means for determining a subset also removes a first node prior to removing a second node, wherein the first node has a lower connectivity value than the second node.

24. The apparatus of claim 22, wherein the means for determining a subset also adds a first node to the subset of nodes, wherein the first node is one of the largest set of nodes, and wherein the first node was previously removed from the subset of nodes.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention pertains to computer networks. More particularly, this invention relates to improving the ability of a network to route around faulty components.

2. Background

Modern computer technology is advancing at a very fast rate and has resulted in high-performance computing components being made available in smaller and smaller packages. These small, high-performance components are finding expanded uses in a wide range of personal, business and academic fields.

One use of these high-performance components is in network systems. In a network system, multiple processing units are coupled together to perform various programmed tasks. For example, the processing units may be networked together as a local area network (LAN) in an office building to allow individuals with personal computer systems in the building to communicate with one another. Such network systems are beneficial to users because they allow the users to communicate with each other, such as by electronic mail or transferring data files between one another. Or, by way of another example, a "supercomputer" may contain multiple processing units which are coupled together via a high-performance network and which operate together to perform various programmed tasks. These supercomputers are beneficial to users because they provide an extremely fast, powerful and cost-effective system to carry out users' requests.

However, one disadvantage of network systems is that the greater the number of components in the system, the greater the chances that a component will become faulty during system operation. A network with thousands of components has a relatively low mean time between failure for the system components. That is, there is a relatively high probability that one component within the network will fail within a given period of time (for example, one failure per week). In order to be useful to the user(s), the network should be able to resolve these component failures. A system which shuts itself down upon detecting a faulty component and cannot re-start until the component is repaired or replaced reduces the availability of the system and increases the inconvenience to the users. Thus, it would be beneficial to provide a system which is able to automatically bypass faulty network components.

Furthermore, many users have neither the expertise nor the desire to resolve a component failure in the network by indicating to the network how to route around the faulty component. Many users do not have the technical expertise required to perform such a correction. Furthermore, performing such a correction could be very time-consuming, and distracts the user from his or her other responsibilities. Thus, it would be beneficial to provide a system which resolves the failure of a component in a manner which is transparent to the system user(s).

In addition, depending on the layout of a network, a faulty component could cut off multiple good components from the remainder of the network. Depending on the type of network, this could mean that some personal computers could not communicate with others, or that certain processing units would not be available to the system user, even though they are in good working condition. Thus, it would be beneficial to provide a system which reduces the number of good components which are disconnected from the remainder of the system by a faulty component.

Additionally, network systems should effectively resolve "deadlock" situations. A deadlock situation occurs when one or more components within the network cannot advance in their operation due to resources within the system which the component(s) requires being unavailable. The occurrence of a deadlock situation is dependent on the routing technique utilized in the system. In one routing technique, referred to as "circuit switching," a source node sends control information for a packet through its intended path to a destination node in the network to reserve each link in the path. Once the entire path is reserved, the source node transfers the data along the reserved path to the destination node. In another routing technique, referred to as "wormhole routing," the source node sends the necessary control information through its intended path to the destination node, followed immediately by the data. That is, the source node does not wait for the entire path to be reserved prior to beginning transfer of the data. In both of these routing techniques, the data packet maintains reservation of portions of the path already reserved while waiting for subsequent portions to be reserved. Thus, a deadlock situation may arise when, for example, two or more source nodes are attempting to transfer data to one or more destination nodes and none can advance because the other is blocking a portion of the data path required by the other. Thus, in order to provide continued performance of a network system, such deadlock issues need to be resolved.

The present invention provides for these and other advantageous results.

SUMMARY OF THE INVENTION

A method and apparatus for enhancing the fault-tolerance of a network is described herein. The present invention finds a set of computing nodes within the network which are available for use in the network upon detection of a faulty component. The present invention finds this set of available computing nodes by first determining a set of computing nodes within the network which are physically connected together. The present invention then determines a connectivity value for each computing node within this set. A subset of this set is then generated such that each computing node in the subset is able to transfer data to and from each other computing node in the subset. This subset is then utilized as the set of available computing nodes. In one embodiment, the set of computing nodes which are physically connected together is the largest set of physically connected computing nodes in the system.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1A is a network system according to one embodiment of the present invention;

FIG. 1B is a network system according to an alternate embodiment of the present invention;

FIG. 2 shows a router coupled to a computing node according to one embodiment of the present invention;

FIG. 3 shows a packet of data according to one embodiment of the present invention;

FIG. 4 shows a routing device according to one embodiment of the present invention;

FIG. 5 is a flowchart showing the steps followed in the present invention in routing packets of data along paths through a network system; and

FIG. 6 shows the steps followed by one embodiment of the present invention in establishing a set of nodes which have a deadlock-free path between each other.

DETAILED DESCRIPTION

In the following detailed description numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present invention.

Some portions of the detailed descriptions which follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

FIG. 1A shows a network system according to one embodiment of the present invention. A two-dimensional mesh network 100 is shown including multiple routing devices 102, also referred to as routers. Each router 102 is coupled to the two, three, or four routers adjacent to the router in the matrix, depending on its location in the matrix as shown. A network 100 may have any number of routing devices. In one embodiment, network 100 includes 16 such routing devices organized in a four-by-four grid, creating a 4-ary 2-dimensional network.

Each router 102 is coupled to its adjacent routers 102 via a bi-directional communication link 105. Communication link 105 can be any of a wide variety of conventional communication devices. In one embodiment, communication link 105 is a set of wires or other signal transmission medium via which signals issued by a source router propagate to a destination router.

It should be noted that the present invention is not limited to two-dimensional mesh networks as shown FIG. 1A. Routers may be coupled together in a k-ary n-dimensional network, where k and n are any number greater than or equal to one. For example, a three-dimensional mesh-based network may be utilized in which each router is coupled to three, four, five or six other routers, depending on their location within the network. Alternatively, the routers may be connected in a torus network 150, as shown in FIG. 1B. In a torus network, the routers on the ends of the network are directly coupled to the routers on the opposing ends of the network; thus, each router is directly coupled to four other routers. For example, router 152 is directly coupled to routers 154, 156, 158 and 160. In an alternate embodiment, the routers may be connected in a partial torus network, such that the routers on only two of the ends of the network are directly coupled together, thereby resulting in each router being directly coupled to three or four other routers depending on its location within the network.

Furthermore, it will be appreciated that the network systems shown in FIGS. 1A and 1B represent a wide variety of computer networks. For example, the network could be a mesh-based interprocessor communication network utilized to couple computing nodes together in a supercomputer. Alternatively, the network could be a LAN which couples multiple personal computers together, such as multiple file or video servers.

In one embodiment of the present invention, each router 102 is coupled to a computing node 200 as shown in FIG. 2. A computing node 200 is shown comprising a bus or other communication device 210 for communicating information between one or more processors 215 and 220 for processing information and instructions. In one implementation, the present invention includes Intel.RTM. architecture microprocessors as processors 215 and 220; however, the present invention may utilize any type of microprocessor architecture. In one embodiment, bus 210 includes address, data and control buses. The system also includes random access memory (RAM) 225 coupled with bus 210 for storing information and instructions for the processors 215 and 220, a read only memory (ROM) 230 coupled with the bus 210 for storing static information and instructions for the processors 215 and 220, mass storage device 235 such as a magnetic or optical disk and disk drive coupled with the bus 210 for storing information and instructions, and input/output (I/O) devices 240 coupled with the bus 210 which input and output data and control information to and from the processors 215 and 220. I/O devices 240 include, for example, a display device, an alphanumeric input device including alphanumeric and function keys, and a cursor control device. A hard copy device such as a plotter or printer may also be included in I/O devices 240 for providing a visual representation of computer images.

A network interface unit 205 is also coupled with the bus 210 for allowing the node 200 to communicate with the router 102. In an alternate embodiment, network interface unit 205 is coupled to a separate I/O bus, such as a Peripheral Component Interconnect (PCI) bus, which is coupled to bus 21 0 via a bus bridge. In another alternate embodiment, network interface unit 205 is included as part of I/0 devices 240. The network interface unit 205 operates in a conventional manner to transfer information to and from a router 102.

In one embodiment, the method of the present invention is implemented as a series of software routines that are run by the processors 215 and 220 of the computing nodes in the system. These software routines interact with the network to establish paths around faulty components. It will be appreciated by those skilled in the art, however, that in an alternative embodiment, the present invention may be implemented in discrete hardware or firmware.

It will be appreciated that certain implementations of the present invention may include additional processors or other components. Furthermore, certain implementations of the present invention may not require nor include all of the above components. For example, processor 220 or a display device may not be coupled to bus 210.

Returning to FIG. 1 A, the present invention routes packets of data from source nodes to destination nodes in the network 100 utilizing "custom routing". A packet of data, as referred to herein, is the data which is being transferred from the source node to the destination node. Each packet may be of any size, typically ranging from a few bytes to several megabytes. In custom routing, the source node determines which path through the network 100 to utilize for transferring packets of data to a particular destination node. The path of a packet of data refers to the routers and links the packet travels through between the source and destination nodes. The path of the packet consists of one or more pathway segments. For example, assuming every router in FIG. 1A is coupled to a computing node, a path in the network of FIG. 1A exists where node A is the source node transferring information to the destination node E. The path node A selects includes pathway segments between node A and node B, node B and node C, node C and node D, and node D and node E.

In one embodiment of the present invention, custom routing is implemented utilizing multiple header blocks as shown in FIG. 3. In this embodiment, each router in the network is assigned an identifier, such as a unique identification number. The packet 300 is generated by the source node and includes multiple header blocks which contain the necessary control information to indicate to the routers the path of the packet through network 100. The packet 300 shown includes four header blocks 310, 320, 330 and 340. Header block 310 is the first header block and indicates that the source node is the computing node coupled to router A and the first pathway segment is from router A to router B. The subsequent header blocks 320, 330 and 340 indicate pathway segments from router B to router C, router C to router D, and router D to router E, respectively. It should be noted that the header blocks in packet 300 indicate where the path through network 100 should begin, end, and the destination routers for pathway segments. For example, in FIG. 1A additional routers may exist between routers A, B, C, D and E. However, since those additional routers are not indicated as destination routers, the path of the packet continues in the same direction at these additional routers.

The header blocks 310-340 contain control information which indicate the proper path through the network 100. Following the header blocks 310-340, the packet 300 includes the data 350 and the tail 360. The data, as discussed above, can be bytes or megabytes. The tail 360 indicates the end of packet 300. Packets of data are transferred through the network 100 in units referred to as "flits". In one implementation of the present invention, each flit is 64 bits; however, a flit could contain any number of bits. In one embodiment of the present invention, each one of the links 105 shown in FIG. 1A is a 16-bit bus or other communication means. In an alternate embodiment, each one of the links 105 is capable of transferring an entire flit between two routers concurrently. For example, if a flit is 64 bits then each one of the links is a 64-bit bus or other communication means. In one embodiment of the present invention, each one of the header blocks 310-340 and the tail 360 is a single flit. The data portion 350 typically comprises multiple flits and may comprise any number of flits. In one implementation, data portion 350 typically comprises 4 k bytes, or 512 64-bit flits. In an alternate implementation, data portion 350 typically comprises four bytes, or four 64-bit flits.

In one embodiment of the present invention, custom routing is implemented using a "wormhole routing" technique. In wormhole routing, when a packet of data begins transfer along a path, each portion of the path it begins on is reserved for that packet until the packet tail 360 is received. That is, when router A begins the transfer to router B, router A determines whether the link between router A and the next router is available. If the link is not available, then the router A waits to begin transmission of the packet until the link is available. Once the link is available, router A transfers the first flit, header block 310, from router A to the next router. Note that this next router may be router B, or may be an additional router between router A and router B. Router A then holds that portion of the path to the next router for this packet of data and continues to transfer subsequent data in the packet to the next router until the tail 360 is transferred. It should be noted, however, that other packets may also be transferred over this same physical link utilizing "virtual channels," as discussed in more detail below.

Each time a router receives data for a new packet, the router checks whether the next link in the path is available, and begins transfer along that link once it is available. The proper link to transfer the data to is indicated in the header blocks. In one implementation, when a router receives a new header block it checks whether it is the destination router for that block. If it is not the destination router, then the router transfers the header block and all subsequent flits in the packet in the same direction within the network. However, if the router is the destination router indicated by the first header block, it discards the first header block and checks the subsequent flit (that is, the second header block) to determine the next pathway segment. Once it determines the next pathway segment, the router transfers the second header block and all subsequent flits to the next router in the path.

Once the destination node is reached, the first flit received by the destination router is the last header block. The second flit received by the destination router is the first flit of data 350. In one implementation, this first flit of data 350 is a flit of control information indicating it is the first flit of data. Alternatively, the destination router could know the flit is data and not a header block because the control information included within each header block is not contained within the data flit, or the last header block may contain control information indicating it is the last header block. Thus, the destination router knows that the computing node coupled to the router is the proper destination for the subsequent flits, and transfers the subsequent flits in the packet to the computing node rather than to another router.

FIG. 4 shows a routing device according to one embodiment of the present invention in more detail. The router 400 has four input links 401,402, 403 and 404, and four output links 411,412, 413 and 414. Each of the four input links and the four output links represent one of the links 105 shown in FIG. 1A. Thus, the router 400 shown in FIG. 4 is coupled to four other routers 102 of FIG. 1A. The router 400 also includes a crossbar 41 6 which transfers data received on an input link 401,402, 403 or 404 to the appropriate output link 411,412, 413 or 414. It will be appreciated that the number of input links and output links which are directly coupled to a router is dependent on the topology of the network and that router's location in the network, as shown in FIGS. 1A and 1B.

Each of the four input links 401-404 is coupled to four different inbound queues 418a, 418b, 418c and 418d. The inbound queues 418a-418d are temporary storage facilities in router 400 for incoming data. In one embodiment of the present invention, each one of the inbound queues 418a-418d is a 16-flit first in-first out (FIFO) buffer. Therefore, each of the inbound queues 418a-418d can be smaller than the size of a packet being routed through the network.

Each one of the queues 418a-418d represents a different "virtual channel" of the router. That is, even though the router 400 is coupled to only the single input link 401, router 400 can temporarily store data before passing it on to the next router via one of the output links 411-414. Thus, even though a single physical network exists, the router 400 is able to temporarily store data from four separate packets. Thus, router 400 supports four "virtual channels," because the router is able to support four separate data paths utilizing its temporary storage capabilities. Each one of these four virtual channels corresponds to one of the inbound queues 418a-418d.

In one embodiment of the present invention, these virtual channels are configured as static virtual networks. A static virtual network is a network in which paths between source and destination nodes utilize the same numbered virtual channels. For example, if two virtual networks are supported by the system, then a path between source and destination nodes is entirely in the first network or entirely in the second network (however, note that two separate paths, one in each virtual network, could exist). Thus, packets which router 400 receives on a particular virtual network and transfers to input queue 418a (for example, virtual channel A) will be transferred to virtual channel A in each of the other routers in the system which receive the packet.

In an alternate embodiment of the present invention, static virtual networks are not required. For example, a particular source node may indicate that the path to a destination node should utilize a first virtual channel for the first pathway segment and a second virtual channel for the second pathway segment. In this embodiment, the channel selecting logic 420 checks the control information in the header blocks received and transfers the subsequent flits of the packet to the channel indicated by the header block. Note that in this embodiment the router 400 also checks whether the second virtual channel of the next link is available (for example, another packet may currently be utilizing the second network).

Each node within the network system is able to utilize any one of these four virtual networks. Which virtual network a source node utilizes to transfer data to a destination node is dependent on several factors, including the existence of a deadlock-free path within the virtual network as discussed in more detail below. In the example shown, four virtual networks are supported. It will be appreciated however, that any number of virtual networks can be supported by router 400 by utilizing the proper number of input queues 418.

Data is received by router 400 in flits, as discussed above. Upon receipt of a flit of a new packet via input link 401, a channel selecting logic 420 coupled to input link 401 checks the control information in the flit. In one embodiment of the present invention, the first flit of a new packet is a header block. If the header block indicates that the current router is not the destination router for the pathway segment, then the channel selecting logic 420 checks the control information in the header block to determine which virtual network the packet is using. The channel selecting logic 420 then transfers the first flit and all subsequent flits in the packet to the channel of input link 401 corresponding to the virtual network and asserts a request signal to an arbitration unit to transfer the flits of the packet to the output link continuing in the same direction through the network.

However, if the header block indicates that the current router is the destination router for this pathway segment, then channel selecting logic 420 discards the first header block and checks the second flit. If the second flit is a header block, then channel selecting logic 420 checks the control information in the header block to determine which of the virtual networks the packet is using. The channel selecting logic 420 then transfers the second and all subsequent flits in the packet to the channel of input link 401 corresponding to that virtual network and asserts a request signal to the arbitration unit to transfer the flits of the packet to the output link indicated by the header block. The channel selecting logic 420 continues to transfer all subsequent flits to this channel until the tail of the packet is received. If, however, the second flit is not a header block, then the packet is transferred to the computing node connected to the router 400, as discussed in more detail below.

In one embodiment of the present invention, channel selecting logic 420 stores the directions of the input links 401-404 and the output links 411-414. Channel selecting logic 420 utilizes these directions when router 400 is not a destination router for the packet. In this situation, router 400 transfers the packet to the next router along the same direction in the network as the previous link. Channel selecting logic 420 is able to determine which output link 411-414 is in the same direction based on these stored directions.

Data received by router 400 via input links 402-404 is handled analogously to the discussion above regarding input link 401.

In one embodiment of the present invention, router 400 includes an additional set of inbound queues 419a-419d which are coupled directly to the computing node connected to router 400. These inbound queues 419a-419d are used analogous to inbound queues 418a-418d discussed above, except that the source of the packets transferred to inbound queues 419a-419d is the computing node rather than another routing device. Thus, when the computing node connected to the routing device is the source node for a particular packet, the flits of that packet are transferred from the computing node to the inbound queues 419a-419d and channel selecting logic coupled to the inbound queues 419a-419d asserts a request signal to the arbitration unit to transfer the flits of the packet to the appropriate outbound link, as indicated by the header information for the packet.

Crossbar 416 transfers flits from the four input links 401-404 to the four output links 411-414. The channels of each of the four input links 401-404 are multiplexed onto the output links 411-414 by crossbar 416. In one embodiment of the present invention, router 400 operates according to an arbitration policy which ensures that the data being received on each input link 401-404 is treated fairly. In one implementation, this arbitration policy is the well-known round-robin scheme. By ensuring that each input link is treated fairly, packets being transferred on multiple virtual networks progress through the network independent of the progress of packets on any other virtual network. Alternatively, other arbitration policies may be employed which are less fair, such as giving one or more channels priority over the remaining channels. However, these other arbitration policies should be such that the progress of packets on one virtual network does not prevent the progress of packets on another virtual network indefinitely.

In one embodiment of the present invention, channel selecting logic 420 maintains the current state for each channel of the four input channels. That is, channel selecting logic 420 keeps track of whether a packet is currently being transferred via a particular channel and the correct outbound link for a packet if a packet is being transferred. In this embodiment, crossbar 416 is a coupling device which connects each input queue of each input link to each outbound channel of each output link. When a new packet is received by channel selecting logic 420, it asserts an access request signal to an arbitration unit for access to the appropriate output channel. If another input link is currently using the requested output channel then the request is denied. Once the output channel is available, the arbitration unit asserts an access granted signal to channel selecting logic 420. Channel selecting logic 420 does not begin transfer of the data to the output link until this access granted signal is received. Thus, conflicts between multiple input links for the same output channel are resolved by the arbitration unit granting access to the output channel to only one input link at a time. Note that if only a single channel has data to be transferred, the crossbar 416 allows that channel to monopolize the crossbar 416 until another channel has data to be transferred. The arbitration unit grants access to the output channels according to the arbitration policy utilized by the router, as discussed above.

It will be appreciated that other implementations of an arbitration unit may also be utilized within router 400. Any of a wide variety of arbitration policies and units may be employed by router 400 which allow an inbound link to obtain access to an outbound link, which allow the inbound link to maintain access to the outbound link until the transfer of the packet is completed, and which resolve conflicts between multiple input links for access to the same output link.

The appropriate output link for a flit can be determined in a wide variety of manners. In one embodiment, channel selecting logic 420 stores the appropriate output link for each packet, as discussed above. In an alternate embodiment, channel selecting logic 420 asserts a signal to crossbar 416 each time it receives a new header block. This signal indicates to crossbar 416 the appropriate output link 411-414 for all flits coming from the appropriate inbound queue until a new signal is asserted. Alternatively, control logic within crossbar 416 may monitor the flits as they are transferred through the crossbar 416. Each time the control logic monitors a flit which is a header block indicating the next portion of the path for this router, the crossbar stores which output link all subsequent flits from that queue should be transferred to (until another appropriate header block is received).

If a flit received by router 400 is the first flit in a new packet, then router 400 does not transfer the flit to the appropriate output link 411,412, 413 or 414 until that appropriate output link is reserved. For example, a flit for a new packet may be received on input link 401 which is to be transferred to channel A of output link 412. However, another packet may have already been received on input link 403 which is currently transferring data to channel A of output link 412. Thus, router 400 waits until the packet being received on input link 403 is finished before transferring the packet from input link 401 to channel A of output link 412. It should be noted that, under certain circumstances, flits received via input link 401 may fill the inbound queue for channel A before the packet being received on input link 403 is completely transferred to channel A of output link 412. When this occurs, router 400 asserts a signal to the router it is receiving the packet from on input link 401. This signal indicates to the prior router that the inbound queue is full and that transfer of flits to router 400 should be suspended until the signal is deasserted. Router 400 deasserts the signal once the channel A of output link 412 is available. It should further be noted that the suspension of transferring flits by the prior router may result in the inbound queue of the prior router being filled. In this situation, the prior router asserts a signal to the router it is receiving flits from to suspend transfer of flits until the signal is deasserted, analogous to the discussion above.

In the embodiment shown in FIG. 4, each output link 411-414 is associated with four outbound channels, each of which corresponds to an inbound