WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Load balancing of network by maintaining in each computer information regarding current load on the computer and load on some other computers in the network    
United States Patent5539883   
Link to this pagehttp://www.wikipatents.com/5539883.html
Inventor(s)Allon; David (Jerusalem, IL); Bach; Moshe (Haifa, IL); Moatti; Yosef (Haifa, IL); Teperman; Abraham (Haifa, IL)
AbstractA method is described of operating a computer in a network of computers using an improved load balancing technique. Logical links are generated between the computer and other computers in the network so that a tree structure is formed, the computer being logically linked to one computer higher up the tree and a number of computers lower down the tree. Stored information is maintained in the computer regarding the current load on the computer and the load on at least some of the other computers in the network by causing the computer periodically to distribute the information to the computers to which it is logically linked, and to receive from the computers similar such information and to update its own information in accordance therewith, so that the information can be used to determine a computer in the network that can accept extra load. A sender-initiated embodiment of the invention includes the further step of, when the computer is overloaded, using the information to determine a computer that can accept extra load and transferring at least one task to that computer. The load balancing technique is scalable, fault tolerant, flexible and supports clustering, thus making it suitable for use in networks having very large numbers of computers.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5539883
Load balancing of network by maintaining in each computer information

     regarding current load on the computer and load on some other computers

     in the network - US Patent 5539883 Drawing
Load balancing of network by maintaining in each computer information regarding current load on the computer and load on some other computers in the network
Inventor     Allon; David (Jerusalem, IL); Bach; Moshe (Haifa, IL); Moatti; Yosef (Haifa, IL); Teperman; Abraham (Haifa, IL)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     July 23, 1996
Application Number     08/231,352
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 20, 1994
US Classification    
Int'l Classification    
Examiner     Lee; Thomas C.
Assistant Examiner     Meky; Moustafa Mohamed
Attorney/Law Firm     Pintner; James C.
Address
Parent Case     This is a continuation of application Ser. No. 07/968,713 filed on Oct. 30, 1992 now abandoned.
Priority Data     Oct 31, 1991 [IL] 099923 Sep 08, 1992 [EP] 92308137
USPTO Field of Search    
Patent Tags     load balancing network maintaining each computer information regarding current load computer load some other computers network
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5283897
Georgiadis
718/105
Feb,1994

[0 after 0 votes]
5241677
Naganuma
718/105
Aug,1993

[0 after 0 votes]
5155858
DeBruler
718/105
Oct,1992

[0 after 0 votes]
5150360
Perlman
370/402
Sep,1992

[0 after 0 votes]
5115505
Bishop
718/104
May,1992

[0 after 0 votes]
5053950
Naganuma
718/105
Oct,1991

[0 after 0 votes]
5031089
Liu
709/226
Jul,1991

[0 after 0 votes]
4811337
Hart
370/256
Mar,1989

[0 after 0 votes]
4748558
Hirosawa
718/105
May,1988

[0 after 0 votes]
4633387
Hartung
718/105
Dec,1986

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of operating a first computer in a network of computers, the method comprising the steps of:

generating logical links between the first computer and other computers in the network so that a tree structure is formed, the logical links including a link to one of the other computers higher up the tree and links to a number of computers lower down the tree; and

maintaining in the first computer stored information regarding a current load on the first computer and a load on at least some of the other computers in the network, the step of maintaining including causing the first computer:

(i) periodically to distribute the information to the computers to which it is logically linked,

(ii) to receive from said other computers similar such information, and

(iii) to update its own information in accordance therewith, so that the information can be used to determine ones of the other computers in the network that can accept extra load:

wherein the step of maintaining includes maintaining information stored in the first computer, including a number of entries, each entry containing information regarding:

(i) a load on one of the other computers in the network,

(ii) a number of links in the tree separating that other computer from the first computer, and

(iii) the one of the other computers, which are linked to the first computer, from which the entry was last received; and wherein,

the method further comprises the steps, when the first computer receives the similar information from on one of the other computers to which it is linked, of:

(a) incrementing a number of links value in each entry of the received similar information by one;

(b) deleting entries in the received information which originated from the other computer;

(c) deleting entries in the information already stored in the first computer which were received from the other computer;

(d) merging the received similar information with the information already stored in the first computer; and

(e) sorting the merged information in ascending order of load, entries with equal load being sorted in ascending order of number of links separation from the first computer.

2. A method as claimed in claim 1, further comprising the step, when the first computer is overloaded, of using the information to determine a second computer that can accept extra load and transferring at least one task to the second computer.

3. A method as claimed in claim 1 further comprising the step, if a second computer higher up the tree, to which the first computer is logically linked, fails or is otherwise inoperative, of generating a new logical link to another computer in the network which has capacity for accepting new downward links.

4. A method as claimed in claim 1, wherein the step of maintaining includes:

(i) using the periodic distribution of load information to determine whether or not the other computers to which the first computer is linked in the tree are operational, and

(ii) marking the first computer, if one of the other computers lower in the tree to which the first computer is linked is not operational, as having capacity to accept new downward links.

5. A method as claimed in claim 1, further comprising the step of randomly permuting entries in the information, the entries relating to computers which have the same load and number of links separation from the computer, prior to the step of maintaining by causing the computer periodically to distribute.

6. A method as claimed in claim 1, further comprising the steps, if an entry corresponding to the computer appears first in its own sorted information, of:

attaching a counter to the entry;

initializing the counter to negative value of a spare load capacity of the computer;

incrementing the counter, if that entry is still first, whenever the information is distributed; and

removing that entry from the information, if the counter becomes positive;

whereby a probability that the same entry will appear first in the information stored in different computers is reduced.

7. A method as claimed in claim 1, wherein:

the step of causing periodically to distribute is executed according to a first period which includes a second period of periodic distribution of the information to the one computer higher up the tree; and

the second period depends on a rate at which new tasks are created in the computer.

8. A method as claimed in claim 7, wherein, in the step of causing, the first period depends on a rate at which new tasks are created in the other computers to which the first computer is logically linked in the tree structure.

9. A method as claimed in claim 1, wherein, in the step of causing periodically to distribute, the information is distributed to one of the other computers which is lower in the tree in response to receipt by the first computer of the similar information from the other computer lower in the tree.

10. A method of operating a network of computers, comprising the step of:

operating each computer of the network using the method as claimed in claim 1.

11. A method as claimed in claim 10, wherein the step of generating logical links includes assigning a rank to each computer, no two computers being assigned the same rank, each computer being linked to one computer of lower rank and a number of computers of higher rank to form the tree structure.

12. A computer comprising:

means for operating in a network of similar such computers,

means for logically linking the computer to similar such computers in a tree structure network, and for identifying the similar such computers;

means for storing information regarding a load on the computer and loads on at least some of the similar such computers in the network;

means for sending said information to the similar such computers to which the computer is logically linked;

means for receiving similar information from said similar such computers to which the computer is logically linked, and for updating the stored information in accordance therewith; and

means for selecting one of the similar such computers in the network having spare load capacity from said information, and for transferring tasks to the selected computer:

wherein the means for storing includes means for maintaining information stored in the first computer, including a number of entries, each entry containing information regarding:

(i) a load on one of the other computers in the network,

(ii) a number of links in the tree separating that other computer from the first computer, and

(iii) the one of the other computers, which are linked to the first computer, from which the entry was last received; and

wherein, the computer further comprises means, operable when the first computer receives the similar information from one of the other computers to which it is linked, for:

(a) incrementing a number of links value in each entry of the received similar information by one;

(b) deleting entries in the received information which originated from the other computer;

(c) deleting entries in the information already stored in the first computer which were received from the other computer;

(d) merging the received similar information with the information already stored in the first computer; and

(e) sorting the merged information in ascending order of load, entries with equal load being sorted in ascending order of number of links separation from the first computer.

13. A computer as claimed in claim 12, further comprising:

means for accessing a stored list of all the computers in the network;

means for selecting a computer from the list as a candidate neighboring computer for linking in the tree structure;

means for sending a message to the selected computer indicating a link request;

means for establishing a logical link with the selected computer by updating the means for identifying if a positive response from the selected computer is received;

means for receiving a message from one of the similar such computers indicating that a link is requested; and

means for determining, on receipt of such a link request message, whether or not there is capacity for establishing a downward link, for sending a positive response to the one similar such computer which sent the link request message if there is such capacity and updating the identifying means accordingly.

14. A network of computers comprising a plurality of computers as claimed in claim 13.

15. A network of computers comprising a plurality of computers as claimed in claim 12.

16. A computer as claimed in claim 12, further comprising means, operable when the first computer is overloaded, for using the information to determine a second computer that can accept extra load and transferring at least one task to the second computer.

17. A computer as claimed in claim 12 further comprising means, operable when a second computer higher up the tree, to which the first computer is logically linked, fails or is otherwise inoperative, for generating a new logical link to another computer in the network which has capacity for accepting new downward links.

18. A computer as claimed in claim 12, wherein the means for storing includes:

(i) means for using the periodic distribution of load information to determine whether or not the other computers to which the first computer is linked in the tree are operational, and

(ii) means for marking the first computer, if one of the other computers lower in the tree to which the first computer is linked is not operational, as having capacity to accept new downward links.

19. A computer as claimed in claim 12 further comprising means for randomly permuting entries in the information, the entries relating to computers which have the same load and number of links separation from the computer.

20. A computer as claimed in claim 12, further comprising means, operable when an entry corresponding to the computer appears first in its own sorted information, for:

attaching a counter to the entry;

initializing the counter to negative value of a spare load capacity of the computer;

incrementing the counter, if that entry is still first, whenever the information is distributed; and

removing that entry from the information, if the counter becomes positive;

whereby a probability that the same entry will appear first in the information stored in different computers is reduced.

21. A computer as claimed in claim 12, wherein:

the means for causing periodically to distribute is operable according to a first period which includes a second period of periodic distribution of the information to the one computer higher up the tree; and

the second period depends on a rate at which new tasks are created in the computer.

22. A computer as claimed in claim 21, wherein, in the means for causing, the first period depends on a rate at which new tasks are created in the other computers to which the first computer is logically linked in the tree structure.

23. A computer as claimed in claim 12, wherein the means for causing periodically to distribute includes means for distributing the information to one of the other computers which is lower in the tree in response to receipt by the first computer of the similar information from the other computer lower in the tree.

24. A network of computers, comprising a plurality of computers as claimed in claim 12.

25. A network as claimed in claim 24, wherein, for each computer, the means for generating logical links includes means for assigning a rank to each computer, no two computers being assigned the same rank, each computer being linked to one computer of lower rank and a number of computers of higher rank to form the tree structure.

26. A computer program product, for use with a computer, comprising:

a recording medium;

means, recorded on the recording medium, for directing the computer to operate in a network of similar such computers,

means, recorded on the recording medium, for directing the computer to logically link the computer to similar such computers in a tree structure network, and for identifying the similar such computers;

means, recorded on the recording medium, for directing the computer to store information regarding a load on the computer and loads on at least some of the similar such computers in the network;

means, recorded on the recording medium, for directing the computer to send said information to the similar such computers to which the computer is logically linked;

means, recorded on the recording medium, for directing the computer to receive similar information from said similar such computers to which the computer is logically linked, and for directing the computer to update the stored information in accordance therewith; and

means, recorded on the recording medium, for directing the computer to select one of the similar such computers in the network having spare load capacity from said information, and for transferring tasks to the selected computer;

wherein the means for directing to store includes means, recorded on the recording medium, for directing the computer to maintain information stored in the first computer, including a number of entries, each entry containing information regarding:

(i) a load on one of the other computers in the network,

(ii) a number of links in the tree separating that other computer from the first computer, and

(iii) the one of the other computers, which are linked to the first computer, from which the entry was last received; and

wherein, the computer further comprises means, recorded on the recording medium, operable when the first computer receives the similar information from one of the other computers to which it is linked, for directing the computer:

(a) to increment a number of links value in each entry of the received similar information by one;

(b) to delete entries in the received information which originated from the other computer;

(c) to delete entries in the information already stored in the first computer which were received from the other computer;

(d) to merge the received similar information with the information already stored in the first computer; and

(e) to sort the merged information in ascending order of load, entries with equal load being sorted in ascending order of number of links separation.

27. A computer program product as claimed in claim 26, further comprising:

means, recorded on the recording medium, for directing the computer to access a stored list of all the computers in the network;

means, recorded on the recording medium, for directing the computer to select a computer from the list as a candidate neighboring computer for linking in the tree structure;

means, recorded on the recording medium, for directing the computer to send a message to the selected computer indicating a link request;

means, recorded on the recording medium, for directing the computer to establish a logical link with the selected computer by updating the means for directing to identify if a positive response from the selected computer is received;

means, recorded on the recording medium, for directing the computer to receive a message from one of the similar such computers indicating that a link is requested; and

means, recorded on the recording medium, for directing the computer to determine, on receipt of such a link request message, whether or not there is capacity for establishing a downward link, for directing the computer to send a positive response to the one similar such computer which sent the link request message if there is such capacity, and for directing the computer to update the identifying means accordingly.

28. A computer program product as claimed in claim 26, further comprising means, recorded on the recording medium, operable when the first computer is overloaded, for directing the computer to use the information to determine a second computer that can accept extra load and for directing the computer to transfer at least one task to the second computer.

29. A computer program product as claimed in claim 26 further comprising means, recored on the recording medium, operable when a second computer higher up the tree, to which the first computer is logically linked, fails or is otherwise inoperative, for directing the computer to generate a new logical link to another computer in the network which has capacity for accepting new downward links.

30. A computer program product as claimed in claim 26, wherein the means for directing to store includes:

(i) means, recorded on the recording medium, for directing the computer to use the periodic distribution of load information to determine whether or not the other computers to which the first computer is linked in the tree are operational, and

(ii) means, recorded on the recording medium, for directing the computer to mark the first computer, if one of the other computers lower in the tree to which the first computer is linked is not operational, as having capacity to accept new downward links.

31. A computer program product as claimed in claim 26 further comprising means, recorded on the recording medium, for directing the computer to randomly permute entries in the information, the entries relating to computers which have the same load and number of links separation from the computer.

32. A computer program product as claimed in claim 26, further comprising means, recorded on the recording medium, operable when an entry corresponding to the computer appears first in its own sorted information, for directing the computer:

to attach a counter to the entry;

to initialize the counter to negative value of a spare load capacity of the computer;

to increment the counter, if that entry is still first, whenever the information is distributed; and

to remove that entry from the information, if the counter becomes positive;

whereby a probability that the same entry will appear first in the information stored in different computers is reduced.

33. A computer program product as claimed in claim 26, wherein:

the means for directing to cause periodically to distribute is operable according to a first period which includes a second period of periodic distribution of the information to the one computer higher up the tree; and

the second period depends on a rate at which new tasks are created in the computer.

34. A computer program product as claimed in claim 33, wherein, in the means for directing to cause, the first period depends on a rate at which new tasks are created in the other computers to which the first computer is logically linked in the tree structure.

35. A computer program product as claimed in claim 26, wherein the means for directing to cause periodically to distribute, includes means, recorded on the recording medium, for directing the computer to distribute the information to one of the other computers which is lower in the tree in response to receipt by the first computer of the similar information from the other computer lower in the tree.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The invention relates to methods of operating computers in networks and to computers capable of operating in networks.

BACKGROUND OF THE INVENTION

Local area networks of computers are an indispensable resource in many organisations. One problem in a large network of computers is that it is sometimes difficult to make efficient use of all the computers in the network. Load sharing or balancing techniques increase system throughput by attempting to keep all computers busy. This is done by off-loading processes from overloaded computers to idle ones thereby equalising load on all machines and minimising overall response time.

Load balancing methods can be classified according to the method used to achieve balancing. They can be `static` or `dynamic` and `central` or `distributed`.

In static load balancing, a fixed policy--deterministic or probabilistic--is followed independent of the current system state. Static load balancing is simple to implement and easy to analyse with queuing models. However, its potential benefit is limited since it does not take into account changes in the global system state. For example, a computer may migrate tasks to a computer which is already overloaded.

In dynamic schemes the system notes changes in its global status and decides whether to migrate tasks based on the current state of the computers. Dynamic policies are inherently more complicated than static policies since they require each computer to have knowledge of the state of other computers in the system. This information must be continuously updated.

In a central scheme one computer contains the global state information and makes all the decisions. In a distributed scheme no one computer contains global information. Each computer makes migration decisions based on the state information it has.

Load-balancing methods can be further classified as `sender-initiated` and `receiver-initiated`. In sender-initiated policies an overloaded computer seeks a lightly loaded computer. In receiver-initiated policies lightly loaded computers advertise their capability to receive tasks. It has been shown that if costs of task-transfer are comparable under both policies then sender-initiated strategies outperform receiver-initiated strategies for light to moderate system loads, while receiver-initiated schemes are preferable at high system loads.

Conventionally, a dynamic load balancing mechanism is composed of the following three phases:

1. Measuring the load of the local machine.

2. Exchanging local load information with other machines in the network.

3. Transferring a process to a selected machine.

Local load can be computed as a function of the number of jobs on the run queue, memory usage, paging rate, file use and I/O rate, or other resource usage. The length of the run queue is a generally accepted load metric.

The receipt of load information from other nodes gives a snapshot of the system load. Having this information, each computer executes a transfer policy to decide whether to run a task locally or to transfer it to another node. If the decision is to transfer a task then a location policy is executed to determine the node the task should be transferred to.

If a load balancing method is to be effective in networks having large numbers of computers it must be scalable, in the sense that network traffic increases linearly with the size of the network, flexible so that computers can be easily added to or removed from the network, robust in the event of failure of one or more of the computers, and must support to some degree clustering of the computers.

Centralised methods, though attractive due to their simplicity, are not fault-tolerant. The central computer can detect dead nodes and it takes one or two messages to re-establish connection. However, if the central node fails, the whole scheme falls apart. Load balancing configuration can be re-established by maintaining a prioritised list of alternative administrators in each node or by implementing an election method to elect a new centralised node. When a central node fails, other nodes switch to the new central node. This enhancement, however, increases the management complexity and cost of the method.

In addition, management overhead for a centralised method is unacceptably large. As the number of nodes increases, the time spent by the central node in handling load information increases, and there must come a point at which it will overload.

Several prior art load balancing methods are both dynamic and distributed.

For example, in the method described in Barak A. and Shiloh A. SOFTWARE--PRACTICE AND EXPERIENCE, 15(9):901-913, September 1985 [R1], each computer maintains a fixed size load vector that is periodically distributed between computers. Each computer executes the following method. First, the computer updates its own load value. Then, it chooses a computer at random and sends the first half of its load vector to the chosen computer. When receiving a load vector, each computer merges the information with its local load vector according to a predefined rule.

A problem with this method is that the size of the load vector has to be chosen carefully. Finding the optimal value for the load vector is difficult and it must be tuned to a given system. A large vector contains load information for many nodes. Thus, many nodes will know of a lightly loaded node, increasing the chance that it will quickly receive many migrating processes and will overload. On the other hand, the size of the load vector should not be so small that load values do not propagate through the network in a reasonable time. For large number n of nodes in the network, the expected time to propagate load information through the network is O(log n).

The method described in R1 can handle any number of computers after tuning the length of the load vector. Therefore, the method is scalable. However, it is flexible only up to a point. Adding a small number of nodes to the network does not require any change. Adding a large number of nodes requires changing the size of the load vector in all computers, thus increasing the administrative overhead. The method is fault-tolerant and continues to work in spite of single failures, however there is no built-in mechanism for detecting dead nodes and updating information on them in the load vectors. Thus information about failed nodes is not propagated and other nodes may continue to migrate processes to them, with the result of decreased response time.

The number of communications per unit time is O(n). However, the method does not support clustering. When a node wants to off-load process to another node it chooses a candidate from its load vector. Since the information on nodes in the load vector of node p is updated from random nodes, the set of candidates for off-loading is a random subset of n and not a controlled set for p.

In Lin F. C. H. and Keller R. M. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 329-336, May 1986 [R2], a distributed and synchronous load balancing method using a gradient model is disclosed. Computers are logically arranged in a grid and the load on the computers is represented as a surface. Each computer interacts solely with its immediate neighbours on the grid. A computer with high load is a `hill` and an idle computer is a `valley`. A neutral computer is one which is neither overloaded nor idle. Load balancing is a form of relaxation of the surface. Tasks migrate from hills towards valleys and the surface is flattened. Each computer computes its distance to the nearest idle computer and uses that distance for task migration.

An overloaded node transfers a task to its immediate neighbour in the direction of an idle node. A task continues to move until it reaches an idle node. If a former idle node is overloaded when a task arrives, the task moves to another idle node.

This method, though scalable, is only partially flexible. It is easy to add or remove nodes at the edge of the grid, but it is difficult to do so in the middle since the grid is fixed. To do this, reconfiguration is required, which increases the overhead inherent in the method. Detection of dead nodes is not easy. Since only changes of the state of a node are sent to its neighbours, a node's failure remains undetected until a job is transferred to it. Late detection delays migrating processes and, therefore, increases overall response time. In this method clustering is not supported. Each node transfers processes to one of its immediate neighbours which may transfer it in a different direction. The overloaded node has no control as to where the off-loaded processes will eventually execute. Management overhead to start the grid is low and the number of messages is O(n). Administrative overhead for reconfiguration is high since it requires changes in all neighbouring nodes for a failed node. In this method node overhead is high and network traffic increases because tasks move in hops rather than directly to an idle node and it takes long time to migrate a process.

The buddy set algorithm proposed in Shin K. G. and Chang Y. C. IEEE TRANSACTIONS ON COMPUTERS, 38(8), August 1989 [R3], aims for very fast load sharing. For each node two lists are defined: Its buddy set, the set of its neighbours, and its preferred list, an ordered list of nodes in its buddy set to which tasks are transferred. Each node can be in one of the following states: under-loaded, medium-loaded and overloaded. Each node contains the status of all nodes in its buddy set. Whenever the status of a node changes, it broadcasts its new state to all nodes in its buddy set. The internal order of preferred lists varies per node in order to minimise the probability that two nodes in a buddy set will transfer a task to the same node. When a node is overloaded it scans its preferred list for an under-loaded node and transfers a task to that node. Overloaded nodes drop out of preferred lists. Buddy sets and their preferred lists overlap, so processes migrate around the network.

Again, this method, though scalable in that the number of messages increases linearly with the number of computers in the network n, is not flexible. Adding or removing nodes from the network requires recomputation of the preferred lists in all nodes of the related buddy set. If a node is added to more than one buddy set the recomputation must be done for each node in each buddy set. Detection of dead nodes is difficult for the same reason as in the gradient model described in R2. Clustering is supported but reconfiguratio expensive since it requires recomputation of new buddy sets and preferred lists, as for adding and removing nodes. The administrative overhead inherent in the method is therefore high.

SUMMARY OF THE INVENTION

According to a first aspect of the present invention there is provided a method of operating a computer in a network of computers, the method comprising: generating logical links between the computer and other computers in the network so that a tree structure is formed, the computer being logically linked to one computer higher up the tree and a number of computers lower down the tree; and maintaining in the computer stored information regarding the current load on the computer and the load on at least some of the other computers in the network by causing the computer periodically to distribute the information to the computers to which it is linked, to receive from said computers similar such information and to update its own information in accordance therewith, so that the information can be used to determine computers in the network that can accept extra load.

The invention is applicable to both sender-initiated and receiver-initiated load balancing. A embodiment of the invention using a sender-initiated load balancing technique includes the further step of, when the computer is overloaded, using the information to determine a computer that can accept extra load and transferring at least one task to that computer.

There is further provided a method of operating a network of computers by operating each computer using the above method.

The generation of the logical links can be achieved by assigning a rank to each computer, no two computers being assigned the same rank, each computer being logically linked to one computer of lower rank and a number of computers of higher rank to form the tree structure.

The tree structure can be maintained if a computer fails, or is otherwise inoperative, by generating new logical links between each of the computers lower down the tree to which the failed computer was linked and other computers, which have capacity for accepting new downward links.

An improved load balancing technique is employed which has the property of scalability by limiting the communication load on each computer and the knowledge of the network topology which is required to be stored in each computer. The actions taken by each computer are thus independent of the total number of computers in the network.

Since the tree structure is dynamically built and maintained, the method is also flexible. Changing a tree configuration requires simply changing the ranking of the reconfigured computers. The new ranking files have to be consistent with existing ranking files to avoid cycles. A computer is reconfigured by logically disconnecting it so that it appears inoperative to its neighbours, replacing its configuration file, and letting it reconnect to the tree. A new configuration is achieved without disturbing other nodes on the tree and without disrupting local services.

Detection and reconnection of failed nodes is also provided for. If a computer higher up the tree does not respond within a certain time, each higher ranked computer previously connected to it tries to relink itself elsewhere in the tree. If it temporarily fails to do this, it acts as root of a disconnected sub-tree, and load balancing occurs within the sub-tree. Eventually, the flags marking nodes as already tried or dead will be reset, and the computer, which periodically tries to attach itself to the tree, will reconnect to the tree.

Advantageously, the periodic distribution of load information across the links of the tree is used by each computer to determine whether or not the computers to which it is linked are operational, each computer seeking, if the computer to which it is linked higher in the tree is not operational, to generate a new link with a computer having lower rank, having the capacity for new downward links, and being marked, if one of the computers lower in the tree to which it is linked is not operational, as having capacity to accept new downward links.

In a preferred form of the invention, the information stored in each computer contains a number of entries, each entry containing information regarding the load on a particular one of the computers in the network, i, the number of links in the tree separating it from the computer in which the information is stored, and the name (i.e., rank) of the computer, logically linked to the computer in which the information is stored, from which the entry was last received; and wherein, when each computer receives the similar information from a computer to which it is logically linked, the following steps are performed:

a) the number of links separation value in each entry of the received information is incremented by one;

b) entries in the received information which originated from the receiving computer are deleted;

c) entries in the information already stored in the receiving computer which were received from the sending computer are deleted;

d) the received information is merged with the information already stored in the receiving computer; and

e) the merged information is sorted in ascending order of load, entries with equal load being sorted in ascending order of number of links separation from the receiving computer.

Before each distribution of information between computers, entries in the information relating to computers which have the same load and number of links separation from the computer can be randomly permuted, so that the probability that the same entry will appear first in the information stored in different computers is reduced. Also, if an entry corresponding to the computer appears first in its own sorted information, a counter can be attached to the entry and initialised to the negative value of the spare load capacity of the computer. The counter is incremented, if that entry is still first, whenever the information is distributed. If the counter becomes positive, that entry is removed from the information. The probability that the same entry will appear first in the information stored in different computers is thereby reduced.

In an advantageous form of the invention, the period of the periodic distribution of the information to a computer higher up the tree is related to the rate at which new tasks are created in the computer and/or the rate at which new tasks are created in the computers to which it is logically linked in the tree structure. This has the advantage that if one or more nodes suddenly become highly loaded, this information will spread more quickly throughout the network.

A further advantage of the method provided in the present invention is that the case where the overall load in the network increases steadily is handled gracefully. In such a case, when local load vectors are searched for a target computer, fewer candidates will be found and the number of migrations will decrease. When overall load decreases, more candidates will be found in the local load vectors and migrations will resume. In other words, the network performance degrades gracefully during network load saturation and does not crash, rather normal operation is resumed when overall load returns to normal.

Viewed from another aspect, the invention provides a computer capable of operating in a network of similar such computers, the computer comprising: means for identifying computers in the network to which the computer is logically linked in a tree structure; means for storing information regarding the load on the computer and at least some of the other computers in the network; means for sending said information to the computers to which the computer is logically linked in the tree structure; means for receiving similar information from said computers to which the computer is logically linked, and updating the stored information in accordance therewith; and means for selecting one of the other computers in the network having spare load capacity using said information and transferring tasks to the selected computer.

The computer preferably also includes means for accessing a stored list of all the computers in the network; means for selecting a computer from this list as a candidate neighbouring computer for linking in the tree structure; means for sending a message to the selected computer indicating a link request; means for establishing a logical link with the selected computer by updating the identifying means if a positive response from the selected computer is received; means for receiving messages from other computers indicating that a link is requested; means for determining, on receipt of such a link request message, whether or not there is capacity for establishing a downward link; and means for sending a positive response to the sender of the link request message if there is such capacity and for updating the identifying means accordingly.

There is also provided a network of such computers.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawing, wherein:

FIGS 1a, 1b, 1c and 1d are flow diagrams showing the request and receive phases of the tree generation;

FIGS 2a, 2b and 2c show stages of the tree generation according to the present invention;

FIGS 3a, 3b and 3c illustrate an example of tree maintenance in the embodiment of the invention; and

FIGS 4a, 4b and 4c illustrate the handling of clustering of nodes in the embodiment of the invention.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

This embodiment of the invention is composed of two main parts. The