WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Delay-based congestion avoidance in computer networks    
United States Patent5193151   
Link to this pagehttp://www.wikipatents.com/5193151.html
Inventor(s)Jain; Rajendra K. (Sudbury, MA)
AbstractA packet data communication system employs a congestion avoidance method in which each node measures the round-trip delay occurring when it sends data to a destination and receives an acknowledgement. This delay is measured for different load levels, and a comparison of these delays is used to determine whether to increase or decrease the load level. The load level can be adjusted by adjusting the window size (number of packets sent in to the network) or by adjusting the packet rate (packets per unit time). The objective is operation at the knee in the throughput vs. traffic curve, so that the data throughput is high and the round trip delay is low. Control is accomplished at each node individually, without intervention by the router or server, so system overhead is not increased.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Jain; Rajendra K. (Sudbury, MA)
Owner/Assignee     Digital Equipment Corporation (Maynard, MA)
Patent assignment
All assignments
Publication Date     March 9, 1993
Application Number     07/400,858
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 30, 1989
US Classification    
Int'l Classification    
Examiner     Shaw; Gareth D.
Assistant Examiner     Chaki; Kakali
Attorney/Law Firm     Arnold, White & Durkee
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     delay-based congestion avoidance computer networks
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5042029
Hayakawa
370/231
Aug,1991

[0 after 0 votes]
4962498
May, Jr.
370/474
Oct,1990

[0 after 0 votes]
4779267
Limb
370/232
Oct,1988

[0 after 0 votes]
4771391
Blasbalg
709/232
Sep,1988

[0 after 0 votes]
4769815
Hinch
370/236
Sep,1988

[0 after 0 votes]
4736369
Barzilai
370/231
Apr,1988

[0 after 0 votes]
4617657
Drynan
370/394
Oct,1986

[0 after 0 votes]
4569042
Larson
370/250
Feb,1986

[0 after 0 votes]
4551833
Turner
370/236
Nov,1985

[0 after 0 votes]
4475192
Fernow
370/232
Oct,1984

[0 after 0 votes]
4404557
Grow
370/455
Sep,1983

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. For use in a data communication network having a plurality of nodes, including a first node including a processor characterized by a changeable parameter which affects the level of loading imposed on the network by the transmission of data by said first node, the plurality of nodes also including a destination node, a method of congestion avoidance comprising the steps of:

a) sending a first group of data from the first node to said destination node, loading of said network during travel of said first group of data from said first node to said destination node being at a first loading value which represents the loading imposed on the network when said first group is sent by the first node;

b) receiving at the first node an acknowledgement that at least some of the data in the first group was received by the destination node;

c) measuring by said processor of said first node a first time delay which is the time delay between the sending of the first group of data and the receiving of the corresponding acknowledgement;

d) sending a second group of data from the first node to the destination node, loading of said network during travel of said second group of data from said first node to said destination node being at a second loading value which represents the loading imposed when the second group is sent by the first node;

e) receiving at the first node an acknowledgement that at least some of the data in the second group was received by the destination node;

f) measuring by said processor of said first node a second time delay which is the time delay between the sending of the second group and the receiving of the corresponding acknowledgement;

g) computing by said processor of said first node a quantity which is a function of the ratio between (1) the relative difference between the first and second time delays, and (2) the relative difference between the first and second loading values; and

h) automatically changing by said processor of said first node said changeable parameter so as to increase the loading from any data to be subsequently sent by the first node if the said quantity is less than a predetermined value, and changing at said first node said changeable parameter so as to decrease such loading if said quantity is greater than said predetermined value.

2. A method according to claim 1, wherein the step of changing includes changing a parameter consisting of the window size said window size being equal to the maximum number of data packets the first node may transmit at a time.

3. A method according to claim 2, wherein the first and second loading values are the respective window sizes used by the first node when sending data immediately prior to sending the first and second groups, respectively.

4. A method according to claim 1, wherein the step of changing includes changing a parameter consisting of the bit rate at which the first node may transmit data.

5. A method according to claim 4, wherein the first and second loading values are the respective bit rates at which the first node sent data immediately prior to sending the first and second groups, respectively.

6. A method according to claim 1, wherein the step of changing includes changing a parameter consisting of the maximum packet rate at which the first node may transmit data.

7. A method according to claim 6, wherein the first and second loading values are the respective packet rates at which the first node sent data immediately prior to sending the first and second groups, respectively.

8. A method of operating a computer network, said computer network including a source node and a destination node, wherein said source node includes a processor, comprising the steps of:

a) sending first data from said source node to said destination node by said network at a first loading level and receiving at said source node a first acknowledgement from said destination node of said network, said first acknowledgement confirming receipt of at least part of said first data; and measuring by said processor a first delay between said sending said first data and said receiving said first acknowledgement;

b) measuring by said processor in said source node a second delay between sending second data from said source node to said destination node and receiving at said source node a second acknowledgement from said destination node, said second acknowledgement confirming receipt of at least part said second data, said second data being sent by said source node with a second loading level different from said first loading level;

c) automatically changing the loading level by said processor of said source node, in response to a function of the difference in said measured first and second delays, said step of changing including decreasing from said second loading level if said second delay is greater than said first delay and increasing from said second loading level if said second delay is less than said first delay.

9. A method according to claim 8 wherein said step of changing the loading includes one of the steps of (a) changing the number of packets of data sent by said source and (b) changing the rate of packets sent by said source; said number of packets being termed the "window size".

10. A method according to claim 8 wherein said step of sending a first group of sata includes transmitting a number of packets of data.

11. A method according to claim 10 wherein said step of decreasing the loading includes decreasing the number of packets and said step of increasing the loading includes increasing the number of packets.

12. A method according to claim 11 wherein said step of changing the changeable parameter is also in response to the measured delays as a function of the relative change in the window sizes.

13. A method of controlling a source node in a data communications network, said source node including a processor, comprising the steps of:

a) sending first data from said source node of said network to a destination node in said network, using a selected data loading;

b) receiving at said source node first acknowledgement indicating receipt of at least part of said first data by said destination node;

c) measuring by said source node the delay time between said step of sending the first data and said step of receiving the first acknowledgement;

d) sending second data from said source node of said network to said destination node, using a different data loading from said selected data loading;

e) receiving at said source node a second acknowledgement indicating receipt of at least part of said second data by said destination node;

f) measuring said source node the delay time between said step of sending the second data and said step of receiving the second acknowledgement;

g) comparing by said processor at said source node said first and second measured delay times to produce a difference value;

h) and, at said source node, in response to a function of said difference value, changing the loading by increasing the data loading for further data sent by said source node to said destination node if said second delay time is less than said first delay time and by decreasing the data loading for further data sent by said source node to said destination node if said second delay time is greater than said first delay time.

14. A method according to claim 13 wherein said steps of sending first and second data includes sending data in a plurality of separate packets.

15. A method according to claim 14 wherein said step of decreasing said data loading is by reducing, by an integral number of packets, the size of window during which packets of data are sent, and wherein said step of increasing said data loading is by increasing, by an integral number of packets, the size of window during which packets of data are sent.

16. A method according to claim 13 wherein said data communications network includes a plurality of nodes and said steps are performed independently at each of said nodes.

17. A method according to claim 16 including the step of connecting said nodes to a router by serial links.

18. A method according to claim 13 wherein said steps of increasing and decreasing said data loading include varying said data loading above and below an operating point for said network to provide maximum throughput at minimum delay time.

19. A method of operating a packet data communications network having a plurality of nodes said nodes including a source node and a destination node, said source node including a processor, comprising the steps of:

a) sending first data from said source node of said network to said destination node of said network, using a selected first level of loading for said first data, said first data including a number of packets;

b) receiving at said source node a first acknowledgement of receipt of at least part of said first data by said destination node;

c) measuring and recording by said processor at said source node a first delay time between said step of sending said first data and said step of receiving said first acknowledgement;

d) sending second data from said source node of said network to said destination node, using a second level of loading for said step of sending second data different from said selected first level of loading, said second data including a number of packets;

e) receiving at said source node a second acknowledgement of receipt of at least part of said second data by said destination node;

f) measuring by said processor at said source node a second delay time between said step of sending said second data and said step of receiving said second acknowledgement, and comparing said recorded first delay time with the second delay time, to determine a change in delay time; and

g) subsequently sending from said source node to said destination node third data at a loading greater than said second level if said change is a negative value and at a loading less than said second level if said change is a positive value, to approach an operating point of maximum throughput at minimum delay time.

20. A method according to claim 19 wherein said steps of sending said first data, said second data, and said further data each includes sending data in a plurality of separate packets.

21. A method according to claim 20 wherein said step of sending said third data includes changing, by an integral number of packets, the size of window during which packets of data are sent.

22. Apparatus for operating a computer network, wherein said network includes a source node and a destination node, comprising: means including a processor at said source node for measuring a first delay between sending first data from said source node to said destination node in said network at a first loading value and receiving at said source node a first acknowledgement from said destination node in said network indicating receipt of at least part of said first data, and means including said processor at said source node for subsequently measuring a second delay between sending second data from said source node to said destination node in said network at a second loading value different from said first loading value and receiving at said source node a second acknowledgement from said destination node in said network indicating receipt of at least part of said second data; and means including said processor at said source node for thereafter changing the loading by said source for subsequent data sent by said source node to said destination node by varying the loading from said second loading value toward another loading value in response to the difference between said measured first and second delays, the loading being decreased if said difference is positive and increased if said difference is negative.

23. Apparatus according to claim 22 wherein said source node includes means for sending said first data as a number of packets of data, and said means for changing the loading changes said number of packets of data sent by said source node.

24. Apparatus for controlling a first node in a data communications network, comprising: transmitter/receiver means including a processor in said first node and having sending means for sending first data from said first node to a destination node using a selected window size and then subsequently sending second data from said first node to said destination node using a second window size different from said selected window size, said transmitter/receiver means having receiving means for receiving at said first node acknowledgements from said destination node indicating receipt of at least parts of said first and second data after first and second delay times, respectively; control means including said processor in said first node including means for comparing said first delay time between said sending and receiving for said first data with said second delay time between said sending and receiving for said second data; and said transmitter/receiver means including means for subsequent sending of third data using an altered window size in response to said means for comparing, said means for subsequent sending using an altered window size larger than said second window size if said second delay time is less than said first delay time and less than said second window size if said second delay time is greater than said first delay time.

25. Apparatus according to claim 24 wherein said transmitter/receiver means transmits a number of packets of data from said first node of the network through at least one router to said destination node of the network.

26. Apparatus according to claim 24 wherein said sending means sends said first and second data by data packets, and wherein said window sizes each consists of an integral number of data packets.

27. Apparatus according to claim 26 wherein said means for subsequent sending causes said window size to increase when said comparing of the delays shows there is no difference in the delays.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

This invention relates to data communication systems, and more particularly to a delay-based procedure for congestion avoidance in a packet data communication network employing serial data transmission.

Traditional congestion control schemes act to reduce the load on the network when the load on the network becomes excessive and packets start getting lost. They allow the network to recover from congestion. Congestion avoidance mechanisms, on the other hand, are prevention mechanisms which allow the network to operate at the optimum load.

In copending application entitled "Congestion Avoidance in Computer Networks," Ser. No. 184,945, filed Apr. 22, 1988, by Rajendra K. Jain, K. K. Ramakrishnan and Dah-Ming Chiu, assigned to Digital Equipment Corporation, assignee of this invention, there is disclosed a congestion avoidance scheme for computer networks in which each router in a network seeks to constrain the total traffic it handles, i.e., its load, to assure that the load is within a region of optimum performance for that router. To accomplish this control, the router generates feedback to the nodes by means of bits in the packets being received by those nodes, to indicate the overload condition. While this system is effective in the function of congestion avoidance, nevertheless the requirement that certain feedback bits be added to the packet is not consistent with use of the system in networks where the packet format is fixed and new bits cannot be added. Also, in a heterogeneous network containing nodes and servers of diverse types, the bits may be meaningless to some equipment. In addition, the method adds overhead to the router or server function.

Other computer network architectures have schemes for congestion control. For example, the Digital Network Architecture (DNA) uses a timeout-based congestion control as disclosed in IEEE Journal on Selected Areas in Communications, October 1986, pp. 1162-1167, and square root buffer limiting as disclosed in IEEE Trans. on Communications, March 1978, pp. 328-337. IBM's System Networking Architecture (SNA) uses congestion bits called a "change window indicator" (CWI) and a "reset window indicator" (RWI) in packets flowing in the reverse direction to ask source nodes to reduce the load during congestion, as disclosed in IBM Systems Journal, Vol. 18, No. 2, 1979, pp. 298-314. In general, other congestion schemes consist of a feedback signal from the network to the users (in the form of timeout, bits, or messages) and a load-control mechanism exercised by the users (reduced window rate).

These prior schemes are all congestion control schemes. Also, they have been dedicated to a specific network system design and protocol, and are not applicable to other types of network architectures. In a network consisting of heterogeneous subnetworks, the congestion feedback from one subnetwork usually would have no meaning to sources on other subnetworks.

It is the principal object of this invention to provide a congestion avoidance procedure for computer networks which is operable in heterogeneous networks, i.e., networks in which subnetworks of differing architecture, e.g., SNA, TCP/IP, ISO/OSI, DNA, etc., may be present. Another object is to provide a congestion avoidance scheme which does not add overhead to the network itself, i.e., to the packet-forwarding routers or servers, and one which does not add overhead to the packet in the form of additional bits for feedback. A further object is to provide a congestion avoidance scheme which does not itself inject additional packets into the network, thereby adding to the traffic load. An additional object is to provide a congestion avoidance scheme which is able to adapt to changing network configurations and traffic, adapting dynamically to a moving target of optimum loading.

SUMMARY OF THE INVENTION

In accordance with one embodiment of the invention, a packet data communication system employs a congestion avoidance method in which each node measures the round-trip delay occurring when it sends data to a destination and receives an acknowledgement. This delay is measured for different load levels, and a comparison of these delays is used to determine whether to increase or decrease the load level. The load level can be changed by adjusting a network tuning parameter such as the window size (number of packets sent in to the network) or the packet rate (packets per unit time). The objective is operation at the knee in the throughput vs. traffic curve, so that the data throughput is high and the round trip delay is low. Control is accomplished at each node individually, without intervention by the router or server, so system overhead is not increased.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as other features and advantages thereof, will best be understood by reference to a detailed description of a specific embodiment which follows, when read in conjunction with the accompanying drawings, wherein:

FIG. 1 is an electrical diagram in block form of a data communication system which may use features of the invention;

FIGS. 2a-2c are graphic representations of throughput, time delay and power as a function of load (number of packets sent) in a network such as that of FIG. 1;

FIG. 3 is a space-time diagram showing events vs. time for packet data transmission and reception in a system of FIG. 1;

FIG. 4 is a graphic representation of delay vs. window size for a network according to one embodiment;

FIG. 5 is a graphic representation of window size vs. time for a node employing the procedure of the invention, according to one embodiment;

FIG. 6 is a graphic representation of window size vs. time similar to FIG. 5, for a node employing the procedure of the invention, according to another example;

FIGS. 7-10 are graphic representations of window size vs. time similar to FIGS. 5 and 6, for a node employing the procedure of the invention, according to other examples.

FIG. 11 is an electrical diagram in block form of one of the adapters 11 used in the computer interconnect system of FIG. 1; and

FIG. 12 is a diagram of one embodiment of a packet format which may be used in the computer interconnect system of FIG. 1.

DETAILED DESCRIPTION OF SPECIFIC EMBODIMENT

Referring to FIG. 1, a packet-type data communication system for transmitting and receiving data packets is illustrated according to one embodiment which may implement features of the invention. A number of users or nodes 10a-10n are connected to a router or server 11 by links 12. The links 12 may be point-to-point communication links or a shared communication medium such as a token ring or Ethernet LAN. Each one of the nodes 10a-10n is a CPU or similar processor-type device which is capable of generating and receiving message packets. The nodes 10 could be disk controllers, high speed printer facilities, or other resources of this type, as well as high-performance data processors. The router 11 is coupled by a link 13 to a router or server 14, to which a number of nodes 15a-15n are connected by links 16. Similarly, the router 14 is coupled by a link 17 to a router 18 which is connected by links 19 to a number of nodes 20a-20n. There may be many more of such routers in the complete network. Each of the routers 11, 14 and 18 may have up to perhaps many dozens of nodes coupled to it, and there may be many of the routers, so the network may contain thousands of nodes. The links 13 and 17 may include wire links in the same building or between buildings, or may include long-distance fibre optic links, as well as satellite links.

The network may consist of several subnetworks each of which may follow a different protocol. For example, the three routers shown in FIG. 1 may be parts of three different subnetworks following DNA, SNA, and TCP/IP, protocols, respectively.

One of the nodes in the system of FIG. 1 would often have rather large blocks of data to send to another node, perhaps several megabytes, but using the packets of typical format this data must be split into many packets of perhaps 512-bytes each. Instead of sending hundreds or thousands of the packets in one burst, however, a node must follow the priority and fairness criteria established for the network.

The routers contain buffers to receive packets from a node and hold them for forwarding towards the destination when a path is free. These buffers act as first-in-first-out memories. When the buffers are full, however, an incoming packet is merely discarded, and the sender will have to resend it. Buffers also exist at the receiving node (destination).

According to one method of flow control, depending upon the availability of buffers, the destination node allows the source node to send a number of packets, called a window; if this window exceeds the store-and-forward capacity of these buffers in the routers, then a source would send packets needlessly when they were being discarded. Buffer overflow may occur in routers even if it does not occur in the destination.

Another alternative for limiting the load on the network is via packet rates. In this method, the destination permits the source to send packets at a rate of a certain number of packets per unit of time. In this "rate-based" scheme, if the rate permitted by the destination exceeds the capacity of the intermediate routers, too many packets may queue up at the router and eventually the buffers may become full, leading to packet loss.

The limiting rate in a rate-based scheme, can also be specified in terms of bits per second or bytes per second instead of packets per unit of time.

In addition to the window size or packet rate parameters, a network protocol generally has other parameters, referred to as network tuning parameters, which may be adjusted to optimize network performance. Generally, these tuning parameters also affect network loading.

Referring to FIGS. 2a-2c, the performance of a network as seen in FIG. 1 is illustrated as a function of the load, where the load is a measure of the number of packets per unit of time entered by the nodes to the network. In FIG. 2a, it is seen that the throughput (total number of packets sent by any node and received by the destination node) increases linearly with load when the load is low, then when load level 35 is reached a "knee" of the curve occurs, after which an increased load does not increase the throughput significantly but instead additional messages sent will result in unnecessary queueing and increased delay. In FIG. 2b, the delay per packet is plotted as a function of load; below the knee 35 the delay is essentially constant, but above the knee the delay increases with increasing load. At a load level 36, referred to as the "cliff", the capacity of the network is reached, in that additional packets sent by source nodes will not result in any packets getting through, so the throughput approaches zero and the delay approaches infinity, as seen in FIGS. 2a and 2b. It is desirable that a network operate at the knee 35, rather than at the cliff 36. Congestion control schemes attempt to keep the network load to the left of the cliff while congestion avoidance schemes try to maintain the load at the optimum knee level 35. There are several ways to define the knee. One definition is in terms of "network" which is defined as the ratio of network throughput to the round trip delay. Sometimes an exponent .alpha. is used and the power is defined as follows: ##EQU1## The knee is defined as the load at which the power is maximum as shown in FIG. 2c. By choosing the parameter .alpha. greater than one, the network designers can choose to operate the network at a relatively higher delay. Similarly, by choosing .alpha. less than one, the designers can choose to operate at a relatively higher throughput.

An analysis of a method of calculating the optimum window size for operating at the knee will be first made using the assumption that there are no other nodes on the network sending packets, so the only source of delay or limitation in throughput is the capacity of the routers to handle traffic, i.e., how long a queue can be built up before reaching the knee. The following analysis uses these symbols:

W=Window=Number of packets in the network

T=Throughput in packets per unit of time

D=Round-trip delay

P=Power=T.sup..alpha. /D

.alpha.=Exponent used in defining Power

The round-trip delay D and the throughput T are both functions of the window W:

The power is defined as the ratio of throughput and delay: ##EQU2## Here, .alpha. is a parameter chosen by the network designers. From the definition of P, the following can be expressed:

At the point of maximum power, i.e., at the knee: ##EQU3## Thus, at the knee, the relative (percentage) increase in delay is .alpha. times the relative increase in throughput. If .alpha.=1, the percentage increase in delay is equal to the percentage increase in throughput at the knee. Before the knee: ##EQU4## the relative increase in delay is smaller than the relative gain in throughput. After the knee: ##EQU5## the relative increase in delay is smaller than the relative gain in throughput.

To obtain a higher relative increase in delay at the knee, then the system provides .alpha.>1. Similarly, a system with .alpha.>1 provides higher relative increase in throughput at the knee.

For a window flow controlled network, at a given node the throughput T is W packets per round-trip delay, or ##EQU6## and therefore,

At the knee (i.e., where dP=0): ##EQU7## By solving the above condition for W, we get the optimal window size W as: ##EQU8## The above equality holds exactly only if the derivative dD/dW and the delay D are measured at W=W. At other values of W, the equation can be used to find the relative value of W with respect to W, that is, to find if W is less (or more) than W.

The analysis so far is valid for all networks or resources since no assumptions are made about the behavior of the internal components of the network, deterministic or probabilistic distributions of service times, or linear or nonlinear behavior of the delay versus window curve.

If there are no other users on the network, the above analysis provides a way for one node (one user) to determine the knee using the measured delay D and the gradient dD/dW of the delay-window curve.

The value of W as calculated using the above equation gives the optimal direction for window adjustment. If the current window is less than W, then the window should be increased. Similarly, if the current window W is less than W, the window size should be decreased. The exact difference between W and W may or may not be meaningful; for example, if the gradient dD/dW is zero at a particular W, then W is infinite indicating that W should be increased, but this should not be interpreted to mean that the path has infinite knee capacity. At different values of window W, the calculated W may be different, but in each case it points in the right direction. In other words, only the sign, and not the magnitude, of the difference (W-W), is meaningful.

One way to determine the correct direction of window adjustment is to use the normalized delay gradient (NDG) which is defined as the ratio: ##EQU9## If the load is low, NDG is low, or, if the load is high, NDG is high. At the knee, NDG is given by: ##EQU10## If the parameter .alpha.=1, the NDG at knee is one-half. Thus, by calculating NDG, the decision of whether to increase or decrease the window size is apparent.

For multiuser systems, the above equation for W is no longer straightforward. There are two optimal operating points, one being "social" and the other being "selfish".

Given n users (nodes) sharing a single path, the system throughput T is a function of the sum of the windows of all n users: ##EQU11## Here, W.sub.i is the window of the i.sup.th user, and D is the common delay experienced by each of the n users. The system power is defined on the basis of system throughput: ##EQU12## The point of maximum system power is given by a set of n equations like the following: ##EQU13## The optimal operating point so obtained is called the social optimum. Note that the delay D and the partial derivatives .differential.D/.differential.W.sub.i in the above equation should be evaluated at W.sub.i =W.sub.i.

Each individual user's power P.sub.i is based on the user's throughput T.sub.i and is given by ##EQU14## The user's power is maximum when: ##EQU15## The operating point so obtained is called a selfish optimum. It is clear by examining the above equations for W.sub.i that W.sub.i obtained by the selfish optimum is not the same as that obtained by the social optimum. They may not indicate that the node should change its window size in the same direction. The two values are equal if .SIGMA..sub.j.noteq.i W.sub.j =0, that is, if there is only one user on the network; for such a case, either equation for W.sub.i can be used to determine the direction of window adjustment.

Social considerations would lead a conscientious user to lower the window size as other users increase their traffic, while selfish considerations would lead a user to increase its window size as other users increase their traffic. This behavior is not only mathematically true according to the above relationships for W.sub.i, but it is also intuitively true; it is observed that persons start hoarding a resource and increase their apparent demand for it if the resource begins to exhibit a condition of short supply. In congestion avoidance, the aim is attaining social optimum, because selfish optimum leads to a race condition in which each user tries to maximize its power at the cost of that of the others, and the windows keep increasing without bound.

An examination of the equation above for W.sub.i for social optimum shows that at a given node it is necessary to know the windows of all the other users in order to determine the social optimum window at this node. A congestion avoidance method requiring each node to send a packet to all of the other nodes to inform the other users of the window size being used would cause too much overhead in added traffic to be acceptable. Fortunately, there is a special case in which knowledge of other users' windows is not required to achieve the social optimum. This special case involves deterministic networks.

A deterministic computer network is one in which the packet service time (the time to forward a packet) at the servers or routers is not a random variable. The service time per packet at different servers may be different, but at each server the time is fixed. Analytically, such networks can be modeled by a closed queueing network of m D/D/1 servers, where m is the number of queues that the packets and their acknowledgements pass through in one round trip through the network. For such networks the delay versus window curve consists of two straight line segments meeting at the knee. Before the knee, the delay is constant: ##EQU16## where, t.sub.i is the service time of the i.sup.th server. After the knee, the delay increases linearly:

where t.sub.b is the service time of the bottleneck server, i.e., ##EQU17## Fixed delay servers such as satellite links are not included in the maximum determination but are included in the summation. The two equations for delay above can be combined into one: ##EQU18## The power is maximum at the knee, where: ##EQU19## This equation for optimal window size allows calculation of the knee capacity of the path ##EQU20##

The congestion avoidance procedure executed at a node need to answer the following three questions: (1) Whether to increase or decrease the size of the window? This is the "decision function". (2) How large a change in window size should be made? This is the "increase/decrease" algorithm. (3) How often should a change be made in the window size? This is referred to as "decision frequency". These three components together form what is called user policy. The delay-based procedure according to the invention has no network policy since the network does not explicitly participate in the congestion avoidance method as will be described, although other congestion avoidance methods such as that of above-referenced copending application may be used at the same time.

The decision function of the procedure determines the direction of window adjustment at a node. The normalized delay gradient NDG can be used as the decision function. For deterministic networks, NDG is zero to the left of the knee. Given round-trip delays D and D.sub.old at windows W and W.sub.old respectively, the decision function consists of checking simply if NDG is zero. This algorithm is described below: ##EQU21##

In the above algorithm, .delta. is the threshold value chosen by the network designers. The threshold value of .alpha./(1+.alpha.), 1/2, zero, etc. may be used depending upon the desired operating point. W.sub.min and W.sub.max are the lower and upper bounds on the window. The upper bound is set equal to the flow control window permitted by the receiving node based on its local buffer availability considerations. The lower bound is greater or equal to one since the window cannot be reduced to zero.

By setting W.sub.min =W.sub.max, the window adjustment can be disabled. Note the window must either increase or decrease at every decision point. It cannot remain constant (except when the procedure has been disabled by setting W.sub.min =W.sub.max). This is necessary since the network load is constantly changing, and it is important to ensure that changes in gradient, if any, are detected as soon as possible. Also note that instead of checking whether the change in delay (D-D.sub.old) is zero, the comparison is whether NDG is zero. These two conditions may be equivalent but the latter is preferred since NDG is a dimensionless quantity and its value remains the same regardless of whether delays are measured in nanoseconds, milliseconds, seconds, or any other time units. The difference in delay could be made to look arbitrarily small (or large) by appropriate manipulation of its units, but NDG is not susceptible to such manipulations.

The procedure uses additive increase and multiplicative decrease algorithms which are the simplest alternatives leading to fairness and convergence for multiple users starting at arbitrary window values. Thus, if the window has to be increased, we do so additively:

For a decrease, window is multiplied by a factor less than one:

The parameters .DELTA.W and c affect the amplitude and frequency of oscillations when the system operating point approaches the knee. Recommended values of these two parameters are .DELTA.W=1 and c=0.875.

The choice of additive increase and multiplicative decrease is for the following reasons. If the network is operating below the knee, all users increase their window size equally, but, if the network is congested, the multiplicative decrease makes users with larger windows reduce more than those with smaller windows, making the allocation more fair. Note that 0.875=1-2.sup.-3. Thus, the multiplication can be performed without floating point hardware, and by simple logical shift instructions. The recommended values of the increase/decrease parameters lead to small oscillations and are easy to implement. The computations should be rounded to the nearest integer. Truncation, instead of rounding, results in a slightly lower fairness.

The "decision frequency" component of the procedure helps decide how often to change the window size. Changing too often leads to unnecessary oscillations, whereas changing infrequently leads to a system that takes too long to adapt. According to general system control theory, the optimal control frequency depends upon the feedback delay--the time between applying a control (change window size) and getting feedback from the network corresponding to this control. In a computer network such as that of FIG. 1, it takes one round-trip delay to affect the control, that is, for the new window to take effect, and another round-trip delay to get the resulting change fed back from the network to the node which made the change. The operation of the congestion avoidance system is illustrated in FIG. 3, which depicts the flow of data packets and acknowledgements over time. In FIG. 3, prior to time t=0 the window size W is W.sub.0 (in the illustration, W.sub.0 =2 packets), and at t=0 the window size is changed to W.sub.1 (in the illustration, W.sub.1 =3 packets). Beginning at t= 0, three packets are sent, and beginning at t=D.sub.0 the acknowledgements for these three packets begin to arrive at the source node. At time t=D.sub.0 +D.sub.1 the acknowledgements for the three packets sent beginning at time t=D.sub.0 start to arrive.

The delay experienced by a packet is a function of the window size used before the packet is sent. The delay D.sub.0 is a function of W.sub.0, and the delay D.sub.1 is a function of W.sub.1. This, therefore, leads to the conclusion that windows be adjusted once every two round-trip delays (two window turns) and that only the feedback signals received in the most recent cycle be used in window adjustment.

In the procedure as outlined above, alternate delay measurements are discarded. This leads to a slight loss of information which can be avoided by a simple modification. The delay experienced by every packet is a function of the number of packets already in the network. This number is normally equal to the current window except at the point of window change. If for those packets whose sending times are recorded for round-trip delay measurements, there is also recorded the number W.sub.out of packets outstanding (packets sent but not acknowledged) at the time of sending, the delay D and the number W.sub.out have a one-to-one correspondence. Any two {W.sub.out, D} pairs can thus be used to compute NDG. This modification allows updating the window every round-trip delay. The increased information results in a faster response to the network changes.

With regard to initialization, the procedure does not set any requirements on the window values to be used when a node starts a new connection through the network. Transports can start the connections at any window value and the procedure will eventually bring the load to the knee level. Nonetheless, starting at the minimum window value is recommended as this causes minimal affect on other users that may already be using the network.

A simulation model was used to demonstrate the performance of various delay-based congestion avoidance alternatives. This same model was used earlier for evaluation of a timeout-based congestion control method as described in IEEE Journal on Selected Areas in Communications, October 1986, pp. 1162-1167 and of the binary feedback congestion avoidance method of the above-mentioned copending application which is also described in Proc. ACM Sigcomm '88, Stanford, CA, August 1988. The model allows simulation of a general computer network with several terrestrial and satellite links, with any reasonable number of users, intermediate systems, and links.

Referring to FIGS. 4 and 5, a network configuration was simulated with a satellite link and with several terrestrial links; that is, the network was similar to that of FIG. 1, with one of the links 13, 17, etc., a satellite link and others wire or fibre optics links. Most large networks now in use generally consist of several wide area networks (WANs) and local area networks (LANs) connected together via satellite links, together forming a "very large area network" or VLAN. The queueing model of the simulated network consists of four servers with deterministic service times of 2, 5, 3, and 4 units of time, and the satellite link is represented by a fixed (regardless of window) delay of 62.5 units of time. All service times are relative to source service time which therefore has a service time of 1. For this network, the bottleneck server's service time t.sub.b 5, and .SIGMA.t.sub.i =77.5. If the total number of packets in this network is W, the delay D is given by:

The knee of the delay curve as seen in FIG. 4 is at W.sub.knee =77.5/5=15.5.

A plot of window size as a function of time, as obtained from simulation using the sample procedure, is shown using a solid line curve in FIG. 5, where it is seen that within sixteen window adjustments, the window reaches the optimal value (shown using a broken line curve) and then oscillates between the values of twelve and sixteen. Every fourth cycle, the window curve takes an upturn at thirteen (rather than at twelve) because window values are maintain