WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Dual counter consistency control for fault tolerant network file servers    
United States Patent5713017   
Link to this pagehttp://www.wikipatents.com/5713017.html
Inventor(s)Lin; Dah-Haur David (Austin, TX); Shi; Shaw-Ben (Austin, TX); Wei; Yi-Hsiu (Austin, TX)
AbstractA consistency control method for a fault tolerant file system. Data files in this fault tolerant file system are replicated on file system servers. The update request will be sent to all the file servers. However, since different servers might receive update requests in different order. The "dual-counter" scheme described in this invention is to resolve this problem. To perform an update, the client needs to obtain a sequence number from a sequencer. Two counters will be maintained by each server. The first counter is used primarily by the sequencer to sent out sequence numbers. The second counter is used by the server to keep track of the largest sequence number of requests which have been completed on the server. With the sequence numbers and the first and second counters, updates will be performed in the same order on different file servers.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5713017
Dual counter consistency control for fault tolerant network file servers - US Patent 5713017 Drawing
Dual counter consistency control for fault tolerant network file servers
Inventor     Lin; Dah-Haur David (Austin, TX); Shi; Shaw-Ben (Austin, TX); Wei; Yi-Hsiu (Austin, TX)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     January 27, 1998
Application Number     08/484,228
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 7, 1995
US Classification     707/8 707/9 707/10
Int'l Classification     G06F 017/30
Examiner     Black; Thomas G.
Assistant Examiner     Coby; Frantz
Attorney/Law Firm     LaBaw; Jeffrey S.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/182.04 395/575 395/200.03 395/200.02 395/650 395/608 395/610 395/606 395/607 395/609 455/601 370/16.1 364/200
Patent Tags     dual counter consistency control fault tolerant network file servers
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5630124
Coyle, Jr.
707/103R
May,1997

[0 after 0 votes]
5561797
Gilles
707/8
Oct,1996

[0 after 0 votes]
5557792
Josten

Sep,1996

[0 after 0 votes]
5544151
Baek
370/223
Aug,1996

[0 after 0 votes]
5513314
Kandasamy
714/6
Apr,1996

[0 after 0 votes]
5499367
Bamford
707/8
Mar,1996

[0 after 0 votes]
5497463
Stein
709/203
Mar,1996

[0 after 0 votes]
5495609
Scott
707/8
Feb,1996

[0 after 0 votes]
5459862
Garliepp
707/8
Oct,1995

[0 after 0 votes]
5448723
Rowett
714/4
Sep,1995

[0 after 0 votes]
5371885
Letwin
707/205
Dec,1994

[0 after 0 votes]
5339408
Bruckert
714/11
Aug,1994

[0 after 0 votes]
5325517
Baker
714/11
Jun,1994

[0 after 0 votes]
5307487
Tavares
707/8
Apr,1994

[0 after 0 votes]
5307490
Davidson
719/328
Apr,1994

[0 after 0 votes]
5247672
Mohan
711/152
Sep,1993

[0 after 0 votes]
5226129
Ooi
712/204
Jul,1993

[0 after 0 votes]
5210871
Lala
710/116
May,1993

[0 after 0 votes]
5191652
Dias
709/251
Mar,1993

[0 after 0 votes]
5123104
Levine
707/1
Jun,1992

[0 after 0 votes]
5084816
Boese

Jan,1992

[0 after 0 votes]
5023942
Goepel
398/173
Jun,1991

[0 after 0 votes]
4945474
Elliott
714/16
Jul,1990

[0 after 0 votes]
4823310
Grand
707/8
Apr,1989

[0 after 0 votes]
4713755
Worley, Jr.
711/123
Dec,1987

[0 after 0 votes]
4630045
Georgiou
340/2.25
Dec,1986

[0 after 0 votes]
4569015
Dolev
709/201
Feb,1986

[0 after 0 votes]
4318182
Bachman
718/105
Mar,1982

[0 after 0 votes]
4112488
Smith, III
709/222
Sep,1978

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. In a distributed computing environment, a method of managing concurrent updates by a group of server machines coupled together by a network, each server machine having a respective first and second counter, the method comprising the steps of:

in response to a client request from a client machine also coupled to the network, sending a unique sequence number identifying the client request derived from the first counter in a first server machine to the group of server machines and the client machine;

incrementing the first counter in the group of server machines;

in response to a client update request from the client machine including the unique sequence number, checking by each server machine in the group of server machines against its second counter that all other update requests with lower sequence numbers have been received;

in response to missing sequence numbers, requesting from other servers in the group of servers for update requests corresponding to the missing sequence numbers; and

performing the update request if all update requests up to the unique sequence number have been received by the respective server machine.

2. A method for managing a replicated file server including a plurality of servers coupled by a network, comprising the steps of:

sending a sequence number from a counter in a first server to a client requesting a right to perform an update request, the sequence number identifying a particular update request;

sending the sequence number and the update request to the plurality of servers by the requesting client;

checking by a second server in the plurality of servers the sequence number against a counter in the second server that all updates with lower sequence numbers have been received; and

performing the update request if all update requests up to the sequence number have been received by the second server.

3. The method as recited in claim 2 further comprising the steps of:

checking by a third server in the plurality of servers the sequence number against a counter in the third server that all updates with lower sequence numbers have been received;

in response to missing sequence numbers, requesting from the second server for update requests corresponding to the missing sequence numbers: and

performing the update request once all update requests up to the sequence number have been received by the third server.

4. The method as recited in claim 3 wherein each of the servers in the plurality of servers has a respective first and second counter, the first counter of a type used to derive sequence numbers, the second counter of a type used to check a sequence number accompanying an update request.

5. The method as recited in claim 4 wherein a request for a sequence number is sent to the plurality of servers by the requesting client and the method further comprising the steps of:

in response to the reception of the request for a sequence number by a second server, determining whether a sequence number has been sent by the first server;

in response to an apparent failure of the first server, determining whether the first server has failed; and

in response to the failure of the first server, assuming creation of sequence numbers by the second server.

6. The method as recited in claim 2 wherein the update request is multicast to each of the plurality of servers.

7. The method as recited in claim 2 wherein a plurality of clients request a right to perform update requests, a sequence number is sent to each client by the first server and each of the plurality of servers perform update requests in an order in which the sequence numbers were sent.

8. The method as recited in claim 6 further comprising the steps of:

setting a timer in response to receiving the request for the right to perform an update request;

in response to an apparent failure of the requesting client to send the sequence number and the update request to the plurality of servers within a period of time, checking whether any of the plurality of servers received the sequence number and update request; and

in response to nonreception by the plurality of servers, cancelling the sequence number in the plurality of servers.

9. A computer memory for storing a set of computer readable instructions for managing a replicated file server including a plurality of servers coupled by a network, comprising:

means for requesting a sequence number from a counter in a first server by a client, the sequence number for identifying an update request to be made by the client;

means for sending the sequence number and the update request to the plurality of servers by the requesting client;

means for checking by a second server in the plurality of servers the sequence number against a counter in the second server that all update requests with lower sequence numbers have been received; and

means for performing the update request if all update requests up to the sequence number have been received by the second server.

10. The memory as recited in claim 9 further comprising:

means for checking by a third server in the plurality of servers the sequence number against a counter in the third server that all updates with lower sequence numbers have been received;

means responsive to missing sequence numbers for requesting from the second server for update requests corresponding to the missing sequence numbers: and

means for performing the update request once all update requests up to the sequence number have been received by the third server.

11. The memory as recited in claim 10 further comprising:

means responsive to reception of a request for a sequence number for a second server for determining whether a sequence number has been sent by the first server;

means responsive to an apparent failure of the first server for determining whether the first server has failed; and

means responsive to the failure of the first server for assuming creation of sequence numbers by the second server.

12. The memory as recited in claim 11 further comprising:

means for setting a timer in response to receiving the request for a sequence number;

means responsive to an apparent failure of the requesting client to send the sequence number and the update request to the plurality of servers within a period of time, checking whether any of the plurality of servers received the sequence number and update request; and

means responsive to nonreception by the plurality of servers for cancelling the sequence number in the plurality of servers.

13. A replicated file server comprising:

a network;

a plurality of file servers coupled to the network, each file server having a respective first and second counter, the first counter in a first server for generating sequence numbers which entitle requesting clients to perform update requests, the second counter in the plurality of servers for sequencing update requests from the requesting clients; and

a plurality of clients coupled to the network for making update requests to the plurality of servers.

14. The replicated file server recited in claim 13 further comprising means for multicasting update requests to the plurality of servers.

15. The replicated file server recited in claim 14 further comprising:

means for multicasting requests for sequence numbers to the plurality of servers by the requesting clients;

means for multicasting a particular sequence number and a corresponding update request to the plurality of servers by the requesting client;

means for checking by each server in the plurality of servers the particular sequence number against a respective second counter that all update requests corresponding to lower sequence numbers have been received; and

means for performing the update request corresponding to the particular sequence number if all update requests up to the particular sequence number have been received by the respective server.

16. The replicated file server as recited in claim 15 further comprising:

means responsive to missing sequence numbers for requesting for update requests corresponding to the missing sequence numbers: and

means for performing all update requests corresponding to the missing sequence numbers.

17. The replicated file server as recited in claim 13 further comprising:

means responsive to reception of a request for a sequence number for a second server for determining whether a sequence number has been sent by the first server;

means responsive to an apparent failure of the first server for determining whether the first server has failed; and

means responsive to the failure of the first server for assuming creation of sequence numbers by the second server.

18. The replicated file server as recited in claim 13 further comprising:

means for setting a timer in response to receiving the request for a sequence number;

means responsive to an apparent failure of a requesting client to send the sequence number and the update request to the plurality of servers within a period of time, checking whether any of the plurality of servers received the sequence number and update request; and

means responsive to nonreception by the plurality of servers for cancelling the sequence number in the plurality of servers.
 Description Submit all comments and votes
 


DESCRIPTION

BACKGROUND OF THE INVENTION

This invention relates generally to distributed computing environments. More particularly, it relates to consistency control for a fault tolerant replicated file system.

A basic technique for improving reliability of a file system is to mask the effects of failures through replication. There are two major approaches of building a fault tolerant file system: hardware replication approaches and software replication approaches. HA-NFS is representative of the hardware replication approach. HA-NFS uses dual-ported disks which are accessible to a server and a backup. Each HA-NFS server is assigned a backup that can access the server's disk if the server fails. The server stores information about its volatile state on a disk log. If the server fails, the backup will reconstruct the server's lost state using the log. For further details on HA-NFS, the user is referred to "A Highly Available Network File Server", A. Birdie et el. in USENIX Conference Proceedings, pp. 199-205, Jan. 1991.

However, in a distributed environment, it is not always necessary to use special hardware for improved reliability. The computers connected by the network are a natural resource of duplicates. The software replication approach replicates file systems on workstations in the network. If the primary file system crashes, one of the available file systems will take over and provide the data access services. Consistency control protocols are implemented to ensure the consistency among file system replicas. Several existing file systems can be categorized as software approaches. The advantages of a software replication approach are:

1. Flexibility. A replicated file system can reside on any workstation in the network.

2. Fault tolerance. Tolerate both hardware and software failures. Since each workstation has its own hardware devices and operating system, the failure of one workstation does not affect the operation of others.

3. Performance. Better performance can be achieved through distributing file read operations among file system replicas.

Two known software replication approaches include RFNS and Deceit. RNFS achieves fault-tolerance by replicating the server and by using an automatic broadcast protocol for communication. The client sends its requests to an intermediary (an agent) that in turn broadcasts to all of the servers. To the client, the agent appears to be a file server that has an exceptionally reliable secondary storage. The advantage of this approach is that the clients can stay unchanged. However, all RNFS requests have to pass through the agent. For each RPC to the server, an additional RPC is required for this extra level of indirection. Another problem with the agent is that it is a critical process. If the agent fails, all files are not accessible. RNFS introduces a mechanism to replicate the agents and control the data consistency among the agents. The reader is referred to K. Marzullo and F. Schmuck "Supplying high availability with a standard network file system", IEEE Eighth International Conference on Distributed Computing, June 1988, pages 447-453 for more details on RNFS.

Similar to RNFS, Deceit uses an intermediary, the Deceit server to catch the requests from the clients and forward them to a group of replicated servers. Deceit provides a wide range of protocol "control knobs" to allow trade-offs among performance, availability, concurrency and consistency. Thus, users that do not require certain properties on some files are not forced to pay the corresponding overhead. Deceit is built on the ISIS Distributed Programming Environment, which provides reliable process groups, communication primitives and failure detection. More details on Deceit can be found in A. Siegel, K. Birman and K. Marzullo. "Deceit: A flexible distributed file systems" In Summer 1990 USENIX Conference, pages 51-61, Anaheim, Calif., June 1990. ISIS is described in greater detail in K. Birman and R. Cooper "The ISIS project: real experience with a fault tolerant programming system", Oper. Sys. Rev. vol. 25 No. 2 Apr. 1991 pages 103-107.

The invention of the fault tolerant network file system discussed below is an example of the software replication approaches.

SUMMARY OF THE INVENTION

Therefore, it is an object of the invention to synchronize multiple updates to a replicated file.

It is another object of the invention that failures and recovery be transparent to the users of the file system.

It is another object of the invention to allow updates to replicate file servers even though one of the file servers has failed.

It is another object of the invention to support heterogenous hardware platforms.

It is another object of the invention to minimize the performance impact of adding fault tolerance to a distributed file system.

It is another object of the invention to support heterogeneous file systems.

These and other objects are accomplished by a consistency control method for a fault tolerant file system. Data files in this fault tolerant file system are replicated on a group of file system servers. The update request will be sent to all the file servers. However, since different servers might receive update requests in a different order, each server is equipped with the "dual-counter" mechanism described in this invention. To perform an update, the client needs to obtain a sequence number from a sequencer, typically a designated server of the group of file servers. Two counters will be maintained by each server. The first counter is used primarily by the sequencer to send out sequence numbers. The second counter is used by each server to keep track of the largest sequence number of requests which have been completed on the server. With the sequence numbers and the first and second counters, updates will be performed in the same order on different file servers.

BRIEF DESCRIPTION OF THE DRAWINGS

These objects, features and advantages will be more readily understood with reference to the attached figures and following description.

FIG. 1 depicts a computer system configured according to the teachings of the present invention.

FIG. 2 depicts a distributed file system according to the invention.

FIG. 3 is a flow diagram of the dual counter synchronization mechanism.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The invention may be run on a variety of computers or collection of computers under a number of different operating systems. The computer could be, for example, a personal computer, a mini computer, mainframe computer or a computer running in a distributed network of other computers. Although the specific choice of computer is limited only by disk and disk storage requirements, computers in the IBM PS/2 (TM) series of computers could be used in the present invention. For additional information on IBM's PS/2 series of computers, the reader is referred to Technical Reference Manual Personal Systems/2 Model 50, 60 Systems IBM Corporation, Part No. 68X2224 Order Number S68X-2224 and Technical Reference 2 Manual Personal Systems/2 (Model 80) IBM Corporation Part No. 68X 2256 Order Number S68X-2254. One operating system which an IBM PS/2 personal computer may run is IBM's OS/2 2.0 (TM) for more information on the IBM OS/2 2.0 Operating System the reader is referred to OS/2 2.0 Technical Library, Programming Guide Vol. 1, 2, 3 Version 2.00 Order Nos. 10G6261, 10G6495, 10G6494.

In the alternative, the computer system might be in the IBM RISC System/6000 (TM) line of computers which run on the AIX (TM) operating system. The various models of the RISC System/6000 is described in many publications of the IBM Corporation for example, RISC System/6000, 7073 and 7016 POWERstation and POWERserver Hardware Technical reference, Order No. SA23-2644-00. The AIX operating system is described in General Concepts and Procedure--AIX Version 3 for RISC System/6000 Order No. SC23-2202-00 as well as other publications of the IBM Corporation.

In FIG. 1, a computer 10, comprising a system unit 11, a keyboard 12, a mouse 13 and a display 14 are depicted in block diagram form. The system unit 11 includes a system bus or plurality of system buses 21 to which various components are coupled and by which communication between the various components is accomplished. The microprocessor 22 is connected to the system bus 21 and is supported by read only memory (ROM) 23 and random access memory (RAM) 24 also connected to system bus 21. A microprocessor in the IBM PS/2 series of computers is one of the Intel family of microprocessors including the 386 or 486 microprocessors. However, other microprocessors including, but not limited to, Motorola's family of microprocessors such as the 68000, 68020 or the 68030 microprocessors and various Reduced Instruction Set Computer (RISC) microprocessors such as the PowerPC chip manufactured by IBM, or others Hewlett Packard, Sun, Motorola and others may be used in the specific computer.

The ROM 23 contains among other code the Basic Input-Output system (BIOS) which controls basic hardware operations such as the interaction and the disk drives and the keyboard. The RAM 24 is the main memory into which the operating system and application programs are loaded. The memory management chip 25 is connected to the system bus 21 and controls direct memory access operations including, passing data between the RAM 24 and hard disk drive 26 and floppy disk drive 27. The CD ROM 32 also coupled to the system bus 21 is used to store a large amount of data, e.g., a multimedia program or presentation.

Also connected to this system bus 21 are various I/O controllers: The keyboard controller 28, the mouse controller 29, the video controller 30, and the audio controller 31. As might be expected, the keyboard controller 28 provides the hardware interface for the keyboard 12, the mouse controller 29 provides the hardware interface for mouse 13, the video controller 30 is the hardware interface for the display 14, and the audio controller 31 is the hardware interface for the speakers 15. An I/O controlle