|
|
|
| United States Patent | 5713017 |
| Link to this page | http://www.wikipatents.com/5713017.html |
| Inventor(s) | Lin; Dah-Haur David (Austin, TX);
Shi; Shaw-Ben (Austin, TX);
Wei; Yi-Hsiu (Austin, TX) |
| Abstract | A consistency control method for a fault tolerant file system. Data files
in this fault tolerant file system are replicated on file system servers.
The update request will be sent to all the file servers. However, since
different servers might receive update requests in different order. The
"dual-counter" scheme described in this invention is to resolve this
problem. To perform an update, the client needs to obtain a sequence
number from a sequencer. Two counters will be maintained by each server.
The first counter is used primarily by the sequencer to sent out sequence
numbers. The second counter is used by the server to keep track of the
largest sequence number of requests which have been completed on the
server. With the sequence numbers and the first and second counters,
updates will be performed in the same order on different file servers. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5713017 |
|
|
Dual counter consistency control for fault tolerant network file servers |
|
|
|
|
|
| Publication Date |
January 27, 1998 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 5630124 Coyle, Jr. 707/103R May,1997 |      Your vote accepted [0 after 0 votes] | | 5561797 Gilles 707/8 Oct,1996 |      Your vote accepted [0 after 0 votes] | | 5557792 Josten
Sep,1996 |      Your vote accepted [0 after 0 votes] | | 5544151 Baek 370/223 Aug,1996 |      Your vote accepted [0 after 0 votes] | | 5513314 Kandasamy 714/6 Apr,1996 |      Your vote accepted [0 after 0 votes] | | 5499367 Bamford 707/8 Mar,1996 |      Your vote accepted [0 after 0 votes] | | 5497463 Stein 709/203 Mar,1996 |      Your vote accepted [0 after 0 votes] | | 5495609 Scott 707/8 Feb,1996 |      Your vote accepted [0 after 0 votes] | | 5459862 Garliepp 707/8 Oct,1995 |      Your vote accepted [0 after 0 votes] | | 5448723 Rowett 714/4 Sep,1995 |      Your vote accepted [0 after 0 votes] | | 5371885 Letwin 707/205 Dec,1994 |      Your vote accepted [0 after 0 votes] | | 5339408 Bruckert 714/11 Aug,1994 |      Your vote accepted [0 after 0 votes] | | 5325517 Baker 714/11 Jun,1994 |      Your vote accepted [0 after 0 votes] | | 5307487 Tavares 707/8 Apr,1994 |      Your vote accepted [0 after 0 votes] | | 5307490 Davidson 719/328 Apr,1994 |      Your vote accepted [0 after 0 votes] | | 5247672 Mohan 711/152 Sep,1993 |      Your vote accepted [0 after 0 votes] | | 5226129 Ooi 712/204 Jul,1993 |      Your vote accepted [0 after 0 votes] | | 5210871 Lala 710/116 May,1993 |      Your vote accepted [0 after 0 votes] | | 5191652 Dias 709/251 Mar,1993 |      Your vote accepted [0 after 0 votes] | | 5123104 Levine 707/1 Jun,1992 |      Your vote accepted [0 after 0 votes] | | 5084816 Boese
Jan,1992 |      Your vote accepted [0 after 0 votes] | | 5023942 Goepel 398/173 Jun,1991 |      Your vote accepted [0 after 0 votes] | | 4945474 Elliott 714/16 Jul,1990 |      Your vote accepted [0 after 0 votes] | | 4823310 Grand 707/8 Apr,1989 |      Your vote accepted [0 after 0 votes] | | 4713755 Worley, Jr. 711/123 Dec,1987 |      Your vote accepted [0 after 0 votes] | | 4630045 Georgiou 340/2.25 Dec,1986 |      Your vote accepted [0 after 0 votes] | | 4569015 Dolev 709/201 Feb,1986 |      Your vote accepted [0 after 0 votes] | | 4318182 Bachman 718/105 Mar,1982 |      Your vote accepted [0 after 0 votes] | | 4112488 Smith, III 709/222 Sep,1978 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. In a distributed computing environment, a method of managing concurrent
updates by a group of server machines coupled together by a network, each
server machine having a respective first and second counter, the method
comprising the steps of:
in response to a client request from a client machine also coupled to the
network, sending a unique sequence number identifying the client request
derived from the first counter in a first server machine to the group of
server machines and the client machine;
incrementing the first counter in the group of server machines;
in response to a client update request from the client machine including
the unique sequence number, checking by each server machine in the group
of server machines against its second counter that all other update
requests with lower sequence numbers have been received;
in response to missing sequence numbers, requesting from other servers in
the group of servers for update requests corresponding to the missing
sequence numbers; and
performing the update request if all update requests up to the unique
sequence number have been received by the respective server machine.
2. A method for managing a replicated file server including a plurality of
servers coupled by a network, comprising the steps of:
sending a sequence number from a counter in a first server to a client
requesting a right to perform an update request, the sequence number
identifying a particular update request;
sending the sequence number and the update request to the plurality of
servers by the requesting client;
checking by a second server in the plurality of servers the sequence number
against a counter in the second server that all updates with lower
sequence numbers have been received; and
performing the update request if all update requests up to the sequence
number have been received by the second server.
3. The method as recited in claim 2 further comprising the steps of:
checking by a third server in the plurality of servers the sequence number
against a counter in the third server that all updates with lower sequence
numbers have been received;
in response to missing sequence numbers, requesting from the second server
for update requests corresponding to the missing sequence numbers: and
performing the update request once all update requests up to the sequence
number have been received by the third server.
4. The method as recited in claim 3 wherein each of the servers in the
plurality of servers has a respective first and second counter, the first
counter of a type used to derive sequence numbers, the second counter of a
type used to check a sequence number accompanying an update request.
5. The method as recited in claim 4 wherein a request for a sequence number
is sent to the plurality of servers by the requesting client and the
method further comprising the steps of:
in response to the reception of the request for a sequence number by a
second server, determining whether a sequence number has been sent by the
first server;
in response to an apparent failure of the first server, determining whether
the first server has failed; and
in response to the failure of the first server, assuming creation of
sequence numbers by the second server.
6. The method as recited in claim 2 wherein the update request is multicast
to each of the plurality of servers.
7. The method as recited in claim 2 wherein a plurality of clients request
a right to perform update requests, a sequence number is sent to each
client by the first server and each of the plurality of servers perform
update requests in an order in which the sequence numbers were sent.
8. The method as recited in claim 6 further comprising the steps of:
setting a timer in response to receiving the request for the right to
perform an update request;
in response to an apparent failure of the requesting client to send the
sequence number and the update request to the plurality of servers within
a period of time, checking whether any of the plurality of servers
received the sequence number and update request; and
in response to nonreception by the plurality of servers, cancelling the
sequence number in the plurality of servers.
9. A computer memory for storing a set of computer readable instructions
for managing a replicated file server including a plurality of servers
coupled by a network, comprising:
means for requesting a sequence number from a counter in a first server by
a client, the sequence number for identifying an update request to be made
by the client;
means for sending the sequence number and the update request to the
plurality of servers by the requesting client;
means for checking by a second server in the plurality of servers the
sequence number against a counter in the second server that all update
requests with lower sequence numbers have been received; and
means for performing the update request if all update requests up to the
sequence number have been received by the second server.
10. The memory as recited in claim 9 further comprising:
means for checking by a third server in the plurality of servers the
sequence number against a counter in the third server that all updates
with lower sequence numbers have been received;
means responsive to missing sequence numbers for requesting from the second
server for update requests corresponding to the missing sequence numbers:
and
means for performing the update request once all update requests up to the
sequence number have been received by the third server.
11. The memory as recited in claim 10 further comprising:
means responsive to reception of a request for a sequence number for a
second server for determining whether a sequence number has been sent by
the first server;
means responsive to an apparent failure of the first server for determining
whether the first server has failed; and
means responsive to the failure of the first server for assuming creation
of sequence numbers by the second server.
12. The memory as recited in claim 11 further comprising:
means for setting a timer in response to receiving the request for a
sequence number;
means responsive to an apparent failure of the requesting client to send
the sequence number and the update request to the plurality of servers
within a period of time, checking whether any of the plurality of servers
received the sequence number and update request; and
means responsive to nonreception by the plurality of servers for cancelling
the sequence number in the plurality of servers.
13. A replicated file server comprising:
a network;
a plurality of file servers coupled to the network, each file server having
a respective first and second counter, the first counter in a first server
for generating sequence numbers which entitle requesting clients to
perform update requests, the second counter in the plurality of servers
for sequencing update requests from the requesting clients; and
a plurality of clients coupled to the network for making update requests to
the plurality of servers.
14. The replicated file server recited in claim 13 further comprising means
for multicasting update requests to the plurality of servers.
15. The replicated file server recited in claim 14 further comprising:
means for multicasting requests for sequence numbers to the plurality of
servers by the requesting clients;
means for multicasting a particular sequence number and a corresponding
update request to the plurality of servers by the requesting client;
means for checking by each server in the plurality of servers the
particular sequence number against a respective second counter that all
update requests corresponding to lower sequence numbers have been
received; and
means for performing the update request corresponding to the particular
sequence number if all update requests up to the particular sequence
number have been received by the respective server.
16. The replicated file server as recited in claim 15 further comprising:
means responsive to missing sequence numbers for requesting for update
requests corresponding to the missing sequence numbers: and
means for performing all update requests corresponding to the missing
sequence numbers.
17. The replicated file server as recited in claim 13 further comprising:
means responsive to reception of a request for a sequence number for a
second server for determining whether a sequence number has been sent by
the first server;
means responsive to an apparent failure of the first server for determining
whether the first server has failed; and
means responsive to the failure of the first server for assuming creation
of sequence numbers by the second server.
18. The replicated file server as recited in claim 13 further comprising:
means for setting a timer in response to receiving the request for a
sequence number;
means responsive to an apparent failure of a requesting client to send the
sequence number and the update request to the plurality of servers within
a period of time, checking whether any of the plurality of servers
received the sequence number and update request; and
means responsive to nonreception by the plurality of servers for cancelling
the sequence number in the plurality of servers. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
DESCRIPTION
BACKGROUND OF THE INVENTION
This invention relates generally to distributed computing environments.
More particularly, it relates to consistency control for a fault tolerant
replicated file system.
A basic technique for improving reliability of a file system is to mask the
effects of failures through replication. There are two major approaches of
building a fault tolerant file system: hardware replication approaches and
software replication approaches. HA-NFS is representative of the hardware
replication approach. HA-NFS uses dual-ported disks which are accessible
to a server and a backup. Each HA-NFS server is assigned a backup that can
access the server's disk if the server fails. The server stores
information about its volatile state on a disk log. If the server fails,
the backup will reconstruct the server's lost state using the log. For
further details on HA-NFS, the user is referred to "A Highly Available
Network File Server", A. Birdie et el. in USENIX Conference Proceedings,
pp. 199-205, Jan. 1991.
However, in a distributed environment, it is not always necessary to use
special hardware for improved reliability. The computers connected by the
network are a natural resource of duplicates. The software replication
approach replicates file systems on workstations in the network. If the
primary file system crashes, one of the available file systems will take
over and provide the data access services. Consistency control protocols
are implemented to ensure the consistency among file system replicas.
Several existing file systems can be categorized as software approaches.
The advantages of a software replication approach are:
1. Flexibility. A replicated file system can reside on any workstation in
the network.
2. Fault tolerance. Tolerate both hardware and software failures. Since
each workstation has its own hardware devices and operating system, the
failure of one workstation does not affect the operation of others.
3. Performance. Better performance can be achieved through distributing
file read operations among file system replicas.
Two known software replication approaches include RFNS and Deceit. RNFS
achieves fault-tolerance by replicating the server and by using an
automatic broadcast protocol for communication. The client sends its
requests to an intermediary (an agent) that in turn broadcasts to all of
the servers. To the client, the agent appears to be a file server that has
an exceptionally reliable secondary storage. The advantage of this
approach is that the clients can stay unchanged. However, all RNFS
requests have to pass through the agent. For each RPC to the server, an
additional RPC is required for this extra level of indirection. Another
problem with the agent is that it is a critical process. If the agent
fails, all files are not accessible. RNFS introduces a mechanism to
replicate the agents and control the data consistency among the agents.
The reader is referred to K. Marzullo and F. Schmuck "Supplying high
availability with a standard network file system", IEEE Eighth
International Conference on Distributed Computing, June 1988, pages
447-453 for more details on RNFS.
Similar to RNFS, Deceit uses an intermediary, the Deceit server to catch
the requests from the clients and forward them to a group of replicated
servers. Deceit provides a wide range of protocol "control knobs" to allow
trade-offs among performance, availability, concurrency and consistency.
Thus, users that do not require certain properties on some files are not
forced to pay the corresponding overhead. Deceit is built on the ISIS
Distributed Programming Environment, which provides reliable process
groups, communication primitives and failure detection. More details on
Deceit can be found in A. Siegel, K. Birman and K. Marzullo. "Deceit: A
flexible distributed file systems" In Summer 1990 USENIX Conference, pages
51-61, Anaheim, Calif., June 1990. ISIS is described in greater detail in
K. Birman and R. Cooper "The ISIS project: real experience with a fault
tolerant programming system", Oper. Sys. Rev. vol. 25 No. 2 Apr. 1991
pages 103-107.
The invention of the fault tolerant network file system discussed below is
an example of the software replication approaches.
SUMMARY OF THE INVENTION
Therefore, it is an object of the invention to synchronize multiple updates
to a replicated file.
It is another object of the invention that failures and recovery be
transparent to the users of the file system.
It is another object of the invention to allow updates to replicate file
servers even though one of the file servers has failed.
It is another object of the invention to support heterogenous hardware
platforms.
It is another object of the invention to minimize the performance impact of
adding fault tolerance to a distributed file system.
It is another object of the invention to support heterogeneous file
systems.
These and other objects are accomplished by a consistency control method
for a fault tolerant file system. Data files in this fault tolerant file
system are replicated on a group of file system servers. The update
request will be sent to all the file servers. However, since different
servers might receive update requests in a different order, each server is
equipped with the "dual-counter" mechanism described in this invention. To
perform an update, the client needs to obtain a sequence number from a
sequencer, typically a designated server of the group of file servers. Two
counters will be maintained by each server. The first counter is used
primarily by the sequencer to send out sequence numbers. The second
counter is used by each server to keep track of the largest sequence
number of requests which have been completed on the server. With the
sequence numbers and the first and second counters, updates will be
performed in the same order on different file servers.
BRIEF DESCRIPTION OF THE DRAWINGS
These objects, features and advantages will be more readily understood with
reference to the attached figures and following description.
FIG. 1 depicts a computer system configured according to the teachings of
the present invention.
FIG. 2 depicts a distributed file system according to the invention.
FIG. 3 is a flow diagram of the dual counter synchronization mechanism.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
The invention may be run on a variety of computers or collection of
computers under a number of different operating systems. The computer
could be, for example, a personal computer, a mini computer, mainframe
computer or a computer running in a distributed network of other
computers. Although the specific choice of computer is limited only by
disk and disk storage requirements, computers in the IBM PS/2 (TM) series
of computers could be used in the present invention. For additional
information on IBM's PS/2 series of computers, the reader is referred to
Technical Reference Manual Personal Systems/2 Model 50, 60 Systems IBM
Corporation, Part No. 68X2224 Order Number S68X-2224 and Technical
Reference 2 Manual Personal Systems/2 (Model 80) IBM Corporation Part No.
68X 2256 Order Number S68X-2254. One operating system which an IBM PS/2
personal computer may run is IBM's OS/2 2.0 (TM) for more information on
the IBM OS/2 2.0 Operating System the reader is referred to OS/2 2.0
Technical Library, Programming Guide Vol. 1, 2, 3 Version 2.00 Order Nos.
10G6261, 10G6495, 10G6494.
In the alternative, the computer system might be in the IBM RISC
System/6000 (TM) line of computers which run on the AIX (TM) operating
system. The various models of the RISC System/6000 is described in many
publications of the IBM Corporation for example, RISC System/6000, 7073
and 7016 POWERstation and POWERserver Hardware Technical reference, Order
No. SA23-2644-00. The AIX operating system is described in General
Concepts and Procedure--AIX Version 3 for RISC System/6000 Order No.
SC23-2202-00 as well as other publications of the IBM Corporation.
In FIG. 1, a computer 10, comprising a system unit 11, a keyboard 12, a
mouse 13 and a display 14 are depicted in block diagram form. The system
unit 11 includes a system bus or plurality of system buses 21 to which
various components are coupled and by which communication between the
various components is accomplished. The microprocessor 22 is connected to
the system bus 21 and is supported by read only memory (ROM) 23 and random
access memory (RAM) 24 also connected to system bus 21. A microprocessor
in the IBM PS/2 series of computers is one of the Intel family of
microprocessors including the 386 or 486 microprocessors. However, other
microprocessors including, but not limited to, Motorola's family of
microprocessors such as the 68000, 68020 or the 68030 microprocessors and
various Reduced Instruction Set Computer (RISC) microprocessors such as
the PowerPC chip manufactured by IBM, or others Hewlett Packard, Sun,
Motorola and others may be used in the specific computer.
The ROM 23 contains among other code the Basic Input-Output system (BIOS)
which controls basic hardware operations such as the interaction and the
disk drives and the keyboard. The RAM 24 is the main memory into which the
operating system and application programs are loaded. The memory
management chip 25 is connected to the system bus 21 and controls direct
memory access operations including, passing data between the RAM 24 and
hard disk drive 26 and floppy disk drive 27. The CD ROM 32 also coupled to
the system bus 21 is used to store a large amount of data, e.g., a
multimedia program or presentation.
Also connected to this system bus 21 are various I/O controllers: The
keyboard controller 28, the mouse controller 29, the video controller 30,
and the audio controller 31. As might be expected, the keyboard controller
28 provides the hardware interface for the keyboard 12, the mouse
controller 29 provides the hardware interface for mouse 13, the video
controller 30 is the hardware interface for the display 14, and the audio
controller 31 is the hardware interface for the speakers 15. An I/O
controlle | | |