|
Description  |
|
|
CROSS REFERENCE TO RELATED APPLICATIONS
This application is related in subject matter to the following applications
filed concurrently herewith and assigned to a common assignee:
Application Ser. No. 07/014,899 filed by A. Chang, G. H. Neuman, A. A.
Shaheen-Gouda, and T. A. Smith for A System And Method For Using Cached
Data At A Local Node After Re-opening A File At A Remote Node In A
Distributed Networking Environment.
Application Ser. No. 07/014,884 filed by D. W. Johnson, L. W. Henson, A. A.
Shaheen-Gouda, and T. A. Smith for Negotiating Communicating Conventions
Between Nodes In A Network.
Application Ser. No. 07/014,900 filed by D. W. Johnson, A. A.
Shaheen-Gouda, T. A. Smith for Distributed File Access Structure Lock.
Application Ser. No. 07/014/891 filed by L. W. Henson, A. A. Shaheen-Gouda,
and T. A. Smith for File and Record Locking Between Nodes in a Distributed
Data Processing Environment.
Application Ser. No. 07/014,892 filed by D. W. Johnson, L. K. Loucks, C H.
Sauer, and T. A. Smith for Single System Image Uniquely Defining An
Environment for Each User In A Data Processing System.
Application Ser. No. 07/014,888 filed by D. W. Johnson, L. K. Loucks, A. A.
Shaheen-Gouda for Interprocess Communication Queue Location Transparency.
Application Ser. No. 07/014,889 filed by D. W. Johnson, A. A.
Shaheen-Gouda, and T. A. Smith for Directory Cache Management In a
Distributed Data Processing System.
The disclosures of the foregoing co-pending applications are incorporated
herein by reference.
DESCRIPTION
1. Field of the Invention
This invention relates to processing systems connected through a network,
and more particularly to the accessing of files between local and remote
processing systems within the network.
2. Background Art
As shown in FIG. 1, a distributed networking environment 1 consists of two
or more nodes A, B, C, connected through a communication link or a network
3. The network 3 can be either a local area network (LAN), or a wide area
network (WAN). The latter consists of switched or leased teleprocessing
(TP) connections to other nodes, or to a systems network architecture
(SNA) network of systems.
At any of the nodes A, B, C, there may be a processing system 10A, 10B,
10C, such as a personal computer. Each of these processing systems 10A,
10B, 10C, may be a single user system or a multi-user system with the
ability to use the network 3 to access files located at a remote node. For
example, the processing system 10A at local node A, is able to access the
files 5B, 5C at the remote nodes B, C.
The problems encountered in accessing remote nodes can be better understood
by first examining how a stand-alone system accesses files. In a stand
alone system, such as 10 as shown in FIG. 2, a local buffer 12 in the
operating system 11 is used to buffer the data transferred between the
permanent storage 2, such as a hard file or a disk in a personal computer,
and the user address space 14. The local buffer 12 in the operating system
11 is also referred to as a local cache or kernel buffer.
In the standalone system, the kernel buffer 12 is identified by blocks 15
which are designated as device number, and logical block number within the
device. When a read system call 16 is issued, it is issued with a file
descriptor of the file 5, and a byte range within the file 5, as shown in
step 101, FIG. 3. The operating system 11 takes this information and
converts it to device number, and logical block numbers in the device,
step 102, FIG. 3. Then the operating system 11 reads the cache 12
according to the device number and logical block numbers, step 103.
Any data read from the disk 2 is kept in the cache block 15 until the cache
block 15 is needed. Consequently, any successive read requests from an
application 4 that is running on the processing system 10 for the same
data previously read is accessed from the cache 12 and not the disk 2.
Reading from the cache is less time consuming than going out to the fixed
disk 2, accessing the correct disk sectors, and reading from the disk.
Similarly, data written from the application 4 is not saved immediately on
the disk 2, but is written to the cache 12. This saves disk accesses if
another write operation is issued to the same block. Modified data blocks
in the cache 12 are saved on the disk 2 periodically.
Use of a cache in a stand-alone system that utilizes an AIX.sup.1 (Advanced
Interactive Executive) operating system improves the overall performance
of the system since disk accessing is eliminated for successive reads and
writes. Overall performance is enhanced because accessing permanent
storage is slower and more expensive than accessing a cache.
.sup.1 AIX is a trademark of IBM Corporation.
In a distributed environment, as shown in FIG. 1, there are two ways the
processing system 10C in local node C could read the file 5A from node A.
In one way, the processing system 10C could copy the whole file 5A, and
then read it as if it were a local file 5C residing at node C. Reading a
file in this way creates a problem if another processing system 10A at
another node A modifies the file 5A after the file 5A has been copied at
node C as file 5C. The processing system 10C would not have access to
these latest modifications to the file 5A.
Another way for processing system 10C to access a file 5A at node A is to
read one block N1 at a time as the processing system at node C requires
it. A problem with this method is that every read has to go across the
network communication link 3 to the node A where the file resides. Sending
the data for every successive read is time consuming.
Accessing files across a network presents two competing problems as
illustrated above. One problem involves the time required to transmit data
across the network for successive reads and writes. On the other hand, if
the file data is stored in the node to reduce network traffic, the file
integrity may be lost. For example, if one of the several nodes is also
writing to the file, the other nodes accessing the file may not be
accessing the latest updated file that has just been written. As such, the
file integrity is lost since a node may be accessing incorrect and
outdated files.
Within this document, the term "server" will be used to indicate the
processing system where the file is permanently stored, and the term
"client" will be used to mean any other processing system having processes
accessing the file. It is to be understood, however, that the term
"server" does not mean a dedicated server as that term is used in some
local area network systems. The distributed services system in which the
invention is implemented is truly a distributed system supporting a wide
variety of applications running at different nodes in the system which may
access files located anywhere in the system.
The invention to be described hereinafter was implemented in a version of
the UNIX.sup.2 operating system but may be used in other operating systems
having characteristics similar to the UNIX operating system. The UNIX
operating system was developed by Bell Telephone Laboratories, Inc., for
use on a Digital Equipment Corporation (DEC) minicomputer but has become a
popular operating system for a wide range of minicomputers and, more
recently, microcomputers. One reason for this popularity is that the UNIX
operating system is written in the C programming language, also developed
at Bell Telephone Laboratories, rather than in assembly language so that
it is not processor specific. Thus, compilers written for various machines
to give them C capability make it possible to transport the UNIX operating
system from one machine to another. Therefore, application programs
written for the UNIX operating system environment are also portable from
one machine to another. For more information on the UNIX operating system,
the reader is referred to UNIX.TM. System, User's Manual, System V,
published by Western Electric Co., January 1983. A good overview of the
UNIX operating system is provided by Brian W. Kernighan and Rob Pike in
their book entitled The Unix Programming Environment, published by
Prentice-Hall (1984). A more detailed description of the design of the
UNIX operating system is to be found in a book by Maurice J. Bach, Design
of the Unix Operating System, published by Prentice-Hall (1986).
.sup.2 Developed and licensed by AT&T. UNIX is a registered trademark of
AT&T in the U.S.A. and other countries.
AT&T Bell Labs has licensed a number of parties to use the UNIX operating
system, and there are now several versions available. The most current
version from AT&T is version 5.2. Another version known as the Berkeley
version of the UNIX operating system was developed by the University of
California at Berkeley. Microsoft, the publisher of the popular MS-DOS and
PC-DOS operating systems for personal computers, has a version known under
their trademark as XENIX. With the announcement of the IBM RT PC.sup.3
(RISC (reduced instruction set computer) Technology Personal Computer)) in
1985, IBM Corp. released a new operating system called AIX which is
compatible at the application interface level with AT&T's UNIX operating
system, version 5.2, and includes extensions to the UNIX operating system,
version 5.2. For more description of the AIX operating system, the reader
is referred to AIX Operating System Technical Reference, published by IBM
Corp., First Edition (Nov. 1985).
.sup.3 RT and RT PC are trademarks of IBM Corporation.
The invention is specifically concerned with distributed data processing
systems characterized by a plurality of processors interconnected in a
network. As actually implemented, the invention runs on a plurality of IBM
RT PCs interconnected by IBM's Systems Network Architecture (SNA), and
more specifically SNA LU 6.2 Advanced Program to Program Communication
(APPC). An Introduction To Advanced Program-To-Program Communication
(APPC), Technical Bulletin by IBM International Systems Centers, July
1983, number GG24-1584-0, and IBM RT PC SNA Access Method Guide and
Reference, Aug. 15, 1986, are two documents that further describe SNA LU
6.2.
SNA uses as its link level Ethernet.sup.4 a local area network (LAN)
developed by Xerox Corp., or SDLC (Synchronous Data Link Control). A
simplified description of local area networks including the Ethernet local
area network may be found in a book by Larry E. Jordan and Bruce Churchill
entitled Communications and Networking for the IBM PC, published by Robert
J. Brady (a Prentice-Hall company) (1983). A more definitive description
of communications systems for computers, particularly of SNA and SDLC, is
to be found in a book by R. J. Cypser entitled Communications Architecture
for Distributed Systems, published by Addison-Wesley (1978). It will,
however, be understood that the invention may be implemented using other
and different computers than the IBM RT PC interconnected by other
networks than the Ethernet local area network or IBM's SNA.
.sup.4 Ethernet is a trademark of Xerox Corporation.
As mentioned, the invention to be described hereinafter is directed to a
distributed data processing system in a communication network. In this
environment, each processor at a node in the network potentially may
access all the files in the network no matter at which nodes the files may
reside.
Other approaches to supporting a distributed data processing system in a
UNIX operating system environment are known. For example, Sun Microsystems
has released a Network File System (NFS) and Bell Laboratories has
developed a Remote File System (RFS). The Sun Microsystems NFS has been
described in a series of publications including S. R. Kleiman, "Vnodes: An
Architecture for Multiple File System Types in Sun UNIX", Conference
Proceedings, USENIX 1986 Summer Technical Conference and Exhibition, pp.
238 to 247; Russel Sandberg et al., "Design and Implementation of the Sun
Network File System", Conference Proceedings, Usenix 1985, pp. 119 to 130;
Dan Walsh et al., "Overview of the Sun Network File System", pp. 117 to
124; JoMei Chang, "Status Monitor Provides Network Locking Service for
NFS", JoMei Chang, "SunNet", pp. 71 to 75; and Bradley Taylor, "Secure
Networking in the Sun Environment", pp. 28 to 36. The AT&T RFS has also
been described in a series of publications including Andrew P. Rifkin et
al., "RFS Architectural Overview", USENIX Conference Proceedings, Atlanta,
Ga. (June 1986), pp. 1 to 12; Richard Hamilton et al., "An Administrator's
View of Remote File Sharing", pp. 1 to 9; Tom Houghton et al., "File
Systems Switch", pp. 1 to 2; and David J. Olander et al., "A Framework for
Networking in System V", pp. 1 to 8.
One feature of the distributed services system in which the subject
invention is implemented which distinguishes it from the Sun Microsystems
NFS, for example, is that Sun's approach was to design what is essentially
a stateless machine. More specifically, the server in a distributed system
may be designed to be stateless. This means that the server does not store
any information about client nodes, including such information as which
client nodes have a server file open, whether client processes have a file
open in read.sub.-- only or read.sub.-- write modes, or whether a client
has locks placed on byte ranges of the file. Such an implementation
simplifies the design of the server because the server does not have to
deal with error recovery situations which may arise when a client fails or
goes off-line without properly informing the server that it is releasing
its claim on server resources.
An entirely different approach was taken in the design of the distributed
services system in which the present invention is implemented. More
specifically, the distributed services system may be characterized as a
"statefull implementation". A "statefull" server, such as that described
here, does keep information about who is using its files and how the files
are being used. This requires that the server have some way to detect the
loss of contact with a client so that accumulated state information about
that client can be discarded. The cache management strategies described
here, however, cannot be implemented unless the server keeps such state
information. The management of the cache is affected, as described below,
by the number of client nodes which have issued requests to open a server
file and the read/write modes of those opens.
SUMMARY OF THE INVENTION
It is therefore an object of this invention to improve the response time in
accessing remote files.
It is a further object of this invention to maintain the file integrity in
a distributed networking environment.
The system and method of this invention takes into account three different
situations when processing systems are reading and writing to files in a
distributed networking environment. In the first situation, all reading
and writing to a file is performed at a single client node. In the second
situation, all nodes only read from a file. In the third situation, more
than one node is performing a read from a file, and at least one node is
writing to the file. The third situation may also be brought about if the
device is open for a write at the server.
In the first two situations, a local client cache exists in every node. The
client processes executing at the client nodes access the server file via
two step caching: the client cache and the server cache. Using the client
cache efficiently to access a remote file can significantly improve the
performance since it can save network traffic and overhead.
In the third situation, client caching is not used since file integrity is
deemed more important than performance speed. In a distributed networking
environment, the first two situations occur more frequently than the third
situation. Consequently, by providing for these three separate situations,
overall performance is optimized without sacrificing file integrity.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 shows three processing systems connected in a networking environment
as known in the art.
FIG. 2 shows a stand-alone processing system using a kernel buffer as known
in the art.
FIG. 3 shows a flow chart of a read to the kernel buffer in a stand-alone
system as known in the art.
FIG. 4 shows three distributed processing systems connected in a network
for accessing files across the network with client and server caches.
FIG. 5 shows a client and server node having client and server caches,
respectively in READONLY or ASYNC synchronization mode.
FIG. 6 shows the three synchronization modes used for managing the use of
client and server caches in a distributed networking environment.
FIG. 7 shows the transitions between the three synchronization modes.
FIG. 8 shows a client accessing a file at the server in FULLSYNC s.sub.--
mode.
FIG. 9 shows the steps during a read when a client cache is used, and when
the client cache is not used.
FIG. 10 is a block diagram of the data structure illustrating the scenario
for following a path to a file operation at a local node as performed by
the operating system which supports the subject invention.
FIGS. 11 and 12 are block diagrams of the data structures illustrating the
before and after conditions of the scenario for a mount file operation at
a local node as performed by the operating system.
FIG. 13A shows a file tree whose immediate decendents are all directories.
FIG. 13B shows a file tree whose immediate decendents are a collection of
directories and simple files.
FIG. 13C shows a file tree whose immediate decendents are all simple files.
FIG. 14 is a block diagram of the data structure for the distributed file
system shown in FIG. 4.
FIGS. 15A to 15F are block diagrams of component parts of the data
structure shown in FIG. 14.
FIGS. 16, 17 and 18 are block diagrams of the data structures illustrating
the scenarios for a mount file operation and following a path to a file at
a local and remote node in a distributed system as performed by the
operating system.
FIG. 19 is a diagram showing the control flow of accesses to a file by two
client nodes.
FIG. 20 is a diagram showing a deadlock when two operations are currently
executing.
FIG. 21 is a diagram showing the execution steps of an open request from a
client node.
DESCRIPTION OF THE PREFERRED EMBODIMENT
In the present invention as shown in FIG. 4, a local cache 12A, 12B, 12C,
exists at every node A, B, C. If file 5 permanently resides at node A on
disk 2A, node A is referred to as the server. At the server A, use of the
cache 12A by local processes 13A executing at the server node A is as that
in a stand-alone system as discussed above in the Background Art.
However, remote processes 13B, 13C executing at nodes B, C, access the file
5 through a two step caching scheme using a server cache and a client
cache as shown more clearly in FIG. 5. The server node A gets blocks of
file 5 from disk 2A and stores it in the server cache 12A. Client node B
goes out over the network 3 and gets blocks of file 5 from the server
cache 12A. Client node B stores the blocks of file 5 as it existed in the
server cache 12A into the client cache 12B. When the user address space
14B of client node B seeks data from any block of file 5, the client cache
12B is accessed instead of going across the network 3 for each access.
Using the client cache 12B to access a remote file 5 can significantly
improve the performance since it can save network traffic and overhead.
The system and method of this invention manages the use of the client cache
12B and server cache 12A in a distributed environment to achieve high
performance while preserving the file access semantics at the application
program level. This allows existing programs which run on a stand-alone
system to run on a distributed system without any modification.
The file access semantics preserves a file's integrity as it is being
opened by different processes that issue read and write system calls to
access and modify the file. The file access semantics require that only
one I/O operation is allowed on any byte range at a time, and once an I/O
operation starts, it cannot be pre-empted by any other I/O operation to
the same byte range of the file.
An example of this is given by referring again to FIG. 5. If process 131
issues a write system call to a byte range N1-N2 in file 5, the write
system call can only be executed when the entire byte range N1-N2 is
available for access by process 131, and no read operation involving the
byte range N1-N2 is being executed. During the execution of the write
system call, all other operations involving the byte range N1-N2 in file 5
are suspended until the write is completed. The write is not completed
until the bytes are written to the local cache 12A. When a write request
is complete, the written data in the cache 12A is visible to any
subsequent read operation by any of the other processes 131-13N.
Another requirement of file access semantics is that when a file byte range
such as N1-N2, which can be a record or a set of related records accessed
by the same I/O operation, is visible to a reading process, the file byte
range N1-N2 must always have a consistent set of data reflecting the last
update to this range. This range is never available for access while a
write operation is being executed. In this way the next read issued by a
process will read the data just written and not the old outdated data.
In a distributed networking environment of this invention as shown in FIG.
5, the execution of read and write system calls from different application
programs 4A, 4B, and processes 131-13N, 231-23N are synchronized such that
the file access semantics as discussed above are preserved. The system and
method of this invention guarantees synchronization by utilizing various
cache synchronization (sync) modes. For a specific file 5, the I/O calls
are synchronized by either the client B or the server A depending on the
location of the processes 131-13N, 231-23N which have the file 5 open for
access, and the sync mode.
The three synchronization modes are shown in FIG. 6, and are described with
reference to FIG. 4. The first mode 141 is referred to as ASYNCH s.sub.--
mode, or asynchronous mode. The file 5 operates in this mode 141 if the
file 5 is open for read/write access by processes 13C executing at only
one client remote node C, as shown in block 144, FIG. 6. In this mode 141,
all of the control is in the client node C. Both the server cache 12A and
client cache 12C are used for these read/write operations. A read or write
operation requires access to the server cache 12A only if it cannot be
satisfied from the client cache 12C. Modified blocks at the client 12C are
written to the server 12A by the periodic sync operation, or when the file
5 is closed by all processes 13C in the client node C, or when a block
must be written in order to make room for other data being brought into
the cache. Additionally modified blocks are written to the server when the
file changes from ASYNC s.sub.-- mode to FULLSYNC s.sub.-- mode.
A second mode 142 is READONLY s.sub.-- mode. The READONLY s.sub.-- mode 142
is used for files 5 that are open for read only access from processes 13C
in only one node C, or from processes 13B, 13C in more than one node B, C,
as shown in block 145, FIG. 6. In this mode 142, the server cache 12A and
the client caches 12B and/or 12C are used. The read request is issued for
a block or more at a time. Every other read request from the same client,
either B or C, to the specific block does not go to the server 12.
Instead, it is read from the respective client cache, either B or C. In
other words, a read operation does not require access to the server 12A if
it can be satisfied from the client cache 12C or 12B. In summary, the file
5 operates in mode 142 if the file 5 is open for read only access by any
of the processes 13A, 13B, 13C, in any of the nodes A, B, C.
A third mode 143 is FULLSYNCH s.sub.-- mode. The FULLSYNC s.sub.-- mode 143
is used for files 5 open in more than one node A, B, and at least one node
has the file 5 open for write access. In the FULLSYNC s.sub.-- mode 143,
the client cache 12C or 12B is bypassed, and only the server cache 12A is
used. All read and write operations are executed at the server 12A.
In a distributed environment 1 FIG. 4, most files 5 will more frequently be
open for read only by processes 13A, 13B, 13C, at several nodes A, B, C in
the READONLY s.sub.-- mode 142, FIG. 6, or open for update at only one
node in the Asynchronous s.sub.-- mode 141, FIG. 6. It will be less
frequent that there will be an open for read and write access by processes
executing at more than one node in the Fullsync s.sub.-- mode 143, FIG. 6.
In both the READONLY s.sub.-- mode 142, FIG. 6, and the ASYNCH s.sub.--
mode 141, FIG. 6, the use of a client cache 12B, FIG. 5, significantly
reduces the remote read/write response time of accessing file 5, and
improves the overall system performance.
As shown in FIG. 8, in the FULLSYNC s.sub.-- mode, the client cache is not
used. The client node B accesses the file 5 from the server A over the
network 3 for each read and write. Although the read/write response time
increases in this mode, the file access semantics are preserved since a
client does not retain a file 5 in a local cache that has not been updated
along with the corresponding file residing at the server.
Utilizing the three modes to manage the use of the client cache optimizes
overall system performance by combining both an overall average increase
in read/write response speed with file integrity. Using a client cache in
some situations decreases the read/write response time; while not using a
client cache in other situations preserves the file system semantics.
A file's sync mode is not only dependent on which nodes have the file open,
and whether the file is open for read or write, but also on whether the
device where the file resides is open in raw access mode. Raw access for a
device means that a block of data LBN1, FIG. 5, within a device 2A is
accessed. In this way, the reads and writes of the device 2A read and
write to a block LBN1 of device 2A. It is not relevant to which file the
block belongs to. The device 2A can be open for raw access from a process
131-13N at the server node A. It cannot be open for raw access from a
remote node B, C.
In reference to FIG. 5, the server cache 12A is managed as blocks LBN1 of a
device 2A, similar to a stand-alone system as described above with
reference to FIG. 2. The server A looks at the server cache 12A as a
logical block LBN1 within a device 2A. The client B has no knowledge of
where the file 5 resides on the device 2A. All that client B knows is that
it accesses a file 5 on block number N1 on device 2A. The client cache 12B
handles the data as logical blocks N1 of files 5. In the server cache 12A,
the data is handled as logical blocks LBN1 of devices 2A. In handling the
data this way, the server can guarantee that if data is written to the
device as a raw device, and if there is another read of a block of the
file that happens to be the same block that was written to the device,
then the read would see the newly written data. This preserves the file
system semantics.
If the file is being accessed in a client node B, and the file is in ASYNC
or READONLY mode, as shown in FIG. 5, the client operating system 11B does
not convert the file descriptor and byte range within the file in the
system call READ (file descriptor, N1) 16 to the device number and the
logical block number in the device. The client does convert the file
descriptor and byte range to a file handle, node identifier, and logical
block number within the file. In the client cache 12B, there are blocks 17
that are designated by file handle, node identifier, and logical block
number within the file. When a read 16 is issued from a client application
4B, step 104, FIG. 9, the request for the read goes to the operating
system 11B with the file descriptor and the byte range within the file.
The operating system then looks in the client cache 12B, step 105, FIG. 9.
If the file handle, node identifier, and logical block number within the
file is there, the cache 12B is read, step 106, FIG. 9. If it isn't there,
step 107, FIG. 9, the read is sent to the server, step 108, FIG. 9. The
server then takes the file handle and the logical block number within the
file and converts it to a device number and logical block in the device,
step 109, FIG. 9. This conversion is necessary since the server cache 12A
is managed by device number and block number within the device as it is in
a stand-alone system. After the read is sent to the server, it is handled
the same as if the read was coming from its own application in a
stand-alone system as described above with reference to FIG. 2.
A closed file does not have a synchronization mode. However, once a file is
first opened by a process, the file's sync mode is initialized according
to the following as illustrated in FIG. 7.
The sync mode for a file is initialized to ASYNCH 141 if the device (D)
where the file resides is closed 161, i.e., it is not open as a special
device, and the file is open for write access at one remote node 162.
The sync mode for a file is READONLY 142 if the device where the file
resides is closed, and the file is open for read only access in one or
more nodes 163, or both the device and the file are open for read only
access 164.
The sync mode for a file is initialized to FULLSYNCH 143 if the device
where the file resides is open as a block special device for read/write
access 65, or the file is opened in more than one node and at least one of
the opens is for writing. A block special device means that there is a raw
access to the device.
Once a file is initialized to a mode, if the conditions change, the file
mode may change. Transitions from one mode to another, as shown by lines
171-176 in FIG. 7, may occur under the following conditions.
If a file is presently in ASYNC mode 141, and the number of node where the
file is open becomes two or more, 181, then the sync mode changes to
FULLSYNC 143 as shown via line 172, FIG. 6. Also, if there is an open of
the block special device D where the file resides, 182, the sync mode will
change from ASYNC 141 to FULLSYNC 143. In a close operation for the file,
if the close operation is not the last close of the file, and the file is
still open for write, there is no mode change. However, if the close
operation is the last close of the file for write access such that all the
remaining opens are for read access, 183, then the new mode becomes
READONLY 142 as shown via line 74. If the close operation is the last
close of the file, then there is no sync mode.
If a file is presently in READONLY s.sub.-- mode 142 and there is a file
open operation, there will not be a mode change if the open is for read.
However, if the open is for write, then the new sync mode is ASYNC 141 if
all the opens are in one client node, 184 as shown via line 173.
Otherwise, the sync mode is FULLSYNC Furthermore, if the device where the
file resides is open for read/write access, 187, the new sync mode for the
file is FULLSYNC mode 143. For a close operation, if the close is the last
close of the file, there is no sync mode for the file. If the file is
still open at one or more nodes after a close operation, there is no
change to the sync mode.
If a file is presently in FULLSYNC mode 143 and there is another open for
the file, or the device where the file resides is opened, there is no sync
mode change. If after a close operation of the file, there remains an open
for read/write access at one remote node, and the block special device
where the file resides is not open, the sync mode is changed to ASYNC
s.sub.-- mode 141, as shown by block 188 via line 171. The sync mode is
changed from FULLSYNC 143 to READONLY 142 if the block special device
where the file resides is not open, and the file is open for read only
access at one or more nodes as shown by block 189 on line 175, or if the
block special device where the file resides is open for read only access
and the file is open for read only access as shown in block 190 on line
175.
All open and close operations for files and devices are resolved at the
server node. The server determines the sync mode of an open file when
executing any operation that may change the mode. The server also performs
the change of the synchronization modes. As the server gets new opens or
closes for the file, a change in synchronization modes for the file may be
triggered. If the required sync mode is not the current one, the server
sends a "change sync mode" remote procedure call (rpc) to all the clients
with the file open.
After a file is opened for the first time, the client that opened the file
is informed of the mode of the file. If the mode is either ASYNC or
READONLY, the client can start using the client cache for reads, and also
for writes if the mode is ASYNC, as shown in FIG. 5. The client does not
have to read or write over the communications link to the server. If the
mode is FULLSYNC as shown in FIG. 8, the client cache is not used, and the
client must send the read or write over the communications link 3 to the
server.
The server A, FIG. 5, always sets the mode 151 of the file 5. The mode of
the file is the same at every node that has the file open. The server A
also knows which nodes have the file open, and whether the opens are for
reads, or writes. The server A doesn't have to know which processes
131-13N, 231-23N within a node have a file open. The server keeps all the
above information in a file access structure list 150. Each element of the
file access structure list 150 contains a node which has the file open
152, the number of opens for read 153 in the node, and the number of opens
for write 154 in the node.
If a file is in ASYNC mode, then only one client node has the file open,
and the cache at this client may contain blocks which have been modified
but which have not been written to the server. The client cache may also
contain blocks which have been read from the server but which have not
been modified. When a file changes sync mode from ASYNC to FULLSYNC, the
server notifies the client of the change, and the client writes all of the
file's modified blocks to the server and discards any unmodified blocks.
If the file is in READONLY sync mode, then many clients may have the file
open, and the caches in each of these clients may contain blocks which
have been read from the server. The client caches may not, however,
contain any modified blocks. When a file changes sync mode from READONLY
to FULLSYNC, the server notifies each client of the change. Each client
then discards any of the file's blocks from its cache.
DESCRIPTION OF OPERATION
The preceding discussion provides an implementation independent of the
discussion of file caching. The following discussion is a more detailed
description of an implementation of the caching method. Caching only
applies to files which have been opened, but in the implementation
described, a remote file must first be made accessible by a "mount"
operation before it can be opened. The mount operation constructs data
structures which must be in place before the open can be performed.
Therefore, the first part of the implementation discussion describes
mounting and path following in some detail. This detail is not directly
relevant to the caching discussion, but it is necessary ground work
preparatory to a detailed discussion of managing the cache.
The following disclosure describes solutions to problems which are
encountered when creating a distributed file system in which the logic
that manages a machine's files is altered to allow files that physically
reside in several different machines to appear to be part of the local
machine's file system. The implementatio | | |