|
Claims  |
|
|
Having thus described our invention, what we claim as new and desire to
secure by Letters Patent is set forth in the following claims:
1. A data processing system having a hierarchical file system and connected
at a local node to at least one remote node by a communication link, said
data processing system comprising:
at least one file in the hierarchical file system located in at least one
of said at least one remote node;
a directory file in said local node, said directory file having at least
one immediate descendent file for which said directory file is a parent
file;
means for mounting, in the data processing system, the at least one file in
said at least one remote node to a local name of the at least one
immediate descendent file in said local node, while preserving the
contents of the at least one immediate descendent file, said means for
mounting causing a path name to be associated with at least one portion of
the hierarchical file system; and
means for accessing, by an application unaware of the location of the at
least one file, contents of the mounted at least one file in said at least
one remote node by using said local name.
2. The data processing system as recited in claim 1 wherein said decendent
file is a simple file.
3. The data processing system as recited in claim 1 wherein said decendent
file is a directory file.
4. A data processing system having a hierarchical file system, said data
processing system comprising:
at least one file in said hierarchical file system;
a directory file having at least one immediate descendent file for which
said directory file is a parent file; and
means for mounting, in the data processing system, the at least one file
over a name of the at least one immediate descent file while preserving
the contents of the at least one immediate descendent file, said means for
mounting causing a path name to be associated with at least one portion of
the hierarchical file system; and
means for accessing, by an application unaware of the location of the at
least one file, contents of the mounted at least one file by using said
name.
5. A method, performed by a computer, of accessing a remote file from a
local node in a data processing system having a hierarchical file system,
said method comprising:
mounting, in the data processing system, the remote file at a remote node
over a local file at the local node while preserving the contents of the
local file, said step of mounting causing a path name to be associated
with at least one portion of the hierarchical file system; and
accessing, by an application unaware of the location of the remote file,
the remote file at the local node through a path to the local file.
6. The method of claim 5 further comprising the step of copying the remote
file to the local file before mounting the remote file over the local
file.
7. The method of claim 6 wherein the step of copying further comprises the
steps of:
mounting the remote file over a stub file;
copying contents of the remote file to the local file; and
unmounting the remote file.
8. A method, performed by a computer, of accessing a second file in a data
processing system having a hierarchical file system, said method
comprising the steps of:
mounting ,in the data processing system, the second file over a first file
while preserving the contents of the first file, said step of mounting
causing a path name to be associated with at least one portion of the
hierarchical file system; and
accessing, by an application unaware of the location of said second file,
the second file through a path to the first file.
9. The method of claim 8 further comprising the step of copying the second
file to the first file before mounting the second file over the first
file.
10. The method of claim 9 wherein the step of copying further comprises the
steps of:
mounting the second file over a stub file;
copying contents of the second file to the first file; and
unmounting the second file.
11. In a data processing system having a plurality of nodes comprising a
plurality of system files containing characteristics of the data
processing system, at least one set of sad system files being a master set
of system files, a hierarchal file system having a plurality of user
directories and files for a plurlaity ofusers and a file of default file
tree organizations for each of said users, a method, performed by a
computer, of creating a single system image unique for each user on each
of said plurality of nodes comprising:
maintaining a file of default file tree organizations for each of said
plurality of users;
creating a set of stub files at each of said plurality of nodes;
mounting said master set of system files onto said set of stub files to
create a path from each of said nodes to said masters set of system files,
said step of mounting causing a path name to be associated with at least
one portion of the hierarchical file system;
copying said master set of system files into system files of said plurality
of nodes;
unmounting said master set of system files;
deleting said set of stub files;
mounting said master set of system files over said system files of said
plurality of nodes; and
creating a default file tree for each of said users according to the
default file tree organizations to give each user a unique view of the
system individual to that user, said unique view for the user being
identical for that user at every node in the system.
12. The method according to claim 11 further comprising the steps of:
using said plurality of system files in read only mode to continue to give
each of said users a same file interface if said master set of system
files are unavailable; and
mounting said master set of system files over said system files of said
plurality of nodes when the master set is again available.
13. The method according to claim 11 wherein said step of creating a
default file tree for each of said users comprises the steps of:
retrieving said file of default file tree organizations for said plurality
of users;
determining which of said user directories and files are contained on
another of said plurality of nodes in an organization of said default file
tree;
creating stub directories for each of said user directories and files
contained on another of said plurality of nodes; and
mounting said user directories and files over said local stub directories
to allow said user to have an identical view of the system as would be
obtained from any other node in the system by that user.
14. A data processing system having a plurality of named files in a
hierarchical file system and having a connection at a local node to at
least one remote node by a communication link, said data processing system
comprising:
means for mounting, in said data processing system, at least one file
located at one of said at least one remote node to a local name of a named
file located at said local node, while preserving the contents of said
named file, said means for mounting causing a path name to be associated
with at least one portion of the hierarchical file system; and
means for using said local name of said named file located at said local
node, by an application unaware of the location of said mounted at least
one file located at one of said at least one remote node, to access
contents of said mounted at least one file.
15. A data processing system having a plurality of named files in a
hierarchical file system, said data processing system comprising:
at least one first file;
at least one second file having a name;
means for mounting, in said data processing system, said first file to said
name of said at least one second file, while preserving the contents of
said second file, said means for mounting causing a path name to be
associated with at least one portion of the hierarchical file system;
means for using, after said mounting, by an application unaware of the
location of said at least one first file, said name to access the contents
of said at least one first file;
means for unmounting, in said data processing system, said at least one
first file from said name; and
means for using, after said unmounting, said name by an application to
access the preserved contents of said at least one second file.
16. A method, performed by a data processing system, of accessing from a
local node a remote file in a hierarchical file system residing at a
remote node connected to said local node by a communication link, said
method comprising:
mounting, by said data processing system, said remote file to a local name
of a named file residing at said local node while preserving the contents
of said named file, said means for mounting causing a path name to be
associated with at least one portion of the hierarchical file system; and
using said local name, by an application unaware of the location of said
remote file, to access contents of said remote file.
17. A method, performed by a data processing system, of accessing a first
file in a hierarchical file system, said method comprising the steps of:
mounting, by said data processing system, said first file to a name of a
second file, while preserving the contents of said second file, said step
of mounting causing a path name to be associated with at least one portion
of the hierarchical file system;
using, after said mounting, said name by an application unaware of the
location of the first file to access contents of said first file;
unmounting, by said data processing system, said first file from said name;
and
using, after said unmounting, said name by an application to access the
preserved contents of said second file.
18. A computer program product having a computer readable medium having a
computer program recorded thereon for use in a data processing system for
accessing from a local node a remote file in a hierarchical file system
residing at a remote node, wherein said local node is connected to at
least one remote node by a communication link, said computer program
product comprising:
program code means for causing a mounting, by said data processing system,
of said remote file to a local name of a named file residing at said local
node while preserving the contents of said named file, said mounting
causing a path name to be associated with at least one portion of the
hierarchical file system; and
means for using said local name, by an application unaware of the location
of said remote file, to access contents of said remote file.
19. A computer program product having a computer readable medium having a
computer program recorded thereon for use in a data processing system for
accessing a file in a hierarchical file system, said computer program
product comprising:
program code means for causing a mounting, by said data processing system,
of a first file to a name of a second file while preserving the contents
of said second file, said mounting causing a path name to be associated
with at least one portion of the hierarchical file system;
means for using, after said mounting, said name by an application unaware
of the location of said first file to access contents of said first file;
means for unmounting said first file from said name of said second file;
and
means for using, after said unmounting, said name by an application to
access the preserved contents of said second file.
20. A distributed data processing system including a hierarchical file
system and having a plurality of nodes interconnected by a communication
link said distributed data processing system comprising:
a named directory having at least one shared file containing information
for use by said plurality of nodes;
at least one unique local file, residing in said named directory,
containing information for use by at least one local node of said
plurality of nodes;
means for mounting, by said distributed data processing system, in said at
least one local node, one of said at least one shared file residing at at
least one remote node of said plurality of nodes, over a named file in
said named directory at said local node, while maintaining said at least
one unique local file at said local node, said means for mounting causing
a path name to be associated with at least one portion of the hierarchical
file system;
means for using, from any one of said at least one local node, a first path
through the named directory and the named file to access the mounted at
least one shared file; and
means for using, from any one of said at least one local node, a second
path thorough the named directory to access the at least one unique local
file of said local node.
21. The system of claim 20 wherein said means for using a first path
accesses the mounted at least one shared file from any one of said at
least one local node through a path/etc/passwd.
22. The system of claim 20 wherein said means for using a first path
accesses the mounted at least one shared file from any one of said at
least one local node through a path/etc/group.
23. The system of claim 20 wherein said means for using a first path
accesses the mounted at least one shared file from any one of said at
least one local node through a path/etc/motd.
24. A system of claim 20 wherein one of said at least one shared file is
mounted over a same named file in each of said plurality of nodes, and
said at least one shared file is accessed through a same first path from
each of said plurality of nodes.
25. A system of claim 20 further comprising means for accessing, by a user,
a same plurality of files from any one of said nodes in a consistent
manner independent of where any one of said plurality of files resides,
and independent of the node that the user is currently using, whereby said
user has a single system image from any one of said nodes. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
CROSS REFERENCE TO RELATED APPLICATIONS
This application is related in subject matter to the following applications
filed concurrently herewith and assigned to a common assignee:
Application Ser. No. 07/014,899, now U.S. Pat. No. 4,897,781, filed by A.
Chang, G. H. Neuman, A. A. Shaheen-Gouda, and T. A. Smith for A System And
Method For Using Cached Data At A Local Node After Re-opening A File At A
Remote Node In A Distributed Networking Environment.
Application Ser. No. 07/014,884, now abandoned filed by D. W. Johnson, L.
W. Henson, A. A. Shaheen-Gouda, and T. A. Smith for A System and Method
for Version Level Negotiation.
Application Ser. No. 07/014,897, now U.S. Pat. 4,887,204, filed by D. W.
Johnson, G. H. Neuman, C. H. Sauer, A. A. Shaheen-Gouda, and T. A. Smith
for A System And Method For Accessing Remote Files In A Distributed
Networking Environment.
Application Ser. No. 07/014,900, now abandoned, filed by D. W. Johnson, A.
A. Shaheen-Gouda, T. A. Smith for Distributed File Access Structure Lock.
Application Ser. No. 07/014,891 filed by L. W. Henson, A. A. Shaheen-Gouda,
and T. A. Smith for Distributed File and Record Locking.
Application Ser. No. 07/014,888 filed by D. W. Johnson, L. K. Loucks, A. A.
Shaheen-Gouda for Interprocess Communication Queue Location Transparency.
Application Ser. No. 07/014,889 filed by D. W. Johnson, A. A.
Shaheen-Gouda, and T. A. Smith for Directory Cache Management In A
Distributed Data Processing System.
The disclosures of the foregoing co-pending applications are incorporated
herein by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to improvements in operating systems for a
distributed data processing system and, more particularly, to an operating
system for a multi-processor system interconnected by a local area network
(LAN) or a wide area network (WAN). IBM's System Network Architecture
(SNA) may be used to construct the LAN or WAN. The operating system
according to the invention permits the accessing of files by processors in
the system, no matter where those files are located in the system. The
invention is disclosed in terms of a preferred embodiment which is
implemented in a version of the UNIX.sup.1 operating system; however, the
invention could be implemented in other and different operating systems.
.sup.1 Developed and licensed by AT&T. UNIX is a registered trademark of
AT&T in the U.S.A. and other countries.
2. Description of the Related Art
Virtual machine operating systems are known in the prior art which make a
single real machine appear to be several machines. These machines can be
very similar to the real machine on which they are run or they can be very
different. While many virtual machine operating systems have been
developed, perhaps the most widely used is VM/370 which runs on the IBM
System/370. The VM/370 operating system creates the illusion that each of
several users operating from terminals has a complete System/370 with
varying amounts of disk and memory capacity.
The physical disk devices are managed by the VM/370 operating system. The
physical volumes residing on disk are divided into virtual volumes of
various sizes and assigned and accessed by users carrying out a process
called mounting. Mounting defines and attaches physical volumes to a
VM/370 operating system and defines the virtual characteristics of the
volumes such as size, security and ownership.
Moreover, under VM/370 a user can access and use any of the other operating
systems running under VM/370 either locally on the same processor or
remotely on another processor. A user in Austin can use a function of
VM/370 called "passthru" to access another VM/370 or MVS/370 operating
system on the same processor or, for example, a processor connected into
the same SNA network and located in Paris, France. Once the user has
employed this function, the files attached to the other operating system
are available for processing by the user.
There are some significant drawbacks to this approach. First, when the user
employs the "passthru" function to access another operating system either
locally or remotely, the files and operating environment that were
previously being used are no longer available until the new session has
been terminated. The only way to process files from the other session is
to send the files to the other operating system and effectively make
duplicate copies on both disks. Second, the user must have a separate
"logon" on all the systems that are to be accessed. This provides the
security necessary to protect the integrity of the system, but it also
creates a tremendous burden on the user. For further background, the
reader is referred to the text book by Harvey M. Deitel entitled An
Introduction to Operating Systems, published by Addison-Wesley (1984), and
in particular to Chapter 22 entitled "VM: A Virtual Machine Operating
System". A more in depth discussion may be had by referring to the
textbook by Harold Lorin and Harvey M. Deitel entitled Operating Systems,
published by Addison-Wesley (1981), and in particular to Chapter 16
entitled "Virtual Machines".
The invention to be described hereinafter was implemented in a version of
the UNIX operating system but may be used in other operating systems
having characteristics similar to the UNIX operating system. The UNIX
operating system was developed by Bell Telephone Laboratories, Inc., for
use on a Digital Equipment Corporation (DEC) minicomputer but has become a
popular operating system for a wide range of minicomputers and, more
recently, microcomputers. One reason for this popularity is that the UNIX
operating system is written in the C programming language, also developed
at Bell Telephone Laboratories, rather than in assembly language so that
it is not processor specific. Thus, compilers written for various machines
to give them C capability make it possible to transport the UNIX operating
system from one machine to another. Therefore, application programs
written for the UNIX operating system environment are also portable from
one machine to another., For more information on the UNIX operating
system, the reader is referred to UNIX System, User's Manual, System V,
published by Western Electric Co., January 1983. A good overview of the
UNIX operating system is provided by Brian W. Kernighan and Rob Pike in
their book entitled The Unix Programming Environment, published by
Prentice-Hall (1984). A more detailed description of the design of the
UNIX operating system is to be found in a book by Maurice J. Bach, Design
of the Unix Operating System, published by Prentice-Hall (1986).
AT&T Bell Labs has licensed a number of parties to use the UNIX operating
system, and there are now several versions available. The most current
version from AT&T is version 5.2. Another version known as the Berkeley
version of the UNIX operating system was developed by the University of
California at Berkeley. Microsoft, the publisher of the popular MS-DOS and
PC-DOS operating systems for personal computers, has a version known under
their trademark as XENIX. With the announcement of the IBM RT PC.sup.2
(RISC (reduced instruction set computer) Technology Personal Computer)) in
1985, IBM Corp. released a new operating system called AIX.sup.3 (Advanced
Interactive Executive) which is compatible at the application interface
level with AT&T's UNIX operating system, version 5.2, and includes
extensions to the UNIX operating system, version 5.2. For more description
of the AIX operating system, the reader is referred to AIX Operating
System Technical Reference, published by IBM Corp., First Edition (Nov.
1985).
.sup.2 RT and RT PC are registered trademarks of IBM Corporation.
.sup.3 AIX is a trademark of IBM Corporation.
The invention is specifically concerned with distributed data processing
systems characterized by a plurality of processors interconnected in a
network. As actually implemented, the invention runs on a plurality of IBM
RT PCs interconnected by IBM's Systems Network Architecture (SNA), and
more specifically SNA LU 6.2 Advanced Program to Program Communication
(APPC). SNA uses as its link level Ethernet.sup.4, a local area network
(LAN) developed by Xerox Corp., or SDLC (Synchronous Data Link Control). A
simplified description of local area networks including the Ethernet local
area network may be, found in a book by Larry E. Jordan and Bruce
Churchill entitled Communications and Networking for the IBM PC, published
by Robert J. Brady (a Prentice-Hall company) (1983). A more definitive
description of communications systems for computers, particularly of SNA
and SDLC, is to be found in a book by R. J. Cypser entitled Communications
Architecture for Distributed Systems, published by Addison-Wesley (1978).
It will, however, be understood that the invention may be implemented
using other and different computers than the IBM RT PC interconnected by
other networks than the Ethernet local area network or IBM's SNA.
.sup.4 Ethernet is a trademark of Xerox Corporation.
As mentioned, the invention to be described hereinafter is directed to a
distributed data processing system in a communication network. In this
environment, each processor at a node in the network potentially may
access all the files in the network no matter at which nodes the files may
reside. As shown in FIG. 1, a distributed network environment 1 may
consist of two or more nodes A, B and C connected through a communication
link or network 3. The network 3 can be a local area network (LAN) as
mentioned or a wide area network (WAN), the latter comprising a switched
or leased teleprocessing (TP) connection to other nodes or to a SNA
network of systems. At any of the nodes A, B or C there may be a
processing system 10A, 10B or 10C, such as the aforementioned IBM RT PC.
Each of these systems 10A, 10B and 10C may be a single user system or a
multi-user system with the ability to use the network 3 to access files
located at a remote node in the network. For example, the processing
system 10A at local node A is able to access the files 5B and 5C at the
remote nodes B and C.
The problems encountered in accessing remote nodes can be better understood
by first examining how a standalone system accesses files. In a standalone
system, such as 10 shown in FIG. 2, a local buffer 12 in the operating
system 11 is used to buffer the data transferred between the permanent
storage 2, such as a hard file or a disk in a personal computer, and the
user address space 14. The local buffer 12 in the operating system 11 is
also referred to as a local cache or kernel buffer. For more information
on the UNIX operating system kernel, see the aforementioned books by
Kernighan et al. and Bach. The local cache can be best understood in terms
of a memory resident disk. The data retains the physical characteristics
that it had on disk; however, the information now resides in a medium that
lends itself to faster data transfer rates very close to the rates
achieved in main system memory.
In the standalone system, the kernel buffer 12 is identified by blocks 15
which are designated as device number and logical block number within the
device. When a read system call 16 is issued, it is issued with a file
descriptor of the file 5 and a byte range within the file 5, as shown in
step 101 in FIG. 3. The operating system 11 takes this information and
converts it to a device number and logical block numbers of the device in
step 102. Then the operating system 11 reads the cache 12 according to the
device number and logical block numbers, step 103.
Any data read from the disk 2 is kept in the cache block 15 until the cache
block 15 is needed. Consequently, any successive read requests from an
application program 4 that is running on the processing system 10 for the
same data previously read from the disk is accessed from the cache 12 and
not the disk 2. Reading from the cache is less time consuming than
accessing the disk; therefore, by reading from the cache, performance of
the application 4 is improved. Obviously, if the data which is to be
accessed is not in the cache, then a disk access must be made, but this
requirement occurs infrequently.
Similarly, data written from the application 4 is not saved immediately on
the disk 2 but is written to the cache 12. This again saves time,
improving the performance of the application 4. Modified data blocks in
the cache 12 are saved on the disk 2 periodically under the control of the
operating system 11.
Use of a cache in a standalone system that utilizes the AIX operating
system, which is the environment in which the invention was implemented,
improves the overall performance of the system disk and minimizes access
time by eliminating the need for successive read and write disk
operations.
In the distributed networking environment shown in FIG. 1, there are two
ways the processing system 10C in local node C could read the file 5A from
node A. In one way, the processing system 10C could copy the whole file 5A
and then read it as if it were a local file 5C residing at node C. Reading
the file in this way creates a problem if another processing system 10B at
node B, for example, modifies the file 5A after the file 5A has been
copied at node C. The processing system 10C would not have access to the
latest modifications to the file 5A.
Another way for processing system 10C to access a file 5A at node A is to
read one block at a time as the processing system at node C requires it. A
problem with this method is that every read has to go across the network
communications link 3 to the node A where the file resides. Sending the
data for every successive read is time consuming.
Accessing files across a network presents two competing problems as
illustrated above. One problem involves the time required to transmit data
across the network for successive reads and writes. On the other hand, if
the file data is stored in the node to reduce network traffic, the file
integrity may be lost. For example, if one of the several nodes is also
writing to the file, the other nodes accessing the file may not be
accessing the latest updated file that has just been written. As such, the
file integrity is lost, and a node may be accessing incorrect and outdated
files. Within this document, the term "server" will be used to indicate
the processing system where the file is permanently stored, and the term
client will be used to mean any other processing system having processes
accessing the file. The invention to be described hereinafter is part of
an operating system which provides a solution to the problem of managing
distributed information.
Other approaches to supporting a distributed data processing system in a
UNIX operating system environment are known. For example, Sun Microsystems
has released a Network File System (NFS) and Bell Laboratories has
developed a Remote File System (RFS). The Sun Microsystems NFS has been
described in a series of publications including S. R. Kleiman, "Vnodes: An
Architecture for Multiple File System Types in Sun UNIX", Conference
Proceedings, USENIX 1986 Summer Technical Conference and Exhibition, pp.
238 to 247; Russel Sandberg et al., "Design and Implementation of the Sun
Network Filesystem", Conference Proceedings, Usenix 1985, pp. 119 to 130;
Dan Walsh et al., "Overview of the Sun Network File System", pp. 117 to
124; JoMei Chang, "Status Monitor Provides Network Locking Service for
NFS"; JoMei Chang, "SunNet", pp. 71 to 75; and Bradley Taylor, "Secure
Networking in the Sun Environment", pp. 28 to 36. The AT&T RFS has also
been described in a series of publications including Andrew P. Rifkin et
al., "RFS Architectural Overview", USENIX Conference Proceedings, Atlanta
Ga. (June 1986), pp. 1 to 12; Richard Hamilton et al., "An
Administrator's View of Remote File Sharing", pp. 1 to 9; Tom Houghton et
al., "File Systems Switch", pp. 1 to 2; and David J. Olander et al., "A
Framework for Networking in System V", pp. 1 to 8.
One feature of the distributed services system in which the subject
invention is implemented which distinguishes it from the Sun Microsystems
NFS, for example, is that Sun's approach was to design what is essentially
a stateless machine. More specifically, the server in a distributed system
may be designed to be stateless. This means that the server does not store
any information about client nodes, including such information as which
client nodes have a server file open, whether client processes have a file
open in read.sub.-- only or read.sub.-- write modes, or whether a client
has locks placed on byte ranges of the file. Such an implementation
simplifies the design of the server because the server does not have to
deal with error recovery situations which may arise when a client fails or
goes off-line without properly informing the server that it is releasing
its claim on server resources.
An entirely different approach was taken in the design of the distributed
services system in which the present invention is implemented. More
specifically, the distributed services system may be characterized as a
"statefull implementation". A "statefull" server, such as that described
here, does keep information about who is using its files and how the files
are being used. This requires that the server have some way to detect the
loss of contact with a client so that accumulated state information about
that client can be discarded. The cache management strategies described
here, however, cannot be implemented unless the server keeps such state
information. The management of the cache is affected, as described below,
by the number of client nodes which have issued requests to open a server
file and the read/write modes of those opens.
SUMMARY OF THE INVENTION
It is therefore a general object of this invention to provide a distributed
services system for an operating system which supports a multi-processor
data processing system interconnected in a communications network that
provides user transparency as to file location in the network and as to
performance.
It is another, more specific object of the invention to provide a technique
for giving a user a single system image from any node of a distributed
environment.
According to the invention, these objects are accomplished by keeping one
set of master system files which each of the distributed nodes uses by
creating a set of stub files at the remote node, mounting the master
system files onto the stub files, copying the master system files into a
set of local system files, unmounting the master system files and deleting
the stub files, and mounting the master system files over the local system
files. The local copies of the master system files are used in the event
that the node containing the master system files is not available. In
addition, a file tree for each of the individual users is maintained to
allow each user to access the same files from any node in a consistent
manner regardless of the node that the user is currently using.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, aspects and advantages of the invention
will be better understood from the following detailed description of the
preferred embodiment of the invention with reference to the accompanying
drawings, in which:
FIG. 1 is a block diagram showing a typical distributed data processing
system in which the subject invention is designed to operate;
FIG. 2 is a block diagram illustrating a typical standalone processor
system;
FIG. 3 is a flowchart showing the steps performed by an operating system
when a read system call is made by an application running on a processor;
FIG. 4 is a block diagram of the data structure illustrating the scenario
for following a path to a file operation at a local node as performed by
the operating system which supports the subject invention;
FIG. 5 is a block diagram of the data structures illustrating the before
condition of the scenario for a mount file operation at a local node as
performed by the operating system;
FIG. 6 is a block diagram of the data structures illustrating the after
condition of the scenario for a mount file operation at a local node as
performed by the operating system.
FIG. 7A shows a file tree whose immediate decendents are all directories.
FIG. 7B shows a file tree whose immediate decendents are a collection of
directories and simple files.
FIG. 7C shows a file tree whose immediate decendents are all simple files.
FIG. 8 is a block diagram of the data structure for the distributed file
system shown in FIG. 13;
FIG. 9 is a block diagram of the VFS part of the data structure shown in
FIG. 8;
FIG. 9B is a block diagram of the VNODE part of the data structure shown in
FIG. 8;
FIG. 9C is a block diagram of the INODE part of the data structure shown in
FIG. 8;
FIG. 9D is a block diagram of the FILE ACCESS part of the data structure
shown in FIG. 8;
FIG. 9E is a block diagram of the NODE TABLE ENTRY part of the data
structure shown in FIG. 8;
FIG. 9F is a block diagram of the SURROGATE INODE part of the data
structure shown in FIG. 8;
FIG. 10 is a block diagram of the initial conditions of the data structures
illustrating the scenario for a mount file operation; FIG. 11 is a block
diagram of the final conditions of the data structures illustrating the
scenario for a amount file operation;
FIG. 12 is a block diagram of the data structures for a mount file
operation illustrating the process of following a path to a file at a
local and remote node in a distributed system as performed by the
operating system.
FIG. 13 is a block diagram, similar to FIG. 1, showing a distributed data
processing system according to the invention.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The following disclosure describes solutions to problems which are
encountered when creating a distributed file system in which the logic
that manages a machine's files is altered to allow files that physically
reside in several different machines to appear to be part of the local
machine's file system. The implementation described is an extension of the
file system of the AIX operating system. Reference should be made to the
above-referenced Technical Reference for more information on this
operating system. The following specific knowledge of the following AXI
file system concepts is assumed: tree structured, also known as
hierarchical, file system directories; and file system organization,
including inodes.
The essential aspects of a file system that are relevant to this discussion
are listed below:
(a) each file on an individual file system is uniquely identified by its
inode number
(b) directories are files, and thus a directory can be uniquely identified
by its inode number.
Note: In some contexts it is necessary to distinguish between files which
are directories and files which are not directories (e.g., files which
simply contain ordinary data, or other file types supported by UNIX
derivative operating systems such as special files or pipes).
In this disclosure the term "simple file" is used to indicate such a
non-directory file. Unless otherwise indicated the term "file" may mean
either a directory file or a simple file, and, of course, the term
"directory" means a directory file.
(c) a directory contains an array of entries of the following form:
name--inode number
where the inode number may be that of a simple file or that of another
directory.
Note: A directory may contain other directories, which, in turn, may
contain other directories or simple files.
Thus a directory may be viewed as the root of a subtree which may include
many levels of descendant directories, with the leaves of the tree being
"simple files".
In this disclosure the term "descendants" means all of the files which
exist in the file tree below a particular directory, even those which can
be reached only by going through other directories. The "immediate
descendants" of a directory are only those files (simple files or
directories) whose names appear in the directory.
(d) by convention, the inode number of the file system's root directory is
inode number 2.
The following discussion describes how traditional UNIX operating systems
use mounts of entire file systems to create file trees, and how paths are
followed in such a file tree.
Following the path "/dir1/dir2/file" within a device's file system thus
involves the following steps:
1. Read the file identified by inode number 2 (the device's root
directory).
2. Search the directory for an entry with name =dir1.
3. Read the file identified by the inode number associated with dir1 (this
is the next directory in the path).
4. Search the directory for an entry with name =dir2.
5. Read the file identified by the inode number associated with dir2 (this
is the next directory in the path).
6. Search the directory for an entry with name =file.
7. The inode number associated with file in this directory is the inode
number of the simple file identified by the path "/dir1/dir2/file".
The file trees which reside on individual file systems are the building
blocks from which a node's aggregate file tree is built. A particular
device (e.g., hard file partition) is designated as the device which
contains a node's root.sub.-- file system. The file tree which resides on
another device can be added to the node's file tree by performing a mount
operation. The two principal parameters to the mount operation are (1) the
name of the device which holds the file tree to be mounted and (2) the
path to the directory upon which the device's file tree is to be mounted.
This directory must already be part of the node's file tree; i.e., it must
be a directory in the root file system, or it must be a directory in a
file system which has already been added (via a mount operation) to the
node's file tree.
After the mount has been accomplished, paths which would ordinarily flow
through the "mounted over" directory instead flow through the root inode
of the mounted file system. A mount operation proceeds as follows:
1. Follow the path to the mount point and get the inode number and device
number of the directory which is to be covered by the mounted device.
2. Create a data structure which contains essentially the following:
(a) the device name and inode number of the covered directory; and
(b) the device name of the mounted device.
The path following in the node's aggregate file tree consists of (a)
following the path in a device file tree until encountering an inode which
has been mounted over (or, of course, the end of the path); (b) once a
mount point is encountered, using the mount data structure to determine
which device is next in the path; and (c) begin following the path at
inode 2 (the root inode) in the device indicated in the mount structure.
The mount data structures are volatile; they are not recorded on disk. The
list of desired mounts must be re-issued each time the machine is powered
up as part of the Initial Program Load (IPL) The preceding discussion
describes how traditional UNIX operating systems use mounts of entire file
systems to create file trees and how paths are followed in such a file
tree. Such an implementation is restricted to mounting the entire file
system which resides on a device. The virtual file system concept
described herein and in the reference material allows (1) mounting a
portion of the file system which resides on a device by allowing the
mounting of files (directories or simple files) in addition to allowing
mounting of devices, and (2) mounting either remote or local directories
over directories which are already part of the file tree. The invention
described herein is an enhancement to the virtual file system concept
which further allows the mounting of simple files (remote or local) over
simple files which are already part of the file tree.
In the virtual file system, the operations which are performed on a
particular device file system are clearly separated from those operations
which deal with constructing and using the node's aggregate file tree. A
node's virtual file system allows access to both local and remote files.
The management of local files is a simpler problem than management of
remote files. For this reason, the discussion of the virtual file system
is broken into two parts. The first part describes only local operations.
This part provides a base from which to discuss remote operations. The
same data structures and operations are used for both remote and local
operations. The discussion on local operations describes those aspects of
the data and procedures which are relevant to standalone operations. The
discussion on remote operations adds information pertinent to remote
operations without, however, reiterating what was discussed in the local
operations section.
FIG. 4 shows the relationship that exists among the data structures of the
virtual file system. Every mount operation creates | | |