|
Claims  |
|
|
We claim:
1. In a distributed system having a replication facility and a number of
computer systems that each include a storage device, a method comprising
the steps of:
providing a plurality of files organized into a tree of files;
replicating a single one of the files that is stored in the storage device
of one of the computer systems using the replication facility so that a
copy of the file is stored in the storage device of another of the
computer systems; and
replicating a subtree of files of multiple levels, from the tree of files,
that is stored in the storage device of one of the computer systems using
the replication facility so that a copy of the subtree of files is stored
in the storage device of another of the computer systems.
2. The method of claim 1, further comprising the step of replicating the
single file using the replication facility so that a copy of the is stored
in of the storage device of a third of the computer systems in the
distributed system.
3. The method of claim 1, further comprising the step of replicating the
subtree using the replication facility so that a copy of the subtree is
stored in the storage device of a third of the computer systems in the
distributed system.
4. The method of claim 3 wherein the subtree being replicated includes at
least three levels of files.
5. A distributed system comprising;
a plurality of computer systems, each computer system including a storage
device for storing files;
a namespace manager for managing a namespace of the system a tree structure
of names of the files; and
a replication facility for replicating a subtree of the namespace that
includes multiple levels.
6. In a distributed system having a reconciler facility and a number of
computer systems, a method comprising the steps of:
providing a first copy of a file in one of the computer systems and a
second copy of the file in another of the computer systems;
reconciling the first copy of the file with the second copy of the file
using the reconciler facility so that the second copy of the file
incorporates any changes made to the first copy of the file since last
reconciled;
providing a first copy of a group of files in one of the computer systems
and a second copy of the group of files in another of the computer
systems; and
reconciling the first copy of the group of files with the second copy of
the group of files using the reconciler facility so that the second copy
of the group of files incorporates any changes made to the first copy of
the group of files since last reconciled.
7. The method of claim 6 wherein the step of reconciling the first copy of
the group of files with the second copy of the group of files further
comprises the step of reconciling on a pair by pair basis each file in the
first copy of the group of files with a corresponding file in the second
copy of the group of files.
8. In a distributed system having a replication facility and a number of
computer systems, each including a storage device, a method comprising the
steps of:
providing a first copy of a group of files stored in the storage device of
a first of the computer systems;
providing a second copy of the group of files stored in the storage device
of a second of the computer systems;
making changes to files in the first copy of the group of files;
propagating the changes to the second copy of the group of files upon
occurrence of an event;
making additional changes to files in the first copy of the group of files;
and
propagating the additional changes to the second copy of the group of files
upon occurrence of another event.
9. The method recited in claim 8 wherein the event is the elapsing of a
predetermined time period.
10. The method recited in claim 9 where the other event is also the
elapsing of a predetermined time period.
11. The method of claim 8 wherein the event is a request by the second
computer system to receive the changes.
12. The method of claim 11 wherein the other event is a request by the
second computer system to receive the additional change.
13. The method recited in claim 8, further comprising the step of
reconciling the second copy of the group of files with the first copy of
the group of files so that the second copy of the group of files
incorporates the changes made to the first copy of the group of files.
14. The method recited in claim 13, further comprising the step of
reconciling the second copy of the group of files with the first copy of
the group of files so that the second copy of the group of files
incorporates the additional changes made to the first copy of the group of
files.
15. In a distributed system having a replication facility and computer
systems that each include a storage device, a method comprising the steps
of:
storing files, having names, in the storage devices of the computer
systems;
providing a distributed namespace comprising a logical organization of the
names of the stored files; and
replicating selected portions of a group of files stored in the storage
devices of one of the computer systems and whose names form a part of the
distributed namespace using the replication facility to create new files
holding the selected portions of the files.
16. The method recited in claim 15, further comprising the step of
replicating the new files to distribute the new files across at least a
portion of the computer systems of the distributed system.
17. In a distributed system having a first computer system and a second
computer system, a method comprising the steps of:
providing a first copy of a set of files of a given class that are stored
in the first computer system;
providing a second copy of the set of files of the given class that are
stored in the second computer system;
reconciling the first copy of the set of files with the second copy of the
set of files using a class-specific reconciler that only reconciles files
of the given class.
18. The method recited in claim 17, further comprising the steps of:
making changes to the first copy of the set of files;
reconciling the first copy of the set of files with the second copy of the
set of files using a class-independent reconciler that reconciles files
regardless of class.
19. In a distributed system having a private replication mechanism and
computer systems for running processes that each include a storage device,
a method comprising the steps of:
running an application program on one of the computer systems;
making a request to the private replication mechanism to replicate a set of
files within the application program, each of the files maintaining a list
of processes that are permitted to access the file; and
replicating the set of files using the private replication mechanism to
produce a new set of files without replicating, for each file, the list of
processes that are permitted to access the file.
20. In a distributed system having a first computer system and a second
computer system, a method comprising the steps of:
providing a collection of files at the first computer system;
in response to a request to replicate the collection of files to the second
computer system, determining whether all or none of the files in the
collection should be replicated;
where it is determined that all of the files in the collection should be
replicated, replicating all of the files in the collection so that a
replica of the collection is provided at the second computer system; and
where it is determined that none of the files in the collection should be
replicated, replicating none of the files in the collection.
21. In a distributed system having a first computer system and a second
computer system, a method comprising the steps of:
providing a first copy of a group of files in the first computer system;
providing a second copy of the group of files in the second computer
system;
making changes to the first copy of the group of files;
providing an agent for the first copy of the group of files, wherein each
agent has access rights to access and read the files in the first copy of
the group of files;
providing a reconciler at the second computer system for reconciling the
second copy of the group of files with the first copy of the group of
files;
granting a proxy to the reconciler from the agent of the first copy of the
group of files, said proxy granting the reconciler limited authority to
access and read the files in the first copy of the group of files; and
reconciling the second copy of the group of files with the first copy of
the group of files using the reconciler so that the changes made to the
first copy of the group of files are made to the second copy of the group
of files.
22. In a distributed system, a method comprising:
providing heterogeneous file system in the distributed system;
providing a storage manager for each file system to manage access to files
in the file system;
in response to a request to reconcile a first set of files with a second
set of files, granting access to the first set of files by the storage
manager for the file system that holds the first set of files and granting
access to the second set of files by the storage manger for the file
system that holds the second set of files; and
reconciling the first set of files with the second set of files under
control of the storage managers of the respective file systems holding the
first set of files and the second set of files.
23. The method of claim 22 wherein each copy of a file stored in the file
systems is provided a storage-specific identifier by the storage manager.
24. The method of claim 22 wherein each storage manager reports changes to
the files in its file system.
25. The method of claim 24 wherein the changes include deletions of files.
26. The method of claim 24 wherein the changes include renaming of files.
27. The method of claim 24 wherein the changes include moving of files in
the distributed system.
28. The method of claim 24 wherein the changes are reported to a change log
and wherein the step of reconciling is performed using the change log.
29. The method of claim 22 wherein each copy of a file is assigned to a
unique identifier and wherein the step of reconciling includes comparing
identifiers to determine which files are to be reconciled. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
TECHNICAL FIELD
The present invention relates generally to data processing systems and,
more particularly, to replication facilities used within distributed
systems.
BACKGROUND OF THE INVENTION
Replication facilities have been provided in a number of different types of
software products. For instance, replication facilities have been
incorporated in database products, network directory service products, and
groupware products. Many of the conventional replication facilities are
limited in terms of what they can replicate. For instance, many
conventional replicators can only replicate one type of logical structure
(i.e., a file). Furthermore, the conventional replicators are limited in
terms of the quantity of the logical structures that may be replicated at
a time. In particular, many conventional replicators can only replicate
one file at a time.
SUMMARY OF THE INVENTION
In accordance with a first aspect of a preferred embodiment of the present
invention, a method is practiced in a distributed system having a
replication facility and a number of computer systems that each include a
storage device. In this method, a plurality of files are provided and
organized into a tree. A single one of the files is replicated using the
replication facility such that a copy of the file is stored in the storage
device of a different computer system than the original copy of the file.
A subtree of files of multiple levels is also replicated. The subtree is
originally stored on the storage device of one of the computer systems.
Replication is performed using the replication facility such that a copy
of the subtree and its files are stored in the storage device in another
of the computer systems.
In accordance with another aspect of the present invention, a first copy of
a file is provided in one of the computer systems. A second copy of the
file is provided in another of the computer systems. The first copy of the
file is reconciled with the second copy of the file using a reconciler
facility. The reconciliation ensures that the second copy of the file
incorporates any changes made to the first copy of the file. A first copy
of a group of files is provided in one of the computer systems, and a
second copy of the group of files is provided in another of the computer
systems. The reconciler facility is used to reconcile the first copy of
the group of files with the second copy of the group of files so that the
second copy of the group of files incorporates any changes made to the
first copy of the group of files since last reconciled.
In accordance with a further aspect of the present invention, a first copy
of a group of files is stored in the storage device of a first of the
computer systems. A second copy of the group of files is stored in the
storage device of a second of the computer systems. Changes are made to at
least one of the files in the first copy of a group of files. The changes
are propagated to the second group of files upon the occurrence of an
event. Additional changes are made to at least one of the files in the
first copy of a group of files, and these changes are also propagated to
the second copy of a group of files upon the occurrence of another event.
In accordance with yet another aspect of the present invention, a first
copy of a group of files is stored in the storage device of the first
computer system. The second copy of the group of files is stored in the
storage device of a second computer system. Any changes made to the first
copy of the group of files are incrementally sent to the second computer
system so that the changes may be made to the second copy of the group of
files.
In accordance with an additional aspect of the present invention, a first
set of files that are stored in one of the storage devices is specified to
be replicated. A filter is specified for determining what files in the
first set of files are to be replicated. The files specified by the filter
are replicated using the replication facility to produce a second set of
files.
In accordance with a still further aspect of the present invention, files
having names are stored in the storage devices of the computer systems of
the distributed system. A distributed namespace is provided. The
distributed namespace comprises a logical organization of the names of the
stored files. Selected portions of a group of files in the namespace are
replicated to create new files holding the selected portions of the files.
In accordance with a further aspect of the present invention, a first copy
of a set of files of a given class are stored in a first computer system.
A second copy of the set of files are stored in a second computer system.
The first copy of the set of files is reconciled with the second copy of
the set of files using a class-specific reconciler that only reconciles
files of the given class. The files may be stored as persistent objects,
which are organized into classes. Objects and classes will be discussed
below.
In accordance with another aspect of the present invention, an application
program is run on one of the computer systems of a distributed system. A
request is made within the application program to a private replication
mechanism to replicate a set of files. Each of the files maintains a list
of processes that are permitted to access the file. The set of files is
replicated using the private replication mechanism to produce a new set of
files without replicating the list of processes that are permitted to
access the file.
In accordance with a further aspect of the present invention, a first copy
of a group of files is provided in a first computer system and a second
copy of the group of files is provided in a second computer system.
Changes are made to the first copy of a group of files. An agent is
provided for the first copy of group of files. Each agent has access
rights to access and read the files in the first copy of the group of
files. A reconciler is provided at the second computer system for
reconciling the second copy of the group of files with the first copy of
the group of files. A proxy is granted from the agent of the first copy of
the group of files to the reconciler. The proxy grants the reconciler
limited authority to access and read the files in the first copy of the
group of files. The reconciler then reconciles the second copy of the
group of files with the first copy of the group of files using the
reconciler so that changes that were made to the first copy of group of
files is also made to the second copy of group of files.
In accordance with a final aspect of the present invention, a method is
practiced in a distributed system. In this method, heterogeneous file
systems are provided in the distributed system. A storage manager is
provided for each file system to manage access to the files held therein.
In response to a request to reconcile a first set of files with a second
set of files, access is granted to the first set of files by the storage
manager for the file system that holds the first set of files and access
is granted to the second set of files by the storage manager for the file
system that holds the second set of files. The first object set is
reconciled with the second object set under the control of the storage
managers of the respective file systems that hold the first set of files
and the second set of files.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a block diagram of a distributed system suitable for practicing
a preferred embodiment of the present invention.
FIG. 1B is a diagram of a distributed namespace for a distributed system in
accordance with the preferred embodiment of the present invention.
FIG. 2 is a block diagram of a change log used in the preferred embodiment
of the present invention.
FIG. 3 is a block diagram of a replication information block (RIB) used in
the preferred embodiment of the present invention.
FIG. 4 is a block diagram illustrating the functional components of the
replication facility used in the preferred embodiment of the present
invention.
FIG. 5 is a diagram illustrating the interaction of elements that play a
role in public replication in the preferred embodiment of the present
invention.
FIG. 6 is a flowchart of the steps performed in replication in the
preferred embodiment of the present invention.
FIG. 7 is a flowchart illustrating the steps performed to provide security
during replication in the preferred embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
A preferred embodiment of the present invention provides a replication
facility for use in a distributed environment. The replication facility
supports weakly consistent replication of any subtree of persistent
objects in the distributed namespace of the system. The replication
facility may replicate single objects or may replicate logical structures
that include multiple objects. The replication facility reconciles local
copies of objects with remote copies of objects. Reconciliation occurs on
a pair-wise basis such that each object in a local set of objects is
reconciled with its corresponding object in the remote set of local
objects. The reconciliation may occur over heterogeneous file systems.
FIG. 1A depicts a distributed system 10 that is suitable for practicing the
preferred embodiment of the present invention. The distributed system 10
includes an interconnection mechanism 12, such as a local area network
(LAN), wide area network (WAN), or other interconnection mechanism, that
interconnects a number of different data processing resources. The data
processing resources include workstations 14, 16, 18 and 20, printers 22
and 24, and secondary storage devices 26 and 28. Each of the workstations
14, 16, 18 and 20 includes a respective memory 30, 32, 34 and 36. Each of
the memories 30, 32, 34 and 36 holds a copy of a distributed operating
system 38. Each workstation 14, 16, 18 and 20 may implement a separate
file system.
Those skilled in the art will appreciate that the present invention may be
practiced on configurations other than the configuration shown in FIG. 1A.
The distributed system 10 shown in FIG. 1A is intended to be merely
illustrative and not limiting of the present invention. For instance, the
interconnection mechanism 12 may interconnect a number of networks
together that are running separate network operating systems.
The preferred embodiment of the present invention allows users and system
administrators to replicate persistent "objects". An object, in this
context, is a logical structure that holds at least one data field. Groups
of objects with similar properties and common semantics are organized into
object classes. A number of different object classes may be defined for
the distributed system 10. Although the preferred embodiment of the
present invention employs objects, those skilled in the art will
appreciate that the present invention is not limited to an object-oriented
environment; rather, the present invention may also be practiced in
non-object-oriented environments. The present invention is not limited to
replication of objects; rather, it is more generalized to support the
replication of logical structures, such as files or file directories.
The operating system 38 includes a file system for storing the objects that
are used in the preferred embodiment of the present invention. The objects
are organized into a distributed namespace 19 (FIG. 1B). The distributed
namespace 19 is a logical tree-like structure formed from the object names
21 stored in the file system of the operating system 38. The distributed
namespace 19 illustrates the hierarchy among the named objects of the
system 10 (FIG. 1A).
The replication facility of the preferred embodiment of the present
invention provides not only for the duplication of objects so that objects
may be distributed across the distributed system, but also provides for
reconciliation of multiple copies of objects (i.e., multimaster
replication). Reconciliation refers to reconciling an object with a
changed object so that the object reflects the changes made to the changed
object. For instance, suppose that a remote copy of an object has been
changed and a local copy of the object has not yet been updated to reflect
the changes. Each object not only has contents but also has a name and
location within the distributed file system. Reconciliation involves
reconciling the two copies of the object such that the local copy of the
object is changed in a like fashion to how the remote copy of the object
was changed. The term "replication," as used herein, refers to not only
duplicating objects so that multiple copies of the objects are distributed
across the distributed system 10, but also refers to reconciliation of the
copies of the objects.
Before discussing the preferred embodiment of the present invention in more
detail below, it is helpful to introduce a few key concepts that will be
referenced below. An "object set" is a collection of objects that are
grouped together for replication. An object set may include a single
object or a sub-tree of objects. The object set is specified by the user
or administrator who requests replication. A "replica set," in contrast,
is a collection of systems which each own a local copy of an object set,
and a "replica"is a member of a replica set.
To insulate the replication facility from the underlying physical storage
system (e.g., the type of file system employed to store objects) and to
provide extensibility, the preferred embodiment of the present invention
adopts the abstraction of a replicated object store (ReplStore). The
ReplStore abstraction allows the replication facility to be applied across
heterogeneous file systems. The ReplStore presents a group of interfaces
that must be supported for an underlying physical storage system to
support replication facilities. In particular, only those objects that
reside in object stores that support the ReplStore interfaces can be
replicated. An interface is a named group of logically related functions.
The interface specifies signatures (such as parameters) for the group of
related functions provided by an interface. The interface does not provide
code for implementing the functions; rather, the code for implementing the
function is provided by objects or by other implementations. Objects that
provide the code for an instance of an interface are said to "support" the
interface. The code provided by an object that supports an interface must
comply with the signature specified within the interface. Thus, in the
example described above, the object store that stores the objects in the
object set must support the ReplStore interfaces in order for the object
set to be replicated. Implementations of the ReplStore interfaces are
provided for each of the file systems within the distributed system 10 in
order to support replication over each of the file systems.
Each ReplStore provides a mechanism for identifying replicated objects on
the local volume. This mechanism is the replicated object ID (ROBID). The
ROBID is an abstraction that encapsulates the identity as well as other
information about an object that is being replicated. The ReplStore
supports routines for serializing and deserializing ROBIDs. The ROBID of
an object provides a mechanism for performing numerous operations. For
instance, an object can be retrieved from storage using information
contained in the ROBID. Further, a component name of an object can be
derived from its ROBID.
Each ReplStore maintains a replicated storage change log 40 (FIG. 2). The
change log 40 includes a number of change items 42 that specify changes
that have been made to objects in the object set. Each change item 42
includes a type field 44, a serialized ROBID field 46 for the object that
is changed, a time field 48 indicating the time that the change occurred
(local time) and a replication information block (RIB) field 50 holding a
RIB that is associated with the change. In the embodiment described
herein, there are five types of changes that may be specified within the
type field 44. These changes are deletion, creation, modification,
renaming, and moving. A deletion occurs when an object is deleted.
Creation occurs when the object is created. A modification occurs when the
contents of the object are modified in some way. A renaming occurs when
the component name of the object is modified and moving occurs when the
object is moved under a new parent in the distributed namespace of the
system.
A cursor 49 is maintained within the change log 40 that acts as an index
into the list of change items 42. The cursor 49 acts as a marker in the
list of change items 42. In addition, a change log may include multiple
cursors. The cursor 49 may take the form of a time stamp. The cursor 49
may, for example, identify the beginning of changes that have occurred
after a point in time.
Every object in an object set that is being replicated is stamped with an
RIB 51 (FIG. 3). The RIB 51 has three fields: an originator field 57, a
change identifier field 55, and a propagator field 57. The originator
field 53 specifies where the last change to the object occurred. The
change identifier field 55, in contrast, identifies the last change to the
object relative to the originator identified within the originator field
53. Lastly, the propagator field 57 specifies the identity of the party
who sent the change to the local site. When an object is changed locally,
the RIB 51 associated with the object is modified to reflect the local
site as the originator and the propagator. The change identifier is
stamped appropriately.
Replication is useful for the distributed system 10 in that it provides
load balancing and availability. Replication provides load balancing by
having more than one copy of an object stored across the distributed
system 10 to limit the load on any one copy of the object. Replication
enhances availability by allowing multiple copies of important objects to
be distributed across the system 10. The enhanced availability increases
the fault resilience of the system. Specifically, by having copies of
important objects distributed across the system 10, users are less
affected by failures within the system that prevent or limit access to
objects. The enhanced availability also enhances the performance of the
system.
The preferred embodiment of the present invention is embodied in a
replication facility 54 (FIG. 4) that is part of the operating system 38.
Nevertheless, those skilled in the art will appreciate that the
replication facility of the present invention may also be implemented in
other environments, including graphical user interfaces. As shown in FIG.
4, the replication facility 42 includes three primary functional
components: a copying component 56, a reconciler component 58 and a
control component 60. The replication facility 54 uses the copying
component 56 for duplication. In addition, the replication facility 54
reconciles copies of object sets using the reconciler component 58 to
ensure that they are consistent with each other. This reconciliation
insures a consistent view of the objects across the distributed system 10.
One level of control exerted by the control component 56 concerns how
replication is invoked. Replication may be invoked manually or
automatically. Manual invocation requires that an explicit request to
replicate be made by a user or other party. The user or other party must
specify the object set and the destination for replication. The
destinations are not specified for each replication cycle; rather a
replica connection is specified initially. The replica connection
identifies the two replicas and the object set that are to be involved in
replication. In contrast, automatic invocation occurs when replication is
triggered by certain events 67 (see FIG. 5) or by the passage of a certain
amount of time (which may be construed as a type of event). Replication
may be prescheduled to occur at fixed time intervals. Another aspect of
control exerted by the control mechanism concerns who may invoke
replication. Replication may be invoked by an appropriately privileged
party.
The preferred embodiment of the present invention provides two types of
replication: public replication and private replication. Public
replication refers to a process that may be performed only by
appropriately privileged parties to produce a "public"copy of an object
set. In public replication, each of the copies of the object set that are
produced cooperates with the other copies to maintain consistency. The
nodes in the namespace that store the public copies, in aggregate, form a
public replica set, and the members of the set keep state information to
maintain consistency among the copies. Access restrictions on the objects
are preserved. Changes that occur in a public copy of an object set are
reconciled with other public copies.
Private replication | | |