WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state    
United States Patent6233623   
Link to this pagehttp://www.wikipatents.com/6233623.html
Inventor(s)Jeffords; Jason (Dover, NH), Dev; Roger (Durham, NH)
AbstractA method and apparatus for accessing resource objects contained in a distributed memory space in a communications network, including dividing the distributed memory space into a plurality of memory pools, each pool containing a collection of resource objects, providing a plurality of resource manager objects, each resource manager object having an associated set of memory pools and a registry of network unique identifiers for the resource objects in those pools, and accessing a given resource object via its network identifier. Another aspect of the invention is to provide a relativistic view of state of a plurality of objects, each object generating a state vector representing that object's view of its own state and the state of all other objects, each object sending its state vector to other objects, and each object maintaining a state matrix of the state vectors.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6233623
Replicated resource management system for managing resources in a
     distributed application and maintaining a relativistic view of state - US Patent 6233623 Drawing
Replicated resource management system for managing resources in a distributed application and maintaining a relativistic view of state
Inventor     Jeffords; Jason (Dover, NH) , Dev; Roger (Durham, NH)
Owner/Assignee     Cabletron Systems, Inc. (Rochester, NH)
Patent assignment
All assignments
Publication Date     May 15, 2001
Application Number     08/585,054
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     January 11, 1996
US Classification     719/316 711/173
Int'l Classification    
Examiner     Oberley; Alvin E.
Assistant Examiner     Courtenay III; St. John
Attorney/Law Firm     Wolf, Greenfield & Sacks, P.C.
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/670 395/671 395/672 395/673 395/674 395/675 395/676 395/677 395/678 711/1 711/2 711/3 711/4 711/5 711/1 711/2 711/3 711/4 711/5 711/1 711/2 711/3 711/4 711/5 709/1 709/100 709/101 709/102 709/103 709/104 709/105 709/106 709/107 709/108 709/100 709/101 709/102 709/103 709/104 709/105 709/106 709/107 709/108 709/100 709/101 709/102 709/103 709/104 709/105 709/106 709/107 709/108 709/316
Patent Tags     replicated resource management managing resources a distributed application maintaining relativistic view state
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5619682
Mayer et al.

Sep,1997

[0 after 0 votes]
5852719
Fishler et al.

Sep,1997

[0 after 0 votes]
5485455
Dobbins et al.

Jan,1996

[0 after 0 votes]
5261044
Dev et al.

Nov,1993

[0 after 0 votes]
5263165
Janis

Nov,1993

[0 after 0 votes]
5179554
Lomicka et al.

Jan,1993

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of accessing resource objects contained in a distributed memory space in a communications network comprising:

dividing the distributed memory space into one or more memory pools, each memory pool comprising a dedicated plurality of segments of the distributed memory space for storing a collection of resource objects, each resource object being a software object having a network unique identifier and containing methods and attributes;

providing a plurality of resource manager objects, each resource manager object having an associated set of memory pools and a registry of the network unique indentifiers for the resource objects in the associated set of memory pools; and

accessing a given resource object via its network unique identifier in the registry of the resource manager object.

2. The method of claim 1, further including providing a plurality of processes to perform a distributed processing function, each process having an associated resource manager object, and the plurality of processes sharing the methods and attributes of the resource objects in at least one associated memory pool in order to provide the distributed processing function.

3. The method of claim 1, further comprising constructing a distributed application, utilizing a plurality of processes located on a plurality of host systems in the network, by providing each process with an application interface to the resource manager objects.

4. The method of claim 1, further including transporting distributable representations of resource objects to the resource manager objects for an associated memory pool.

5. The method of claim 1, wherein each resource object includes one or more methods for changing an attribute's value and one or more methods for sending the attribute's value to the resource manager objects of the associated memory pool.

6. The method of claim 1, wherein the resource manager objects provide location-independent object-level messaging based on objectOIDs.

7. The method of any one of claims 2 and 3, wherein the processes perform one or more of the following steps:

(a) examine the resource objects;

(b) use the resource objects;

(c) instantiate the resource objects;

(d) delete resource objects from the associated memory pool;

(e) change the attributes of the resource objects;

(f) synchronize the change of attributes of the resource objects;

(g) receive attribute change notifications for the resource objects;

(h) use any one of object level messaging and object level remote procedure calls for any one of steps (a) through (g).

8. The method of any one of claims 1, 2 and 3 wherein the resource manager objects perform one or more of the following steps:

(i) object transport;

(j) object level message delivery;

(k) automatic object replication;

(l) resource manager object synchronization;

(m) failure detection.

9. The method of claim 2, wherein the plurality of processes communicate through object level messaging and object level remote procedure calls.

10. The method of claim 2, further comprising:

assigning ownership of each of the resource objects to any one of the processes, wherein work requested by calling a method of an owned resource object is performed by the process that owns the called resource object.

11. The method of claim 2, wherein each resource manager object maintains state information which is used for one or more of the following:

coordination of work among the processes;

determining validity of data obtained from the processes;

fault detection and isolation;

voting and agreement paradigms.

12. The method of claim 2, wherein the resource manager objects perform process level failure detection.

13. The method of claim 2, wherein the processes are organized in a peer-to-peer configuration.

14. The method of claim 2, wherein ownership of each resource object is assigned to one of the processes, and the work requested by calling a method of an owned resource object is performed by the process that owns the called resource object.

15. The method of claim 2, wherein each memory pool is assigned a pool identifier which is unique across process boundaries.

16. The method of claim 2, wherein each resource manager object is assigned a resource manager object identifier which is unique across process boundaries.

17. The method of claim 3, wherein the distributed application is a network switch control application, and memory pools are provided relating to the following services: network topology; policy; directory; and calls.

18. The method of claim 3, further comprising applying application specific logic for deriving application specific objects from the resource objects.

19. The method of claim 10, wherein:

if one of the processes fails, the ownership of resource objects of the failed process is automatically reassigned to another one of the processes.

20. The method of claim 10, wherein when a first joining process of a first resource manager object seeks to join a first memory pool, the first memory pool is synchronized by:

the first resource manager object sending a join pool request to the other resource manager objects;

each resource manager object already joined in the first memory pool responding to the request by sending all the resource objects owned by its associated process to the joining process; and

the first resource manager object sending all of the resource objects owned by the joining process to the already joined processes.

21. The method of claim 11, wherein each resource manager object maintains state information for a contact status determination and for its associated memory pools and the resource manager objects synchronize the state information.

22. The method of claim 11, wherein the state information forms a state matrix and synchronization requires the state matrix to be determinant.

23. The method of claim 11, wherein each resource manager object maintains a relativistic view of state based upon state information received from other resource manager objects.

24. The method of claim 14, wherein the network unique identifier is an object identifier (OID) that is unique across process boundaries, and the OID comprises:

PoolOID.ServiceAddress.InstanceID where:

PoolOID identifies the memory pool which contains the resource object;

ServiceAddress identifies the process that created the resource object; and

InstanceID is a unique ID identifying an instance of an object under the prefix given by PoolOID.

25. The method of claim 14, wherein upon failure of one of the processes, each resource object owned by the failed process calls an ownership arbitration method in order to assign ownership to another process.

26. The method of claim 14, wherein object ownership is reassigned in order to redistribute the processing load.

27. The method of claim 14, wherein object ownership is reassigned in order to allow a resource manager object to leave an associated memory pool.

28. In a communications network having a distributed memory space in a plurality of hosts, apparatus for managing the distributed memory space comprising:

the distributed memory space being divided into a plurality of memory pools, each memory pool comprising a dedicated plurality of segments of the distributed memory for storing a collection of resource objects, each resource object being a software object having a network unique identifier and containing methods and attributes; and

a plurality of resource manager objects located on different hosts in the network, each resource manager object having an associated set of memory pools and a replicated set of resource objects for the associated memory pools.

29. In a distributed computing method, wherein a number of cooperating processes require access to resource objects, the improvement comprising:

a) providing a distributed memory space containing resource objects;

b) providing a plurality of pool objects, each pool object identifying an associated set of resource objects for dividing the resource objects in the distributed memory space into pools, wherein each pool comprises a plurality of dedicated segments of the distributed memory;

c) providing each cooperating process with a resource manager object object, each resource manager object object identifying an associated set of pools in which the cooperating process requires access to the contained resource objects;

d) each resource manager object object replicating the resource objects in its associated set of pools and providing access by the cooperating process to the replicated resource objects; and

e) each resource manager object object synchronizing its state with the other resource manager object objects.

30. The method of claim 29, further comprising the step of:

storing the resource objects in a persistent storage medium.

31. The method of claim 29, wherein the synchronizing step includes:

determining whether the resource manager object object is in an active state.

32. The method of claim 31, wherein the synchronizing step includes:

determining the state of the pool objects.

33. The method of claim 29, further comprising:

assigning ownership of each resource object to any one of the processes, wherein the owning process is responsible for the work performed by the owned resource object.

34. The method of any one of claims 29 and 32, wherein the determining step includes:

generating a state vector comprising the object's view of itself and relative view of other objects.

35. The method of claim 32, wherein the synchronizing step includes:

determining the state of the resource objects.

36. The method of claim 33, further comprising the step of:

a cooperating process leaving the computing function by sending a leave pool message to the resource manager objects already joined to the pool.

37. The method of claim 33, further comprising the step of:

the resource manager objects determining a distribution of workload among the processes.

38. The method of claim 33, further comprising the step of:

a new cooperating process joining the computing function by sending a join pool message to the resource manager objects.

39. The method of claim 34, wherein the determining step comprises:

determining whether the state vector is determinant.

40. The method of claim 34, wherein the determining step comprises:

generating a state matrix of state vectors.

41. The method of claim 34, further comprising:

applying application specific logic to the state vectors by derivation to provide application specific state vectors.

42. The method of claim 40, wherein the determining step comprises:

determining whether the state matrix is determinant.

43. The method of claim 40, further comprising:

applying application specific logic to the state matrices by derivation for providing application specific state matrices.

44. The method of claim 43, further comprising:

determining the contact status of the cooperating processes from the application specific state matrices.

45. Apparatus for performing a distributed computing function in a system having a plurality of hosts, each host having a local processer and memory apportioned into one or more pools, where each pool comprises a dedicated plurality of segments of the memory, the apparatus comprising:

a first host including a first process and a first resource manager object which identifies one or more pools containing resource objects which the first process requires access to;

a second host including a second process and a second resource manager object which identifies one or more pools containing resource objects which the second process requires access to; and

each resource object being contained in the local memory of the host having the process which requires access to the resource object;

wherein each of the first and second processes can access the resource object in local memory contained in the same host as the process.

46. In a system comprising a plurality of hosts and a connection device for enabling communication between the hosts, each host having a local processor and local memory apportioned into one or more pools, each pool comprising a dedicated plurality of segments of the local memory, and the combined local memories comprising a distributed memory space, a software system for enabling the hosts to perform a distributed computing function comprising:

a plurality of cooperating processes contained on different hosts;

each cooperating process having a resource manager object identifying an associated set of pools in which the cooperating process requires access;

a plurality of pool objects for dividing the distributed memory space into pools, each pool object identifying an associated set of resource objects contained in the distributed memory space; and

each host which contains a cooperating process having the pool objects and the resource objects to which the cooperating process requires access.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates to a system for controlling and coordinating distributed applications and for maintaining a relativistic view of state which can be used, for example, to insure a consistent view of resources in a distributed computing function.

BACKGROUND OF THE INVENTION

Distributed processing has many different forms and embodies many different techniques depending on the nature of the data and the objectives of a given application. Typical objectives include: transaction performance, locality of data, minimization of network traffic, high availability, minimization of storage requirements, extensibility, cost to produce, etc. Many applications have certain objectives of overriding significance which dictate the distributed processing technique employed.

There is one type of application for which the overriding concern is high availability (resiliency). This type includes switching systems used in a communications network, as well as control systems in the areas of avionics, industrial control, and stock trading. In these systems, it is assumed that any component may fail and the system must continue to run in the event of such failure. These systems are designed to be "fault-tolerant" usually by sacrificing many other objectives (e.g. cost, performance, flexibility and storage) in order to achieve high availability.

Traditionally, fault-tolerant systems have been built as tightly coupled systems with specialized hardware and software components all directed toward achieving high availability. It would thus be desirable to provide a more generally applicable technique for achieving high availability without reliance on specialized components.

Another important factor for a distributed application is the need to know the relative state of a plurality of peer processes before certain actions occur. Such coordination of effort requires that each process know not only what it thinks about the state of the other processes, but also what the other processes think about its state. This is called a "relativistic" view of process state.

For example, assume a system has been developed that uses four cooperating processes to perform a task. Also assume that all four processes must be active before the coordinated task can begin; an active process is defined as a process that has successfully contacted its peers. On system start-up, each process must contact all other processes. After contacting its peers, each process must wait until all other processes have contacted their peers. Thus, only after all processes have contacted their peers, and all the peers know of this contact status, can the task begin.

This is actually a very common situation when attempting to coordinate a task among several processes. It would be desirable to provide a mechanism for gathering a relativistic view of state among various processes.

SUMMARY OF THE INVENTION

In one aspect, the present invention is a method for constructing high-availability distributed applications. This method, referred to herein as "Replicated Resource Management" (RRM), provides a core set of services which enable the replication of data (e.g., state) information in a plurality of processes and optionally the sharing of workload by those processes. This core set of service enables synchronization of each new RRM process with existing RRM processes, recovery in the event of a process level or other failure, and an application level interface (to the RRM) that enables easy and effective use of RRM services in a distributed application.

RRM utilizes a distributed memory space, which an RRM process may examine at will. An RRM process may join and leave a distributed computing function, also at will. The shared memory space persists across process boundaries so long as at least one RRM process remains (or a persistent store is employed).

An analogy may be drawn between RRM and a long-running meeting or plurality of meetings in different rooms. People may come and go from a given meeting, and the meeting may continue indefinitely. As people enter a meeting, they are given information that allows them to participate in the meeting. As they leave, the meeting continues, using information they may have supplied. As long as at least one person remains in the room, the meeting can be said to be continuing (in session). If the last person leaves the room, all information gathered and distributed in the meeting may be recorded, so that an interested party may retrieve this information at a later date.

In the above analogy, the meeting room is analogous to a shared memory space or resource pool. The people participating in the meeting are analogous to RRM processes, and the information exchanged is analogous to RRM resources. This is illustrated in FIG. 1, where each RRM process has an associated resource manager (RM) which identifies several resource pools of interest, i.e., several meetings which are of interest to the person. In this way, a resource manager (process) participates in just its areas (pools) of interest, maintains a registry of resources in these associated pools, and includes mechanisms for communicating with other resource managers (processes) in order to maintain a consistent view of the state of the resources in the associated pools.

In another aspect, the invention provides a mechanism which allows a distributed application to develop a "relativistic view" of the state of its processes. More generally, the mechanism may be used to provide a relativistic view of any process or object. In the following discussion, we refer to objects instead of processes. The invention provides three procedures for defining a relativistic view of state, which may be used separately or together.

First, a state vector is provided. This is a one-dimensional associative array of logical object name to logical object state. The name is an "index" into the vector, and the state is stored in a "slot" associated with the index. A state vector is generated by an object and describes what that object thinks is the state of all objects in the vector. It is the object's view of itself and "relative" view of all other objects. A state vector is illustrated in FIG. 4.

Second, a state matrix (see FIG. 5) is a two-dimensional associative array composed of state vectors. Rows and columns of the matrix are indexed by logical object name; the intersection of a row and column defines a "slot". A row describes what an object thinks is the state of all objects in the row (this is the same as the state vector described above), while a column describes what all objects think is the state of a single object. General rules are provided for determining whether each of the state vectors and/or state matrices is determinant. For example, a determinant contact status may be achieved when every active resource manager agrees on the present state of all other active resource managers.

Third, application specific logic (ASL--i.e., a set of rules specific to an application) may be applied to the general rules associated with state vectors and state matrices for providing application specific state vectors (ASSVs) and application specific state matrices (ASSMs), respectively. Because state vectors and state matrices are objects, ASL may be added to them by derivation. As illustrated in FIG. 6, a zero, cardinality relationship (o, m) indicates that a state matrix is composed of zero or many state vectors. This relationship is inherited by the application specific state vectors and application specific state matrices. Again, by way of example, the ASSVs and ASSMs may be used to determine the contact status of processes in a distributed application.

These and other features of the present invention will be more fully understood by the following detailed description and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic system level overview of the replicated resource management (RRM) system of this invention;

FIG. 2 is a schematic illustration of communication between resource managers RM1-RM4;

FIG. 3 is a schematic architectural view of hosts A and B, each having their own processes and resource managers, but sharing a distributed memory space containing a number of resource pools;

FIG. 4 is an example of a state vector;

FIG. 5 is an example of a state matrix comprising a plurality of state vectors;

FIG. 6 is a schematic illustration showing derivation of application specific state vectors (ASSV's) and application specific state matrices (ASSM's);

FIG. 7 is an example of a state matrix for a distributed application, showing the contact status of its component processes;

FIG. 8 is a flow chart of a method for updating a state matrix; and

FIG. 9 is a schematic illustration of a general purpose computer for implementing the invention.

DETAILED DESCRIPTION

RRM Overview

A glossary of terms is included at the end of the detailed description. A particular replicated resource management (RRM) system described herein is implemented in the C++ object-oriented programming language.

FIG. 1 is a system level overview of the major RRM components. A plurality of resource managers (10, 11, 12, 13, 14) are provided, each having an associated set of resource pools (16, 17, 18). A distributed memory space (24) exists across the network for the duration of an RRM processing session, and is segmented by the resource pools (16, 17, 18). Each resource manager (RM): participates in the distribution and synchronization of the resource objects (20) in its associated pools; includes a central registry of resources in its associated pools; and contains mechanisms for communicating with other resource managers. As described hereinafter, a resource manager specifies its pools of interest to all other active resource managers, after which time it will receive all of the resource objects in the pools of interest.

A resource pool (16, 17, 18) is a distributed object whose purpose is to collect and organize a coherent set of distributed resources. Resource pools are embodied through a set of one or more resource managers on one or more host systems (see FIG. 3).

A resource object (20) has attributes (data) that are significant to the overall system. Resource objects are contained in resource pools and like the resource pools, are distributed objects. Resource objects exist in all the same resource managers as their containing resource pool. Resources are sub-classed by users of a system in order to provide application level object information, i.e., users derive their objects from a resource class. In this way, the user's information is replicated across all applicable resource managers.

Applications run on top of the RRM system (i.e., they use the RRM's services). Several different applications may view the same resource pools at any given time. This allows information to be shared between applications performing different tasks on, or with, the resource objects maintained by the RRM system.

FIG. 2 illustrates four resource managers (30-33), exchanging information in order to maintain a coherent set of distributed resources. This exchange of information will be further described below in regard to the maintenance of a relativistic view of state between objects.

FIG. 3 illustrates a distribution of RRM components in a communications network. For example, host A (40) has a plurality of processes designated as Process A1 (42) and Process A2 (43). Each process contains a resource manager (44 and 45 respectively). The processes A1 and A2 each have their own pools of interest, designated by arrows directed to shared resource pools (50-51). The pools, designated as resource pools X and Y, each contain one or more resources (52-55), designated as X1 through XN, and Y1 through YN, respectively. As shown in FIG. 3, Process A1 is interested in Resource Pool X; Process A2 is interested in both of Resource Pools X and Y. A second host B (60) contains its own set of processes, designated as Process B1 (62) and Process B2 (63), with their own resource managers (64, 65). Process B1 is interested in Resource Pool X, while Process B2 is interested in both of Resource Pools X and Y.

The following is a more detailed description of RRM operation and services, with subheadings provided for ease of understanding.

RRM Services

RRM provides many services--some of which are transparent to applications, while others are used by applications.

Internal services are transparent to the applications, i.e., the applications see the effects of these services but do not need or want to access them directly. Some of the more important transparent services may include:

process and network status (state) determination

message and object transport

object-level message delivery

automatic object and state replication

RRM synchronization and constancy maintenance

basic fail/over capability

External (application level) services are used by applications directly through the RRM's application interface (API). The major services may include:

distributed object-level messaging and remote procedure calls (RPCs)

object-level contact established/lost notification

distributed object-level attribute changes and attribute change notification

hooks for application level object ownership arbitration and ownership change

Each application that wishes to use the RRM will:

include the RRM subsystem;

initialize the RRM subsystem with:

its peers (i.e., other processes this RRM should communicate with); and

the resource pools this application is interested in.

The processes that compose an RRM system may be started in a completely asynchronous manner, that is, they may be started in any order and at any time. Therefore, it is important that each resource manager handles or avoids all potentially destructive/dangerous start-up interactions.

When an RRM process starts, it attempts to contact all other RRM processes it has been configured to know about. If no other RRM processes exist in the network, the RRM process starts itself in stand-alone mode. This allows an application to run nearly as quickly as it would run without the RRM capability.

If an RRM process determines that it is running in a network with other RRM processes, it establishes a dialog (connection) with the other processes in the network. During this discovery process, each RRM process learns about the relative state of all other RRM processes in the network. This is done through a resource manager state vector exchange protocol, i.e., the use of state matrices filled by the vectors received at each resource manager (see subsequent discussion). Once a stable (deterministic) network state has been achieved (all active RRM processes are reporting the same state information about each other), synchronization of resource pools is initiated.

Similar to synchronization of peer processes, initialization also involves synchronization of the resource pools in which a given application has an interest. This is described in greater detail below under "Consistency and Synchronization."

Once the RRM subsystem is initialized with the above information, its services may be used by the application. At this point, the application may do the following:

examine/use the resource objects in the resource pools

instantiate new resource objects in the resource pools

delete resource objects from the resource pools

change the attributes of resource objects

receive attribute change notifications for resource objects

use object-level messaging and object-level remote procedure calls

receive contact lost/established notifications

RRM application level services are provided almost entirely through the resource object base class. Thus, an application that wants to use the RRM will need to derive its distributable/managed objects (resources) from the RRM's resource object.

When using the RRM, an application developer needs to decide what objects/information should be distributed to peer applications. This decision can involve many complex trade-offs between overall system performance, resiliency, and application level requirements.

RRM performance can be difficult to quantify as it is greatly influenced by application level design and constraints. However, performance within the RRM is affected by two major factors:

distribution and synchronization of state

parallelism of processing in RRM processes

RRM processes maintain a distributed, shared memory space. As such, each resource object's state information is accessible within each participating RRM process. This "sharing of state" comes with a cost, both in terms of maintaining state synchronization information and in actually passing state information between RRM processes.

This cost can be offset through the parallelism inherent in RRM processes. Since several RRM processes can use the shared state information simultaneously, and can communicate through object level messaging and remote procedure calls, the cost of state maintenance is offset through the distribution of processing load. Carefully designed applications can use the facilities of the RRM to enhance performance, while simultaneously gaining resiliency.

Resiliency, which can be one of the major benefits of RRM, is gained through the sharing of state (resource objects) and the concept of resource object ownership. Each RRM process "owns" certain resource objects. Object ownership denotes what process is responsible for performing all processing for the "owned" object. In short, the owner of an object does all work associated with that object. When a process fails, all objects owned by the failed process must have their ownership changed to processes that are still participating in the distributed computing function. This is done automatically by the RRM (see "Failure and Recovery" below).

Applications that wish to be resilient must decide what state information needs to be distributed so that they may recover in the event of a process failure. Consider the following example: a network switch control system consisting of one active switch controller and one hot-standby controller. In the event of failure of the first switch controller, the hot-standby takes control of the switches managed by the first controller. The goal of this system is to provide virtually uninterrupted network service in the event of switch controller failure.

Varying degrees of resiliency and network service can be supplied by this system. If a very high availability system is desired, the following state information may be distributed:

the network's physical topology (switches, edge devices, and links); and

all connections present in the network.

Given this information, the hot-standby controller could non-disruptively take control of the switched network in the event of controller failure. However, several other levels of service may be offered depending on the type and amount of state information being distributed. At least two other schemes could be employed:

propagate no resource objects, but use the RRM's process level failure detection capabilities; or

propagate topology information by creating resource objects that represent the network topology.

The first option would provide resiliency through process level failure detection, but would cause major disruptions in network availability. In this case, the switch controller that takes over the network will have no information about the current state of the network. Thus, before it could take control of the network it would have to discover or confirm the current physical topology of the netwo