|
Claims  |
|
|
We claim:
1. A method for operating a computing apparatus having a plurality of
nodes, each node having storage means for storing a plurality of data
items and means responsive to a request for accessing a specified data
item, and communication means interconnecting selected nodes, the method
including the steps of controlling access to the specified data item and
controlling the modification of copies of the data item, characterized by
the steps of:
distributing data access control to each node of the apparatus; and,
responsive to a request having a specified currency, dynamically
replicating data items while selectively deferring conformation of the
replicated data;
such that the most current data items migrate to their respective affinity
nodes.
2. The method of claim 1, wherein said replicating step is further
characterized by:
selectively updating a data item at a request processing node in connection
with the processing of a request at said request processing node while
selectively deferring conformation of copies of the updated data item at
other nodes.
3. A method for operating a computing system including a plurality of
nodes, with each node having means for storing at least one data item,
comprising the steps of:
responsive to a request having a specified currency, dynamically
replicating data under distributed system control, while
selectively deferring conformation of the replicated data.
4. A method for operating a computing apparatus to dynamically replicate
data under distributed system control comprising the steps of:
storing a data item at each of two nodes;
storing at each of the nodes a dipole half for said date item, the dipole
half stored at a first node describing the status of the data item at the
second node assumed by said first node;
responsive to a request for access to said data item at said first node,
(1) determining from the dipole half stored at said first node if the
status at the second node assumed by the first node is in conflict with
the request, (2) if in conflict, resolving the conflict by communicating
with said second node, and (3) when the conflict is resolved, granting the
request for access.
5. A method for operating a computing apparatus including a plurality of
nodes, with each node having means for storing at least one data item and
means responsive to a request for accessing a specified data item, and
communication means interconnecting selected nodes, the computing
apparatus being operated in response to a request for accessing a
specified data item and controlling the modification of copies of the data
item at a plurality of nodes, the method characterized by the steps of:
generating a data request specifying a required currency for a specific
data item at the node of the request;
determining if the node of the request views any other node as having for
said specific data item a currency status which is in conflict with said
required currency;
resolving currency conflicts; and thereafter
granting the request for accessing the specified data item.
6. A method for operating a first node of a multinode computing apparatus,
said first node including means for storing at least one data item, means
responsive to a request for accessing a specified data item, and a port
for communicating data and control signals with respect to at least one
other node, said first node being operated in response to a request for
accessing a specified data item and controlling the modification of copies
of data items at said first node, the method characterized by the steps
of:
generating a data request specifying a required currency for a specific
data item at said first node;
determining if said first node views any other node as having for said
specific data item a currency status which is in conflict with said
required currency;
communicating a message containing a data request to other nodes and
receiving from each such other node a response resolving any currency
conflicts; and thereafter
granting the request for access to the specified data item.
7. The method of claim 6, further characterized by the steps of:
storing or enabling generation of a set of dipole halves at said first
node, with a dipole half in said set for each other node with which said
first node shares a data item; each such dipole half specifying the status
assumed by said first node for the shared data item at the other node.
8. The method of claim 7, further characterized by the step of:
specifying in each such dipole half the currency control, quality control,
and responsibility control assumed by said first node for the shared data
item at an other node; wherein said currency control describes the
currency of the contents of the shared data item at said other node
assumed by said first node and is established as one of exclusive, unique
clean, shared clean, prior, or not accessible; wherein said quality
control defines the relative quality of the shared data item at said first
node and at or through said other node; and wherein said responsibility
control specifies whether said other node has agreed to always be able to
obtain a copy of said shared data item at the currency given by said
currency control on behalf of said first node.
9. A method for operating a node of a computing system, said node including
means for storing at least one data item and control information for use
in controlling access to such data item, and a port for communicating data
and control signals with respect to at least one other node, the method
characterized by the steps of:
responsive to a request specifying a required currency for a specific data
item at said node, determining if said node views any other node as having
for said spcific data item a currency status which is in conflict with
said required currency; and
for each such currency conflict, generating a message containing a data
request to resolve the conflict for communication to the other node.
10. The method of claim 9, further characterized by the steps of:
responsive to a request for access to a data item, generating a request
quark specifying the currency control set, quality control and
responsibility control required by the said request; and
storing at said node in a status and control file for each data item stored
at said node a set of dipole halves, with one dipole half in said set for
each other node with which said node shares such data item, each such
dipole half specifying the status of the shared data item at the other
node assumed by said node.
11. A method for operating a first node of a computing system, said first
node including means for storing at least one data item, means responsive
to a request for accessing a specified data item, and a port for
communicating data and control signals with respect to at least one other
node, said first node being operated in response to the request for
accessing the specified data item and controlling the modification of
copies of data items at said first node, the method characterized by the
steps of:
storing or enabling generation of a set of dipole halves at said first
node, with a dipole half for each other node with which said node shares a
data item;
responsive to a request quark specifying for a given data item the dipole
half desired at said first node by a related node and the dipole half at
the related node for said first node, executing a quark processing unit
comprising the following steps:
determining if any dipole halves stored at said first node for said given
data item have a conflicting status;
generating a conflict quark for each dipole half having a conflicting
status for communication to the other node having the conflicting status;
waiting for a response to each communicated conflict quark;
modifying dipole halves stored at said first node for other nodes as
required to reflect processing of said request quark, and modifying the
dipole half at said first node for said related node as required to
reflect processing of said request quark; and
selectively generating a response quark or otherwise responding to the
originator of said request quark.
12. A method for operating a computing system including a plurality of
nodes interconnected by a communication network; each node storing in a
data file a plurality of data items which may be unique or copies of data
items stored at other nodes; each node, responsive to a request for access
to a specified data item by an application, controlling the applications's
access to said stored data item; and each node selectively communicating
the updated data item to other nodes storing copies of the specified dta
item; wherein improvements to the controlling, storing, and communicating
steps provide dynamic replication of data under distributed system control
and comprise the steps of:
storing a dipole with respect to a data item shared by a node pair, a
dipole half being stored in each node of the node pair specifying the
status, assumed by this node, of the shared data item at a related node,
with the data state of a given data item at this node being defined by the
set of dipole halves stored at this node with respect to all related nodes
sharing copies of that given data item;
responsive to a request for access to a data item from an application at
this node having a copy of the data item at a data state consistent with
the request, granting the request without network interaction with related
nodes;
responsive to a request for access to a data item from an application at
this node having a data state conflict with the request, negotiating a
dipole change with each related node with which this node receiving the
access request shares a conflicting dipole, and upon resolution of all
such conflicts, granting the request; and
responsive to an update request which has been granted, storing the updated
data item at this node and only thereafter in connection with an
application request communicating from some other node, or in the course
of data base conformation processing, communicating the updated item to
that other node. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF INVENTION
1. Field of the Invention
This invention relates to new and useful improvements in methods of
operating general purpose digital computing systems. More specifically, it
relates to a method for utilizing data storage and communication resources
in a multiprocessing, distributed data base system by dynamically
replicating data under distributed system control.
2. Description of the Prior Art
A multiprocessing general purpose computing system typically includes a
plurality of nodes interconnected by a communication network. Each node,
in such a system, may include a data processor, a data storage device, and
communication ports. A data processor may be executing, in a
multiprogramming mode, under control of a plurality of operating system
components, in which event the data processor may be considered a
plurality of nodes. The data storage device stores a data file, the
operating system and its information management components, and user
application programs.
Data is information abstracted from some aspect of business important to an
enterprise. The challenge is to utilize data storage and communication
resources of the system so as to give end users access to the data with an
availability, performance, and cost commensurate with their business
needs. Access to the data must be controlled to ensure the consistency and
integrity of the data. Among additional characteristics of data accesses
in a distributed data processing environment are geographic and temporal
affinity. The basis for distributed data structures is geographic
affinity: accesses to a given data item tend to cluster geographically. A
basis for the method for dynamic replication of data is temporal affinity:
data items which have been accessed recently may be more likely to be
accessed in the near future than data items not recently accessed. The
node at which accesses for a given data item tend to cluster is called the
affinity node; the affinity node for a given data item may not be known
ahead of time, and it may vary with time.
Distributed data technology may be categorized according to the attributes
of data location, degree of data sharing, degree to which data base
management control is provided network-wide, and to the type of data
access. Data location may be centralized, partitioned, or replicated.
Degree of data sharing may be centralized, decentralized, or distributed.
Data base management control may be user provided (distributed data) or
system provided (distributed data base). Data access may be by transaction
shipping, function shipping, or data shipping.
Historically, a centralized approach has been used for managing data base
storage and accesses. In this approach, both data management and
application processing are centralized. A single data base manager is
used, and the teleprocessing network is used to connect users to the
central facility. In a variation on the centralized approach, some of the
processing is distributed among nodes in a network, but the data is kept
centralized.
The advantages of a centralized data base approach are that (1) data base
integrity can be ensured by the single data base manager; (2) all
application programs can be written to a single application programming
interface: application programs need not be aware of data location since
all data is stored in one location; (3) many tools are available to solve
the problems of administering data in the centralized environment; and (4)
a single system is easier to operate, maintain and control.
Some disadvantages to the centralized approach are: (1) communication costs
are high for some enterprises: application performance may be degraded due
to communication delays; (2) data availability may be poor due to
instability in the teleprocessing network or the central system, which may
have to be mitigated by backup systems and redundant communication; and
(3) the processing capabilities of a single system have already been
reached by some enterprises.
Two approaches for distributing data to the nodes of a distributed data
system are (1) partitioning and (2) static replication. In the partitioned
data approach there is no primary copy of the data base, whereas there may
be in static replication.
A partitioned data base approach divides the data base into distinct
partitions, and the partitions are spread among the nodes. A given data
item resides at only one node location. Each location has a data base
manager which manages the data at its location. A data distribution
manager takes a data request from an application program and maps it to a
local request, if the data is held locally, or to a remote request if the
data is held at another location.
Good data availability and access performance result in a partitioned
distributed data base if the data required is held locally. Furthermore,
data base integrity is facilitated since each data item is managed by a
single data base manager. These results may be achieved if a good
partitioning algorithm exists, is known ahead of time, and is stable.
In a partitioned data base, the system must provide a network-wide scope of
recovery for programs which change data at more than one location.
Among the disadvantages of a partitioned data base system are (1) reduced
availability and performance result if the partitioning algorithm does not
match the data access patterns; (2) the application program may have to be
aware of the data location, or at least the data partitioning algorithm,
and access the data base differently, depending upon data location; (3)
changing the data base partitioning algorithm is very difficult because
data location is reflected in the application programs, exits, or
declarations at each node; (4) existing data relocation and algorithm
changes at each node must be synchronized network-wide, and therefore the
partitioning algorithm may not be adjusted as needed to maintain optimum
performance and availability; and (5) programs which access data items
uniformly across a partitioned data base, or which must access the entire
data base, will suffer poor performance and availability.
Static replication techniques for distributing data include those with and
without a central node. In the former, the central location stores a
primary copy of the data base, and each location has a data base manager
and a copy of the data base. In typical uses of static replication, the
primary data base is copied and sent to each replica location, or node,
where the data then becomes available for local processing. Data
modifications made at each replica location are collected for later
processing against the primary data base. Between periods of application
processing, local modifications are sent to the central location and
applied against the primary data base. Because this technique for managing
replicated data bases does nothing to prevent multiple updates, the
occurrence of such must be detected during primary data base update and
resolved manually; otherwise, the application must be restricted so that,
somehow, multiple updates do not occur. After the primary data base has
been made to conform to replica changes, new copies are sent to the
replica locations, and the whole process starts over again.
The main advantage of static replication with a primary copy at a central
location is high availability and good response time since all data is
locally accessible. However, significant disadvantages exist, among them:
(1) because the system does not prevent multiple updates, data base
integrity is difficult to ensure, severly restricting the data base
processing which is feasible for static replicas; (2) the system does not
ensure current data for application accesses requiring such; (3) special
operational procedures are required for collecting and applying replica
modifications to the primary data base, which can be costly and prone to
error: typically, primary data base conformation occurs in the middle of
the night, and since this is when problems are most likely to be
encountered, key personnel must be available; and (4) the data base may
not be available during the conformation procedure: providing a large
enough window for conformation is not feasible in many applications, the
data transmission bandwidth may be unnecessarily large because updates and
replicas are transmitted only during the narrow window between periods of
operation, and if one or more of the nodes is incapacitated, then
conformation may not be possible in the scheduled window.
Many variations have been described in the literature with respect to the
basic techniques of static replication described above. The application
can be designed so that multiple updates do not occur, or the replicas can
be limited to read accesses only. The application program can collect
updates itself for later transmission to the primary location, or this
information can be gleaned from data base manager logs. Full replicas or
only partial replicas can be formed at the replica locations. The entire
replica data base or only changes to data held can be transmitted.
Replicas can be continually synchronized by sending modifications made by
a transaction to the various nodes and receiving acknowledgments as part
of transaction termination processing. Such techniques of synchronization
may solve the integrity problems of static replication, but lose much of
the performance and availability benefits.
U.S. Pat. No. 4,007,450 by Haibt describes a distributed data control
system where each node shares certain of its data sets in common with
other nodes, there being no primary copy at a central location, but
replicas are continually synchronized. Each node is operative to update
any shared data set unless one of the other nodes is also seeking to
update, in which event the node with the higher priority prevails. Each
node stores in its memory the node location of each shared data set and
the updating priority each node has with respect to each respective set of
shared data. When a data set is updated at a node, all nodes having a
replica are sent the update. As above, such a technique solves the
integrity problems of static replication, but loses much of the
performance and availability benefits.
A method for utilizing data storage and communication resources in a
distributed data base system is needed which avoids the disadvantages
while maintaining the advantages of the central, partitioned, and static
replication techniques: (1) high availability and performance for data
which is held at the node where it is accessed; (2) high availability and
performance for application programs that can accept data which is
possibly not the most current; (3) data location transparency, such that
application programmers and end users need not be aware of data location
or even of a data partitioning algorithm and the data base appears the
same as in the single system, or central, approach; (4) data automatically
migrates to the locations where it is accessed without the need for a
partitioning algorithm; (5) good performance for programs which access the
data base uniformly: a given node can be made to have a copy of every data
item in the data base, but it need not be the most current; (6) no special
update procedures or windows required, but rather are managed network-wide
by the system; (7) data base integrity, with multiple updates prevented
and application programs receiving data as current as they require; and
(8) the teleprocessing network can be unstable, with the control not
requiring that communication with all or any other node be available to
access data held locally.
SUMMARY OF THE INVENTION
The invention provides a method for operating a computing system including
a plurality of nodes, with each node having means for storing at least one
data item, characterized by the steps of accepting a request having a
specified currency, and responsive thereto dynamically replicating data
under distributed system control while selectively deferring conformation
of the replicated data.
This invention further provides a new and useful method for operating a
multiprocessing system including a communication network interconnecting a
plurality of data processing nodes accessing a distributed data base, the
method including the steps of storing unique and replicated data items at
a plurality of nodes, and responsive to a request by an application at
this node enabling access by the application to the copy of a data item
stored at this node, and communicating copies of updated data items to
other nodes; wherein the improvement controls utilization of data storage
and communication resources to enhance data access performance and ensure
network-wide data consistency and integrity where accesses to data may
have time varying and a priori unknown geographic and temporal affinity
and wherein the communication network may be unstable, by dynamically
replicating data under distributed system control, and comprises the steps
of:
storing a dipole with respect to a data item shared by a node pair, a
dipole half being stored in each node of the node pair specifying the
status of the shared data item at the related node of the node pair
assumed by this node of the node pair, with the data state of a given data
item at this node being defined by the set of dipole halves stored at this
node with respect to all related nodes sharing copies of said given data
item;
responsive to a request for access to a data item from an application at
this node, where this node stores a copy of the data item at a data state
consistent with the request, granting the request without network
interaction with other nodes; otherwise,
responsive to a request for access to a data item from an application at
this node, where this node has a data state inconsistent with the request,
negotiating a dipole change with each related node with which this node
receiving the access request shares a conflicting dipole, and upon
resolving all conflicting dipoles, granting the request; and
responsive to an update request which has been granted, storing the updated
data item at this node and only thereafter in connection with application
requests communicated from related nodes, or in the course of data base
conformation processing with respect to related nodes, communicating the
updated data item to related nodes.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagrammatic flowchart showing the data storage, processing,
and communication resources to be allocated among competing tasks in a
representative configuration of nodes in a distributed data base and
multiprocessing/multiprogramming system.
FIG. 2 is a diagrammatic illustration of a typical data base.
FIG. 3 is a diagrammatic representation of the format of a communication
message.
FIG. 4 is a diagrammatic illustration of a quark processing unit.
FIGS. 5 and 6 are flowchart illustrations of the method steps required to
process a request for access to a data item and a utility program request,
respectively.
FIG. 7 shows the manner of combining FIGS. 7A and 7B into one illustration.
FIGS. 7A and 7B provide a more detailed diagrammatic illustration of the
formats and field interrelationships associated with the quark processing
unit of FIG. 4.
DESCRIPTION OF THE PREFERRED EMBODIMENT
According to the preferred embodiment of the invention, the distributed
data control system components are illustrated in FIG. 1, which shows
three nodes 10, 12, 14 interconnected by communication links 16, 18 and by
links 20, 22 to other nodes in a distributed data base, multiprocessing,
multiprogramming system. No one node is a primary node. Every node
10,12,14, . . . , has the capability of storing some number of data items
in respective data files: 36, 38, 40. Copies of data items are dynamically
created at a node as required to support the processing which occurs at
the node. Physically, a node may be a general purpose computer, or central
electronic complex, such as an IBM System/360 or 370, described in U.S.
Pat. No. 3,400,371 by G. M. Amdahl, et al, and in IBM System/370
Principles of Operation, IBM Publication GA22-7000-6.
As is illustrated by the EMPLOYEE data base of FIG. 2, different
applications have different requirements for the currency of the data they
access. An application program 26, for example, which is updating the
payroll information 30 to compute a raise for a given employee 32 needs to
work with the most current value, herein designated CLEAN, of the PAYROLL
segment 30. A data item is CLEAN, in the view of application program 26,
if its value represents the most recently committed update and if there
are no uncommitted updates outstanding: that is, no updates to the values
in the employee's record, committed or uncommitted, will be allowed at
another node 10, 14. However, another application program 28 which is
preparing a report on job statistics by location is probably content with
JOBHIST segment 34 values which are possibly not the most current, herein
designated PRIOR. PRIOR data allows those transactions which do not
require the most recently committed value for a data item to enjoy
potentially improved performance and availability.
Thus transactions within applications 24, 26 and 28 can have different
currency requirements for the data they access. In generating a request
for a data item, application program 24, would specify the key of the
record sought and the appropriate currency. In this way, applications 24,
26, 28 can take advantage of the increased performance and availability
which can often be achieved by specifying less stringent currency
requirements. Typically, multiple transactions with differing currency
requirements can process the same file simultaneously, and these
transactions can be running at the same or different nodes 10, 12, 14. The
specification of currency which may be performed under control of the
system programmer and data base administrator facilities, by the
application programmer, or by defaults, is concerned with concurrent usage
of a data item where at least one user may be updating the data item. The
term update is used generically to include insertion, deletion, and
modification of data item contents. All updates are "atomic": A data item
will never be read or updated while in a partially updated state with
respect to a single update request. All updates are either committed or
backed out. A committed update is an update that can no longer be backed
out, and it will be made available to other transactions anywhere in the
network, as will be described hereafter.
Referring once again to FIG. 1, the major components of the distributed
data control system will be described. Each node 10, 12, 14 has the
capability of storing some number of data items. In the exemplary
embodiment described herein, a data item may be a DL/1 data base record,
such as is shown in FIG. 2. These data items may be held uniquely at the
node or may be replicas of data items held at one or more other nodes.
This replica data is stored in data files 36, 38, 40, which may be located
in main storage or on separate storage devices. Data which is required to
honor an application data base call from, say, application program 26 can
be requested from another node 10 and stored in data file 38. Existing
data in data file 38 may have to be removed to make room for the new data.
Application programs 24, 26, 28 run under the control of transaction
managers 42, 44, 46, respectively. Examples of transaction managers 42,
44, 46 include IMS/VS DC, described in IBM publication GH20-1260, and
CICS/VS, described in IBM publication GC33-0066.
Data distribution managers 48, 50, 52 ensure data integrity and
consistency, and implement the major portion of the method steps of this
invention, as will be described in greater detail hereafter. Data
distribution manager 48 receives a data call from application 24 via
transaction manager 42, ensures that the requested data is held at the
required currency in data file 36, and then passes the data call to data
base manager 54. Similarly, data base managers 56, 58 are provided at each
of the other nodes 12, 14.
Access conflicts between application programs 24, 26, 28 running at
different nodes 10, 12, 14 are managed using control information in a
format designated dipoles. Each of the two nodes sharing a data item has
one-half of the dipole. A node's dipole half describes that node's view of
the other node's status for the shared data item. It describes, for
example, whether a copy of the data item is available through the other
node, the currency (CLEAN or PRIOR) it is assuming for its copy, and
whether it is authorized to make updates to the data item. The format of a
dipole half (sometimes abbreviated in the figures and text as DIPOLEH)
will be described hereafter in connection with FIG. 7.
The set of all dipole halves held at a node defines the status of the data
at the node, and is referred to as the data state at the node. Data state
information is held in the status and control (SAC) files 60, 62, 64 for
nodes 10, 12, and 14, respectively. Data distribution manager (DDM) 48 can
determine from SAC file 60 whether or not applications 24 at its node 10
can safely reference or update a data item in data file 36 as well as
obtain information about the currency of the data item.
Any item an application program 24 inserts, deletes, or replaces a data
item in data file 36 a record of the update is placed in SAC file 60. It
is not necessary to inform other locations 12, 14 of the updates as they
occur at node 10, and thus, conformation of the copies held at the other
node is deferred.
Data base manager 54 receives data calls from transaction manager 42 (which
have been intercepted by data distribution manager 48 and processed as
hereinafter described), and accesses data file 36 according to the data
call and returns the results to the application through elements 42 and
48. Examples of data base managers useful herein are DL/1 DOS/VS,
described in IBM publication GH20-1246, and IMS/VS, described in IBM
publication GH20-1260. The description of this preferred embodiment will
assume a DL/1 environment.
Communication facilities 66, 68, 70 provide internodal communication, and
the following, for example, may be used as such communication facilities:
ACF/VTAM, described in IBM publication GC38-0282 and IMS/VS DC, supra. The
multiple region option (MRO) of CICS/VS may be used when a plurality of
nodes 10, 12, 14 physically reside on the same central electronic complex.
Processing of a transaction request for a data item at a node is governed
by the data state at the node, which as previously noted is defined by the
set of dipole halves stored in the node's SAC file, with a dipole half
existing for each combination of this node and related node. As each
dipole half describes this node's view of the corresponding data item at
the related node, the state at this node is an exocentric view of the
status of data in the network. Dipole halves are communicated between
nodes 10, 12, 14 over links 16, 18 in messages 72 having the format FIG.
3. Referring to FIG. 3, a message 72 from a source node (say, 12) to a
destination node (say, 14) includes the following fields: source
reflection ID 76, destination reflection ID 78, quark 74 and data 80. If
data 80 accompanies a message 72, its presence will be indicated by a
field in a request dipole half in quark 74.
Quark 74 describes a unit of work, and is used to communicate and respond
to requests for dipole changes and data. It includes the following fields:
requesting node name and data file (RN), this node name and data file
(TN), data item key (K), request dipole half at TN for RN, dipole half at
RN for TN, and response desired indicator (RDI). Processing at source node
12 may be waiting on a response to this message 72. If processing is
waiting, then some identification of that processing is sent in source
reflection ID. When destination node 14 prepares a response, it returns
this ID in the destination reflection ID field 78 of message 72. The
distributed data manager 52 at destination node 14 may be waiting on this
message 72. That processing is identified by destination reflection ID 78
of this message 72. This information is used at destination node 14 to
determine locking requirements and in notifying the waiting DDM 52 when
this message 72 has been received and successfully processed at
destination node 14.
Referring now to FIG. 4, description will be given of the fundamental quark
processing unit 82.
Request quark 84 is analyzed by procedures defined by one or more of the
three tables: conflict determination (CONFON) 86; request reflection
(RESPON) 88; and response generation (RESPRN) 90.
CONFON 86 determines whether or not there are conflicts for a given request
quark 84. The data item key in request quark 84 is used to select those
dipole halves which describe the same data item for nodes other than the
requesting node. Each dipole half for another node is compared to the
desired dipole half for the requesting node. If there is a conflict, then
conflict quark 92 is generated to be sent to the other node.
If any conflicts were found, then quark processing unit 82 waits 94 until
notified that all conflicts 92 have been resolved before further
processing request quark 84.
RESPON 88 determines whether or not any dipole halves for other nodes need
to be modified to reflect the processing of quark 84 from the requesting
node. For example, if request quark 84 is accompanied by a new data item
value, then the dipole halves for all other nodes with a replica of that
data item may need to be marked to indicate that those other nodes now
have worse data than this node.
RESPRN 90 changes the dipole half for the requesting node to reflect
processing of request quark 84 and generates response quark 96.
Referring now to FIG. 5, a description will be given of an overview of the
processing of request, conflict, and response quarks for a transaction
request (data call) occurring at transaction node 10. The process is
initiated by receipt at a transaction node 10 of a data call from an
application 24 at node 10 for access to a data item, the data call
including the data item key and the currency requirements.
DL/I calls GU, GHU, and ISRT are data calls in the exemplary embodiment.
Quark processing is not required for DL/I calls REPL, DLET, GNP, and GHNP
calls since DL/I requires that they be immediately preceded by one of the
above data calls. (The DL/I calls GN and GHN are not supported since they
do not necessarily include a key K.)
In addition, for each data item modified by the application program, the
equivalent of a data call called local advice (LADV) is generated by the
DDM at the time the modification is committed by the Data Base Manager.
Quark processing must be performed for each such LADV.
The data call is converted into a request quark 100, which is processed
through quark processing unit 102 executed by data distribution manager 48
to ensure that the requested data is held at the required currency in data
file 36 of this transaction node 10. Conflict quarks 104 are generated and
processed as required at other nodes (including 12) to resolve conflicting
data states, with a conflict quark 104 generated for each conflicting
dipole half identified by CONFON 106.
Each conflict quark 104, outgoing from node 10 in message 105, becomes a
request quark (such as 108) when received at another node (such as
destination node 12). There the second request quark 108 is processed
through an instance 110 of the quark processing unit. This may require the
generation of still further conflict quarks for communication to still
other nodes, as represented by links 112, where the process recurses. When
a response represented by links 114 is received for each of the still
further conflict quarks, the dipole half at other node 12 for node 10 is
modified by RESPRN 134 to remove the original conflict between node 10 and
node 12 (detected by CONFON 106), whereupon RESPRN 134 generates response
quark 116. Since the response desired indicator (RDI) is set in request
quark 108, derived from conflict quark 104, a response quark 116 is
created and included in message 118 back to the source node 10.
If destination node 12 for this message 105 determines that an additional
interchange is required to complete processing, then it sets the response
desired indicator in response quark 116. A (response) message 118 is
returned regardless of the indicator in quark 104. When received at the
transaction node 10, response quark 116 becomes an instance of request
quark 120, with one such instance required for each conflict quark 104
sent out from transaction node 10. Each instance of request quark 120 is
processed through an instance 122 of quark processing unit 82, successful
completion of which is required for instance 102 to process to completion.
Referring again to FIG. 5, quark processing units 102, 110, 122 include
CONFON 106, 124, 126, RESPON 128, 130, 132, and RESPRN 134 and 136,
respectively. RESPRN is not included in quark processing unit 102 since
the request quark 100 is generated on behalf of a local application; no
dipole half need be changed and no response quark need be generated. The
first instance 102, 110 of a quark processing unit at a node also includes
wait 138, 140. Such waits 138, 140 are not required in processing
instances of request quark 120 received in response to conflict quark 104
because it is not expected that CONFON 125 will detect any conflicts with
request quark 120; consequently, no conflict quarks will be generated by
CONFON 126, and there will be no wait required. (CONFON 126 could,
therefore, be eliminated but it is included in each instance of processing
unit 122 to facilitate the implementation of the recursive unit 82 (122)
described hereafter in connection with FIG. 7.)
The source reflection indentifier field 76 in message 105 is set by
transaction node 10 to identify the wait 138. When other node 12 responds,
message 118 will include a destination reflection identifier field 78 set
to the above source reflection identifier to identify message 118 to wait
138. Lines 135 represent the collection of all (response) messages 118,
which when received and processed permit completion of processing through
RESPON 128. (As all dipole half changes required at node 10 in resolving
conflict quarks 104 have been made by RESPRN 136, RESPON 128 can initiate
a response to the data call from application 24.)
CONFON 86 (106, 124, 126), RESPON 88 (128, 130, 132), and RESPRN 90 (134,
136) are implemented, in this embodiment, by tables processed according to
procedures to be described hereafter for performing the steps of
identifying and resolving conflicting dipoles. CONFON 86 detects dipole
conflicts between a request quark 84 received at a node (e.g., 10) and the
dipole halves in SAC file 60 at that node. RESPRN 90 processes and updates
as required the dipole half stored in SAC file (e.g., 60) of node 10 for a
requesting node RN, and RESPON 88 processes and updates as required the
dipole halves stored in the SAC file (e.g., 60) of node 10 for other nodes
ON than the requesting node RN.
The preferred embodiment assumes that the two halves of a dipole can be
changed by RESPRN 134, 136 within a single scope of recovery. CICS/VS
OS/VS Version 1 Release 6 is an example of a transaction manager which
provides such capability.
Referring now to FIG. 6, a description will be given | | |