|
Claims  |
|
|
We claim:
1. In a computer-implemented concurrent transaction and readonly query
processing system in which a plurality N of versions of a database are
maintained during a present version period (v), said database including a
plurality of data records that are updated by said transactions and read
by said queries, a plurality of versions of each of said plurality of data
records being distributed among said database versions, each said database
version (i) consisting essentially of said data records existing at the
beginning of the corresponding said version period (i), each said version
period (i) being the time interval between one said database version (i)
and the next subsequent said database version (i+1), said database
versions including a present database version (v), a present stable
database version (v-1) and an oldest database version (v-N+1), wherein N
is a positive integer greater than 2, a method for concurrency control
comprising the unordered steps of:
(a) maintaining a dynamic Uncommitted List (UL) of all said transactions
that are uncommitted;
(b) maintaining for said present database version (v) a present version (v)
of a Non-Stable and Uncommitted List (NSUL) of all said transactions that
are either uncommitted or were committed no earlier than during said
present version period(v);
directing all said read-only queries that arrive during said present
version period (v) to said data records in said present stable database
version (v-1); and
(d) refreshing said present database version (v) by performing the steps of
(d.1) setting a refresh time t after all said read-only queries that
arrived during said oldest version period (v-N+2) are terminated,
(d.2) creating a new version (v+1) of said database containing said data
records existing at said refresh time t,
(d.3) creating a new version (v+1) of said NSUL containing said
transactions listed at said refresh time t in said dynamic UL, and
(d.4) initiating a new version period (v+1) at said refresh time t by
assigning said present database version (v) to be the new present stable
database version (v), whereby database atomicity is maintained buy using
said present stable database version (v) for subsequent read-only query
processing without lock conflict or transaction rollback.
2. The method of claim 1 wherein said refreshing step (d) is performed
before the updating of a data record written by a previous committed
transaction listed in said NSUL.
3. The method of claim 1 wherein said setting step (d.1) further comprises
the step of:
(d.1.1) aborting all said read-only queries that had arrived during said
oldest version period (v-N+2).
4. The method of claim 2 wherein said processing system includes record
accession means for accessing said database records, said record accession
means having at least one record key structure corresponding to each said
record, said structure having a plurality of substructures each
corresponding to one of said N+1 record versions, further comprising the
unordered step of:
(e) updating said record key structure corresponding to said updated
record.
5. The method of claim 2 wherein:
said UL and NSULs comprise bit maps having at least one bit position
corresponding to each said transaction that is still, active or was
terminated after the start of said version period (v-N+2).
6. The method of claim 1 wherein said processing system includes record
accession means for accessing said database records, said record accession
means having at least ne record key structure corresponding to each said
record, said structure having a plurality of substructures each
corresponding to one of said N+1 record versions, further comprising the
step of:
(e) updating said record key structure corresponding to said updated
record.
7. The method of claim 1 wherein:
said UL and said NSULs comprise bit maps having at least one bit position
corresponding to each said transaction that is still active or was
terminated after the start of said oldest version period (v-N+2).
8. In a computer-implemented concurrent transaction and read-only query
processing system in which at least two versions of a database are
maintained during a present version period, said database including a
plurality of data records that are updated by said transactions and read
by said queries, at least two versions of each data record of said
plurality of data records being distributed among said database versions,
each said database version(i) consisting essentially of said data records
existing at the beginning of a corresponding said version period (i), each
said version period (i) being the time interval between one said database
version (i) and the next subsequent said database version (i+1), said
versions including a present version and a stable version, a method for
concurrency control comprising the underordered steps of:
(a) maintaining a dynamic Uncommitted List (UL) of all said transactions
that are uncommitted;
(b) maintaining a Non-Stable and Uncommitted List (NSUL) of all said
transactions that are either uncommitted or were committed during said
present version period;
(c) directing all said read-only queries that arrive during said present
version period to said data records in said stable database version; and
(d) refreshing said present database version by performing the steps of
(d.1) setting a refresh time t after all said read-only queries are
terminated,
(d.2) creating a new version of said database containing said data records
existing at said refresh time t.
(d.3) resetting said NSUL to equal said UL, and
(d.4) initiating a new version period at said refresh time t by assigning
said present database version to be the new stable database version,
whereby database atomicity is maintained by using said stable database
version for read-only query processing without lock conflict or
transaction rollback.
9. The method of claim 8 wherein said refreshing step (d) is performed
before updating a record written by a committed transaction listed in said
NSUL.
10. The method of claim 8, wherein said setting step (d.1) comprises the
step of:
(d.1.1) aborting all said read-only queries that had arrived during the
oldest said version period for which a corresponding database version is
maintained.
11. The method of claim 8 wherein:
said UL and said NSUL comprise bit maps having at least one bit position
corresponding to each said transaction that is still active or was
terminated after said start of the stable version period.
12. The method of claim 8 wherein said processing system includes record
accession means for accessing said database records, said record accession
means having at least one record key structure corresponding to each said
record, said structure having a plurality of substructures each
corresponding to one of said record versions, further comprising the
unordered step of:
(e) updating said record key structure corresponding to said updated
record.
13. In a distributed computer-implemented concurrent transaction and query
processing system having a plurality of nodes in each of which are
maintained portions of up to N versions of a distributed database, said
distributed database including a plurality of data records distributed
among said nodes, each data record of said plurality of data records being
updated by said transactions and read by said queries, a plurality of
versions of each said data record being distributed among said database
versions, each said database version (i) at each node consisting
essentially of the version (i) of said data records existing at said each
node at the beginning of a corresponding said version period (i) for each
said node, each said version period (i) for said each node being the time
interval between one said database version (i) and the next subsequent
said database version (i+1) for said each node, said database versions
including a present database version (v), a present stable database
version (v-1) and an oldest database version (v-N+2), said nodes having
both original and received transactions and queries, wherein N is a
positive integer greater than 2, a method for concurrency control
comprising the unordered steps of:
(a) maintaining within each said node a dynamic Uncommitted List (UL) of
all said transactions that are uncommitted within said each node;
(b) maintaining for said present version period (v) within each said node a
present version (v) of a Non-Stable and Uncommitted List (NSUL) of all
said transactions that are either uncommitted or were committed within
said each node no earlier than during said present version period (v) for
said each node;
(c) maintaining within each node a dynamic Precommit List (PL) of all
precommitted transactions involving said each node;
(d) directing within each node all said queries that originate during said
present version period (v) of the query originating node to said data
records of said present stable database version (v-1) within said each
node;
(e) refreshing said present database version (v) by performing within each
node the steps of
(e.1) setting a refresh time t after all said read-only queries that
arrived at said each node during said oldest version period (v-N+2 are
terminated,
(e.2) creating a new version (v+1) of said NSUL containing said
transactions listed at said refresh time t in said dynamic UL within said
each node, and
(e.3) initiating a new version period (v+1) within said each node at said
refresh time t by assigning said present database version (v) to be the
new present stable database version (v) and creating a new present
database version v+1);
(f) aborting, within said each node, every said transaction having an
originating node version period earlier than said present version period
(v) in said each; and
(g) aborting within said each node all said read-only queries directed to a
database version represented in said dynamic PL for said each node.
14. The method of claim 13 wherein said refreshing step (e) is performed
before the updating within said each node of a data record written by a
committed transaction listed in said NSUL for said each node.
15. The method of claim 14 wherein said setting step (e.1) comprises the
step of:
(e.1.1) aborting within said each node all said read-only queries that had
arrived during said oldest version period (v-N+2) within said each node.
16. The method of claim 13 wherein:
said UL, said PL and said NSULs are implemented as bit maps having a single
bit position corresponding to each said transaction that is still active
or was terminated after the start of said oldest version period (v-N+2).
17. The method of claim 13 wherein said processing system includes record
accession means in each said node for accessing said database records,
said record accession means having at least one record key structure
corresponding to each said record in said each node, said structure having
a plurality of substructures each corresponding to one of said N+1 record
versions, further comprising the unordered step of:
(g) updating said key structure corresponding to said each updated record
within said each node.
18. A computer-implemented concurrent transaction and query processing
system comprising:
multi-version database means for storing up to N versions of a database
during a present version period (v), said database including a plurality
of data records that are updated by said transactions and read by said
queries, a plurality of versions of each data record of said plurality of
data records being distributed among said database versions, each said
database version (i) consisting essentially of said data records existing
at the beginning of a corresponding said version period (i), each said
version period (i) being the time interval between one said database
version (i) and the next subsequent said database version (i+1), said
database versions including a present database version (v) and a present
stable database version (v-1);
Uncommitted List (UL) means coupled to said multi-version database means
for identifying all uncommitted transactions;
Non-Stable and Uncommitted List (NSUL) means coupled to said multi-version
database means for identifying all said transactions that are either
uncommitted or were committed no earlier than during said present version
period (v);
record accession means coupled to said multi-version database means for
locating each said data record version and for identifying the transaction
that created said each data record version;
version initiation means coupled to said multi-version database means for
starting a new version period by assigning said present database version
(v) to be the new present stable database version (v); and
garbage collection means for eliminating superfluous database records in
response to an update by a transaction to a data record written by a
previous committed transaction listed in said NSUL.
19. A computer-implemented concurrent transaction and query processing
system comprising:
multi-version database means for storing two versions of a database, said
database including a plurality of data records that are updated by said
transactions and ready by said queries, up to two versions of each data
record of said plurality of data records being distributed among said two
database versions, each said database version (i) consisting essentially
of said data records existing at the beginning of a corresponding said
version period (i), each said version period (i) being the time interval
between one said database version (i) and the next subsequent said
database version (i+1), said data record versions including a present data
record version and a present stable data record version;
Uncommitted List (UL) means coupled to said multi-version database means
for identifying all uncommitted transactions;
Non-Stable and Uncommitted List (NSUL) means coupled to said multi-version
database means for identifying all said transactions that are either
uncommitted or were committed during said present version period;
record accession means coupled to said multi-version database means for
locating each said record version and for identifying the transaction that
created said each record version;
version initiation means coupled to said multi-version database means for
starting a new version period by assigning said present data record
version to be the new present stable data record version; and
garbage collection means for eliminating superfluous database records in
response to an update by a transaction to a data record written by a
previous committed transaction listed in said NSUL. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
Field of the Invention
This invention relates to methods for concurrent database access by
multiple transactions and queries in general, and more specifically, to
techniques for eliminating transaction waiting during read-only queries by
maintaining multiple versions of records.
Description of the Related Art
The simultaneous or concurrent execution of transactions that update a few
data more or less randomly and transactions that access large numbers of
data is a well-known problem in the database systems art. A transaction
may read from the database, write to the database, or both. Because
accessing transactions must lock records to ensure consistency, locking
conflicts can arise between concurrent transactions. Upon such a conflict,
one of the transactions must wait. The locking conflict is logical and the
database management system solution is limited to serializing the
transactions. However, other solutions to the wait-on-conflict problem are
possible if the extensive transaction is limited to a read-only
transaction or query.
When several transactions execute concurrently in the same database, the
consistency of data may no longer be preserved. Thus, the system must
control the interaction among the concurrent transactions. This control is
achieved through a concurrency control procedure that exploits and
guarantees transactional consistency. Because a transaction is a unit that
preserves consistency, any schedule produced by a concurrency control
scheme for processing a set of transactions concurrently must be
computationally equivalent to a schedule produced by executing the same
transactions serially in some order. This property is usually guaranteed
by maintaining serializability in a manner known in the art.
To ensure serializability, any of a variety of different concurrency
control methods may be used. In general, the concurrency control methods
known in the art ensure serializability by either delaying an operation or
aborting the transaction that issued the operation. The classes of methods
well-known in the art include "locking protocols", "timestamp ordering"
schemes, "validation" techniques, and "multi-version" schemes. For a
survey of the related art, reference is made, for example, to H. F. Korth,
et al, "Database System Concepts", Chapter 11, Concurrency Control,
McGraw-Hill, New York, 1986.
Selection of a concurrency control scheme should reflect consideration of
the desired balance of several conflicting criteria. The selected method
must guard against cascading rollbacks during recovery without reducing
system parallelism more than absolutely necessary. For instance, a typical
multi-version concurrency control scheme assumes that each "write"
operation creates a new version of the updated record. When a "read"
operation is issued to the updated record, the system selects one of the
record versions to be read. The concurrency control scheme must ensure
that the version selection is made in a manner that guarantees
serializability. This can be accomplished through the use of timestamps,
for instance. In such a system, a read operation always succeeds while a
write operation may result in transaction rollbacks.
The application of such multi-version techniques is useful for reducing the
locking required to ensure serializability of concurrent transactions. Any
locking reduction will beneficially reduce transaction wait states and
thereby increase system parallelism. Thus, the art is replete with
techniques for locking reduction, including other multi-version
approaches.
R. Bayer, et al disclose a complex multi-version support mechanism (Bayer,
R., Heller, H., and Reiser, A.,
"Parallelism and Recovery in Database Systems", Transactions on Database
Systems, Vol. 5, No. 2, Jun. 1980). Bayer, et al require a continuous and
costly maintenance of a graph that tracks the inter-transaction
dependencies to avoid both non-serializable transaction executions and
deadlocks. Bayer, et al require even read-only transactions or queries to
lock the data objects being read, although the read-locks are granted
without wait states. Query transactions also incur the cost of analyzing
the dependency graph to locate the object version that must be read.
Sometimes non-query updating transactions may be rolled back to preserve
data atomicity. Read-only queries are never rolled back.
Because only two versions of any data item are maintained by Bayer, et al,
the commit of an updating transaction may be delayed by a read-only query
that is actively reading the earlier record version targeted for update.
The Bayer, et al approach also incurs additional lock-related overhead for
updating transactions compared to the overhead required in
non-multi-version concurrency control schemes. They do not consider space
management, structures required to track locations of different object
versions, partial rollbacks, and incremental versioning, which are all
problematic issues known for multi-versioning concurrency control schemes.
D. Reed proposes the use of timestamps for synchronization (Reed, D.,
"Naming and Synchronization in a Decentralized Computer System", PhD
Thesis, Technical Report MIT/LTS/TR-205, MIT, Sept. 1978). Reed's method
requires all read-only queries to update the timestamp control information
associated with the data objects being read. His method permits creation
of an unlimited number of object versions, thereby raising potential space
management problems. Reed does not consider the garbage collection
problem. Reading of records may be delayed and updating transactions may
be aborted to avoid serializability violations.
Stearns, et al propose a similar method that may block or abort read-only
queries under some circumstances and may delay the committing of updating
transactions until termination of read-only queries accessing previous
object versions (Stearns, R. E., Rosenkrantz, D. J., "Distributed Database
Concurrency Controls Using Before-Values, Proc. SIGMOD International
Conference on Management of Data, Ann Arbor, Apr. 1981).
A. Chan, et al propose a method that permits creation of any number of
object versions, thereby raising potential space management problems
(Chan, A., Fox, S., Lin, W-T., Nori, A., and Ries, D., "The Implementation
of an Integrated Concurrency Control and Recovery Scheme", Proc. SIGMOD
International Conference on Management of Data, Orlando, Jun. 1982.).
Chan, et al provide versioning at the page level, thus requiring the
transfer of an entire page to a slot in their "version pool" even if only
a small part of the page is changed. Besides the path length overhead,
this approach unnecessarily consumes buffer and disk space. Moreover, if a
read-only query accesses a logical page having an uncommitted version,
then the query must search at least one additional page before locating
the committed page version needed, thereby increasing I/0 overhead. This
occurs because their different page versions are backchained and each
version is labelled with the identifier code of the creating transaction.
A read-only query is required to read the most recent version of the page
created by an updating transaction that is committed at the time the
read-only query has begun. Because of this, every new read-only query must
be associated with a Committed Transaction List ("CTL").
This Chan, et al page level versioning method uses a "version pool",
guaranteeing that clustered access to physically contiguous pages for
read-only queries, especially long ones, is not possible. Their garbage
collection method may waste space in "version pool" and every updating
transaction must track the slots used in the "version pool". Chan, et al
clearly support only page level locking in their versioning scheme. Page
level locking, especially for index data, normally leads to an intolerably
low level of concurrency. Chan, et al do not actually discuss how their
versioning is done for index data. Their method requires all modifications
made by an updating transaction to be forced to disk at commit time.
Because versioning is being done at the page level and because the before
images of modified records are not logged, the previous version of the
modified page must also be forced to disk before the modified version of
the page with uncommitted changes is put on disk. These are a costly
operations.
Chan and Gray later extend the Chan, et al scheme to the case of
distributed read-only queries (Chan, A., and Gray, R., "Implementing
Distributed Read-Only Transactions", IEEE Transactions on Software
Engineering, Vol. SE-11, No. 2, 1985). The Chan and Gray algorithm causes
the CTL of a given site to be transmitted in all precommit and commit
messages sent by that site. Thus, read votes cannot be used to avoid the
second phase of commit processing as is possible in other distributed
database management systems known in the art (e.g., R* DDBMS). Sites that
receive CTLs from other sites merge them with their own CTLs to create new
versions of their own CTLs. To avoid aborting a read-only query because of
premature garbage collection of the data needed by the query at the remote
site, the set of retrieval sites that the ready-only query will visit must
be known in advance and each of those sites must first be queried for its
CTL before the query transaction starts. The union of the received CTLs
must then be transmitted to all retrieval sites for use by the query to
determine the data object versions to be read. Once the retrieval sites
communicate their CTLs, they must be somehow prevented from
garbage-collecting the data snapshot defined by the CTLs. This Chan and
Gray algorithm incurs additional overhead to maintain CTLs in stable
storage. These CTLs could become quite large. Chan and Gray do not suggest
how premature garbage collection can be prevented or even detected.
Reference is also made to the disclosure by J. Robinson, et al of a
technique for concurrency control using on-demand versioning to eliminate
lock contention when an updating transaction must lock a record in a page
required by a read-only query (Robinson, J., Thomasian, A., and Yu, P.S.,
"Elimination of Lock Contention in Relational Databases Accessed by
Read-Only Queries and On-Line Update Transaction", I.B.M. Technical
Bulletin 06-88, p. 180-85, Jun. 1988). Robinson, et al disclose a method
wherein a page version containing the record accessed by a read-only query
is prepared for the query and the global copy of the page is made
available to the updating transaction. Versioning granularity can be
varied and the version pages may be stored in a shared buffer pool along
with other data. Different versions of the same page may co-exist in the
buffer for queries started at different times. Robinson, et al do not
consider the problems of efficiently controlling the various versions or
of the efficient garbage collection of version data following read-only
query termination.
S. Todd discloses a concurrency system that provides a deadlock-free
environment for use in a distributed database system that tracks multiple
versions of data pages (Todd S., "Concurrency System Suitable for
Distributed Databases", I.B.M. Technical Disclosure Bulletin, 06-78, p.
383-386, Jun. 1978). Todd describes a method for labeling various page
versions to indicate version status as replacement, lock or current. His
method replaces the use of read locks with version control and permits all
read-only queries to access the database without wait states.
In U.S. Pat. No. 4,627,019, Fred K. Ng discloses a versioning technique
that stores an array of access blocks, each block defining the database
location of a version of a relation, only one of which is defined as
current. However, his technique requires a new access block in the access
dictionary (defining a new database location) for every update
transaction. Database access by each of a plurality of database
transactions is permitted only through the relation block associated with
that particular transaction. Ng's technique appears to be intended for use
in systems with few updating transactions and many read-only queries.
In U.S. Pat. No. 4,853,843, Denise J. Ecklund discloses a system for
merging virtual partitions of a distributed database where various
database versions exist among the partitions. In U.S. Pat. No. 4,875,159,
Richard W. Cary, et al disclose a version management system for
synchronizing two versions in a multi-processor system. Both the Ecklund
and Cary, et al techniques are intended for coordinating versioning
control among independent processors in a distributed data processing
system.
There is a clearly-felt need in the art to improve the optimization of
tradeoffs between transaction concurrency and the delays arising from lock
conflicts in concurrent processing systems. The related unresolved
problems and deficiencies are clearly felt in the art and are solved by
the present invention in the manner described below.
SUMMARY OF THE INVENTION
The present invention is a method for maintaining at least one
"stable-state" version of the database for use by read-only transactions
or queries concurrently with updating transactions. A query need only
access a consistent state of the database and often does not require that
consistent state to reflect the current database state at the time the
query terminates. Thus, maintaining a "stable-state" version of the
database for use by queries permits extensive read-only queries to access
consistent data without the associated read locks forcing concurrent
updating transactions to wait for query completion and without forcing a
query to wait for commit of updating transactions. When no query is
active, the "stable-state" version can be updated to reflect a later
committed database state. Occasionally, when a read-only query must access
the most recent database version, it may be relabeled and restarted as an
updating transaction and thereby acquire the necessary read locks on all
data.
The concept of a flexible "version period" for a database processing system
is introduced. Because transactions move the state of the database along
the time axis, the time axis may be divided into a series of "version
periods", which are numbered sequentially. An updating transaction is
assigned the version number of the version period during which the
transaction commits. A read-only query is assigned the version number of
the version period during which it arrives. An updating transaction is
permitted to acquire read and write locks on the most current version of
each record accessed. A read-only query is directed to read records from a
present "stable-state" version of the database. The "stable-state" version
accessed by a read-only query remains unchanged during the life of the
query.
A new version of a record is created whenever any updating transaction
writes new data to a record that was created by a previous committed
transaction. An arbitrarily long chain of increasingly outdated
"stable-state" versions can be created and maintained. Each "stable-state"
version need be retained only as long as necessary to terminate all active
queries that are using that particular version.
The method of the present invention is implemented by using a novel
"version-block" record key structure to provide a direct access path to
the associated record data. Version control and tracking is implemented by
maintaining several transaction identification lists. First, in main
memory, the system maintains a List of Uncommitted update transactions,
(UL), and a List of committed but Not yet Stable state update
transactions, (NSL). Actually, the union of UL and NSL, which is NSUL, is
maintained as the list of interest instead of NSL. A separate NSUL is kept
for each aging stable-state database version Because the record key
structure provides a direct path to each record version and also
identifies the creating transaction for each record version, the record
version can be defined as the version of the creating transaction (that
is, the version period during which the creating transaction committed).
Accordingly, the record key structure of this invention enables the system
to quickly redetermine the currency of a record version by comparing the
associated creating transaction with the two transaction lists UL and
NSUL. This efficient redetermination method allows the system to
dynamically track all database versions as they flow back along the
timeline.
The present stable-state version can be refreshed by switching the present
version period to a new later value. This is done, in part, by resetting
the oldest NSUL to UL to flush out all non-stable committed transactions.
The switching can occur as soon as all read-only queries using the oldest
stable-state database version are terminated. This refreshing method
brings each recently committed version of the database into atomic
visibility for subsequent read-only queries.
The method of the present invention is applicable to distributed processing
systems and is dynamic in that it can include from two to an indefinitely
large number of database versions. The application of this method to a
non-distributed database processing system is a special case of the
distributed system application and there is no significant increase in
overhead incurred by using the more generalized distributed concurrency
control embodiment for the local non-distributed application. Also, there
is little additional overhead cost in using the N-version embodiment to
maintain merely two versions, which is the simplest case of interest.
These two versions are the "present" version and the "present stable"
version.
Thus, it is an object of this invention to provide a consistent
"stable-state" of the database for use by read-only transactions, herein
called queries, without requiring read locks, thereby avoiding
lock-conflicts with concurrently processed updating transactions having
both read and write locks on the database. It is another object of this
invention to support efficient distributed database access and to permit
record-level locking.
It is an advantage of the present invention that recovery can be performed
by using shadow pages and logs, or by using write-ahead logging. It is a
feature of the present invention that all read-only queries not requiring
the latest state of the data will not acquire any long duration locks,
thereby avoiding all waits caused by lock-conflicts with concurrent
updating transactions. It is another feature of this invention that
read-only queries do not update any control information (e.g. timestamps)
associated with the data records queried. There is no need to assign
timestamps to queries or transactions because the serialization order
among conflicting transactions is determined dynamically by the method of
the present invention.
It is both an object and an advantage of this invention that no forced
roll-back of transactions is necessary to accommodate system support of
multiple database versions and such roll-backs are used only in event of
deadlocks between such transactions. It is a feature of this invention
that the maximum number of versions retained by the system is dynamically
adjustable. Another feature of this invention is that in-place updating of
data on disk is possible even before transaction termination and there is
no need for deferred update. There is also no need to force to disk all
modified pages at commit time.
An advantage of this invention is that incremental versioning can be
performed to reduce storage requirement increases arising from database
version maintenance. Also, recovery log volume is reduced because the
"before" image of updated fields (or of the complete record during a
deletion) need not be logged, except when the same record is being updated
more than once by a single transaction.
It is yet another object of this invention to minimize the storage volume
required to maintain multiple versions. It is an advantage of this
invention that the concept of version periods reduces the volume of
storage required because, unlike with other multi-version schemes, a new
database version need not be created at each record update.
Another advantage of this invention is that no beginning transaction or
query is required to declare the set of data items that it will read or
write. It is yet another object and advantage of this invention that
read-only queries need never be forcibly rolled back by the system, except
in a distributed database processing system as an alternative to unwanted
delays in version switching. Because versions may be switched only upon
termination of the oldest active read-only queries, it is an advantage of
this invention that any read-only query may be forcibly terminated and
restarted as an updating transaction having read and write locking
capability, thereby permitting version switching on demand without
rollback or wait states.
It is another advantage of this invention that asynchronous garbage
collection can be performed in the background. Garbage collection is
efficiently controlled by means of bit-mapped transaction tables and a
"version block" record key structure.
The foregoing, together with other features and advantages of the present
invention, will become more apparent when referring to the following
specifications, claims and the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention, reference is
now made to the following detailed description of the embodiments
illustrated in the accompanying drawings, wherein:
FIG. 1 illustrates the concept of version periods on a time line;
FIG. 2 shows a schematic illustration of the version block record key
structure of this invention;
FIG. 3 illustrates a bit-mapped embodiment of the UL and NSUL transaction
tables;
FIG. 4 provides a PASCAL-type pseudocode embodiment of the version block
data structure;
FIGS. 5A-5C provide a PASCAL-type pseudocode embodiment of the record
update method of this invention;
FIG. 6 provides PASCAL-type pseudocode embodiment of the method of this
invention for locking a version period in shared mode;
FIG. 7 provides a PASCAL-type pseudocode embodiment of the version
switching method of this invention for a non-distributed N-version system;
FIGS. 8A-8B provide a PASCAL-type pseudocode embodiment of the two-phase
commit protocol for updating transactions in the general distributed
N-version embodiment of the method of this invention;
FIG. 9 provides a PASCAL-type pseudocode embodiment of the method of this
invention for starting a read-only query in a distributed N-version
system;
FIG. 10 provides a PASCAL-type pseudocode embodiment of the method of this
invention for execution of a read-only query in a distributed N-version
system; and
FIG. 11 provides a PASCAL-type pseudocode embodiment of the version
switching method of this invention for a distributed N-version system.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
FIG. 1 illustrates the database versions and transaction types concepts as
used herein. The ongoing series of transactions moves the state of the
database along the time axis 12. Time axis 12 is divided into version
periods numbered (v-2), (v-1), (v), and so forth, as exemplified by
present version period 14. For the purposes of this disclosure, a
read-and-write transaction is denoted by a T and denominated an updating
transaction. A read-only transaction is denoted by a Q and denominated a
query. A type T transaction is of version (v) if and only if T commits
during version period (v). A type Q query is of version (v) if and only if
it arrives during version period (v). A query of version (v) sees all of
the results of transactions of version less than (v). That is, as shown in
FIG. 1, all type Q queries of version (v) will see that database state
committed at the beginning of version period (v). Thus, a query 16 is
directed to the database as it existed at the beginning of present version
period 14 and query 16 is said to have present version (v). Another query
18 is said to have version (v-1) because it refers to the database as it
existed at the beginning of version period (v-1).
Periodically, the database system switches to a new version period,
allowing the queries to see more recent changes to the database A
transaction 20 is said to have version (v-1) because it committed during
the version period (v-1). Conversely, any transaction that arrives during
present version period 14 and remains uncommitted until after the end of
present version period 14 must have a version number greater than (v).
The method of this invention is sufficiently general for application to
distributed database systems and can maintain an indefinitely large (N)
number of versions of such a database. However, for purposes of clear and
simple exposition of this invention, the operation of a two version
embodiment is first presented. This two version embodiment allows only two
kinds of transactions to be active at the same time. These are types T and
Q transactions of the current version (v). In this embodiment, the system
can switch to a new version period only when no query is active. The more
general N version embodiment of the method of this invention removes this
restriction and allows the version period to be switched immediately while
the older version periods are gradually phased out. Thus, the N version
embodiment permits queries of the last (N-1) version periods to remain
active concurrently with type T transactions. The concepts necessary to an
understanding of the N version embodiment will be appreciated by referring
to the following discussion of the two version embodiment.
A first element of this invention requires the system to maintain a record
key structure for each record in the database (see FIG. 2). The database
is assumed to be a series of records, each of which is identified by such
a key, which can be a logical key or any other suitable type of record
identifier. It is also assumed that type Q queries, which are read-only
and which do not acquire locks, are identified as such when started and
that all type T transactions use read and write locks to synchronize among
themselves. With these assumptions, the two version embodiment was found
to require no more than three simultaneous record versions These three
versions are shown in FIG. 2 as a first version, PTR(1), representing the
uncommitted state of the record, a second version, PTR(2), representing
the last committed record value that is not in the stable state, and a
third version, | | |