WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multiple version database concurrency control system    

Get related patents on CD
United States Patent5280612   
Link to this pagehttp://www.wikipatents.com/5280612.html
Inventor(s)Lorie; Raymond A. (San Jose, CA); Mohan; Chandrasekaran (San Jose, CA); Pirahesh; Mir H. (San Jose, CA)
AbstractAn improved concurrency control system for application to a distributed concurrent transaction and query processing system using multi-version database records to overcome delays arising from lock conflicts. Read-only queries are afforded a consistent "stable state" of the database during the life of the query. Updating transactions requiring locks can proceed without waiting for the termination of long queries. At least two database versions are necessary, although availability of more versions permits long read-only queries to phase-out over time without forcing new queries to use aged "stable-state" data and without roll-back. Read-only queries can be terminated and converted to locking transactions to permit an update of the "stable state" database version before the queries would normally terminate. A novel record key structure having a plurality of substructures corresponding to the several database versions is used to access database records. Rapid selection of proper record version and efficient version tracking and updating is effected using several bit-mapped transaction index tables.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Inventor     Lorie; Raymond A. (San Jose, CA); Mohan; Chandrasekaran (San Jose, CA); Pirahesh; Mir H. (San Jose, CA)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Company News
Publication Date     January 18, 1994
Application Number     07/801,769
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     November 26, 1991
US Classification     707/8 707/9 707/203 707/206 710/242
Int'l Classification     G06F 015/40 G06F 012/00
Examiner     Kulik; Paul V.
Assistant Examiner    
Attorney/Law Firm     Baker, Maxham, Jester & Meador
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/600 395/650 395/425
Patent Tags     multiple version database concurrency control
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5043876
Terry
707/201
Aug,1991

[0 after 0 votes]
4875159
Cary
707/203
Oct,1989

[0 after 0 votes]
4853843
Ecklund
707/203
Aug,1989

[0 after 0 votes]
4627019
Ng
707/8
Dec,1986

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. In a computer-implemented concurrent transaction and readonly query processing system in which a plurality N of versions of a database are maintained during a present version period (v), said database including a plurality of data records that are updated by said transactions and read by said queries, a plurality of versions of each of said plurality of data records being distributed among said database versions, each said database version (i) consisting essentially of said data records existing at the beginning of the corresponding said version period (i), each said version period (i) being the time interval between one said database version (i) and the next subsequent said database version (i+1), said database versions including a present database version (v), a present stable database version (v-1) and an oldest database version (v-N+1), wherein N is a positive integer greater than 2, a method for concurrency control comprising the unordered steps of:

(a) maintaining a dynamic Uncommitted List (UL) of all said transactions that are uncommitted;

(b) maintaining for said present database version (v) a present version (v) of a Non-Stable and Uncommitted List (NSUL) of all said transactions that are either uncommitted or were committed no earlier than during said present version period(v);

directing all said read-only queries that arrive during said present version period (v) to said data records in said present stable database version (v-1); and

(d) refreshing said present database version (v) by performing the steps of

(d.1) setting a refresh time t after all said read-only queries that arrived during said oldest version period (v-N+2) are terminated,

(d.2) creating a new version (v+1) of said database containing said data records existing at said refresh time t,

(d.3) creating a new version (v+1) of said NSUL containing said transactions listed at said refresh time t in said dynamic UL, and

(d.4) initiating a new version period (v+1) at said refresh time t by assigning said present database version (v) to be the new present stable database version (v), whereby database atomicity is maintained buy using said present stable database version (v) for subsequent read-only query processing without lock conflict or transaction rollback.

2. The method of claim 1 wherein said refreshing step (d) is performed before the updating of a data record written by a previous committed transaction listed in said NSUL.

3. The method of claim 1 wherein said setting step (d.1) further comprises the step of:

(d.1.1) aborting all said read-only queries that had arrived during said oldest version period (v-N+2).

4. The method of claim 2 wherein said processing system includes record accession means for accessing said database records, said record accession means having at least one record key structure corresponding to each said record, said structure having a plurality of substructures each corresponding to one of said N+1 record versions, further comprising the unordered step of:

(e) updating said record key structure corresponding to said updated record.

5. The method of claim 2 wherein:

said UL and NSULs comprise bit maps having at least one bit position corresponding to each said transaction that is still, active or was terminated after the start of said version period (v-N+2).

6. The method of claim 1 wherein said processing system includes record accession means for accessing said database records, said record accession means having at least ne record key structure corresponding to each said record, said structure having a plurality of substructures each corresponding to one of said N+1 record versions, further comprising the step of:

(e) updating said record key structure corresponding to said updated record.

7. The method of claim 1 wherein:

said UL and said NSULs comprise bit maps having at least one bit position corresponding to each said transaction that is still active or was terminated after the start of said oldest version period (v-N+2).

8. In a computer-implemented concurrent transaction and read-only query processing system in which at least two versions of a database are maintained during a present version period, said database including a plurality of data records that are updated by said transactions and read by said queries, at least two versions of each data record of said plurality of data records being distributed among said database versions, each said database version(i) consisting essentially of said data records existing at the beginning of a corresponding said version period (i), each said version period (i) being the time interval between one said database version (i) and the next subsequent said database version (i+1), said versions including a present version and a stable version, a method for concurrency control comprising the underordered steps of:

(a) maintaining a dynamic Uncommitted List (UL) of all said transactions that are uncommitted;

(b) maintaining a Non-Stable and Uncommitted List (NSUL) of all said transactions that are either uncommitted or were committed during said present version period;

(c) directing all said read-only queries that arrive during said present version period to said data records in said stable database version; and

(d) refreshing said present database version by performing the steps of

(d.1) setting a refresh time t after all said read-only queries are terminated,

(d.2) creating a new version of said database containing said data records existing at said refresh time t.

(d.3) resetting said NSUL to equal said UL, and

(d.4) initiating a new version period at said refresh time t by assigning said present database version to be the new stable database version, whereby database atomicity is maintained by using said stable database version for read-only query processing without lock conflict or transaction rollback.

9. The method of claim 8 wherein said refreshing step (d) is performed before updating a record written by a committed transaction listed in said NSUL.

10. The method of claim 8, wherein said setting step (d.1) comprises the step of:

(d.1.1) aborting all said read-only queries that had arrived during the oldest said version period for which a corresponding database version is maintained.

11. The method of claim 8 wherein:

said UL and said NSUL comprise bit maps having at least one bit position corresponding to each said transaction that is still active or was terminated after said start of the stable version period.

12. The method of claim 8 wherein said processing system includes record accession means for accessing said database records, said record accession means having at least one record key structure corresponding to each said record, said structure having a plurality of substructures each corresponding to one of said record versions, further comprising the unordered step of:

(e) updating said record key structure corresponding to said updated record.

13. In a distributed computer-implemented concurrent transaction and query processing system having a plurality of nodes in each of which are maintained portions of up to N versions of a distributed database, said distributed database including a plurality of data records distributed among said nodes, each data record of said plurality of data records being updated by said transactions and read by said queries, a plurality of versions of each said data record being distributed among said database versions, each said database version (i) at each node consisting essentially of the version (i) of said data records existing at said each node at the beginning of a corresponding said version period (i) for each said node, each said version period (i) for said each node being the time interval between one said database version (i) and the next subsequent said database version (i+1) for said each node, said database versions including a present database version (v), a present stable database version (v-1) and an oldest database version (v-N+2), said nodes having both original and received transactions and queries, wherein N is a positive integer greater than 2, a method for concurrency control comprising the unordered steps of:

(a) maintaining within each said node a dynamic Uncommitted List (UL) of all said transactions that are uncommitted within said each node;

(b) maintaining for said present version period (v) within each said node a present version (v) of a Non-Stable and Uncommitted List (NSUL) of all said transactions that are either uncommitted or were committed within said each node no earlier than during said present version period (v) for said each node;

(c) maintaining within each node a dynamic Precommit List (PL) of all precommitted transactions involving said each node;

(d) directing within each node all said queries that originate during said present version period (v) of the query originating node to said data records of said present stable database version (v-1) within said each node;

(e) refreshing said present database version (v) by performing within each node the steps of

(e.1) setting a refresh time t after all said read-only queries that arrived at said each node during said oldest version period (v-N+2 are terminated,

(e.2) creating a new version (v+1) of said NSUL containing said transactions listed at said refresh time t in said dynamic UL within said each node, and

(e.3) initiating a new version period (v+1) within said each node at said refresh time t by assigning said present database version (v) to be the new present stable database version (v) and creating a new present database version v+1);

(f) aborting, within said each node, every said transaction having an originating node version period earlier than said present version period (v) in said each; and

(g) aborting within said each node all said read-only queries directed to a database version represented in said dynamic PL for said each node.

14. The method of claim 13 wherein said refreshing step (e) is performed before the updating within said each node of a data record written by a committed transaction listed in said NSUL for said each node.

15. The method of claim 14 wherein said setting step (e.1) comprises the step of:

(e.1.1) aborting within said each node all said read-only queries that had arrived during said oldest version period (v-N+2) within said each node.

16. The method of claim 13 wherein:

said UL, said PL and said NSULs are implemented as bit maps having a single bit position corresponding to each said transaction that is still active or was terminated after the start of said oldest version period (v-N+2).

17. The method of claim 13 wherein said processing system includes record accession means in each said node for accessing said database records, said record accession means having at least one record key structure corresponding to each said record in said each node, said structure having a plurality of substructures each corresponding to one of said N+1 record versions, further comprising the unordered step of:

(g) updating said key structure corresponding to said each updated record within said each node.

18. A computer-implemented concurrent transaction and query processing system comprising:

multi-version database means for storing up to N versions of a database during a present version period (v), said database including a plurality of data records that are updated by said transactions and read by said queries, a plurality of versions of each data record of said plurality of data records being distributed among said database versions, each said database version (i) consisting essentially of said data records existing at the beginning of a corresponding said version period (i), each said version period (i) being the time interval between one said database version (i) and the next subsequent said database version (i+1), said database versions including a present database version (v) and a present stable database version (v-1);

Uncommitted List (UL) means coupled to said multi-version database means for identifying all uncommitted transactions;

Non-Stable and Uncommitted List (NSUL) means coupled to said multi-version database means for identifying all said transactions that are either uncommitted or were committed no earlier than during said present version period (v);

record accession means coupled to said multi-version database means for locating each said data record version and for identifying the transaction that created said each data record version;

version initiation means coupled to said multi-version database means for starting a new version period by assigning said present database version (v) to be the new present stable database version (v); and

garbage collection means for eliminating superfluous database records in response to an update by a transaction to a data record written by a previous committed transaction listed in said NSUL.

19. A computer-implemented concurrent transaction and query processing system comprising:

multi-version database means for storing two versions of a database, said database including a plurality of data records that are updated by said transactions and ready by said queries, up to two versions of each data record of said plurality of data records being distributed among said two database versions, each said database version (i) consisting essentially of said data records existing at the beginning of a corresponding said version period (i), each said version period (i) being the time interval between one said database version (i) and the next subsequent said database version (i+1), said data record versions including a present data record version and a present stable data record version;

Uncommitted List (UL) means coupled to said multi-version database means for identifying all uncommitted transactions;

Non-Stable and Uncommitted List (NSUL) means coupled to said multi-version database means for identifying all said transactions that are either uncommitted or were committed during said present version period;

record accession means coupled to said multi-version database means for locating each said record version and for identifying the transaction that created said each record version;

version initiation means coupled to said multi-version database means for starting a new version period by assigning said present data record version to be the new present stable data record version; and

garbage collection means for eliminating superfluous database records in response to an update by a transaction to a data record written by a previous committed transaction listed in said NSUL.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

Field of the Invention

This invention relates to methods for concurrent database access by multiple transactions and queries in general, and more specifically, to techniques for eliminating transaction waiting during read-only queries by maintaining multiple versions of records.

Description of the Related Art

The simultaneous or concurrent execution of transactions that update a few data more or less randomly and transactions that access large numbers of data is a well-known problem in the database systems art. A transaction may read from the database, write to the database, or both. Because accessing transactions must lock records to ensure consistency, locking conflicts can arise between concurrent transactions. Upon such a conflict, one of the transactions must wait. The locking conflict is logical and the database management system solution is limited to serializing the transactions. However, other solutions to the wait-on-conflict problem are possible if the extensive transaction is limited to a read-only transaction or query.

When several transactions execute concurrently in the same database, the consistency of data may no longer be preserved. Thus, the system must control the interaction among the concurrent transactions. This control is achieved through a concurrency control procedure that exploits and guarantees transactional consistency. Because a transaction is a unit that preserves consistency, any schedule produced by a concurrency control scheme for processing a set of transactions concurrently must be computationally equivalent to a schedule produced by executing the same transactions serially in some order. This property is usually guaranteed by maintaining serializability in a manner known in the art.

To ensure serializability, any of a variety of different concurrency control methods may be used. In general, the concurrency control methods known in the art ensure serializability by either delaying an operation or aborting the transaction that issued the operation. The classes of methods well-known in the art include "locking protocols", "timestamp ordering" schemes, "validation" techniques, and "multi-version" schemes. For a survey of the related art, reference is made, for example, to H. F. Korth, et al, "Database System Concepts", Chapter 11, Concurrency Control, McGraw-Hill, New York, 1986.

Selection of a concurrency control scheme should reflect consideration of the desired balance of several conflicting criteria. The selected method must guard against cascading rollbacks during recovery without reducing system parallelism more than absolutely necessary. For instance, a typical multi-version concurrency control scheme assumes that each "write" operation creates a new version of the updated record. When a "read" operation is issued to the updated record, the system selects one of the record versions to be read. The concurrency control scheme must ensure that the version selection is made in a manner that guarantees serializability. This can be accomplished through the use of timestamps, for instance. In such a system, a read operation always succeeds while a write operation may result in transaction rollbacks.

The application of such multi-version techniques is useful for reducing the locking required to ensure serializability of concurrent transactions. Any locking reduction will beneficially reduce transaction wait states and thereby increase system parallelism. Thus, the art is replete with techniques for locking reduction, including other multi-version approaches.

R. Bayer, et al disclose a complex multi-version support mechanism (Bayer, R., Heller, H., and Reiser, A.,

"Parallelism and Recovery in Database Systems", Transactions on Database Systems, Vol. 5, No. 2, Jun. 1980). Bayer, et al require a continuous and costly maintenance of a graph that tracks the inter-transaction dependencies to avoid both non-serializable transaction executions and deadlocks. Bayer, et al require even read-only transactions or queries to lock the data objects being read, although the read-locks are granted without wait states. Query transactions also incur the cost of analyzing the dependency graph to locate the object version that must be read. Sometimes non-query updating transactions may be rolled back to preserve data atomicity. Read-only queries are never rolled back.

Because only two versions of any data item are maintained by Bayer, et al, the commit of an updating transaction may be delayed by a read-only query that is actively reading the earlier record version targeted for update. The Bayer, et al approach also incurs additional lock-related overhead for updating transactions compared to the overhead required in non-multi-version concurrency control schemes. They do not consider space management, structures required to track locations of different object versions, partial rollbacks, and incremental versioning, which are all problematic issues known for multi-versioning concurrency control schemes.

D. Reed proposes the use of timestamps for synchronization (Reed, D., "Naming and Synchronization in a Decentralized Computer System", PhD Thesis, Technical Report MIT/LTS/TR-205, MIT, Sept. 1978). Reed's method requires all read-only queries to update the timestamp control information associated with the data objects being read. His method permits creation of an unlimited number of object versions, thereby raising potential space management problems. Reed does not consider the garbage collection problem. Reading of records may be delayed and updating transactions may be aborted to avoid serializability violations.

Stearns, et al propose a similar method that may block or abort read-only queries under some circumstances and may delay the committing of updating transactions until termination of read-only queries accessing previous object versions (Stearns, R. E., Rosenkrantz, D. J., "Distributed Database Concurrency Controls Using Before-Values, Proc. SIGMOD International Conference on Management of Data, Ann Arbor, Apr. 1981).

A. Chan, et al propose a method that permits creation of any number of object versions, thereby raising potential space management problems (Chan, A., Fox, S., Lin, W-T., Nori, A., and Ries, D., "The Implementation of an Integrated Concurrency Control and Recovery Scheme", Proc. SIGMOD International Conference on Management of Data, Orlando, Jun. 1982.). Chan, et al provide versioning at the page level, thus requiring the transfer of an entire page to a slot in their "version pool" even if only a small part of the page is changed. Besides the path length overhead, this approach unnecessarily consumes buffer and disk space. Moreover, if a read-only query accesses a logical page having an uncommitted version, then the query must search at least one additional page before locating the committed page version needed, thereby increasing I/0 overhead. This occurs because their different page versions are backchained and each version is labelled with the identifier code of the creating transaction. A read-only query is required to read the most recent version of the page created by an updating transaction that is committed at the time the read-only query has begun. Because of this, every new read-only query must be associated with a Committed Transaction List ("CTL").

This Chan, et al page level versioning method uses a "version pool", guaranteeing that clustered access to physically contiguous pages for read-only queries, especially long ones, is not possible. Their garbage collection method may waste space in "version pool" and every updating transaction must track the slots used in the "version pool". Chan, et al clearly support only page level locking in their versioning scheme. Page level locking, especially for index data, normally leads to an intolerably low level of concurrency. Chan, et al do not actually discuss how their versioning is done for index data. Their method requires all modifications made by an updating transaction to be forced to disk at commit time. Because versioning is being done at the page level and because the before images of modified records are not logged, the previous version of the modified page must also be forced to disk before the modified version of the page with uncommitted changes is put on disk. These are a costly operations.

Chan and Gray later extend the Chan, et al scheme to the case of distributed read-only queries (Chan, A., and Gray, R., "Implementing Distributed Read-Only Transactions", IEEE Transactions on Software Engineering, Vol. SE-11, No. 2, 1985). The Chan and Gray algorithm causes the CTL of a given site to be transmitted in all precommit and commit messages sent by that site. Thus, read votes cannot be used to avoid the second phase of commit processing as is possible in other distributed database management systems known in the art (e.g., R* DDBMS). Sites that receive CTLs from other sites merge them with their own CTLs to create new versions of their own CTLs. To avoid aborting a read-only query because of premature garbage collection of the data needed by the query at the remote site, the set of retrieval sites that the ready-only query will visit must be known in advance and each of those sites must first be queried for its CTL before the query transaction starts. The union of the received CTLs must then be transmitted to all retrieval sites for use by the query to determine the data object versions to be read. Once the retrieval sites communicate their CTLs, they must be somehow prevented from garbage-collecting the data snapshot defined by the CTLs. This Chan and Gray algorithm incurs additional overhead to maintain CTLs in stable storage. These CTLs could become quite large. Chan and Gray do not suggest how premature garbage collection can be prevented or even detected.

Reference is also made to the disclosure by J. Robinson, et al of a technique for concurrency control using on-demand versioning to eliminate lock contention when an updating transaction must lock a record in a page required by a read-only query (Robinson, J., Thomasian, A., and Yu, P.S., "Elimination of Lock Contention in Relational Databases Accessed by Read-Only Queries and On-Line Update Transaction", I.B.M. Technical Bulletin 06-88, p. 180-85, Jun. 1988). Robinson, et al disclose a method wherein a page version containing the record accessed by a read-only query is prepared for the query and the global copy of the page is made available to the updating transaction. Versioning granularity can be varied and the version pages may be stored in a shared buffer pool along with other data. Different versions of the same page may co-exist in the buffer for queries started at different times. Robinson, et al do not consider the problems of efficiently controlling the various versions or of the efficient garbage collection of version data following read-only query termination.

S. Todd discloses a concurrency system that provides a deadlock-free environment for use in a distributed database system that tracks multiple versions of data pages (Todd S., "Concurrency System Suitable for Distributed Databases", I.B.M. Technical Disclosure Bulletin, 06-78, p. 383-386, Jun. 1978). Todd describes a method for labeling various page versions to indicate version status as replacement, lock or current. His method replaces the use of read locks with version control and permits all read-only queries to access the database without wait states.

In U.S. Pat. No. 4,627,019, Fred K. Ng discloses a versioning technique that stores an array of access blocks, each block defining the database location of a version of a relation, only one of which is defined as current. However, his technique requires a new access block in the access dictionary (defining a new database location) for every update transaction. Database access by each of a plurality of database transactions is permitted only through the relation block associated with that particular transaction. Ng's technique appears to be intended for use in systems with few updating transactions and many read-only queries.

In U.S. Pat. No. 4,853,843, Denise J. Ecklund discloses a system for merging virtual partitions of a distributed database where various database versions exist among the partitions. In U.S. Pat. No. 4,875,159, Richard W. Cary, et al disclose a version management system for synchronizing two versions in a multi-processor system. Both the Ecklund and Cary, et al techniques are intended for coordinating versioning control among independent processors in a distributed data processing system.

There is a clearly-felt need in the art to improve the optimization of tradeoffs between transaction concurrency and the delays arising from lock conflicts in concurrent processing systems. The related unresolved problems and deficiencies are clearly felt in the art and are solved by the present invention in the manner described below.

SUMMARY OF THE INVENTION

The present invention is a method for maintaining at least one "stable-state" version of the database for use by read-only transactions or queries concurrently with updating transactions. A query need only access a consistent state of the database and often does not require that consistent state to reflect the current database state at the time the query terminates. Thus, maintaining a "stable-state" version of the database for use by queries permits extensive read-only queries to access consistent data without the associated read locks forcing concurrent updating transactions to wait for query completion and without forcing a query to wait for commit of updating transactions. When no query is active, the "stable-state" version can be updated to reflect a later committed database state. Occasionally, when a read-only query must access the most recent database version, it may be relabeled and restarted as an updating transaction and thereby acquire the necessary read locks on all data.

The concept of a flexible "version period" for a database processing system is introduced. Because transactions move the state of the database along the time axis, the time axis may be divided into a series of "version periods", which are numbered sequentially. An updating transaction is assigned the version number of the version period during which the transaction commits. A read-only query is assigned the version number of the version period during which it arrives. An updating transaction is permitted to acquire read and write locks on the most current version of each record accessed. A read-only query is directed to read records from a present "stable-state" version of the database. The "stable-state" version accessed by a read-only query remains unchanged during the life of the query.

A new version of a record is created whenever any updating transaction writes new data to a record that was created by a previous committed transaction. An arbitrarily long chain of increasingly outdated "stable-state" versions can be created and maintained. Each "stable-state" version need be retained only as long as necessary to terminate all active queries that are using that particular version.

The method of the present invention is implemented by using a novel "version-block" record key structure to provide a direct access path to the associated record data. Version control and tracking is implemented by maintaining several transaction identification lists. First, in main memory, the system maintains a List of Uncommitted update transactions, (UL), and a List of committed but Not yet Stable state update transactions, (NSL). Actually, the union of UL and NSL, which is NSUL, is maintained as the list of interest instead of NSL. A separate NSUL is kept for each aging stable-state database version Because the record key structure provides a direct path to each record version and also identifies the creating transaction for each record version, the record version can be defined as the version of the creating transaction (that is, the version period during which the creating transaction committed). Accordingly, the record key structure of this invention enables the system to quickly redetermine the currency of a record version by comparing the associated creating transaction with the two transaction lists UL and NSUL. This efficient redetermination method allows the system to dynamically track all database versions as they flow back along the timeline.

The present stable-state version can be refreshed by switching the present version period to a new later value. This is done, in part, by resetting the oldest NSUL to UL to flush out all non-stable committed transactions. The switching can occur as soon as all read-only queries using the oldest stable-state database version are terminated. This refreshing method brings each recently committed version of the database into atomic visibility for subsequent read-only queries.

The method of the present invention is applicable to distributed processing systems and is dynamic in that it can include from two to an indefinitely large number of database versions. The application of this method to a non-distributed database processing system is a special case of the distributed system application and there is no significant increase in overhead incurred by using the more generalized distributed concurrency control embodiment for the local non-distributed application. Also, there is little additional overhead cost in using the N-version embodiment to maintain merely two versions, which is the simplest case of interest. These two versions are the "present" version and the "present stable" version.

Thus, it is an object of this invention to provide a consistent "stable-state" of the database for use by read-only transactions, herein called queries, without requiring read locks, thereby avoiding lock-conflicts with concurrently processed updating transactions having both read and write locks on the database. It is another object of this invention to support efficient distributed database access and to permit record-level locking.

It is an advantage of the present invention that recovery can be performed by using shadow pages and logs, or by using write-ahead logging. It is a feature of the present invention that all read-only queries not requiring the latest state of the data will not acquire any long duration locks, thereby avoiding all waits caused by lock-conflicts with concurrent updating transactions. It is another feature of this invention that read-only queries do not update any control information (e.g. timestamps) associated with the data records queried. There is no need to assign timestamps to queries or transactions because the serialization order among conflicting transactions is determined dynamically by the method of the present invention.

It is both an object and an advantage of this invention that no forced roll-back of transactions is necessary to accommodate system support of multiple database versions and such roll-backs are used only in event of deadlocks between such transactions. It is a feature of this invention that the maximum number of versions retained by the system is dynamically adjustable. Another feature of this invention is that in-place updating of data on disk is possible even before transaction termination and there is no need for deferred update. There is also no need to force to disk all modified pages at commit time.

An advantage of this invention is that incremental versioning can be performed to reduce storage requirement increases arising from database version maintenance. Also, recovery log volume is reduced because the "before" image of updated fields (or of the complete record during a deletion) need not be logged, except when the same record is being updated more than once by a single transaction.

It is yet another object of this invention to minimize the storage volume required to maintain multiple versions. It is an advantage of this invention that the concept of version periods reduces the volume of storage required because, unlike with other multi-version schemes, a new database version need not be created at each record update.

Another advantage of this invention is that no beginning transaction or query is required to declare the set of data items that it will read or write. It is yet another object and advantage of this invention that read-only queries need never be forcibly rolled back by the system, except in a distributed database processing system as an alternative to unwanted delays in version switching. Because versions may be switched only upon termination of the oldest active read-only queries, it is an advantage of this invention that any read-only query may be forcibly terminated and restarted as an updating transaction having read and write locking capability, thereby permitting version switching on demand without rollback or wait states.

It is another advantage of this invention that asynchronous garbage collection can be performed in the background. Garbage collection is efficiently controlled by means of bit-mapped transaction tables and a "version block" record key structure.

The foregoing, together with other features and advantages of the present invention, will become more apparent when referring to the following specifications, claims and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention, reference is now made to the following detailed description of the embodiments illustrated in the accompanying drawings, wherein:

FIG. 1 illustrates the concept of version periods on a time line;

FIG. 2 shows a schematic illustration of the version block record key structure of this invention;

FIG. 3 illustrates a bit-mapped embodiment of the UL and NSUL transaction tables;

FIG. 4 provides a PASCAL-type pseudocode embodiment of the version block data structure;

FIGS. 5A-5C provide a PASCAL-type pseudocode embodiment of the record update method of this invention;

FIG. 6 provides PASCAL-type pseudocode embodiment of the method of this invention for locking a version period in shared mode;

FIG. 7 provides a PASCAL-type pseudocode embodiment of the version switching method of this invention for a non-distributed N-version system;

FIGS. 8A-8B provide a PASCAL-type pseudocode embodiment of the two-phase commit protocol for updating transactions in the general distributed N-version embodiment of the method of this invention;

FIG. 9 provides a PASCAL-type pseudocode embodiment of the method of this invention for starting a read-only query in a distributed N-version system;

FIG. 10 provides a PASCAL-type pseudocode embodiment of the method of this invention for execution of a read-only query in a distributed N-version system; and

FIG. 11 provides a PASCAL-type pseudocode embodiment of the version switching method of this invention for a distributed N-version system.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 illustrates the database versions and transaction types concepts as used herein. The ongoing series of transactions moves the state of the database along the time axis 12. Time axis 12 is divided into version periods numbered (v-2), (v-1), (v), and so forth, as exemplified by present version period 14. For the purposes of this disclosure, a read-and-write transaction is denoted by a T and denominated an updating transaction. A read-only transaction is denoted by a Q and denominated a query. A type T transaction is of version (v) if and only if T commits during version period (v). A type Q query is of version (v) if and only if it arrives during version period (v). A query of version (v) sees all of the results of transactions of version less than (v). That is, as shown in FIG. 1, all type Q queries of version (v) will see that database state committed at the beginning of version period (v). Thus, a query 16 is directed to the database as it existed at the beginning of present version period 14 and query 16 is said to have present version (v). Another query 18 is said to have version (v-1) because it refers to the database as it existed at the beginning of version period (v-1).

Periodically, the database system switches to a new version period, allowing the queries to see more recent changes to the database A transaction 20 is said to have version (v-1) because it committed during the version period (v-1). Conversely, any transaction that arrives during present version period 14 and remains uncommitted until after the end of present version period 14 must have a version number greater than (v).

The method of this invention is sufficiently general for application to distributed database systems and can maintain an indefinitely large (N) number of versions of such a database. However, for purposes of clear and simple exposition of this invention, the operation of a two version embodiment is first presented. This two version embodiment allows only two kinds of transactions to be active at the same time. These are types T and Q transactions of the current version (v). In this embodiment, the system can switch to a new version period only when no query is active. The more general N version embodiment of the method of this invention removes this restriction and allows the version period to be switched immediately while the older version periods are gradually phased out. Thus, the N version embodiment permits queries of the last (N-1) version periods to remain active concurrently with type T transactions. The concepts necessary to an understanding of the N version embodiment will be appreciated by referring to the following discussion of the two version embodiment.

A first element of this invention requires the system to maintain a record key structure for each record in the database (see FIG. 2). The database is assumed to be a series of records, each of which is identified by such a key, which can be a logical key or any other suitable type of record identifier. It is also assumed that type Q queries, which are read-only and which do not acquire locks, are identified as such when started and that all type T transactions use read and write locks to synchronize among themselves. With these assumptions, the two version embodiment was found to require no more than three simultaneous record versions These three versions are shown in FIG. 2 as a first version, PTR(1), representing the uncommitted state of the record, a second version, PTR(2), representing the last committed record value that is not in the stable state, and a third version,