WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Soft checkpointing system using log sequence numbers derived from stored data pages and log records for database recovery    

Custom CD of patents similar to US5043866 : Soft checkpointing system using log sequence numbers derived from stored data pages and log records for database recovery - $19.95
United States Patent5043866   
Link to this pagehttp://www.wikipatents.com/5043866.html
Inventor(s)Myre, Jr.; William W. (Austin, TX); Shih; Cheng-Fong (Austin, TX)
AbstractFunctions MINBUFLSN and LOWTRANLSN, implemented in a computerized routine, are defined and comprise first and second components of a checkpoint. MINBUFLSN is functionally related to a first update to a first of "dirty" data pages in the RAM buffer. LOWTRANLSN is functionally related to the earliest update of a sequence in a transaction table wherein each update corresponds to an uncommitted transaction. The two components are derived during write-ahead logging and stored in the log header periodically as a function of logging activity. Upon recovery, the checkpoint is retrieved and a functional comparison between the components thereof employed in the recovery algorithm. The conventional analysis pass of the recovery log is avoided and a reduced overhead during logging is provided as well as an efficient recovery.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Myre, Jr.; William W. (Austin, TX); Shih; Cheng-Fong (Austin, TX)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Company News
Publication Date     August 27, 1991
Application Number     07/179,194
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 8, 1988
US Classification     707/202 714/16
Int'l Classification     G06F 011/30 G06F 009/312 G06F 012/08
Examiner     Williams Jr.; Archie E.
Assistant Examiner     Coleman; Eric
Attorney/Law Firm     Carwell; Robert M.
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 MS File 364/900 MS File 371/12
Patent Tags     soft checkpointing log sequence numbers derived stored data pages log records database recovery
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4878167
Kapulka
714/16
Oct,1989

[0 after 0 votes]
4868744
Reinsch
714/19
Sep,1989

[0 after 0 votes]
4852092
Makita
714/17
Jul,1989

[0 after 0 votes]
4823261
Bank
711/152
Apr,1989

[0 after 0 votes]
4819159
Shipley
714/19
Apr,1989

[0 after 0 votes]
4819232
Krings
714/13
Apr,1989

[0 after 0 votes]
4819156
DeLorme
714/15
Apr,1989

[0 after 0 votes]
4814971
Thatte
714/15
Mar,1989

[0 after 0 votes]
4751702
Beier
714/13
Jun,1988

[0 after 0 votes]
4703481
Fremont
714/17
Oct,1987

[0 after 0 votes]
4697266
Finley
714/16
Sep,1987

[0 after 0 votes]
4674038
Brelsford
714/15
Jun,1987

[0 after 0 votes]
4665520
Strom
714/15
May,1987

[0 after 0 votes]
4631673
Haas
707/100
Dec,1986

[0 after 0 votes]
4507751
Gawlick
707/202
Mar,1985

[0 after 0 votes]
4498145
Baker
707/202
Feb,1985

[0 after 0 votes]
4459658
Gabbe
714/6
Jul,1984

[0 after 0 votes]
4429360
Hoffman
710/268
Jan,1984

[0 after 0 votes]
4159517
Paradine
710/57
Jun,1979

[0 after 0 votes]
4077059
Cordi
711/122
Feb,1978

[0 after 0 votes]
4020466
Cordi
707/201
Apr,1977

[0 after 0 votes]
3783256
Caputo
714/15
Jan,1974

[0 after 0 votes]
3736566
Anderson
714/15
May,1973

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method for use in a computerized database system having a processor and associated memory in a database comprising:

storing data corresponding to a plurality of updated data pages in said memory;

deriving from said stored data MINBUFLSNs functionally related to updates of corresponding ones of said updated pages;

storing log records each corresponding to an in-flight database transaction in said memory;

deriving from said stored log records LOWTRANLSNs functionally related to one of said in-flight transactions;

deriving checkpoints as a function of said MINBUFLSNs and said LOWTRANLSNs;

storing said checkpoints in said memory;

writing a plurality of log records to a recovery log in said memory;

generating a metric of said writing of said log records comprised of a number N corresponding to a number of said writes of said log records to said recovery log; writing a plurality of checkpoints as a function of said metric;

retrieving said MINBUFLSNs and said LOWTRANLSNs from one of said stored checkpoints stored in said memory;

comparing said retrieved MINBUFLSNs and said LOWTRANLSNs; and

starting full forward recovery of said database by said system at a location in said recovery log functionally related to said MINBUFLSNs if said LOWTRANLSNs is equal to or greater than said MINBUFLSNs.

2. A method for generating components of a checkpoint for use in recovery of a database, comprising:

1. generating indicators of a first sequence of occurrence of updates to said database, each said update corresponding to an in-flight transaction and one of said indicators;

2. selecting one of said indicators as a first of said components of said checkpoint;

3. generating LSNs which are indicators of a second sequence of occurrence of first updates to data pages in a buffer;

4. selecting one of said LSNs as a value derived from a function MINBUFLSN corresponding to a first update of a first of said data pages in said buffer to be updated as a second of said components of said checkpoint;

5. writing to a recovery log a representation of said first and second components;

6. generating a measure of log record creation; and

7. regulating time between said writing of adjacent ones of said next sets of first and second components as a function of said measurement.

3. The method of claim 2 wherein said measurement comprises N writes to said recovery log.
 Description Submit all comments and votes
 


TECHNICAL FIELD

This invention relates to computerized databases and, more particularly, to systems and methods for recovering data after system crashes.

BACKGROUND ART

It is well known that computerized database systems have gained wide acceptance in numerous applications. The data accumulated therein often represents vast amounts of expense and effort and is extremely valuable if not vital to the user, whereby data loss can be quite serious and costly.

Accordingly, in addition to the more conventional function of database systems in storing and manipulating data, they must also provide an additional important function of data recovery in the event of system crashes wherein normal processing ceases. One difficulty in providing recovery mechanisms was the requirement that the database be restored to a consistent state. A classic illustration of the problem of inconsistency occurs in the case of banking applications for example. A debit to a customer's account record on disk may be made followed by a crash of the bank's database system before a credit is made to a correlative account. The credit action might even have been completed in the sense of being entered in main or RAM buffer memory but simply not written to disk yet. The image of the database out on the permanent storage disk was thereby said to be in an inconsistent state.

To solve the problem, the notion of transaction boundaries developed in the art which bounded discrete sets of database actions whereby all actions such as the debit and credit actions in the example between boundaries if completed would always leave the database in a consistent state. In other words, by transaction control in a database with respect to a sequence of updates to the database either all of the updates of the sequence would be completed or none. In the event of a problem and need to recover the database after a crash before a transaction was completed operations on the database could be backed out to these transaction boundaries.

In addition to the consistency problem addressed by the concepts of transaction boundaries and commits, yet another difficulty associated with database recovery related to the problem of storage of prior data images in the event the database had to be restored to those images at recovery. In the simple previous example, this might mean retaining the original state of the two accounts in the event that the partial or in-flight transaction of only the debit action occurred. The transaction could then be backed out so that the database was left in the original consistent state prior to the incomplete transaction changing the database.

One technique which developed in the art for retaining prior image data for recovery purposes was known as shadowing or shadow paging. Early database systems employing this technique included System R, and the commercial database product SQL DS developed by the IBM Corporation, Armonk, N.Y. In this technique copies of historical data pages were retained. At transaction commits, a new copy of pages from the historical copy was made including the changes which became the new committed copy and the previous copy was deleted. Each time additional database changes occurred, the present committed copy was copied with the changes into a new committed copy of the data pages and the prior correlative data page deleted. A performance benefit of this technique occurred at commit times because changing from the old to the new data pages involved merely changing a pointer. There was no necessity to redo database actions as in the case of the later technique of write-ahead logging to be next described.

Nevertheless numerous well known disadvantages of the shadowing techniques became evident giving rise to development of algorithms supporting the write-ahead logging. These disadvantages included extra RAM and disk space and associated overhead for maintaining the shadow copies of data because in updating every page in the database second copies were required. Additional drawbacks included costly checkpoints, disturbing the physical clustering of data resulting in data fragmentation, inefficient maps, and extra I/O for page map blocks.

In summary it was operationally found that the penalties of shadow paging were too great and thus write-ahead logging was developed and thought to be a better solution to the recovery problem. An early example of this technique may be found in the aforementioned SQL/DS database system. In this technique instead of retaining an entire second copy of data, a linear record or journal is simply retained of what was done before and after a database action. Any action represented in RAM buffers even with respect to a completed transaction will not necessarily have been written out on disk. Accordingly, such work will be lost upon system crash and must be redone on recovery from the recovery log unlike in the case of the aforementioned shadow paging technique.

Under the write-ahead logging protocol in general, logged records corresponding to the changes must first be written to disk before any correlative data pages with the changes are written to the data file. One important aspect of write-ahead logging relates to checkpointing whereby points in the recovery log at which recovery should begin in the event of a system crash are periodically determined and written out. One factor upon which efficient checkpointing depends is the internal structure of the database and, more particularly the granularity of locking. In multiple user databases, a well known problem arises when more than one user is seeking to access the same data. Returning to the previous example, while a first user is reading the aforementioned debit account a second user may be changing the credit account. While the latter user's transaction is in-flight, the first user may then access the same credit account yielding inconsistent results.

A solution to the problem has been to restrict access to a portion of the database to one user, such restriction being referred to as a lock and the size of the portion of the database restricted being directly related to the "granularity" of the lock. In the previously mentioned SQL/DS database, for example, granularity of locking was at the physical data page level which encompassed many records. The significance of such large granularity in terms of database recovery was that it simplified the problems of recovery in checkpointing. The ability to commence recovery at an optimum checkpoint was a simpler problem than in systems of sub-page granularity wherein for example locking on a record level base is effected.

To illustrate why larger granularity simplifies recovery in checkpointing problems, in the case of record locking the physical page moving data in from disk and out from a RAM buffer may contain updates from more than one transaction. Moreover, of these transactions one may be committed, one may be aborting, and one may be stopped at the point of a system crash. At that point, in a database system such as DB2 with larger granularity locking, the page would at most be affected by one transaction. Consequently it would be relatively easy to deal with recovery related to that page. However, with respect to sub-page level granularity, the same page due to concurrent updates facilitated by such sub-page locking, might have multiple transactions running on it making recovery more difficult because all related database changes must then be made in the proper sequence.

Notwithstanding the simpler recovery algorithms associated with locking granularity at the data page level wherein for example it is relatively easy to keep track of whether a data page has been paged out and an update needed, developments and interest increased in provision for sub-page granularity locking and resultant recovery for multiple transactions within a data page. One reason for the desire for such concurrency was that page level locking related to a physical size of the page. However, databases operate in terms of logical objects or entities rather than arbitrary physical limitations, and thus locking at a table or record level on logical data objects was highly desired, i.e., sub-page level granularity such as on a record level. Data pages may exist in the database for purposes of I/O however the locks themselves should preferably functionally not be so limited. Thus, it was desired to provide for multiple transactions with locks on records within a single page for concurrency purposes in the context of the write-ahead logging technique for recovery.

Accordingly, techniques were developed for sub-page granularity locking and recovery techniques for multiple transactions within a data page. In one technique, write-ahead logging was provided with locking granularity finer than the size of the data page (or unit used to bring data in or write data out within the database). During normal database operations, the technique would write out or "log" detailed information about the state of the buffer pool. More particularly, status of the pages in the RAM buffer pool (i.e., which pages were "dirty", actions which dirtied them, with associated log sequence numbers or LSNs, etc.) would be periodically written out to a recovery log and put in a record containing this check point information. In addition to this data necessary for recovery, log records of the changes to the actual data in the database were also written out to the recovery log. The entire log was thus a sequence of records comprised both of checkpoint information as well as the actual data updates to the database.

At recovery time, an analysis pass would occur. During this phase, a forward pass would be made to the log, wherein all of the log records containing checkpoint information which were previously written to the log would be read out and analyzed. The optimal checkpoint at which to begin recovery would then be calculated from this information during the analysis pass.

More particularly, from the numerous checkpoint records two LSNs would be determined: an LSN related to a first update to the earliest of "dirty" pages in the buffer; and an LSN corresponding to a first update to the earliest transaction still in flight or "uncommitted". The optimal recovery point would correspond to the minimum of these LSNs. These LSNs calculated during the analysis pass will hereinafter be understood to correspond to MINBUFLSN and LOWTRANLSN of the present invention which are periodically determined and stored during normal forward processing in accordance therewith. As used herein "LSN" will refer to "log sequence number"; "LOWTRANLSN" will refer to "low transaction log sequence number"; and "MINBUFLSN" will refer to "minimum buffer log sequence number."

One problem with such prior techniques was that the entire log of the buffer pool had to be scanned (i.e., an analysis pass was required) to read all the log records just in order to calculate these two LSN values from which the optimal recovery point could be determined. This is to be contrasted with the present invention wherein the MINBUFLSN and LOWTRANLSN values are periodically determined and written to the log record as aforesaid, thereby obviating the need for an analysis pass.

Although in the prior art actual instantaneous values for MINBUFLSN and LOWTRANLSN might be more current than those last written to the log in accordance with the invention thereby yielding a more optimal recovery point, the latter or nevertheless immediately available upon recovery without the necessity for the analysis pass.

Accordingly, with the foregoing in mind, it is readily apparent that a novel technique was desired for soft checkpointing. Such a technique was desired which avoided hereinbefore noted disadvantages of shadowing while at the same time providing for sub-page concurrency and recovery from multiple page transactions within a data page. Such systems and methods were further desired for database recovery which were both quick and efficient particularly in terms of overhead required during run time to support the technique, and which avoided requiring an analysis phase on recovery.

SUMMARY OF THE INVENTION

Functions MINBUFLSN and LOWTRANLSN, implemented in a computerized routine, are defined and comprise first and second components of a checkpoint. MINBUFLSN is functionally related to a first update to a first of "dirty" data pages in the RAM buffer. LOWTRANLSN is functionally related to the earliest update of a sequence in a transaction table wherein each update corresponds to an uncommitted transaction. The two components are derived during write-ahead logging and stored in the log header periodically as a function of logging activity. Upon recovery, the checkpoint is retrieved and a functional comparison between the components thereof employed in the recovery algorithm. The conventional analysis pass of the recovery log is avoided and a reduced overhead during logging is provided as well as an efficient recovery.

BRIEF DESCRIPTION OF DRAWING

FIG. 1 is a block diagram illustrating the main functional components of the system and method of the present invention.

FIG. 2 is a conceptual illustration of a RAM buffer of a database depicting the relationship between MINBUFLSN and LSN's associated with data pages in the buffer.

FIG. 3 is a conceptual illustration of a transaction table associated with the database depicting the relationship of LOWTRANLSN to the LSN's associated with transactions represented in the table.

FIG. 4 is a flow diagram illustrating the method of deriving and maintaining LOWTRANLSN during normal database operation as shown in block 12 of FIG. 1.

FIG. 5 is a flow diagram illustrating the method of deriving and maintaining MINBUFLSN during normal database operation as shown in block 10 of FIG. 1.

FIG. 6 is a flow diagram illustrating the method for storing the checkpoint (MINBUFLSN, LOWTRANLSN) shown at block 14 of FIG. 1 produced by the methods depicted in FIGS. 4 and 5 corresponding to blocks 12 and 10 of FIG. 1, respectively.

FIG. 7 is a flow diagram illustrating in more detail use of the checkpoint produced at blocks 10 and 12 of FIG. 1 and stored at block 14 of FIG. 1 to commence database recovery.

FIG. 8 is an illustration of a computerized database system with the recovery feature of the present invention.

BEST MODE FOR CARRYING OUT THE INVENTION

In order to describe the subject invention an explanation of certain terms and concepts hereinafter used is provided as well as a brief overview of the system and methods employed. This is followed by a more detailed description of the operation of the invention with reference to the accompanying figures.

In the art of database recovery employing write-ahead logging protocol, the following terminology is conventionally employed:

Data Pages

This term refers to fixed size blocks of storage that contain user data. Such data pages may be paged into some form of secondary storage such as a RAM buffer (wherein they will be lost in the event of a system crash). Alternatively, these pages may be paged out to a conventional form of primary storage such as a hard disk file (wherein they are preserved in the event of such a crash). A data page in a buffer will contain the most current updates thereto whereas an older copy thereof on such a disk may lack the most recent updates to the page.

Log Record

The term log record refers to a record containing both before and after images of a single data change. This log record is conventionally used to permit a REDO or UNDO of a data change using the information contained in the log record.

Recovery Log

This refers to a sequence of the just-described log records which thereby records all changes made to the database data pages in the order in which they were effected. By referencing the recovery log, the recovery process can restore the database to a consistent state.

Log Sequence Number

Each aforesaid log record in the recovery log is identified by a unique log sequence number or LSN. The LSN of a log record is the logical byte offset of the first byte of the log record from the logical beginning of the log.

Write-Ahead Logging

When a data page is updated or "dirtied", the update generates a log record with a corresponding identifying LSN. The data page is also updated with this LSN. Any "dirtied" data page contains the LSN of the log record recording the last update made to the page. In this manner, a data page is associated with a specific point in the log identified by its corresponding LSN such that all log records after this LSN do not reference that particular corresponding page. When a "dirty" data page is written out to disk, the write-ahead logging protocol specifies that the recovery log must have been previously written out to disk up to and including the log record identified by the LSN on the page to be written. This procedure insures that no data page is written out to disk before all log records recording changes to that page are already out on disk. Thus, in the event of recovery the database can thereby be restored to a consistent state.

MINBUFLSN

The database RAM buffer will at any given time contain 0 or more "dirty" pages. In the case where the buffer contains one or more "dirty" pages, one page will exist that is the oldest, i.e., a page which was updated prior to any other "dirty" page. The LSN of the first update to this page is herein defined as MINBUFLSN. It will be noted that MINBUFLSN is not necessarily identical to an LSN written to the page but rather the LSN of the first update to the page. (It is identical until the second update.)

Importance of the MINBUFLSN parameter is that it identifies the first point in the log where logged operations may need to be redone. This is because the data pages associated with any previous log records have already been written out to disk and are no longer in the buffer. When there are no "dirty" pages in the buffer, the next available or free LSN is employed as MINBUFLSN, since this LSN is the first location at which a write to a buffer page could occur.

LOWTRANLSN

This is the LSN of the first log record written by the oldest in-flight transaction. In other words, it is the smallest LSN of any log record written by a transaction still in flight. LOWTRANLSN identifies a point in the log prior to which log records of in-flight transactions do not appear.

With the foregoing in mind, an overview of the general concept of the present invention now follows. Periodically the aforementioned MINBUFLSN and LOWTRANLSN are determined and written to the RAM version of the log file header. The LSN of the last log record written to disk is also updated. The log file header is thence written to disk. These actions constitute the normal run time overhead of the checkpoint system and method of the present invention. It will be noted that maintaining MINBUFLSN and LOWTRANLSN requires relatively little overhead as does the occasional disk I/O in writing the log file header containing the checkpoint (MINBUFLSN, LOWTRANLSN).

When recovery of the database is necessitated, a start point STARTLSN in the log is determined as the minimum of (LOWTRANLSN, MINBUFLSN). If STARTLSN is smaller than the last LSN written to disk, then recovery is completed; otherwise recovery is not required. In addition, however, if LOWTRANLSN is smaller than MINBUFLSN, no data pages need be read in while recovering between LOWTRANLSN and MINBUFLSN inasmuch as the updates logged will have been applied. This phase of the invention will hereinafter be referred to as "mini analysis". After MINBUFLSN has been reached, the normal REDO process of recovery resumes.

With reference to FIG. 1, referring first to FIG. 1, in accordance with the present invention it will be recalled that a checkpoint comprised of MINBUFLSN. and LOWTRANLSN will periodically be produced as represented by blocks 10 and 12, and thence also periodically written out to the log file header of permanent storage such as the hard file or fixed disk, as further represented at block 14. Line 16 is intended to conceptually designate the functional distinction between processes above the line for deriving and storing the checkpoint from those below line 16 wherein such checkpoints are employed after a system crash to facilitate recovery of the database. These latter recovery techniques are accordingly generically indicated at block 18 wherein the previously derived and stored checkpoint will be retrieved and utilized in the recovery process to be hereinafter described in greater detail.

Referring now to FIG. 2, this illustration is intended to conceptually depict the concept of MINBUFLSN. Secondary storage in the form of a RAM buffer 20 of a data processing system will contain a plurality of data pages 22 and 24. Some of these pages 22 may be "clean" whereas others are "dirty" pages, 24, meaning that updates have been written to them with correlative log records indicating the before and after images of these changes. Associated with each "dirty" page is a unique LSN 26 of the log write which caused the respective pages' status to change from "clean" to "dirty". As hereinbefore described, MINBUFLSN is the LSN of the first update to the oldest page, i.e., the LSN of the page updated prior to any other "dirty" page updates. It will be recalled that when there are no "dirty" pages in the buffer 20, the next available or free LSN is used as the MINBUFLSN, since this LSN is the first location at which a write to a buffer page could occur.

Thus, in the example o