WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Multiprocessor system for updating status information through flip-flopping read version and write version of checkpoint data    
United States Patent4823261   
Link to this pagehttp://www.wikipatents.com/4823261.html
Inventor(s)Bank; Judith H. (37 Balmoral Dr., Chestnut Ridge, NY 10977); Familetti; Harry G. (15 Hemlock Ct., Fishkill, NY 12524); Lickel; Charles W. (82 Mandalay Dr., Poughkeepsie, NY 12603)
AbstractAn apparatus and method employs dual checkpoint data sets for communicating system status. A journal of changed data is implemented to reduce I/O to a subsystem's shared data area on a non-volatile shared storage device. The journal provides for an increase in the amount of time that a processor may have access to the shared data area. Also, two versions of the data area are implemented in order to insure the integrity of the continuously updated data area. The two versions flip-flop depending upon which one has the most recent updates. That is, the version that has the most recent updates becomes the to-be-read-from data area and is read by the processor that currently has access to the shared data area during this series of I/O operations. The other version becomes the to-be-written-to data area and is written to by the processor that currently has access to the data area in order to update the to-be-written-to version to the current level. The to-be-written-to version then becomes the to-be-read-from version during the next series of I/O operations.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4823261
Multiprocessor system for updating status information through

     flip-flopping read version and write version of checkpoint data - US Patent 4823261 Drawing
Multiprocessor system for updating status information through flip-flopping read version and write version of checkpoint data
Inventor     Bank; Judith H. (37 Balmoral Dr., Chestnut Ridge, NY 10977); Familetti; Harry G. (15 Hemlock Ct., Fishkill, NY 12524); Lickel; Charles W. (82 Mandalay Dr., Poughkeepsie, NY 12603)
Owner/Assignee     International Business Machines corp. (Armonk, NY)
Patent assignment
All assignments
Publication Date     April 18, 1989
Application Number     06/934,378
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     November 24, 1986
US Classification     711/152 700/5 703/25
Int'l Classification     G06F 015/74 G06F 011/30
Examiner     Williams Jr.; Archie E.
Assistant Examiner     Lee; Thomas C.
Attorney/Law Firm     Biela; Joseph A. Porter; William B. ,
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 MS File 364/900 MS File 364/134 371/10
Patent Tags     multiprocessor updating status information through flip-flopping read version write version checkpoint data
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4654819
Stiffler
711/162
Mar,1987

[0 after 0 votes]
4591976
Webber
714/20
May,1986

[0 after 0 votes]
4413327
Sabo
711/162
Nov,1983

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. In a multiplexed data area computer complex including processors, main storage having program address spaces, at least one non-volatile storage device connected to said main storage, at least one subsystem in one of said program address spaces in said main storage and shared status data stored in fields in a predetermined area in said main storage for each of said processors, the apparatus for communicating subsystem status among said processors comprising:

a single, shared to-be-read-from version of said shared status data including a first storage area formatted on said storage device, said first storage area including records for containing changed data fields,

a single, shared to-be-written-to version of said shared status data including a second storage area formatted on said storage device, said second storage area including records for containing changed data fields, and

a program in one of said processors or address spaces for updating subsystem status on said to-be-written-to version by performing output operations on said to-be-written-to version and input operations on said to-be-read-from version when one of said processors solely controls and has access to said shared data, said operations comprising:

means for reading from said first storage area and overwriting portions of said shared status data in said predetermined area with data in said changed data fields read from said first storage area,

means for reading only changed ones of said records in said to-be-read-from version and overwriting said shared status data in said predetermined area with said changed ones of said records,

means for writing said one of said changed records in said shared status data in said predetermined area to said to-be-written-to version,

means for recording in-progress updates to said data fields in said predetermined area corresponding to one of said processors that solely controls and has access to said shared status data in said predetermined area,

means for writing in-progress updates that have been written to said shared status data in said predetermined area to said to-be-written-to version,

means for flip-flopping said to-be-read-from version to said to-be-written-to version and said to-be-written-to version to said to-be-read-from version following said input and output by said solely controlling processor, and

means for repeating said operations with a subsequent one of said processors controlling and having access to said shared status data.

2. In a multi-access spool computer complex including processors, main storage, at least one direct access storage device (DASD) connected to said main storage, a subsystem in an address space in said main storage and a checkpoint data set stored in a predetermined area in said main storage containing control blocks for each of said processors, the improvement comprising:

a single, shared to-be-read-from version of said checkpoint data set including a first journal formatted on a track of a DASD, said first journal including records for containing changed control blocks,

a single, shared to-be-written-to version of said checkpoint data set including a second journal formatted on a track of a DASD, said second journal including records for containing changed control blocks, and

a channel program in said address space for performing input/output operations on said versions when one of said processors solely controls and has access to said checkpoint data set, said operations comprising:

means for reading from said first journal and overwriting portions of said checkpoint data set in said predetermined area with data read from said first journal,

means for reading only changed ones of said records in said to-be-read-from version and overwriting records in said checkpoint data set in said predetermined area with said changed ones of said records,

means for writing said ones of said changed records in said checkpoint data set in said predetermined area to said to-be-written-to version,

means for recording the status of in-progress updates to control blocks in said checkpoint data set corresponding to said one of said processors that solely controls and has access to said checkpoint data set in said predetermined area,

means for writing in-progress updates that have been written to said checkpoint data set in said predetermined area to said second journal in said to-be-written-to version,

means for flip-flopping said to-be-read-from version to said to-be-written-to version and said to-be-written-to version to said to-be-read-from version following said input/output operations by said solely controlling processor, and

means for repeating said operations with a different one of said processors controlling and having access to said checkpoint data set.

3. In a multi-access spool computer complex including processors, main storage, at least one direct access storage device (DASD) connected to said main storage by a channel subsystem, a subsystem in an address space in said main storage and a checkpoint data set stored in a predetermined area in said main storage containing control blocks for each of said processors, the method for communicating subsystem status among said processors comprising the steps of:

creating a single, shared to-be-read-from version of said checkpoint data set including a first journal formatted on a track of a DASD, said first journal including records for containing changed control blocks, and a single, shared to-be-written-to version of said checkpoint data set including a second journal formatted on a track of a DASD, said second journal including records for containing changed control blocks, and

executing a channel program in said address space for performing input/output operations on said versions when one of said processors solely controls and has access to said checkpoint data set, said operations comprising the steps of:

reading from said first journal and overwriting portions of said checkpoint data set in said predetermined area with only changed data read from said first journal,

reading only changed ones of said records in said to-be-read-from version and overwriting records in said checkpoint data set in said predetermined area with said changed ones of said records,

writing said ones of said changed records in said checkpoint data set in said predetermined area to said to-be-written-to version,

recording status information of in-progress updates to control blocks in said checkpoint data set corresponding to said one of said processors that solely controls and has access to said checkpoint data set in said predetermined area,

writing in-progress updates that have been written to said checkpoint data set in said predetermined area to said second journal in said to-be-written-to version,

flip-flopping said to-be-read-from version to said to-be-written-to version and said to-be-written-to version to said to-be-read-from version following said input/output operations by said solely controlling processor, and

repeating said operations with a different one of said processors controlling and having access to said checkpoint data set.

4. The multi-access spool computer complex of claim 3 in which said status of in-progress updates is recorded in a change area on said address space.

5. The multi-access spool computer complex of claim 2 in which control bytes having bits each corresponding to a processor in said complex control said I/O operations on said versions.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention generally relates to a multi-access spool computer complex in which the checkpoint mechanism for providing communication among processors in the complex has been improved. In particular, this invention relates to the reduction in input/output to a subsystem checkpoint data set and the increase in the amount of time that a precessor in the complex can update the checkpoint data set.

2. Prior Art

The checkpoint (data set) function has existed since the HASP (Houston Automatic Spooling Program) in the late 1960s. In HASP, and in early versions of the job entry subsystem (JES) following HASP, the first several tracks of direct access storage devices were formatted into special checkpoint records. Periodically, the storage copy of the job and output queues were written to the direct access storage device (DASD), hence the name "checkpoint". Originally the sole purpose of the checkpoint was to maintain a copy of the work queues on the DASD so that the system could be restarted. When the processor was restarted after a failure or normal shutdown, HASP could read the checkpoint from the spool and continue processing as if it was never interrupted. (The "spool" is a space allocated by the JES for containing data related to the jobs to be run and job output.).

Subsequent changes were made such that the checkpoint data set now resides on a shared DASD which is accessible to all the processors in a multi-access spool (MAS) complex. By reading and writing data from/to the checkpont data set, a processor maintains an up-to-date copy of the checkpoint information in its storage. When a processor changes data in the queues, it must reflect the change to other processors via the checkpoint data set.

The MAS complex was introduced with the release of MVS and JES Release 3 in order to permit loose coupling of from two to seven processors (members). In the MAS complex, the checkpoint now serves as the communication pathway bewteen processors (or multiple JES address spaces on one processor) each running asynchronously but cooperatively. Since hardware control over the updating of information was needed, the checkpoint information was moved to a separate MVS data set. In a JES MAS complex, all processors share access to the same set of job and output queues. A job can be read in by any processor, can execute on any processor, its output can be printed or punched by each of the processors, and operators can control jobs anywhere in the complex. Each processor in a MAS complex maintains an in-storage copy of these job and output queues. All processors in the complex are therefore equal in control and in processing responsibility.

As the main mechanism for communication between processors in a MAS complex, the checkpoint data set contains much of the information that JES needs in order to control its functions. Aside from the job queues and outpout queues from which work is selected, the checkpoint also contains a record of member values describing the overall configuration of the MAS environment and specific characteristics and information describing the current status of each member. All of this data is used to control the operation of the checkpoint and related aspects of JES. The checkpoint data set also contains data that the processors in the complex use to "share" the spool space. This includes extent information about the spool volumes which are currently mounted and the bit map from which spool space is allocated. In fact, structures in the job and output queues contain pointers through which all data on the spool volumes are actuatlly located. The checkpoint data set is shared by the members using a time-slicing protocol. Each member is insured the privilege of exclusively updating the data set during the time that it has control of and access to the data set.

As installations grew, the size of their checkpoint data sets also grew and some MAS installations began to experience performance problems related to the amount of time required to perform checkpoint I/O (read/write) operations. JES had to read the entire data set because there was no way of determining which records had been changed by other processors in the complex. As a result, JES could become essentially serialized until checkpoint I/O operation completed. This lead to remote job entry timeouts and slow responses to operator commands. (It was not anticipated that the speed of processors in a MAS would be increased more rapidly than the speed of checkpoint I/O operations.) Also, much of the I/O for the checkpoint data set was unnecessary because many of the records in the job queue and job output table were unchanged. Thus the records being read overlaid identical data already in storage.

The problems were alleviated to some extent with the implementation of control bytes which reside on the checkpoint data set itself for identifying records that were changed. As a result, only the changed records had to be read. The physical format of the data set was changed to consist of fixed length 4K blocks. JES also began using the services of MVS to manage the real storage associated with checkpoint I/O buffers.

Nevertheless, problems with respect to unrecoverable checkpoint failures continued to occur and the amoutn of time required to perform I/O operations was still too great. Clearly JES was incapable of providing adequate error recovery for its checkpoint data set and did not provide efficient communication among processors within a complex. The slowness of checkpoint I/O operations and the loss of the checkpoint data set itself are still major problems in large MAS installations.

Special error detection mechanisms, including unusual channel programming and a philosophy of minimal dependence on hardware error reporting, were implemented. An auxiliary locking mechanism to supplement the shared DASD REVERSE/RELEASE locking facility was invented. A copy of the checkpoint data set was maintained as a backup in case of media damage or other failures. The mechanism for recovery from severe failures if the availability of a backup (duplex) coy of the checkpoint data set. The duplex data set must reside on a separate DASD device from the (primary) checkpoint data set.

JES CHECKPOINT CYCLE

There are four stages in a checkpoint cycle. The first stage beings with a read operation to update the in-storage queues and to ensure exclusive control of the checkpoint data set. The first part of the read operation issues a RESERVE for the checkpoint device and reads control data from the first DASD track. The second part of the read operation reads all of the records that were changed by other members. In the second stage, a member is said to "own" the checkpoint. During this "useful time" it can perform processing on and make updates to the checkpoint data. During this stage, a write operation causes the transfer, to the duplex data set, of all updated records that were not yet recorded on the duplex data set. This is a duplex write. Also during this stage, the member (processor) can cause intermediate update levels (intermediate WRITE) of the checkpoint data to be written to DASD without losing its ownership. At the completion of the "ownership" stage or HOLD interval, the final write of the data set is made and a RELEASE operation is performed which enables the next member to begin the checkpoint cycle. This is the third stage. The fourth stage is a dormant period during which a processor makes no attempt to access the data set and thereby allows other processors time to complete their active phases.

Since the JES checkpoint data set can be accessed by several processors, the RESERVE/RELEASE feature of shared DASD is used to control access to it. When a particular JES system wishes to access the checkpoint, a RESERVE is issued. This allows the member to update the data set until it issues a RELEASE. If another processor attempts to access the device on which the checkpoint data set resides while the first processor is still holding the RESERVE, the second processor will be returned a busy condition to its I/O operation. When the processor holding the RESERVE finally issues the RELEASE for the DASD, an interrupt (device end) is signalled to all processors which experienced the busy condition. At this point, the DASD is unlocked, and the other members can again try to RESERVE it.

There are two basic modes of operating a MAS complex. The most prevalent is in a controlled environment where each processor "gets its turn" owning the checkpoint data set. The other, contention mode, is not recommended. In contention mode, it is possible to allow a processor to compete for the checkpoint at all times, essentially by eliminating the "dormant" period. That is, all processors attempt the RESERVE simultaneously.

CHECKPOINT DATA SET FORMAT

The first track on DASD contains three control records: an 8-byte CHECK record, a LOCK record composed of an 8-byte key field and an 8-byte data field, and the MASTER record. The job queues and output queues are segmented into 4K records which reside on the remainder of the data set. There is a possibility that some portion of the last 4K record in the job queue and in the job output table may not be used, since the sizes of checkpoint structures are rounded to fit into 4K boundaries. The checkpoint data set is a non-sta ndard data set, containing both keyed and non-keyed records.

The CHECK record is an 8-byte record at the beginning of the first track of the checkpoint data set. It contains a check value used to help determine whether the remainder of the data in the checkpoint is valid. That is, it will be used in conjunction with a companion value in another record to indicate whether the previous update operation completed successfully.

The LOCK record is the only keyed record in the checkpoint data set. It consists of an 8-byte key portion and an 8-byte data portion which are identical. It is used as a software lock in addition to the normal hardware RESERVE/RELEASE mechanism. This record is used to control access to the data set.

The JES MASTER record is on the first track of the checkpoint data set. It contains data such as initialization parameters which affect the configuration of the MAS complex, the shared queue control element (QSE) data areas that represent the status of each processor in the complex and the checkpoint control bytes (CTLBs). A copy of the "check" value is in the first part of this record. This copy is compared to the value in the CHECK record during "read" operations to determine whether the records in the checkpoint data set are all at the same update level or state.

CHECKPOINT DATA SET LOCK

The checkpoint data set lock is used as a backup for the RESERVE/RELEASE feature of a shared DASD. RESERVE/RELEASE, by itself, is not an adequate mechanism to guarantee that simultaneous updates will not occur, because it has a tendency to open the lock, unintentionally, when failures occur. The checkpoint data set lock provided by JES, on the other hand, tends to lock closed under these conditions. It will always ensure that the data in the checkpoint data set is good by prohibiting simultaneous updates under any circumstances. Because of this characteristic of locking closed when a failure occurs, this lock requires a manual operation to reset it.

When a processor gets control of the shared checkpoint, it will write a value into the key and data portions of the LOCK record. When the lock is not held by any processor, a value of zero will be recorded. Processors in the MAS complex can determine whether the shared data set is available by using a predetermined channel "search" command (CCW) with a zero data field. If this command is successful, the remainder of a channel program (CCWs) will set the key field to the appropriate value of the processor. The channel program basically performs a "compare and swap" operation on the lock record. See U.S. Pat. No. 3,886,525 which is assigned to IBM, and which discloses and claims the "compare and swap" concept.

The operation to obtain the lock will be done as part of the initial "read" channel program which is executed as each processor's checkpoint interval begins. The initial "read" channel program begins by locating the LOCK record in the checkpoint data set. It then ensures that the lock is currently unowned. If the key of the LOCK record is currently 0 (lock unowned), it then goes on to "read" the MASTER record and the CHECK record. Finally, the initial "read"operation sets the lock by writing the ID of the member that will be owning the lock into the key and data of the LOCK record. If the lock record is currently owned (key of the LOCK record is currently non-zero), the channel program reads the value of the lock record in order to determine which member (processor) owns the checkpoint lock.

When this situation occurs, JES will attempt its own error recovery. If the system which lost the RESERVE is still running, it will eventually clear the key value, allowing the looping members to proceed. If it is not running, a JES operator command will be used by one of the other members to reset the lock value, on behalf of the failed member. Thus this lock can be used as a backup for the RESERVE/RELEASE hardware mechanism, since it tends to lock "closed", rather than "open", when hardware or software failures occur.

CHECKPOINT DATA SET CHECK RECORD

JES is capable of detecting when the checkpoint data read at the beginning of the checkpoint cycle is invalid. If another member fails during a checkpoint update, the data in the primary (as opposed to the backup) checkpoint data set may have been partially updated and is therefore not valid. This occurrence can be detected by using the check value in the CHECK record of the checkpoint data set. Each time that data is written to the checkpoint, an incremented counter (which ranges from 1 to 127) will be recorded in both the MASTER checkpoint record and the CHECK record. All checkpoint write operations will write the counter value in the MASTER record first. Then, after all of the changed queue records have been written, the CHECK record will be written as the last write operation (just before the I/O completion verification). When a member is to "read" the checkpoint data, it first compares the counter value with the check value to determine the integrity of the checkpoint data. If the values are unequal, then one of the other members in the complex must have failed during a checkpoint write operation and error recovery actionis necessary, and always involves a warm start i.e., a restart of JES that retains the work already in the queues.

CHECKPOINT DATA SET I/O OPERATIONS

As discussed above, the checkpoint cycle begins with a read operation to update the in-storage queues (control blocks). Each rad operation is divided into two separate I/Os. The first issues a RESERVE for the checkpoint device and reads control data including the control bytes from the first DASD track. The control bytes (CTLBs) are used to build a channel program to read all of the records that were changed by other members.

In addition to the storage actually used to contain the checkpoint data, JES maintains a checkpoint I/O buffer. During normal checkpoint I/O, actual CCWs transfer data to this buffer. Using the control bytes, JES fixes the real frames associated with the pages in the buffer that will be used. After a channel program has been built and executed, JES moves each of the "read pages" to the appropriate area of the actual checkpoint storage. JES then releases the "old" data in the associated frames (or storage slots on DASD) before "replacing" the data. JES then releases the frames used as an I/O buffer. The above operation is performed in a loop, a page at a time, so that the real frame requirement for the I/O never exceeds by more than one the total number of pages read.

The write operatins performed by the checkpoint processor are substantially the reverse of the read operations. Each page of the I/O buffer that will be used are fixed. For each page in the checkpoint data set that was changed, the storage in the actual checkpoint area is moved to the fixed page in the I/O buffer, and real addresses in the channel program are adjusted. The changed pages are then written to DASD. The pages in the I/O buffer that were fixed are released. It is important to note that entire records are read and written regardless of the number of bytes of data that were changed or modified within the records.

A rather complete discussion of direct access storage devices is found in a book entitled "Introduction to IBM Direct Access Storage Devices" written by Marilyn Bohl and published by Science Research Associates in 1981. This book is incorporated herein by reference. Additional pertinent references include U.S. Pat. Nos. 4,507,751 and 4,310,883, both assigned to IBM.

Faster central processors tend to require more JES services per second than slower ones. This means that the number of checkpoint records updated per second increases with processor speed. This, in turn, increases the unmber of records that must be read by each member of the configuration before useful work can begin. During the time that the data set is being read or written, (except for `intermediate` WRITES), all updates are prevented. This, in turn, inhibits almost all JES functions because most JES processing requires the ability to update the checkpoint data set.

Some installations find that faster processors are constrained by the inability to complete all of the requested JES functions during the time that the data set is "owned" by a processor. This causes poor response time and inability to fully utilize the processing power of the computer.

It is therefore an object of this invention to reduce I/O to a checkpoint data set and to increase the amount of time that a processor can update the checkpoint data set in order to speed up checkpoint I/O operations thereby increasing the performance of a multi-access spool complex.

It is also an object of this invention to improve communication among processors by speeding up the data set I/O operations.

A further object of the invention is to improve the reliability, availability and serviceability characteristics of the data set.

An object of the invention is to reduce the dependency of the processor executing the subsystem on I/O operations to the data set.

SUMMARY OF THE INVENTION

A method and apparatus for providing improved communication among processors in a multi-plexed data area computer complex is disclosed and claimed. The complex includes processors, main storage with program address spaces, at least one non-volatile storage device connected to the main storage, at least one subsystem in one of the program address spaces and shared data stored in fields in an area in main storage for each processor and program address space.

The improvement is directed to:

a to-be-read-from version of the shared data including a storage area, which has records for containing changed data fields, formatted on a storage device;

a to-be-written-to version of the shared data including another storage area, which also has records for containing changed data fields, formatted on a storage device; and

a program in one of the processors or address spaces that performs input/output operations on the versions when one of the processors solely controls and has access to the shared data. The versions then flip-flop following the performance of the input/output operations.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a schematic block diagram of the computer complex showing the dual data set configuration;

FIG. 1b is a schematic block diagram of the dual data set configuration showing the data set versions and each corresponding journal;

FIG. 2 shows the records in a journal in a data set version as well as other records in the data set version on a direct access storage device;

FIG. 3 shows the events of checkpoint cycle operation involving the dual data set configuration;

FIG. 4 shows the format of a data set to include a journal;

FIG. 5 shows the format of journal records;

FIG. 6 is a schematic diagram showing the records in a data set in storage;

FIGS. 7a-d show the progression of data set versions, update levels and in-storage data set copies in a four-processor complex during the operation of the checkpoint cycle;

FIG. 8 shows the events of the checkpoint cycle involving the dual data set configuration as each of three processors gains control of and has access to the data set verions;

FIG. 9 shows the update level changes to each data set version and to the in-storage copy of the data set for one processor during a checkpoint cycle operation;

FIGS. 10A-D shows the events that occur on the direct access storage device and in virtual storage during READ 1 processing, READ 2 processing, primary WRITE processing and intermediate and final WRITE processing during a checkpoint cycle operation;

FIG. 11 is a schematic block diagram of lists in the change area in virtual storage having entries that are updated each time a corresponding control block is updated by a processor in the complex;

FIG. 12a is a flow chart showing the READ 1 processing operation;

FIG. 12b is a flow chart showing the READ 2 processing operation;

FIG. 12c is a flow chart showing the primary, intermediate and final WRITE processing operation;

FIG. 13 is a schematic block diagram of the control blocks, channel command words and service routine provided for the subsystem in the virtual storage area;

FIG. 14 is a schematic block diagram that shows the mapping of the storage configuration of the logical areas of the data set by checkpoint information tables; and

FIG. 15A-D shows how a control byte corresponding to a 4K page that contains control blocks is changed as processor access to the data set changes during a checkpoint cycle operation.

DESCRIPTION OF THE PREFERRED EMBODIMENT

In accordance with the invention, a journal is implemented in order to reduce input/output (I/O) to a subsystem's shared data area (such as a job entry subsystem (JES) checkpoint data set) on a non-volatile shared storage device such as a direct access storage device (DASD), and to increase the amount of time that is available for data update access to the shared data in fields of main storage by a given member, i.e. a processor, of a multi-plexed data area computer complex as shown in FIG. 1a. To simplify the discussion of the preferred embodiment, the shared data area is a checkpoint data set, such as the one used by the job entry subsystem to provide for communication among processors in a multi-access spool computer complex, and the shared storage device is DASD. Multi-plexed data area means that each member owns the checkpoint at different times. The journal (or change log) is intended to alleviate the overall complex checkpoint constraint by increasing the amount of time that a member may "own" (or "hold" or "have control of and access to") the shared data, i.e. may actually be permitted to update the data residing in the checkpoint data set, relative to the time spent acquiring and relinquishing "ownership" of it. Most useful work can be accomplished only during periods that allow update access, i.e. when a member "owns" and controls the checkpoint data set, excluding the periods of time for acquiring and releasing data set "ownership". Also, in accordance with the invention, a dual data set configuration is implemented (created) to provide the necessary data integrity in case of a DASD or member failure while the checkpoint data set is being updated with the data in the journal. The dual data set configuration (10) is made up of two data sets or two versions (12 and 14) of the same data set on DASD as shown in FIG. 1a. One is a to-be-written-to version and the other is a to-be-read-from version and are further discussed below. There are three coherent copies of the data set, representing its three most recent update levels (discussed below), that can be reconstructed from the versions and associated journals. Two versions allow the writing of updated 4K records to take place at the time of a primary WRITE operation, i.e. during "useful time", rather than during a final WRITE (non-useful time). This increases the useful time available for updates to the checkpoint data set.

"Useful time", in the context of this invention, is the time during which one of the members in the complex is permitted to update the checkpoint data set, and thus to perform functions that require such updates. "Non-useful time" is defined herein to be the period of time during which no member in the complex "owns" the data set (all are "dormant") or a member "owns" the data set but is in the process of performing initial READ or final WRITE operations.

Also in accordance with the invention, the checkpoint data set is read and written in a way that shortens the length of time required for a checkpoint cycle to occur. Fewer bytes of data are read and written during a checkpoint cycle. A checkpoint cycle is defined to begin at the time a member acquires "ownership" and control of the data set. This includes the time that the member takes to read the checkpoint and "overwrite" journal updates in storage, as well as the time that the member takes to write the new "update level" of the data and to reqlinquish "ownership" of the data set. There is reduced I/O to the checkpoint data set during substantially all update accesses by a given member of the complex. Also, the dual data set configuration eliminates the need for a backup (or duplex) copy. Since performance is a primary consideration with respect to checkpoint I/O, the improvement disclosed and claimed herein increases performance substantially over the prior art. For example, in the prior art, if a two member complex had 1.8 million calls to the execute channel program service routine per week, then approximately 16.5 hours of channel time was used per week. Of these 16.5 hours, approximately 3.9 hours were non-useful time due to unnecessary rotations of DASD. This non-useful time has been substantially eliminated. There is no requirement that the versions reside on separate DASD devices, although this is recommended for increased performance.

Reduction in the checkpoint cycle time and I/O time is due, in large part, to the new journal portion of each version of the data set shown in FIGS. 1a and 1b. The journal (16) of version 1 (12) and the journal (18) of version 2 (14) are shown in FIG. 1b. As a result of implementing the journal, entire records (each 4096 bytes in length and otherwise referred to as 4K pages) will not be read or written during the checkpoint cycle. Instead, only the changed elements of each record, e.g. a control block or data field or data byte(s), will either be read from or written to the journal portion of the data set on DASD. The journal will be composed of journal records which, like the checkpoint records, are 4096 bytes in length. These records will contain only changed control blocks (each with an average size of less than 100 bytes) and identifying information. The implementation of versions of the checkpoint data set with journal, and the dual data set configuration, does not depend upon current or future DASD technology or features beyond that required for the prior art. FIG. 1b shows conceptually the checkpoint cycle (20) to include a journal (16, 18) in each version (12, 14) of the data set in accordance with the invention.

The main journal area (records which contain the journal data ) is located (formatted) on DASD track 1, immediately following the MASTER record, as shown in FIG. 2, so that it can be read or written in a single I/O operation thereby eliminating the rotational and head positioning delay incurred by the "reads" and "writes" of scattered 4K pages. That is, by placing the journal on the first (DASD) track, the DASD will rotate past the journal's records (22a, 22b, 22c) on the first track in the process of reading or writing records on that track. As shown in FIG. 2, additional journal records (22d, 22e) may be placed on track 2 (and subsequent tracks of the data set) if required. The journal substantially reduces the need to read and write to the checkpoint records that are potentially scattered across multiple tracks and, possibly, cylinders throughout the data set during the non-useful time. As a result, the journal provides for a greater proportion of useful checkpoint update time by a member in the complex. Furthermore, the journal reduces the number of bytes of data that must be read and written to DASD during non-useful time (over the prior art). Therefore, a further reduction in I/O time will be realized even if future DASD technology eliminates rotational delays.

FIGS. 2 and 4 show the checkpoint data set format (data records) in which a journal begins at the end of the MASTER record on track 1 (and may continue to other tracks). In FIGS. 2 and 4, a single block represents an non-keyed data record, and a double block represents a keyed record. The size of the journal is kept in the MASTER record (28), as is the "active" size of the journal. The active size of the journal is given in bytes and is divided by 4096 and rounded up to compute the number of journal records actually in use. The MASTER record also contains a byte of data information corresponding to each record in the checkpoint data set. These bytes are called control bytes (CTLBs) and identify the records in the checkpoint that have been changed or updated since a member relinquished ownership of the data set. Also there are control bytes called CLCBs residing in the subsystem virtual storage address space which identify the changed journal records. The first track of the checkpoint data set includes the CHECK record (30), the keyed LOCK record (32) and the MASTER record (28) as well as all or part of the journal. As shown in FIG. 4 the CHECK record (30) includes the data set name (dsn) and volume number (volser) of a version. The LOCK record key (32a) contains an indication of several new states ("me" "1v") associated with the configuration of the checkpoint data set in dual mode of operation. The LOCK record key is used as a means of preventing data set access in the case of a DASD hardware lock (RESERVE) failure and as a means of determining whether or not to read the entire contents of track 1. ("Ownership" of the authority to change the checkpoint data is controlled using the RESERVE/RELEASE feature of shared DASD. REVERSE/RELEASE also ensures the consistency of the checkpoint data by preventing concurrent updates.)

New records (22a, 22b, 22c) which contain the journal data are formatted onto track 1 following the MASTER record. FIGS. 2 and 4 show the records that are formatted onto the first track of a checkpoint data set in the order in which they are read. The format of journal records is shown in FIG. 5. Journal records are composed of updated control blocks (34, 36, 38, 40) preceded by identifying information in an address list entry (ALE). The control blocks can be variable in length and can span records as indicated by updated control block 38 in FIG. 5. The content of the ALE is identical to the content of the change log address list entry CALE in the change area in storage (which is discussed below).

As indicated above, the checkpoint data set is divided into two versions, CKPT1 and CKPT2, each having a corresponding journal (16, 18) as shown in FIG. 1b. In this embodiment, each version and its corresponding journal resides on a DASD volume or separate storage device. Each member of the complex (shown in FIG. 1a) retains a copy of a "state", i.e. update level, of the data set in the subsystem address space, in this case, the virtual storage address space belonging to JES. The update level of the checkpoint retained by any one member of the complex is based on the status of the data in the checkpoint during the most recent member ownership period. The integrity of the data set is maintained by either of the two versions because it is always possible to reconstruct at least two coherent update levels of the checkpoint from one of the versions and its corresponding journal, and either one or two coherent update levels of the checkpoint from the other versions. Furthermore, if both DASD versions are damaged, it is possible to dynamically allocate new versions and recover from the member's copy of the in-storage data set. The data set is converted to a more recent update level by "overwriting" the data in a version of the data set with the updated data in the corresponding journal. In other words, 4K pages in this version of the checkpoint data set are updated (to an update level) when a member overwrites the journaled changes to the journal's corresponding checkpoint data set version. This occurs immediately following the reading of the data in the previous update level of this version of the data set (on DASD) by a member in the complex. The actual "overwriting" is done to the I/O buffer (shown in FIGS. 10A-10D) whose changed records are written as the "new" update level of the data to the version of the checkpoint data set on DASD during the primary WRITE portion of the checkpoint cycle which is discussed below.

Further, in accordance with the invention, one version of the data set is used on an equal basis as the other version, with the members "flip-flopping" between the two versions (CKPT1 and CKPT2) on alternating checkpoint cycles. Flip-flopping is controlled by using appropriate fields in the data on track 1 of the versions as explained herein. Therefore, this mode of operation is referred to as "dual mode" and the versions are alternately referred to as the "to-be-read-from" and "to-be written-to" versions. That is, if a member uses CKPT1 as the to-be-written-to data set and CKPT2 as the to-be-read-from data set, then the next member to "own" the checkpoint data set will use CKPT1 as the to-be-read-from data set and CKPT2 as the to-be-written-to data set. In essence, the one data set that has the most recent updates will become the to-be-read-from data set. As is shown in FIG. 4 a level number (LEVEL) in the data area of the CHECK record (30) is used to indicate which data set version has the most recent updates, i.e. which data set is to-be-read-from. The other data set becomes the to-be-written-to version. The level number is incremented during each primary WRITE operation. (Primary WRITEs are described below.) The higher the number, the more recent the update. "Flip-flopping" between data set versions eliminates the need for a backup (duplex) copy of the data set.

CHECKPOINT CYCLE OPERATION

(See FIG.