WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Incremental disk backup system for a dynamically mapped data storage subsystem    
United States Patent5210866   
Link to this pagehttp://www.wikipatents.com/5210866.html
Inventor(s)Milligan; Charles A. (Golden, CO); Rudeseal; George A. (Boulder, CO); Belsan; Jay S. (Nederland, CO)
AbstractThe parallel disk drive array data storage subsystem dynamically maps between virtual and physical data storage devices and schedules the writing of data to these devices. The data storage subsystem functions as a conventional large form factor disk drive memory, using an array of redundancy groups, each containing N+M disk drives. The data storage subsystem does not modify data stored in a redundancy group but simply writes the modified data as a new record in available memory space on another redundancy group. The original data is flagged as obsolete. A mapping table is maintained to identify portions of these redundancy groups which contain newly written or modified virtual track instances. These marked virtual track instances are written to backup medium as a background process and the mapping table is updated to clear the flags that identify these virtual track instances as having been modified.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5210866
Incremental disk backup system for a dynamically mapped data storage

     subsystem - US Patent 5210866 Drawing
Incremental disk backup system for a dynamically mapped data storage subsystem
Inventor     Milligan; Charles A. (Golden, CO); Rudeseal; George A. (Boulder, CO); Belsan; Jay S. (Nederland, CO)
Owner/Assignee     Storage Technology Corporation (Louisville, CO)
Patent assignment
All assignments
Publication Date     May 11, 1993
Application Number     07/582,260
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 12, 1990
US Classification     714/6
Int'l Classification     G06F 011/20
Examiner     Atkinson; Charles E.
Assistant Examiner    
Attorney/Law Firm     Duft, Graziano & Forest
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/575 371/10.1 364/285.1 364/944.3 364/943.91
Patent Tags     incremental disk backup dynamically mapped data storage subsystem
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5133065
Cheffetz
714/2
Jul,1992

[0 after 0 votes]
5089958
Horton

Feb,1992

[0 after 0 votes]
5043871
Nishigaki
707/202
Aug,1991

[0 after 0 votes]
4916605
Beardsley
711/162
Apr,1990

[0 after 0 votes]
4819154
Stiffler
714/20
Apr,1989

[0 after 0 votes]
4755928
Johnson
714/6
Jul,1988

[0 after 0 votes]
4467421
White
711/118
Aug,1984

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. Apparatus for backing up data records in a dynamically mapped data storage subsystem that stores data records for a host processor, said apparatus comprising:

a plurality of data storage devices, a subset of said plurality of data storage devices being configured into a plurality of redundancy groups, each redundancy group consisting of at least two data storage devices;

means for writing a stream of data records received from said host processor and redundancy data associated with said received stream of data records in a first memory location in a selected one of said redundancy groups;

means for storing a data record pointer indicative of said first memory location used to store said received stream of data records;

cache memory means;

means, responsive to receipt from said host processor of a request for access to one of said data records stored in said first memory location, for writing said requested data record into said cache memory means;

means, responsive to receipt of modifications to said requested data record received from said host processor, for writing said modified data record in a second memory location in a selected one of said redundancy groups;

means, responsive to said writing means, for indicating in said data record pointer said one data record stored in said first memory location as obsolete;

means for storing a modified data record pointer indicative of said second memory location of said written modified data record;

means, responsive to creation of a modified data record pointer, for storing a copy of a data record whose identity is defined by said modified data record pointer, comprising:

means for generating a duplicate data record pointer that identifies said second memory location of said modified data record; and

means for storing said duplicate data record pointer as the identity of a backup copy of said modified data record.

2. The apparatus of claim 1 wherein said data record copy storing means further comprises:

backup memory means;

means for writing said modified data record, as identified by said duplicate data record pointer, to said backup memory means.

3. The apparatus of claim 2 wherein said backup memory means comprises at least one redundancy group in said dynamically mapped data storage subsystem.

4. The apparatus of claim 2 further comprising:

means, responsive to said host processor requesting backup of an identified modified data record, for activating said rewriting means to write said identified modified data record into said backup memory means.

5. The apparatus of claim 2 further comprising:

means, responsive to said host processor requesting backup of said modified data records, for activating said rewriting means to write said modified data records into said backup memory means.

6. The apparatus of claim 1 wherein said data record copy storing means includes:

means for writing said modified data record to a second memory location in one of said redundancy groups;

means for generating a data record pointer that identifies said second memory location in said one of said redundancy groups; and

means for storing said generated data record pointer as the identity of a backup copy of said modified data record.

7. The apparatus of claim 8 wherein said data record copy storing means further comprises:

backup memory means connected to said dynamically mapped data storage subsystem;

means for writing said modified data record from said second location to said backup memory means.

8. The apparatus of claim 7 further comprising:

means responsive to said writing means writing one of said modified data records into said backup memory means for expunging the identity of said rewritten modified data record from said storing means.

9. Apparatus for backing up files in a dynamically mapped memory system that stores data records for a host processor, comprising:

a plurality of data storage devices, a subset of said plurality of data storage devices being configured into a plurality of redundancy groups, each redundancy group consisting of at least two data storage devices;

means for writing a stream of data records received from said host processor and redundancy data associated with said received stream of data records in a first memory location in a selected one of said redundancy groups;

means for storing a data record pointer indicative of said first memory location used to store said received stream of data records;

cache memory mans connected to and interconnecting said host processor and said data storage devices for storing data records transmitted therebetween;

backup memory means connected to said cache memory means for storing modified data records which are stored in said dynamically mapped memory system by said host processor;

means, responsive to receipt from said host processor of a request for access to one of said data records stored in said first memory location, for writing said requested data record into said cache memory means from said first memory location;

means, responsive to modifications to said requested data record received from said host processor for writing said modified data record from said cache memory means into a second memory location in a selected one of said redundancy groups;

means, responsive to said writing means, for indicating said one data record stored in said first location in one of said redundancy groups as obsolete;

means for storing a modified data record pointer indicative of said second memory location of said written modified data record;

means, responsive to creation of a modified data record pointer, for storing a copy of a data record whose identity is defined by said modified data record pointer into said backup memory means.

10. The apparatus of claim 9 wherein said backup memory means comprises a selected one of said redundancy groups, said data record copy storing means includes:

means for generating a data record pointer that identifies said second location of said modified data record in said redundancy groups; and

means for storing said generated data record pointer as the identity of a backup copy of said modified data record.

11. The apparatus of claim 9 further including:

means, responsive to said rewriting means writing one of said modified data records into said backup memory means, for expunging the identity of said rewritten modified data record from said storing means.

12. The apparatus of claim 9 wherein said backup memory means comprises at least one redundancy group in said dynamically mapped data storage subsystem.

13. The apparatus of claim 9 further comprising:

means, responsive to said host processor requesting backup of an identified modified data record, for activating said rewriting means to write said identified modified data record into said backup memory means.

14. The apparatus of claim 9 further comprising:

means, responsive to said host processor requesting backup of said modified data records, for activating said rewriting means to write said modified data records into said backup memory means.

15. The apparatus of claim 9, wherein said backup memory means comprises at least one tape drive, further comprising:

means connected to and interconnecting said cache memory means and said tape drive for transferring data therebetween;

wherein said writing means includes:

means for staging said modified data record from said second location in a selected one of said redundancy groups to said cache memory means,

means for transmitting said staged modified data record from said cache memory means to said transferring means.

16. The apparatus of claim 15 wherein said writing means further includes:

means, responsive to the receipt of a data record backup command from said host processor, for transmitting data to said host processor identifying all modified data records whose identity is stored in said storing means.

17. The apparatus of claim 15 wherein said writing means further includes:

means, responsive to said transmitting means, for expunging said identity of said staged modified data record from said storing means.

18. The apparatus of claim 9 wherein said backup memory means comprises a tape drive connected to said data storage system, said writing means includes:

means for generating a data record pointer that identifies said second location of said modified data record in said redundancy groups; and

means for storing said data record pointer as the identity of a backup copy of said modified data record.

19. The apparatus of claim 18 wherein said rewriting means further includes:

means for writing said modified data record, as identified by said data record pointer, to said backup memory means.

20. Apparatus for backing up files in a dynamically mapped data storage subsystem that stores data records for at least one associated data processor, said dynamically mapped data storage subsystem including a plurality of data storage devices, a subset of said plurality of said data storage devices configured into at least one redundancy group, each redundancy group consisting of n+m data storage devices, where n and m are both positive integers with n being greater than 1 and m being equal to or greater than 1, and said data storage devices each including a like plurality of physical tracks to form sets of physical tracks called logical tracks, each logical track having one physical track at the same relative address on each of said n+m data storage devices, for storing data records thereon, said dynamically mapped data storage subsystem generates m redundancy segments using n received streams of data records, selects a first one of said logical tracks in one of said redundancy groups, having at least one set of available physical tracks addressable at the same relative address for each of said n+m data storage devices and writes said n received streams of data records and said m redundancy segments on said n+m data storage devices in said selected set of physical tracks, each stream of data records and redundancy segments at said selected available physical track on a respective one of said n+m data storage devices, comprising:

cache memory means connected to and interconnecting said host processors and said data storage devices for storing data records transmitted therebetween;

backup memory means connected to said cache memory means for storing modified data records which are stored in said dynamically mapped data storage subsystem by said associated host processors;

means, responsive to the receipt from one of said host processors of a request for access to one of said data records stored in said first logical track of one of said redundancy groups, for writing said requested data record into said cache memory means from said first logical track at one physical track of one of said n+m data storage devices of one of said redundancy groups;

means, responsive to modifications to said requested data record, from said requesting host processor, for writing said modified data record from said cache memory means into a second available logical track in a selected one of said redundancy groups;

means responsive to said writing means for indicating said one data record stored in said first logical track as obsolete;

means for storing data indicative of the location of said written modified data record in said second logical track; and

means for rewriting at least one modified data record whose identity is stored in said storing means into said backup memory means.

21. The apparatus of claim 20 wherein backup memory means comprises a selected one of said redundancy groups, said rewriting means includes:

means for generating a data record pointer that identifies the physical location of said modified data records in said redundancy groups; and

means for storing said data record pointer as the identity of a backup copy of said modified data record.

22. The apparatus of claim 21 wherein said rewriting means further includes:

means for writing said modified data record, as identified by said data record pointer, to said backup memory means.

23. The apparatus of claim 20 further including:

means responsive to said rewriting means writing one of said modified data records into said backup memory means for expunging the identity of said rewritten modified data record from said storing means.

24. The apparatus of claim 20 wherein said backup memory means comprises at least one redundancy group in said dynamically mapped data storage subsystem.

25. The apparatus of claim 20 further including:

means, responsive to one of said associated host processors requesting backup of an identified modified data record, for activating said rewriting means to write said identified modified data record into said backup memory means.

26. The apparatus of claim 20 further including:

means, responsive to one of said associated host processors requesting backup of said modified data records, for activating said rewriting means to write said modified data records into said backup memory means.

27. The apparatus of claim 20, wherein said backup memory means comprises at least one tape drive, further comprising:

means connected to and interconnecting said cache memory means and said tape drive for transferring data therebetween;

wherein said writing means includes:

means for staging said modified data record from said second logical track to said cache memory means,

means for transmitting said staged modified data record from said cache memory means to said transferring means.

28. The apparatus of claim 27 wherein said writing means further includes:

means, responsive to the receipt of a data record backup command from said host processor, for transmitting data to said host processor identifying all modified data records whose identity is stored in said storing means.

29. The apparatus of claim 27 wherein said writing means further includes:

means, responsive to said transmitting means, for expunging said identity of said staged modified data record from said storing means.

30. The apparatus of claim 20 wherein said backup memory means comprises a tape drive connected to said data storage system, said writing means includes:

means for generating a data record pointer that identifies said second location of said modified data record in said redundancy groups; and

means for storing said data record pointer as the identity of a backup copy of said modified data record.

31. The apparatus of claim 30 wherein said rewriting means further includes:

means for writing said modified data record, as identified by said data record pointer, to said backup memory means.
 Description Submit all comments and votes
 


CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application is related to application Ser. No. 07/443,933 entitled Data Record Copy Apparatus for a Virtual Memory System, filed Nov. 30, 1989, application Ser. No. 07/443,895 entitled Data Record Move Apparatus for a Virtual Memory System, filed Nov. 30, 1989 and application Ser. No. 07/509,484 entitled Logical Track Write Scheduling System for a Parallel Disk Drive Array Data Storage Subsystem, filed Apr. 16, 1990.

FIELD OF THE INVENTION

This invention relates to cached peripheral data storage subsystems with a dynamically mapped architecture and, in particular, to a method for performing incremental disk backups in this data storage subsystem.

PROBLEM

It is a problem in the field of data storage subsystems to efficiently perform data backups. In data storage subsystems, a standard practice to reliably store data therein is to produce a backup copy of the data that is stored in the data storage subsystem and retain it on another independently operating data storage subsystem or another location within the data storage subsystem. The maintenance of dual copies of the data insure that if one copy is inadvertently destroyed due to a failure of the data storage subsystem or an error on the part of the system operators, another copy of that data is available to the host processors. The backup of a completely redundant copy of the data stored on the data storage subsystem is an expensive proposition since this effectively doubles the cost of storing data. One method of avoiding this cost is to backup only selected volumes of the most critical data for the backup operation. Another alternative is to store only data that has been modified since the last backup operation, thereby retaining, on an incremental basis, an exact copy of what is stored in the data storage subsystem. Both of these alternative solutions provide a much more cost effective way of providing reliable access to a reserve or backup copy of the data that is stored in the data storage subsystem.

Standard data backup software that accomplishes the above stated functions in the above stated manner are efficient because they use multi-track operations and therefore, seeks and rotations on the disks are always kept to a minimum. In a dynamically mapped subsystem, the data is spread randomly among the various disks and the standard data backup programs are therefore less efficient. There are presently no known efficient data backup systems for dynamically mapped data storage subsystems.

SOLUTION

The above described problems are solved and a technical advance achieved in the field by the incremental disk backup system for a dynamically mapped data storage subsystem. The dynamically mapped data storage subsystem consists of a parallel disk drive array data storage subsystem. The parallel disk drive array switchably interconnects a plurality of disk drives into redundancy groups that each contain n+m data and redundancy disk drives. Data records received from the associated host processors are written on logical tracks in a redundancy group that contains an empty logical cylinder. When an associated host processor modifies data records stored in a redundancy group, the data storage subsystem writes the modified data records into empty logical cylinders instead of modifying the data records at their present storage location. The modified data records are collected in a cache memory until a sufficient number of virtual tracks have been modified to write out an entire logical track, whereupon the original data records are tagged as "obsolete". All logical tracks of a single logical cylinder are thus written before any data is scheduled to be written to a different logical cylinder. Therefore, a mapping table is easily maintained in memory to indicate which of the logical cylinders contained in the data storage subsystem contain modified data records and which contain unmodified and obsolete data records. By maintaining the memory map, the data storage subsystem can easily identify which logical cylinders contained in the disk drive array contain modified data records that require backup. This system then reads the mapping table to locate logical cylinders containing modified data records that have not been backed up and writes only these modified logical cylinders to the backup medium. The backup medium can be a tape drive, optical disk with removable platters or any other such data storage device. Once the logical cylinders are backed up in this fashion, the mapping table is reset to indicate that all of the data records contained therein have been backed up.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 illustrates in block diagram form the architecture of the parallel disk drive array data storage subsystem;

FIG. 2 illustrates the cluster control of the data storage subsystem;

FIG. 3 illustrates the disk drive manager;

FIG. 4 illustrates the disk drive manager control circuit;

FIG. 5 illustrates the disk drive manager disk control electronics;

FIGS. 6 and 7 illustrate, in flow diagram form, the operational steps taken to perform a data read and write operation, respectively;

FIG. 8 illustrates a typical free space directory used in the data storage subsystem;

FIG. 9 illustrates the format of the virtual track directory;

FIGS. 10 and 11 illustrates, in flow diagram form, the free space collection process;

FIGS. 12-15 illustrate in flow diagram form the incremental disk backup process executed by the data storage subsystem;

FIG. 16 illustrates additional details of the tape drive control unit interface;

FIGS. 17 and 18 illustrate in flow diagram form the immediate disk backup process executed by the data storage subsystem; and

FIG. 14 illustrates a typical free space directory entry.

DETAILED DESCRIPTION OF THE DRAWING

The data storage subsystem of the present invention uses a plurality of small form factor disk drives in place of a single large form factor disk drive to implement an inexpensive, high performance, high reliability disk drive memory that emulates the format and capability of large form factor disk drives. This system avoids the parity update problem of the prior art by never updating the parity. Instead, all new or modified data is written on empty logical tracks and the old data is tagged as obsolete. The resultant "holes" in the logical tracks caused by old data are removed by a background free-space collection process that creates empty logical tracks by collecting valid data into previously emptied logical tracks.

The plurality of disk drives in the parallel disk drive array data storage subsystem are configured into a plurality of variable size redundancy groups of N+M parallel connected disk drives to store data thereon. Each redundancy group, also called a logical disk drive, is divided into a number of logical cylinders, each containing i logical tracks, one logical track for each of the i physical tracks contained in a cylinder of one physical disk drive. Each logical track is comprised of N+M physical tracks, one physical track from each disk drive in the redundancy group. The N+M disk drives are used to store N data segments, one on each of N physical tracks per logical track, and to store M redundancy segments, one on each of M physical tracks per logical track in the redundancy group. The N+M disk drives in a redundancy group have unsynchronized spindles and loosely coupled actuators. The data is transferred to the disk drives via independent reads and writes since all disk drives operate independently. Furthermore, the M redundancy segments, for successive logical cylinders, are distributed across all the disk drives in the redundancy group rather than using dedicated redundancy disk drives. The redundancy segments are distributed so that every actuator in a redundancy group is used to access some of the data segments stored on the disk drives. If dedicated drives were provided for redundancy segments, then these disk drives would be inactive unless redundancy segments were being read from or written to these drives. However, with distributed redundancy all actuators in a redundancy group are available for data access. In addition, a pool of R globally switchable spare disk drives is maintained in the data storage subsystem to automatically substitute a replacement disk drive for a disk drive in any redundancy group that fails during operation. The pool of R spare disk drives provides high system reliability at low cost.

Each physical disk drive is designed so that it can detect a failure in its operation, which allows the M redundancy segments per logical track to be used for multi-bit error correction. Identification of the failed physical disk drive provides information on the bit position of the errors in the logical track and the redundancy data provides information to correct the errors. Once a failed disk drive in a redundancy group is identified, a backup disk drive from the shared pool of spare disk drives is automatically switched in place of the failed disk drive. Control circuitry reconstructs the data stored on each physical track of the failed disk drive, using the remaining N-1 physical tracks of data plus the associated M physical tracks containing redundancy segments of each logical track. A failure in the redundancy segments does not require data reconstruction, but necessitates regeneration of the redundancy information. The reconstructed data is then written onto the substitute disk drive. The use of spare disk drives increases the system reliability of the N+M parallel disk drive architecture while the use of a shared pool of spare disk drives minimizes the cost of providing the improved reliability.

The parallel disk drive array data storage subsystem includes a data storage management system that provides improved data storage and retrieval performance by dynamically mapping between virtual and physical data storage devices. The parallel diskY drive array data storage subsystem consists of three abstract layers: virtual, logical and physical. The virtual layer functions as a conventional large form factor disk drive memory. The logical layer functions as an array of storage units that are grouped into a plurality of redundancy groups, each containing N+M physical disk drives. The physical layer functions as a plurality of individual small form factor disk drives. The data storage management system operates to effectuate the dynamic mapping of data among these abstract layers and to control the allocation and management of the actual space on the physical devices. These data storage management functions are performed in a manner that renders the operation of the parallel disk drive array data storage subsystem transparent to the host processor which perceives only the virtual image of the disk drive array data storage subsystem.

The performance of this system is enhanced by the use of a cache memory with both volatile and nonvolatile portions and "backend" data staging and destaging processes. Data received from the host processors are stored in the cache memory in the form of modifications to data records already stored in the redundancy groups of the data storage subsystem. No data stored in a redundancy group is modified. A virtual track is staged from a redundancy group into cache. The host then modifies some, perhaps all, of the data records on the virtual track. Then, as determined by cache replacement algorithms such as Least Recently Used, etc, the modified virtual track is selected to be destaged to a redundancy group. When thus selected, a virtual track is divided (marked off) into several physical sectors to be stored on one or more physical tracks of one or more logical tracks. A complete physical track may contain physical sectors from one or more virtual tracks. Each physical track is combined with N-1 other physical tracks to form the N data segments of a logical track.

The original, unmodified data is simply flagged as obsolete. Obviously, as data is modified, the redundancy groups increasingly contain numerous virtual tracks of obsolete data. The remaining valid virtual tracks in a logical cylinder are read to the cache memory in a background "free space collection" process. They are then written to a previously emptied logical cylinder and the "collected" logical cylinder is tagged as being empty. Thus, all redundancy data creation, writing and free space collection occurs in background, rather than on-demand processes. This arrangement avoids the parity update problem of existing disk array systems and improves the response time versus access rate performance of the data storage subsystem by transferring these overhead tasks to background processes.

Therefore, a mapping table is maintained in memory to indicate which of the logical cylinders contained in the data storage subsystem contain modified data records and which contain obsolete and unmodified data records. By maintaining the memory map, the data storage system can easily identify which logical cylinders contained in the disk drive array contain modified data records that require backup. This system then reads the mapping table to locate logical cylinders containing modified data records that have not been backed up and writes these modified logical cylinders to the backup medium. Once the logical cylinders are backed up in this fashion, the mapping table is reset to indicate that all of the data contained therein has been backed up.

Data Storage Subsystem Architecture

FIG. 1 illustrates in block diagram form the architecture of the preferred embodiment of the parallel disk drive array data storage subsystem 100. The parallel disk drive array data storage subsystem 100 appears to the associated host processors 11-12 to be a collection of large form factor disk drives with their associated storage control, since the architecture of parallel disk drive array data storage subsystem 100 is transparent to the associated host processors 11-12. This parallel disk drive array data storage subsystem 100 includes a plurality of disk drives (ex 122-1 to 125-r) located in a plurality of disk drive subsets 103-1 to 103-i. The disk drives 122-1 to 125-r are significantly less expensive, even while providing disk drives to store redundancy information and providing disk drives for spare purposes, than the typical 14 inch form factor disk drive with an associated backup disk drive. The plurality of disk drives 122-1 to 125-r are typically the commodity hard disk drives in the 51/4 inch form factor.

The architecture illustrated in FIG. 1 is that of a plurality of host processors 11-12 interconnected via the respective plurality of data channels 21, 22-31, 32, respectively to a data storage subsystem 100 that provides the backend data storage capacity for the host processors 11-12. This basic configuration is well known in the data processing art. The data storage subsystem 100 includes a control unit 101 that serves to interconnect the subsets of disk drives 103-1 to 103-i and their associated drive managers 102-1 to 102-i with the data channels 21-22, 31-32 that interconnect data storage subsystem 100 with the plurality of host processors 11, 12.

Control unit 101 includes typically two cluster controls 111, 112 for redundancy purposes. Within a cluster control 111 the multipath storage director 110-0 provides a hardware interface to interconnect data channels 21, 31 to cluster control 111 contained in control unit 101. In this respect, the multipath storage director 110-0 provides a hardware interface to the associated data channels 21, 31 and provides a multiplex function to enable any attached data channel ex-21 from any host processor ex-11 to interconnect to a selected cluster control 111 within control unit 101. The cluster control 111 itself provides a pair of storage paths 201-0, 201-1 which function as an interface to a plurality of optical fiber backend channels 104. In addition, the cluster control 111 includes a data compression function as well as a data routing function that enables cluster control 111 to direct the transfer of data between a selected data channel 21 and cache memory 113, and between cache memory 113 and one of the connected optical fiber backend channels 104. Control unit 101 provides the major data storage subsystem control functions that include the creation and regulation of data redundancy groups, reconstruction of data for a failed disk drive, switching a spare disk drive in place of a failed disk drive, data redundancy generation, logical device space management, and virtual to logical device mapping. These subsystem functions are discussed in further detail below.

Disk drive manager 102-1 interconnects the plurality of commodity disk drives 122-1 to 125-r included in disk drive subset 103-1 with the plurality of optical fiber backend channels 104. Disk drive manager 102-1 includes an input/output circuit 120 that provides a hardware interface to interconnect the optical fiber backend channels 104 with the data paths 126 that serve control and drive circuits 121. Control and drive circuits 121 receive the data on conductors 126 from input/output circuit 120 and convert the form and format of these signals as required by the associated commodity disk drives in disk drive subset 103-1. In addition, control and drive circuits 121 provide a control signalling interface to transfer signals between the disk drive subset 103-1 and control unit 101. The data that is written onto the disk drives in disk drive subset 103-1 consists of data that is transmitted from an associated host processor 11 over data channel 21 to one of cluster controls 111, 112 in control unit 101. The data is written into, for example, cluster control 111 which stores the data in cache 113. Cluster control 111 stores N physical tracks of data in cache 113 and then generates M redundancy segments for error correction purposes. Cluster control 111 then selects a subset of disk drives (122-1 to 122-n+m) to form a redundancy group to store the received data. Cluster control 111 selects an empty logical track, consisting of N+M physical tracks, in the selected redundancy group. Each of the N physical tracks of the data are written onto one of N disk drives in the selected data redundancy group. An additional M disk drives are used in the redundancy group to store the M redundancy segments. The M redundancy segments include error correction characters and data that can be used to verify the integrity of the N physical tracks that are stored on the N disk drives as well as to reconstruct one or more of the N physical tracks of the data if that physical track were lost due to a failure of the disk drive on which that physical track is stored.

Thus, data storage subsystem 100 can emulate one or more large form factor disk drives (ex--an IBM 3380K type of disk drive) using a plurality of smaller form factor disk drives while providing a high system reliability capability by writing the data across a plurality of the smaller form factor disk drives. A reliability improvement is also obtained by providing a pool of R spare disk drives (125-1 to 125-r) that are switchably interconnectable in place of a failed disk drive. Data reconstruction is accomplished by the use of the M redundancy segments, so that the data stored on the remaining functioning disk drives combined with the redundancy information stored in the redundancy segments can be used by control software in control unit 101 to reconstruct the data lost when one or more of the plurality of disk drives in the redundancy group fails (122-1 to 122-n+m). This arrangement provides a reliability capability similar to that obtained by disk shadowing arrangements at a significantly reduced cost over such an arrangement.

Disk Drive

Each of the disk drives 122-1 to 125-r in disk drive subset 103-1 can be considered a disk subsystem that consists of a disk drive mechanism and its surrounding control and interface circuitry. The disk drive consists of a commodity disk drive which is a commercially available hard disk drive of the type that typically is used in personal computers. A control processor associated with the disk drive has control responsibility for the entire disk drive and monitors all information routed over the various serial data channels that connect each disk drive 122-1 to 125-r to control and drive circuits 121. Any data transmitted to the disk drive over these channels is stored in a corresponding interface buffer which is connected via an associated serial data channel to a corresponding serial/parallel converter circuit. A disk controller is also provided in each disk drive to implement the low level electrical interface required by the commodity disk drive. The commodity disk drive has an EDSI interface which must be interfaced with control and drive circuits 121. The disk controller provides this function. Disk controller provides serialization and deserialization of data, CRC/ECC generation, checking and correction and NRZ data encoding. The addressing information such as the head select and other type of control signals are provided by control and drive circuits 121 to commodity disk drive 122-1. This communication path is also provided for diagnostic and control purposes. For example, control and drive circuits 121 can power a commodity disk drive down when the disk drive is in the standby mode. In this fashion, commodity disk drive remains in an idle state until it is selected by control and drive circuits 121.

Control Unit

FIG. 2 illustrates in block diagram form additional details of cluster control 111. Multipath storage director 110 includes a plurality of channel interface units 201-0 to 201-7, each of which terminates a corresponding pair of data channels 21, 31. The control and data signals received by the corresponding channel interface unit 201-0 are output on either of the corresponding control and data buses 206-C, 206-D, or 207-C, 207-D, respectively, to either storage path 200-0 or storage path 200-1. Thus, as can be seen from the structure of the cluster control 111 illustrated in FIG. 2, there is a significant amount of symmetry contained therein. Storage path 200-0 is identical to storage path 200-1 and only one of these is described herein. The multipath storage director 110 uses two sets of data and control busses 206-D, C and 207-D, C to interconnect each channel interface unit 201-0 to 201-7 with both storage path 200-0 and 200-1 so that the corresponding data channel 21 from the associated host processor 11 can be switched via either storage path 200-0 or 200-1 to the plurality of optical fiber backend channels 104. Within storage path 200-0 is contained a processor 204-0 that regulates the operation of storage path 200-0. In addition, an optical device interface 205-0 is provided to convert between the optical fiber signalling format of optical fiber backend channels 104 and the metallic conductors contained within storage path 200-0. Channel interface control 202-0 operates under control of processor 204-0 to control the flow of data to and from cache memory 113 and the one of channel interface units 201 that is presently active within storage path 200-0. The channel interface control 202-0 includes a cyclic redundancy check (CRC) generator/checker to generate and check the CRC bytes for the received data. The channel interface circuit 202-0 also includes a buffer that compensates for speed mismatch between the data transmission rate of the data channel 21 and the available data transfer capability of the cache memory 113. The data that is received by the channel interface control circuit 202-0 from a corresponding channel interface circuit 201 is forwarded to the cache memory 113 via channel data compression circuit 203-0. The channel data compression circuit 203-0 provides the necessary hardware and microcode to perform compression of the channel data for the control unit 101 on a data write from the host processor 11. It also performs the necessary decompression operation for control unit 101 on a data read operation by the host processor 11.

As can be seen from the architecture illustrated in FIG. 2, all data transfers between a host processor 11 and a redundancy group in the disk drive subsets 103 are routed through cache memory 113. Control of cache memory 113 is provided in control unit 101 by processor 204-0. The functions provided by processor 204-0 include initialization of the cache directory and other cache data structures, cache directory searching and management, cache space management, cache performance improvement algorithms as well as other cache control functions. In addition, processor 204-0 creates the redundancy groups from the disk drives in disk drive subsets 103 and maintains records of the status of those devices. Processor 204-0 also causes the redundancy data across the N data disks in a redundancy group to be generated within cache memory 113 and writes the M segments of redundancy data onto the M redundancy disks in the redundancy group. The functional software in processor 204-0 also manages the mappings from virtual