|
|
|
| United States Patent | 5210866 |
| Link to this page | http://www.wikipatents.com/5210866.html |
| Inventor(s) | Milligan; Charles A. (Golden, CO);
Rudeseal; George A. (Boulder, CO);
Belsan; Jay S. (Nederland, CO) |
| Abstract | The parallel disk drive array data storage subsystem dynamically maps
between virtual and physical data storage devices and schedules the
writing of data to these devices. The data storage subsystem functions as
a conventional large form factor disk drive memory, using an array of
redundancy groups, each containing N+M disk drives. The data storage
subsystem does not modify data stored in a redundancy group but simply
writes the modified data as a new record in available memory space on
another redundancy group. The original data is flagged as obsolete. A
mapping table is maintained to identify portions of these redundancy
groups which contain newly written or modified virtual track instances.
These marked virtual track instances are written to backup medium as a
background process and the mapping table is updated to clear the flags
that identify these virtual track instances as having been modified. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5210866 |
|
|
Incremental disk backup system for a dynamically mapped data storage
subsystem |
|
|
|
|
|
| Publication Date |
May 11, 1993 |
|
|
|
|
|
| Filing Date |
September 12, 1990 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
I claim:
1. Apparatus for backing up data records in a dynamically mapped data
storage subsystem that stores data records for a host processor, said
apparatus comprising:
a plurality of data storage devices, a subset of said plurality of data
storage devices being configured into a plurality of redundancy groups,
each redundancy group consisting of at least two data storage devices;
means for writing a stream of data records received from said host
processor and redundancy data associated with said received stream of data
records in a first memory location in a selected one of said redundancy
groups;
means for storing a data record pointer indicative of said first memory
location used to store said received stream of data records;
cache memory means;
means, responsive to receipt from said host processor of a request for
access to one of said data records stored in said first memory location,
for writing said requested data record into said cache memory means;
means, responsive to receipt of modifications to said requested data record
received from said host processor, for writing said modified data record
in a second memory location in a selected one of said redundancy groups;
means, responsive to said writing means, for indicating in said data record
pointer said one data record stored in said first memory location as
obsolete;
means for storing a modified data record pointer indicative of said second
memory location of said written modified data record;
means, responsive to creation of a modified data record pointer, for
storing a copy of a data record whose identity is defined by said modified
data record pointer, comprising:
means for generating a duplicate data record pointer that identifies said
second memory location of said modified data record; and
means for storing said duplicate data record pointer as the identity of a
backup copy of said modified data record.
2. The apparatus of claim 1 wherein said data record copy storing means
further comprises:
backup memory means;
means for writing said modified data record, as identified by said
duplicate data record pointer, to said backup memory means.
3. The apparatus of claim 2 wherein said backup memory means comprises at
least one redundancy group in said dynamically mapped data storage
subsystem.
4. The apparatus of claim 2 further comprising:
means, responsive to said host processor requesting backup of an identified
modified data record, for activating said rewriting means to write said
identified modified data record into said backup memory means.
5. The apparatus of claim 2 further comprising:
means, responsive to said host processor requesting backup of said modified
data records, for activating said rewriting means to write said modified
data records into said backup memory means.
6. The apparatus of claim 1 wherein said data record copy storing means
includes:
means for writing said modified data record to a second memory location in
one of said redundancy groups;
means for generating a data record pointer that identifies said second
memory location in said one of said redundancy groups; and
means for storing said generated data record pointer as the identity of a
backup copy of said modified data record.
7. The apparatus of claim 8 wherein said data record copy storing means
further comprises:
backup memory means connected to said dynamically mapped data storage
subsystem;
means for writing said modified data record from said second location to
said backup memory means.
8. The apparatus of claim 7 further comprising:
means responsive to said writing means writing one of said modified data
records into said backup memory means for expunging the identity of said
rewritten modified data record from said storing means.
9. Apparatus for backing up files in a dynamically mapped memory system
that stores data records for a host processor, comprising:
a plurality of data storage devices, a subset of said plurality of data
storage devices being configured into a plurality of redundancy groups,
each redundancy group consisting of at least two data storage devices;
means for writing a stream of data records received from said host
processor and redundancy data associated with said received stream of data
records in a first memory location in a selected one of said redundancy
groups;
means for storing a data record pointer indicative of said first memory
location used to store said received stream of data records;
cache memory mans connected to and interconnecting said host processor and
said data storage devices for storing data records transmitted
therebetween;
backup memory means connected to said cache memory means for storing
modified data records which are stored in said dynamically mapped memory
system by said host processor;
means, responsive to receipt from said host processor of a request for
access to one of said data records stored in said first memory location,
for writing said requested data record into said cache memory means from
said first memory location;
means, responsive to modifications to said requested data record received
from said host processor for writing said modified data record from said
cache memory means into a second memory location in a selected one of said
redundancy groups;
means, responsive to said writing means, for indicating said one data
record stored in said first location in one of said redundancy groups as
obsolete;
means for storing a modified data record pointer indicative of said second
memory location of said written modified data record;
means, responsive to creation of a modified data record pointer, for
storing a copy of a data record whose identity is defined by said modified
data record pointer into said backup memory means.
10. The apparatus of claim 9 wherein said backup memory means comprises a
selected one of said redundancy groups, said data record copy storing
means includes:
means for generating a data record pointer that identifies said second
location of said modified data record in said redundancy groups; and
means for storing said generated data record pointer as the identity of a
backup copy of said modified data record.
11. The apparatus of claim 9 further including:
means, responsive to said rewriting means writing one of said modified data
records into said backup memory means, for expunging the identity of said
rewritten modified data record from said storing means.
12. The apparatus of claim 9 wherein said backup memory means comprises at
least one redundancy group in said dynamically mapped data storage
subsystem.
13. The apparatus of claim 9 further comprising:
means, responsive to said host processor requesting backup of an identified
modified data record, for activating said rewriting means to write said
identified modified data record into said backup memory means.
14. The apparatus of claim 9 further comprising:
means, responsive to said host processor requesting backup of said modified
data records, for activating said rewriting means to write said modified
data records into said backup memory means.
15. The apparatus of claim 9, wherein said backup memory means comprises at
least one tape drive, further comprising:
means connected to and interconnecting said cache memory means and said
tape drive for transferring data therebetween;
wherein said writing means includes:
means for staging said modified data record from said second location in a
selected one of said redundancy groups to said cache memory means,
means for transmitting said staged modified data record from said cache
memory means to said transferring means.
16. The apparatus of claim 15 wherein said writing means further includes:
means, responsive to the receipt of a data record backup command from said
host processor, for transmitting data to said host processor identifying
all modified data records whose identity is stored in said storing means.
17. The apparatus of claim 15 wherein said writing means further includes:
means, responsive to said transmitting means, for expunging said identity
of said staged modified data record from said storing means.
18. The apparatus of claim 9 wherein said backup memory means comprises a
tape drive connected to said data storage system, said writing means
includes:
means for generating a data record pointer that identifies said second
location of said modified data record in said redundancy groups; and
means for storing said data record pointer as the identity of a backup copy
of said modified data record.
19. The apparatus of claim 18 wherein said rewriting means further
includes:
means for writing said modified data record, as identified by said data
record pointer, to said backup memory means.
20. Apparatus for backing up files in a dynamically mapped data storage
subsystem that stores data records for at least one associated data
processor, said dynamically mapped data storage subsystem including a
plurality of data storage devices, a subset of said plurality of said data
storage devices configured into at least one redundancy group, each
redundancy group consisting of n+m data storage devices, where n and m are
both positive integers with n being greater than 1 and m being equal to or
greater than 1, and said data storage devices each including a like
plurality of physical tracks to form sets of physical tracks called
logical tracks, each logical track having one physical track at the same
relative address on each of said n+m data storage devices, for storing
data records thereon, said dynamically mapped data storage subsystem
generates m redundancy segments using n received streams of data records,
selects a first one of said logical tracks in one of said redundancy
groups, having at least one set of available physical tracks addressable
at the same relative address for each of said n+m data storage devices and
writes said n received streams of data records and said m redundancy
segments on said n+m data storage devices in said selected set of physical
tracks, each stream of data records and redundancy segments at said
selected available physical track on a respective one of said n+m data
storage devices, comprising:
cache memory means connected to and interconnecting said host processors
and said data storage devices for storing data records transmitted
therebetween;
backup memory means connected to said cache memory means for storing
modified data records which are stored in said dynamically mapped data
storage subsystem by said associated host processors;
means, responsive to the receipt from one of said host processors of a
request for access to one of said data records stored in said first
logical track of one of said redundancy groups, for writing said requested
data record into said cache memory means from said first logical track at
one physical track of one of said n+m data storage devices of one of said
redundancy groups;
means, responsive to modifications to said requested data record, from said
requesting host processor, for writing said modified data record from said
cache memory means into a second available logical track in a selected one
of said redundancy groups;
means responsive to said writing means for indicating said one data record
stored in said first logical track as obsolete;
means for storing data indicative of the location of said written modified
data record in said second logical track; and
means for rewriting at least one modified data record whose identity is
stored in said storing means into said backup memory means.
21. The apparatus of claim 20 wherein backup memory means comprises a
selected one of said redundancy groups, said rewriting means includes:
means for generating a data record pointer that identifies the physical
location of said modified data records in said redundancy groups; and
means for storing said data record pointer as the identity of a backup copy
of said modified data record.
22. The apparatus of claim 21 wherein said rewriting means further
includes:
means for writing said modified data record, as identified by said data
record pointer, to said backup memory means.
23. The apparatus of claim 20 further including:
means responsive to said rewriting means writing one of said modified data
records into said backup memory means for expunging the identity of said
rewritten modified data record from said storing means.
24. The apparatus of claim 20 wherein said backup memory means comprises at
least one redundancy group in said dynamically mapped data storage
subsystem.
25. The apparatus of claim 20 further including:
means, responsive to one of said associated host processors requesting
backup of an identified modified data record, for activating said
rewriting means to write said identified modified data record into said
backup memory means.
26. The apparatus of claim 20 further including:
means, responsive to one of said associated host processors requesting
backup of said modified data records, for activating said rewriting means
to write said modified data records into said backup memory means.
27. The apparatus of claim 20, wherein said backup memory means comprises
at least one tape drive, further comprising:
means connected to and interconnecting said cache memory means and said
tape drive for transferring data therebetween;
wherein said writing means includes:
means for staging said modified data record from said second logical track
to said cache memory means,
means for transmitting said staged modified data record from said cache
memory means to said transferring means.
28. The apparatus of claim 27 wherein said writing means further includes:
means, responsive to the receipt of a data record backup command from said
host processor, for transmitting data to said host processor identifying
all modified data records whose identity is stored in said storing means.
29. The apparatus of claim 27 wherein said writing means further includes:
means, responsive to said transmitting means, for expunging said identity
of said staged modified data record from said storing means.
30. The apparatus of claim 20 wherein said backup memory means comprises a
tape drive connected to said data storage system, said writing means
includes:
means for generating a data record pointer that identifies said second
location of said modified data record in said redundancy groups; and
means for storing said data record pointer as the identity of a backup copy
of said modified data record.
31. The apparatus of claim 30 wherein said rewriting means further
includes:
means for writing said modified data record, as identified by said data
record pointer, to said backup memory means. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
CROSS-REFERENCE TO RELATED APPLICATIONS
This patent application is related to application Ser. No. 07/443,933
entitled Data Record Copy Apparatus for a Virtual Memory System, filed
Nov. 30, 1989, application Ser. No. 07/443,895 entitled Data Record Move
Apparatus for a Virtual Memory System, filed Nov. 30, 1989 and application
Ser. No. 07/509,484 entitled Logical Track Write Scheduling System for a
Parallel Disk Drive Array Data Storage Subsystem, filed Apr. 16, 1990.
FIELD OF THE INVENTION
This invention relates to cached peripheral data storage subsystems with a
dynamically mapped architecture and, in particular, to a method for
performing incremental disk backups in this data storage subsystem.
PROBLEM
It is a problem in the field of data storage subsystems to efficiently
perform data backups. In data storage subsystems, a standard practice to
reliably store data therein is to produce a backup copy of the data that
is stored in the data storage subsystem and retain it on another
independently operating data storage subsystem or another location within
the data storage subsystem. The maintenance of dual copies of the data
insure that if one copy is inadvertently destroyed due to a failure of the
data storage subsystem or an error on the part of the system operators,
another copy of that data is available to the host processors. The backup
of a completely redundant copy of the data stored on the data storage
subsystem is an expensive proposition since this effectively doubles the
cost of storing data. One method of avoiding this cost is to backup only
selected volumes of the most critical data for the backup operation.
Another alternative is to store only data that has been modified since the
last backup operation, thereby retaining, on an incremental basis, an
exact copy of what is stored in the data storage subsystem. Both of these
alternative solutions provide a much more cost effective way of providing
reliable access to a reserve or backup copy of the data that is stored in
the data storage subsystem.
Standard data backup software that accomplishes the above stated functions
in the above stated manner are efficient because they use multi-track
operations and therefore, seeks and rotations on the disks are always kept
to a minimum. In a dynamically mapped subsystem, the data is spread
randomly among the various disks and the standard data backup programs are
therefore less efficient. There are presently no known efficient data
backup systems for dynamically mapped data storage subsystems.
SOLUTION
The above described problems are solved and a technical advance achieved in
the field by the incremental disk backup system for a dynamically mapped
data storage subsystem. The dynamically mapped data storage subsystem
consists of a parallel disk drive array data storage subsystem. The
parallel disk drive array switchably interconnects a plurality of disk
drives into redundancy groups that each contain n+m data and redundancy
disk drives. Data records received from the associated host processors are
written on logical tracks in a redundancy group that contains an empty
logical cylinder. When an associated host processor modifies data records
stored in a redundancy group, the data storage subsystem writes the
modified data records into empty logical cylinders instead of modifying
the data records at their present storage location. The modified data
records are collected in a cache memory until a sufficient number of
virtual tracks have been modified to write out an entire logical track,
whereupon the original data records are tagged as "obsolete". All logical
tracks of a single logical cylinder are thus written before any data is
scheduled to be written to a different logical cylinder. Therefore, a
mapping table is easily maintained in memory to indicate which of the
logical cylinders contained in the data storage subsystem contain modified
data records and which contain unmodified and obsolete data records. By
maintaining the memory map, the data storage subsystem can easily identify
which logical cylinders contained in the disk drive array contain modified
data records that require backup. This system then reads the mapping table
to locate logical cylinders containing modified data records that have not
been backed up and writes only these modified logical cylinders to the
backup medium. The backup medium can be a tape drive, optical disk with
removable platters or any other such data storage device. Once the logical
cylinders are backed up in this fashion, the mapping table is reset to
indicate that all of the data records contained therein have been backed
up.
BRIEF DESCRIPTION OF THE DRAWING
FIG. 1 illustrates in block diagram form the architecture of the parallel
disk drive array data storage subsystem;
FIG. 2 illustrates the cluster control of the data storage subsystem;
FIG. 3 illustrates the disk drive manager;
FIG. 4 illustrates the disk drive manager control circuit;
FIG. 5 illustrates the disk drive manager disk control electronics;
FIGS. 6 and 7 illustrate, in flow diagram form, the operational steps taken
to perform a data read and write operation, respectively;
FIG. 8 illustrates a typical free space directory used in the data storage
subsystem;
FIG. 9 illustrates the format of the virtual track directory;
FIGS. 10 and 11 illustrates, in flow diagram form, the free space
collection process;
FIGS. 12-15 illustrate in flow diagram form the incremental disk backup
process executed by the data storage subsystem;
FIG. 16 illustrates additional details of the tape drive control unit
interface;
FIGS. 17 and 18 illustrate in flow diagram form the immediate disk backup
process executed by the data storage subsystem; and
FIG. 14 illustrates a typical free space directory entry.
DETAILED DESCRIPTION OF THE DRAWING
The data storage subsystem of the present invention uses a plurality of
small form factor disk drives in place of a single large form factor disk
drive to implement an inexpensive, high performance, high reliability disk
drive memory that emulates the format and capability of large form factor
disk drives. This system avoids the parity update problem of the prior art
by never updating the parity. Instead, all new or modified data is written
on empty logical tracks and the old data is tagged as obsolete. The
resultant "holes" in the logical tracks caused by old data are removed by
a background free-space collection process that creates empty logical
tracks by collecting valid data into previously emptied logical tracks.
The plurality of disk drives in the parallel disk drive array data storage
subsystem are configured into a plurality of variable size redundancy
groups of N+M parallel connected disk drives to store data thereon. Each
redundancy group, also called a logical disk drive, is divided into a
number of logical cylinders, each containing i logical tracks, one logical
track for each of the i physical tracks contained in a cylinder of one
physical disk drive. Each logical track is comprised of N+M physical
tracks, one physical track from each disk drive in the redundancy group.
The N+M disk drives are used to store N data segments, one on each of N
physical tracks per logical track, and to store M redundancy segments, one
on each of M physical tracks per logical track in the redundancy group.
The N+M disk drives in a redundancy group have unsynchronized spindles and
loosely coupled actuators. The data is transferred to the disk drives via
independent reads and writes since all disk drives operate independently.
Furthermore, the M redundancy segments, for successive logical cylinders,
are distributed across all the disk drives in the redundancy group rather
than using dedicated redundancy disk drives. The redundancy segments are
distributed so that every actuator in a redundancy group is used to access
some of the data segments stored on the disk drives. If dedicated drives
were provided for redundancy segments, then these disk drives would be
inactive unless redundancy segments were being read from or written to
these drives. However, with distributed redundancy all actuators in a
redundancy group are available for data access. In addition, a pool of R
globally switchable spare disk drives is maintained in the data storage
subsystem to automatically substitute a replacement disk drive for a disk
drive in any redundancy group that fails during operation. The pool of R
spare disk drives provides high system reliability at low cost.
Each physical disk drive is designed so that it can detect a failure in its
operation, which allows the M redundancy segments per logical track to be
used for multi-bit error correction. Identification of the failed physical
disk drive provides information on the bit position of the errors in the
logical track and the redundancy data provides information to correct the
errors. Once a failed disk drive in a redundancy group is identified, a
backup disk drive from the shared pool of spare disk drives is
automatically switched in place of the failed disk drive. Control
circuitry reconstructs the data stored on each physical track of the
failed disk drive, using the remaining N-1 physical tracks of data plus
the associated M physical tracks containing redundancy segments of each
logical track. A failure in the redundancy segments does not require data
reconstruction, but necessitates regeneration of the redundancy
information. The reconstructed data is then written onto the substitute
disk drive. The use of spare disk drives increases the system reliability
of the N+M parallel disk drive architecture while the use of a shared pool
of spare disk drives minimizes the cost of providing the improved
reliability.
The parallel disk drive array data storage subsystem includes a data
storage management system that provides improved data storage and
retrieval performance by dynamically mapping between virtual and physical
data storage devices. The parallel diskY drive array data storage
subsystem consists of three abstract layers: virtual, logical and
physical. The virtual layer functions as a conventional large form factor
disk drive memory. The logical layer functions as an array of storage
units that are grouped into a plurality of redundancy groups, each
containing N+M physical disk drives. The physical layer functions as a
plurality of individual small form factor disk drives. The data storage
management system operates to effectuate the dynamic mapping of data among
these abstract layers and to control the allocation and management of the
actual space on the physical devices. These data storage management
functions are performed in a manner that renders the operation of the
parallel disk drive array data storage subsystem transparent to the host
processor which perceives only the virtual image of the disk drive array
data storage subsystem.
The performance of this system is enhanced by the use of a cache memory
with both volatile and nonvolatile portions and "backend" data staging and
destaging processes. Data received from the host processors are stored in
the cache memory in the form of modifications to data records already
stored in the redundancy groups of the data storage subsystem. No data
stored in a redundancy group is modified. A virtual track is staged from a
redundancy group into cache. The host then modifies some, perhaps all, of
the data records on the virtual track. Then, as determined by cache
replacement algorithms such as Least Recently Used, etc, the modified
virtual track is selected to be destaged to a redundancy group. When thus
selected, a virtual track is divided (marked off) into several physical
sectors to be stored on one or more physical tracks of one or more logical
tracks. A complete physical track may contain physical sectors from one or
more virtual tracks. Each physical track is combined with N-1 other
physical tracks to form the N data segments of a logical track.
The original, unmodified data is simply flagged as obsolete. Obviously, as
data is modified, the redundancy groups increasingly contain numerous
virtual tracks of obsolete data. The remaining valid virtual tracks in a
logical cylinder are read to the cache memory in a background "free space
collection" process. They are then written to a previously emptied logical
cylinder and the "collected" logical cylinder is tagged as being empty.
Thus, all redundancy data creation, writing and free space collection
occurs in background, rather than on-demand processes. This arrangement
avoids the parity update problem of existing disk array systems and
improves the response time versus access rate performance of the data
storage subsystem by transferring these overhead tasks to background
processes.
Therefore, a mapping table is maintained in memory to indicate which of the
logical cylinders contained in the data storage subsystem contain modified
data records and which contain obsolete and unmodified data records. By
maintaining the memory map, the data storage system can easily identify
which logical cylinders contained in the disk drive array contain modified
data records that require backup. This system then reads the mapping table
to locate logical cylinders containing modified data records that have not
been backed up and writes these modified logical cylinders to the backup
medium. Once the logical cylinders are backed up in this fashion, the
mapping table is reset to indicate that all of the data contained therein
has been backed up.
Data Storage Subsystem Architecture
FIG. 1 illustrates in block diagram form the architecture of the preferred
embodiment of the parallel disk drive array data storage subsystem 100.
The parallel disk drive array data storage subsystem 100 appears to the
associated host processors 11-12 to be a collection of large form factor
disk drives with their associated storage control, since the architecture
of parallel disk drive array data storage subsystem 100 is transparent to
the associated host processors 11-12. This parallel disk drive array data
storage subsystem 100 includes a plurality of disk drives (ex 122-1 to
125-r) located in a plurality of disk drive subsets 103-1 to 103-i. The
disk drives 122-1 to 125-r are significantly less expensive, even while
providing disk drives to store redundancy information and providing disk
drives for spare purposes, than the typical 14 inch form factor disk drive
with an associated backup disk drive. The plurality of disk drives 122-1
to 125-r are typically the commodity hard disk drives in the 51/4 inch
form factor.
The architecture illustrated in FIG. 1 is that of a plurality of host
processors 11-12 interconnected via the respective plurality of data
channels 21, 22-31, 32, respectively to a data storage subsystem 100 that
provides the backend data storage capacity for the host processors 11-12.
This basic configuration is well known in the data processing art. The
data storage subsystem 100 includes a control unit 101 that serves to
interconnect the subsets of disk drives 103-1 to 103-i and their
associated drive managers 102-1 to 102-i with the data channels 21-22,
31-32 that interconnect data storage subsystem 100 with the plurality of
host processors 11, 12.
Control unit 101 includes typically two cluster controls 111, 112 for
redundancy purposes. Within a cluster control 111 the multipath storage
director 110-0 provides a hardware interface to interconnect data channels
21, 31 to cluster control 111 contained in control unit 101. In this
respect, the multipath storage director 110-0 provides a hardware
interface to the associated data channels 21, 31 and provides a multiplex
function to enable any attached data channel ex-21 from any host processor
ex-11 to interconnect to a selected cluster control 111 within control
unit 101. The cluster control 111 itself provides a pair of storage paths
201-0, 201-1 which function as an interface to a plurality of optical
fiber backend channels 104. In addition, the cluster control 111 includes
a data compression function as well as a data routing function that
enables cluster control 111 to direct the transfer of data between a
selected data channel 21 and cache memory 113, and between cache memory
113 and one of the connected optical fiber backend channels 104. Control
unit 101 provides the major data storage subsystem control functions that
include the creation and regulation of data redundancy groups,
reconstruction of data for a failed disk drive, switching a spare disk
drive in place of a failed disk drive, data redundancy generation, logical
device space management, and virtual to logical device mapping. These
subsystem functions are discussed in further detail below.
Disk drive manager 102-1 interconnects the plurality of commodity disk
drives 122-1 to 125-r included in disk drive subset 103-1 with the
plurality of optical fiber backend channels 104. Disk drive manager 102-1
includes an input/output circuit 120 that provides a hardware interface to
interconnect the optical fiber backend channels 104 with the data paths
126 that serve control and drive circuits 121. Control and drive circuits
121 receive the data on conductors 126 from input/output circuit 120 and
convert the form and format of these signals as required by the associated
commodity disk drives in disk drive subset 103-1. In addition, control and
drive circuits 121 provide a control signalling interface to transfer
signals between the disk drive subset 103-1 and control unit 101. The data
that is written onto the disk drives in disk drive subset 103-1 consists
of data that is transmitted from an associated host processor 11 over data
channel 21 to one of cluster controls 111, 112 in control unit 101. The
data is written into, for example, cluster control 111 which stores the
data in cache 113. Cluster control 111 stores N physical tracks of data in
cache 113 and then generates M redundancy segments for error correction
purposes. Cluster control 111 then selects a subset of disk drives (122-1
to 122-n+m) to form a redundancy group to store the received data. Cluster
control 111 selects an empty logical track, consisting of N+M physical
tracks, in the selected redundancy group. Each of the N physical tracks of
the data are written onto one of N disk drives in the selected data
redundancy group. An additional M disk drives are used in the redundancy
group to store the M redundancy segments. The M redundancy segments
include error correction characters and data that can be used to verify
the integrity of the N physical tracks that are stored on the N disk
drives as well as to reconstruct one or more of the N physical tracks of
the data if that physical track were lost due to a failure of the disk
drive on which that physical track is stored.
Thus, data storage subsystem 100 can emulate one or more large form factor
disk drives (ex--an IBM 3380K type of disk drive) using a plurality of
smaller form factor disk drives while providing a high system reliability
capability by writing the data across a plurality of the smaller form
factor disk drives. A reliability improvement is also obtained by
providing a pool of R spare disk drives (125-1 to 125-r) that are
switchably interconnectable in place of a failed disk drive. Data
reconstruction is accomplished by the use of the M redundancy segments, so
that the data stored on the remaining functioning disk drives combined
with the redundancy information stored in the redundancy segments can be
used by control software in control unit 101 to reconstruct the data lost
when one or more of the plurality of disk drives in the redundancy group
fails (122-1 to 122-n+m). This arrangement provides a reliability
capability similar to that obtained by disk shadowing arrangements at a
significantly reduced cost over such an arrangement.
Disk Drive
Each of the disk drives 122-1 to 125-r in disk drive subset 103-1 can be
considered a disk subsystem that consists of a disk drive mechanism and
its surrounding control and interface circuitry. The disk drive consists
of a commodity disk drive which is a commercially available hard disk
drive of the type that typically is used in personal computers. A control
processor associated with the disk drive has control responsibility for
the entire disk drive and monitors all information routed over the various
serial data channels that connect each disk drive 122-1 to 125-r to
control and drive circuits 121. Any data transmitted to the disk drive
over these channels is stored in a corresponding interface buffer which is
connected via an associated serial data channel to a corresponding
serial/parallel converter circuit. A disk controller is also provided in
each disk drive to implement the low level electrical interface required
by the commodity disk drive. The commodity disk drive has an EDSI
interface which must be interfaced with control and drive circuits 121.
The disk controller provides this function. Disk controller provides
serialization and deserialization of data, CRC/ECC generation, checking
and correction and NRZ data encoding. The addressing information such as
the head select and other type of control signals are provided by control
and drive circuits 121 to commodity disk drive 122-1. This communication
path is also provided for diagnostic and control purposes. For example,
control and drive circuits 121 can power a commodity disk drive down when
the disk drive is in the standby mode. In this fashion, commodity disk
drive remains in an idle state until it is selected by control and drive
circuits 121.
Control Unit
FIG. 2 illustrates in block diagram form additional details of cluster
control 111. Multipath storage director 110 includes a plurality of
channel interface units 201-0 to 201-7, each of which terminates a
corresponding pair of data channels 21, 31. The control and data signals
received by the corresponding channel interface unit 201-0 are output on
either of the corresponding control and data buses 206-C, 206-D, or 207-C,
207-D, respectively, to either storage path 200-0 or storage path 200-1.
Thus, as can be seen from the structure of the cluster control 111
illustrated in FIG. 2, there is a significant amount of symmetry contained
therein. Storage path 200-0 is identical to storage path 200-1 and only
one of these is described herein. The multipath storage director 110 uses
two sets of data and control busses 206-D, C and 207-D, C to interconnect
each channel interface unit 201-0 to 201-7 with both storage path 200-0
and 200-1 so that the corresponding data channel 21 from the associated
host processor 11 can be switched via either storage path 200-0 or 200-1
to the plurality of optical fiber backend channels 104. Within storage
path 200-0 is contained a processor 204-0 that regulates the operation of
storage path 200-0. In addition, an optical device interface 205-0 is
provided to convert between the optical fiber signalling format of optical
fiber backend channels 104 and the metallic conductors contained within
storage path 200-0. Channel interface control 202-0 operates under control
of processor 204-0 to control the flow of data to and from cache memory
113 and the one of channel interface units 201 that is presently active
within storage path 200-0. The channel interface control 202-0 includes a
cyclic redundancy check (CRC) generator/checker to generate and check the
CRC bytes for the received data. The channel interface circuit 202-0 also
includes a buffer that compensates for speed mismatch between the data
transmission rate of the data channel 21 and the available data transfer
capability of the cache memory 113. The data that is received by the
channel interface control circuit 202-0 from a corresponding channel
interface circuit 201 is forwarded to the cache memory 113 via channel
data compression circuit 203-0. The channel data compression circuit 203-0
provides the necessary hardware and microcode to perform compression of
the channel data for the control unit 101 on a data write from the host
processor 11. It also performs the necessary decompression operation for
control unit 101 on a data read operation by the host processor 11.
As can be seen from the architecture illustrated in FIG. 2, all data
transfers between a host processor 11 and a redundancy group in the disk
drive subsets 103 are routed through cache memory 113. Control of cache
memory 113 is provided in control unit 101 by processor 204-0. The
functions provided by processor 204-0 include initialization of the cache
directory and other cache data structures, cache directory searching and
management, cache space management, cache performance improvement
algorithms as well as other cache control functions. In addition,
processor 204-0 creates the redundancy groups from the disk drives in disk
drive subsets 103 and maintains records of the status of those devices.
Processor 204-0 also causes the redundancy data across the N data disks in
a redundancy group to be generated within cache memory 113 and writes the
M segments of redundancy data onto the M redundancy disks in the
redundancy group. The functional software in processor 204-0 also manages
the mappings from virtual | | |