WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Apparatus and method for data storage and retrieval using bandwidth allocation    
United States Patent5583995   
Link to this pagehttp://www.wikipatents.com/5583995.html
Inventor(s)Gardner; Alan S. (Potomac, MD); McElrath; Rodney D. (Fairfax, VA); Harvey; Stephen L. (Port Haywood, VA)
AbstractAn apparatus and method is provided for allocating a data file across a plurality of media servers in a network, wherein each media server has associated therewith one or more levels of I/O devices organized in a hierarchical manner. An attempt is made to allocate the storage of data across the I/O devices in such a way that the bandwidth imposed on the devices when the data file is sequentially accessed will be balanced, and optimum use of I/O bandwidths at all points in the system is achieved. This balancing can be done by incorporating knowledge regarding various bottlenecks in the system into the decisionmaking process required for distributing the data blocks. The method and apparatus further allows bandwidths to be allocated to various clients in the system at the time a data file is opened. Various checks are provided at the time a data file is accessed to ensure that the data rates actually imposed by the requesting client do not exceed that requested by the client at the time the data file was opened. The invention allows for much more efficient use of the I/O resources in a system and ensures that a given configuration will be able to support client requests.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5583995
Apparatus and method for data storage and retrieval using bandwidth

     allocation - US Patent 5583995 Drawing
Apparatus and method for data storage and retrieval using bandwidth allocation
Inventor     Gardner; Alan S. (Potomac, MD); McElrath; Rodney D. (Fairfax, VA); Harvey; Stephen L. (Port Haywood, VA)
Owner/Assignee     MRJ, Inc. (Fairfax, VA)
Patent assignment
All assignments
Publication Date     December 10, 1996
Application Number     08/380,657
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     January 30, 1995
US Classification     709/219 709/231 711/114 711/148 725/87 725/92 725/115 725/116
Int'l Classification     H04N 007/173
Examiner     Kostak; Victor R.
Assistant Examiner     Flynn; Nathan J.
Attorney/Law Firm     Banner & Witcoff, Ltd.
Address
Parent Case    
Priority Data    
USPTO Field of Search     348/6 348/7 348/8 348/12 348/13 455/3.1 455/4.1 455/5.1 455/6.3 395/200.01 395/200.09 395/200.1 395/200.13 395/200.15 395/858 395/856 395/600 364/246 364/246.3 364/236.2
Patent Tags     data storage retrieval bandwidth allocation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5473362
Fitzgerald
725/92
Dec,1995

[0 after 0 votes]
5389963
Lepley

Feb,1995

[0 after 0 votes]
5371532
Gelman
725/88
Dec,1994

[0 after 0 votes]
5305438
MacKay
345/555
Apr,1994

[0 after 0 votes]
5237658
Walker
710/38
Aug,1993

[0 after 0 votes]
5166939
Jaffe

Nov,1992

[0 after 0 votes]
5132992
Yurt
375/240
Jul,1992

[0 after 0 votes]
5133079
Ballantyne
725/146
Jul,1992

[0 after 0 votes]
5093718
Hoarty
725/120
Mar,1992

[0 after 0 votes]
4920432
Eggers
386/96
Apr,1990

[0 after 0 votes]
5274645
Idleman
714/6
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method of storing a data file across a plurality of media servers in a network, each media server having a plurality of first-level I/O devices and a plurality of second-level I/O devices, each second-level I/O device being controlled by one of the first-level I/O devices, the method comprising the steps of:

(a) dividing the data file into a plurality of data blocks;

(b) allocating the plurality of data blocks across each of the plurality of media servers;

(c) allocating each of the data blocks allocated to each media server in step (b) to first-level I/O devices in that media server according to the bandwidth availability of the first-level I/O devices in that media server;

(d) allocating each of the data blocks allocated to each first-level I/O device in step (c) to second-level I/O devices controlled by that first-level I/O device in accordance with the bandwidth availability of the second-level I/O devices; and

(e) storing each of the data blocks onto the second-level I/O devices in accordance with the allocation made in step (d).

2. The method of claim 1, wherein step (c) comprises the step of using disk controllers as the first-level I/O devices, and wherein steps (d) and (e) each comprise the step of using disks as the second-level I/O devices.

3. The method of claim 1, wherein step (a) comprises the step of generating a plurality of error-correcting code (ECC) blocks for correcting errors in the plurality of data blocks, and wherein steps (b) through (e) collectively comprise the step of allocating and storing the plurality of ECC blocks in the same manner as the plurality of data blocks.

4. The method of claim 3, wherein step (a) comprises the step of generating a number of ECC blocks which is dependent on the number of media servers over which the data blocks are allocated.

5. The method of claim 4, wherein step (a) comprises the step of using a number of ECC blocks which is approximately equal to the number of data blocks divided by the number of media servers over which the data blocks are allocated minus one.

6. The method of claim 1, wherein step (b) comprises the step of allocating the data blocks across each media server in proportion to each media server's available bandwidth.

7. The method of claim 1, wherein steps (a) and (b) are performed in a media client coupled to the network, and wherein steps (c) through (e) are performed separately in each of the media servers.

8. The method of claim 1, further comprising the steps of:

(f) from a media client, requesting that the data file be opened at a particular bandwidth;

(g) determining whether the data file can be opened at the requested bandwidth on the basis of previous bandwidth allocations made for the media server across which the data file was previously stored; and

(h) responsive to a determination that the data file cannot be opened in step (g), denying the request to open the file.

9. The method of claim 1, further comprising the step of grouping the plurality of media servers into two or more teams, and wherein step (b) comprises the step of allocating the data blocks only to media servers within one of the two or more teams.

10. The method of claim 1, wherein step (c) comprises the step of using a bandwidth availability of the first-level I/O devices which is approximately:

MINIMUM{BW(a1), [BW(d1)+BW(d2)]}, where

BW(a1) is a bandwidth sustainable by a first-level I/O device a1,

BW(d1) is a bandwidth sustainable by a second-level I/O device d1, and

BW(d2) is a bandwidth sustainable by a second-level I/O device d2.

11. Apparatus for storing a data file across a plurality of media servers in a network, each media server having a plurality of first-level I/O devices and a plurality of second-level I/O devices, each second-level I/O device being controlled by one of the first-level I/O devices, the apparatus comprising:

means for dividing the data file into a plurality of data blocks;

means for allocating the plurality of data blocks across each of the plurality of media servers;

means for allocating each of the data blocks allocated to each media server to first-level I/O devices in that media server according to the bandwidth availability of the first-level I/O devices in that media server;

means for allocating each of the data blocks allocated to each first-level I/O device to second-level I/O devices controlled by that first-level I/O device in accordance with the bandwidth availability of the second-level I/O devices; and

means for storing each of the data blocks onto the second-level I/O devices in accordance with the allocations to the second-level I/O devices.

12. The apparatus of claim 11, wherein said first-level I/O devices each comprise a disk controller, and wherein said second-level I/O devices each comprise a disk.

13. The apparatus of claim 11, further comprising means for generating a plurality of error-correcting code (ECC) blocks for correcting errors in the plurality of data blocks, wherein the plurality of ECC blocks are allocated and stored in the same manner as the plurality of data blocks.

14. The apparatus of claim 13, wherein the number of error-correcting code (ECC) blocks which are generated is dependent on the number of media servers over which the data blocks are allocated.

15. The apparatus of claim 14, wherein the number of ECC blocks generated is approximately equal to the number of data blocks divided by the number of media servers over which the data blocks are allocated minus one.

16. The apparatus of claim 11, wherein the data blocks are allocated across each media server in proportion to each media server's available bandwidth.

17. The method of claim 11, wherein the means for dividing the data file and the means for allocating the data blocks across each of the plurality of media servers is located at a media client coupled to the network, and wherein the means for allocating each of the data blocks to first-level I/O devices is replicated in each of the media servers.

18. The apparatus of claim 11, further comprising:

means for, from a media client, requesting that the data file be opened at a particular bandwidth;

means for determining whether the data file can be opened at the requested bandwidth on the basis of previous bandwidth allocations made for the media servers across which the data file was previously stored; and

means, responsive to a determination that the data file cannot be opened at the requested bandwidth, denying the request to open the file.

19. The apparatus of claim 11, wherein the plurality of media servers are grouped into two or more teams, and wherein the data blocks are allocated only to media servers within one of the two or more teams.

20. The apparatus of claim 11, wherein the bandwidth availability of at least one of the first-level I/O devices which is approximately:

MINIMUM{BW(a1), [BW(d1)+BW(d2)]}, where

BW(a1) is a bandwidth sustainable by a first-level I/O device a1,

BW(d1) is a bandwidth sustainable by a second-level I/O device d1, and

BW(d2) is a bandwidth sustainable by a second-level I/O device d2.

21. A method of retrieving a data file previously stored across a plurality of media servers, the method comprising the steps of:

(a) from a media client, requesting that the data file be opened at a particular data bandwidth;

(b) determining whether enough data bandwidth remains, on the basis of allocations previously made for the media servers across which the data file was previously stored, to satisfy the request made in step (a);

(c) responsive to a determination in step (b) that sufficient bandwidth remains to satisfy the request, allocating the requested bandwidth toward the media servers across which the data file was previously stored; and

(d) retrieving data blocks from the data file.

22. The method of claim 21, further comprising the step of denying the request made in step (a) if the determination in step (b) is negative.

23. The method of claim 21, further comprising the steps of:

(e) periodically measuring the actual data bandwidth used in retrieving the data blocks in step (d); and

(f) responsive to a determination that the actual data bandwidth in step (e) has been exceeded, delaying one or more retrieval requests.

24. Apparatus for retrieving a data file previously stored across a plurality of media servers, comprising:

means for, from a media client, requesting that the data file be opened at a particular data bandwidth;

means for determining whether enough data bandwidth remains, on the basis of allocations previously made for the media servers across which the data file was previously stored, to satisfy the request;

means for, responsive to a determination that sufficient bandwidth remains to satisfy the request, allocating the requested bandwidth toward the media servers across which the data file was previously stored; and

means for retrieving data blocks from the data file.

25. The apparatus of claim 24, further comprising means for denying the request if the determination is negative.

26. The apparatus of claim 24, further comprising:

means for periodically measuring the actual data bandwidth used in retrieving the data blocks; and

means for, responsive to a determination that the actual data bandwidth has been exceeded, delaying one or more retrieval requests.

27. The apparatus of claim 6, wherein the means for periodically measuring the actual data bandwidth used comprises means for accumulating a bill for the media client based on the actual data bandwidth used.

28. A media server, comprising:

a plurality of disk devices for storing data;

a plurality of disk controllers, each of which controls one or more of the disk devices;

a network interface for interfacing with a network;

a client/server protocol, coupled to the network interface, for communicating with media clients on the network; and

means, responsive to a request received from one of the media clients on the network, for allocating a plurality of data file blocks across the plurality of disk devices in accordance with a predetermined bandwidth availability determination, such that the number of data file blocks stored on each disk is approximately proportional to that disk's contribution to the predetermined bandwidth availability determination.

29. A video-on-demand system for storing and retrieving a video data file, comprising:

plurality of media servers connected via a network, each media server comprising

a plurality of first-level I/O devices;

a plurality of second-level I/O devices each controlled by one of the first-level I/O devices;

means for allocating a plurality of video data blocks comprising portions of said video data tile to said first-level I/O devices according to the bandwidth availability of each of the first-level I/O devices;

means for further allocating each video data block allocated to each first-level I/O device to second-level I/O devices controlled by that first-level I/O device in accordance with the bandwidth availability of each of the second-level I/O devices; and

means for storing each of the video data blocks onto the second-level I/O devices in accordance with the allocations to the second-level I/O devices; and

a plurality of media clients connected to said network, each media client comprising

means for requesting that said video data file be opened at a particular bandwidth;

means tier determining whether said video data file can be opened at the requested bandwidth on the basis of previous bandwidth allocations made for all media servers across which said video data file was previously stored; and

means for, responsive to a determination that said video data file can be opened, reading said plurality of video data blocks from said plurality of media servers.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Technical Field

This invention relates generally to a multi-media data storage and retrieval system, such as a video-on-demand system, which can provide multiple data streams to one or more clients across a network. More particularly, the invention provides a system and method which allows a plurality of isochronous high bandwidth video data streams to be reliably and efficiently stored and retrieved from a plurality of media servers. In the multi-media data storage and retrieval technical field, video-on-demand represents the most technically demanding case.

2. Related Information

There is an increasing need to provide "video-on-demand" services for such applications as cable television and hotel services. Such applications require high-bandwidth, isochronous (time-guaranteed delivery) data streams over a sustained period of time with high reliability. As an example, hotels conventionally dedicate multiple analog video cassette recorders (VCRs) to provide "on-demand" movie viewing for hotel guests. Each VCR can play only one movie at a time, and thus the number of concurrent movies which can be played is limited by the number of VCRs. Not only is a VCR for each movie played required, but the hotel must estimate the number of copies of each movie required to satisfy a broad array of anticipated viewers. Additionally, the number of VCRs provided must be large enough to support the maximum number of viewers who will concurrently view different movies. This dedication of units and ancillary support is costly. Moreover, VCRs may be unavailable for periods of time during which tapes are rewound and the like.

It would be desirable to provide a video on-demand system which makes much more efficient use of data storage devices and their interconnections. However, newer digital video-on-demand systems are expensive, lack reliability, and generally require a wasteful dedication of storage and computing facilities. For example, bottlenecks in various points of such systems render large amounts of data bandwidth unusable. Thus, they suffer from some of the same problems as conventional analog systems.

Transmitting a single digital real-time video stream which is MPEG encoded creates a requirement to provide 200 kilobyte/second (KB/sec) data transfer rates for two or more hours. Higher quality video streams require rates as high as 1 megabyte per second (MB/sec) or higher. In a network in which multiple video streams must be provided, this creates a requirement to reliably provide many megabytes of isochronous data streams. The loss of a few hundred milliseconds of data will cause an unacceptable disruption in service.

Any video-on-demand system providing multiple data streams should preferably be able to detect and correct errors caused by failures or data aberrations (e.g., disk drive failures or parity errors). Thus, hardware and/or data redundancy and various error correcting schemes are needed to ensure data integrity and availability. However, the use of a "brute-force" disk mirroring scheme or other similarly unsophisticated method is unacceptably expensive in a video-on-demand system, because the amount of data storage could easily extend into terabytes of data, currently out of the price range for many applications. Finally, many conventional analog and newer digital systems do not "scale up" well in terms of bandwidth. Therefore, there remains a need for providing reliable, high-bandwidth video data storage and retrieval at a low price. Conventional systems have not effectively addressed these needs.

SUMMARY OF THE INVENTION

The present invention solves the aforementioned problems by distributing digital data such as video data across a plurality of media servers in a special manner, retrieving the data at one or more clients across the network, and correcting data errors in the client computers to offload processing from the media servers. The invention contemplates allocating a required data bandwidth across a plurality of media servers to provide a guaranteed data stream bandwidth to each client which requests it at delivery time. The distributed nature of the system allows it to scale up easily and avoids single points of failure, such that continued operation is possible even in the face of a media server failure or a disk failure.

Generally, a networked system includes a plurality of media servers each having a plurality of disk drives over which one or more data files are distributed. The principles of the invention superimpose a hierarchical analysis on the available bandwidths in the system in order to optimally balance bandwidth across the system at data retrieval time. This efficient balancing of I/O bandwidth eliminates "overkill" in computing resources and I/O resources which would otherwise wastefully allocate resources when they are not truly needed.

The invention further provides a means of implementing Redundant Arrays of Inexpensive Disk (RAID) techniques to distribute data blocks across a plurality of media servers so that the loss of a server is recoverable. Additionally, the invention provides a "client-centric" approach to retrieving data and recovering missing data, so that clients can regulate their own data rates as long as they do not exceed allocated bandwidths.

Other features and advantages of the invention will become apparent through the following detailed description and the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a system configured to employ the principles of the present invention, including two media clients C1 and C2, and three media servers MS1, MS2 and MS3.

FIG. 2 shows how the I/O resources of the media servers in FIG. 1 may be viewed for purposes of determining bandwidth availability.

FIG. 3 shows steps for determining the bandwidth availability for each of the media servers in FIG. 1.

FIG. 4A shows an illustrative example of determining the bandwidth availabilities for the configuration of FIG. 2, where each media server has the same availability.

FIG. 4B shows a further illustrative example of determining the bandwidth availabilities for the configuration of FIG. 2, where each media server has a different bandwidth availability.

FIG. 5 shows how file data blocks may be allocated among media servers, first-level I/O devices, and second-level I/O devices in the system.

FIG. 6 shows how media servers may be grouped into teams each having a "total available" and a "currently allocated" bandwidth.

FIG. 7 shows various steps which may be performed in order to regulate bandwidths actually used by clients which have requested them.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a system employing the principles of the invention. In FIG. 1, media servers MS1, MS2, and MS3 each are coupled to a network, which may for example comprise an Ethernet.TM., Fiber Distributed Data Interchange (FDDI), Asynchronous Transfer Mode (ATM), a Small Computer System Interface (SCSI) or any other network used for transporting data among the media servers.

Using media server MS1 as an example, each media server is coupled to the network through a network interface 1 and a client/server protocol 2, which may comprise a standard ISO protocol such as TCP/IP. Each media server further comprises a media server directory services manager 3, a media server disk block manager 4, and one or more disk drivers 5 which handle interaction with local disks d1 through d4.

As can be seen in FIG. 1, media server MS1 includes four disks d1 through d4, coupled through two disk controllers a1 and a2, where each disk controller controls two disks. Each disk controller may comprise a Small Computer System Interface (SCSI) controller, which includes a data bus allowing data to be transferred from more than one disk. Similarly, media server MS2 includes four disks d5 through d8, coupled through three disk controllers a3 through a5 in a configuration different from that of media server MS1. Finally, media server MS3 includes five disks d9 through d13 coupled through two disk controllers a6 and a7 in yet a different configuration.

Media clients C1 and C2 are also coupled to the network as illustrated in FIG. 1, and generally "consume" (i.e., retrieve over a period of time) data which is stored on media servers MS1 to MS3. As described in more detail herein, each data file, such as a movie comprising video and audio data, is preferably distributed across a plurality of media servers in accordance with RAID (Redundant Arrays of Inexpensive Disks) principles in order to prevent reliance on any one media server. Therefore, when a media client requests video data, the "scattered" data file is retrieved from a plurality of media servers, reassembled into the correct sequence, and late or missing blocks are reconstructed using error-correcting codes as required.

Referring by way of example to media client C1, each media client may comprise a network interface 6 and client/server protocol 7 which collectively allow each client to communicate with other nodes on the network. Each media client may further comprise a client disk block manager 8, a client directory services manager 9, and a media client application 10 which actually "consumes" the data. Each of these components will be described in more detail subsequently herein. It should be noted that the functions of the client directory services 9 and the media server directory services 3 generally perform similar functions, as do the client disk block manager 8 and the media server disk block manager 4, as discussed in more detail herein.

Each media client application may request video data, decompress it, and present it on a television screen for viewing, such as to display a movie for a hotel guest. The system may also be used as a UNIX.TM. Network File System, and each media client could thus comprise a host computer using the file system. It is contemplated that many different media clients and different applications can be used.

Each media server and media client may comprise a CPU, data bus, memory, and other peripherals, such as various off-the-shelf Intel-based units widely available (e.g., Intel 486-based products). The specific devices selected are not important to practice the principles of the present invention.

1. System Bandwidth Availability

Viewing each media server MS1 through MS3 as a resource which may be used to supply data in the system, each media server is able to provide data from its associated disks at a sustained data rate which is dependent on a number of factors.

First, each media server generally comprises a CPU, memory, internal data bus, and one or more network interfaces across which all data must generally flow when retrieved from the disks and supplied to the network. Thus, each media server can be viewed as a node having a maximum data bandwidth which cannot be exceeded on a sustained basis. In other words, regardless of the number of disks and controllers within the media server, it has a maximum output data rate which cannot be exceeded. The bandwidth of each media server can be determined empirically by attempting to retrieve large quantities of data at a sustained rate using various I/O configurations. (It will be noted that configurations are possible in which the computer itself presents essentially no bottleneck, and the invention is not intended to be limited in this respect). Each node is indicated in FIG. 1 with the designation "N" followed by a number (e.g., N1 is a node corresponding to the data "pipe" through which media server MS1 can supply data).

Second, each media server may comprise one or more disk controllers, each of which typically has a maximum sustainable data rate at which it can provide data retrieved from all the disks it controls. For example, SCSI controllers have a typical maximum sustainable data rate of about 4 megabytes per second (4 MB/sec), regardless of how many disks are controlled by that SCSI controller. Although media server MS1 has two disk controllers a1 and a2, the bandwidth of the media server as a whole may be lower than the combined bandwidth of the two controllers, because the node itself may have a lower sustainable data rate than that of the combined controllers. Typically, each SCSI controller can control up to 7 disks, although the invention is not limited in this respect and any type of disk controller or other I/O device can be used. For the sake of clarity, a discussion of separate SCSI "chains" has been omitted, it being understood that an I/O hierarchy may exist within each controller.

Third, each disk controller may control one or more disks, each disk having a maximum sustainable data rate at which it can provide data in read or write operations. Thus, if disk d1 and d2 each can read data at a sustained rate of 3 MB/sec, the maximum data available from these combined disks would be 6 MB/sec. However, since SCSI controller al could only sustain 4 MB/sec, the total usable bandwidth from these disks collectively would be limited to 4 MB/sec. And if node N1 has a maximum sustainable data rate of 3.5 MB/sec due to its internal configuration, then the entire configuration of media server MS1 as shown would be limited to that rate. The remaining unused disk bandwidth would effectively be wasted if one were attempting to maximize bandwidth usage in the system.

The present invention contemplates the storage of data onto each disk in a fixed block size, preferably large (e.g., 32 kilobytes (KB) per block), to minimize the number of read and write operations which must occur in order to access large quantities of data. One of ordinary skill in the art will recognize that variable block sizes may be used, or fixed block sizes of larger or smaller increments may be used according to specific design objectives.

The selection of block size can impact the effective sustained data rate obtainable from a particular disk. For example, if a block size of 32 KB is used, a single read operation would result in 32 KB of data being transferred. This would probably result in a higher effective data rate than if a block size of 1 KB were used, because in order to retrieve 32 KB of data with a 1 KB block size, 32 read operations would need to be performed, and each read operation incurs overhead costs in addition to the normal data transfer costs.

The foregoing considerations, which are factored into how a computer system should be configured, can be advantageously used to determine where and how data blocks should be distributed in the system for a particular data file in order to optimize data access and guarantee isochronous data streams for applications such as video-on-demand.

The inventors of the present invention have discovered that using a data storage and retrieval scheme which takes into account the bandwidth characterizations in a system such as that shown in FIG. 1 results in substantial increases in efficiency which can significantly reduce the number of devices which must be assembled to provide the large data storage capacity needed for video on demand and other applications. Such a scheme can also guarantee consumers of the stored data that they will be able to store and retrieve data at a specified bandwidth at delivery time, thus ensuring that an isochronous data stream can be provided. This is particularly important in a video-on-demand system, because different multiple isochronous video streams must be provided in varying configurations as movies are started, stopped, fast forwarded, and the like over the network. Other features and advantages of the invention will become apparent with reference to the other figures and the following description.

FIG. 2 shows how the computer I/O resources of media servers MS1 through MS3 in FIG. 1 can be viewed as a hierarchy of levels. For example, N1 in FIG. 2 corresponds to node N1 in FIG. 1. Below node N1 are two I/O levels, L1 and M1, corresponding to the disk controllers and disks, respectively, which are part of media server MS1. As can be seen in FIG. 2, disk controller a1 in level L1 controls two disks d1 and d2 which are in level M1. Similarly, disk controller a2 in level L1 controls disks d3 and d4 in level M1.

A similar hierarchy exists with respect to nodes N2 and N3 in FIG. 2. As discussed previously, the arrangement of these devices may be selected during system design in order to optimize data rates. For example, if each SCSI controller can support a sustained data rate of only 4 MB/sec, it would be wasteful to connect 7 disks each having a 2 MB/sec sustained rate to the one SCSI controller if one wanted to maximize available bandwidth. This is because the aggregate bandwidth available from 7 disks would be 14 MB/sec, but only 4 MB/sec could pass through each SCSI controller. Although the principles of the present invention generally seek to maximize bandwidths in the system (and would thus generally avoid such a configuration), other system designs may find such a configuration desirable because high sustained data rates may not be a primary objective in all designs.

It will be recognized that different types of disk controllers and disks may be combined in the system shown in FIG. 1. For example, a SCSI controller may control disks of varying capacities and speeds, and different types of controllers may co-exist in the same media server. Similarly, each media server may have a different CPU type or clock speed, different memory capacity, and the like. Therefore, the available bandwidths for retrieving data in the system may not be readily apparent.

FIG. 3 shows how the available (i.e., "usable") data bandwidth for each media server in the system of FIG. 1 may be determined. The notation AVAIL(x) in FIG. 3 refers to the available bandwidth of x, where x is either a level in the hierarchy shown in FIG. 2 or one of the devices in such a hierarchy. Beginning at step S301 in FIG. 3, the available bandwidth of disk controller al is determined to be the MINIMUM of the bandwidth of the disk controller itself, and the sum of the bandwidths of the disks controlled by that controller. For example, if disk controller al is a SCSI controller having a maximum sustainable data rate of 4 MB/sec, and each disk controlled by that controller has a maximum sustainable data rate of 2.5 MB/sec (considering such factors as block size discussed previously), then the available bandwidth through controller a1 is 4 MB/sec, the available bandwidth being limited by the controller itself.

Similarly, in step S302, the available bandwidth of disk controller a2 is determined to be the MINIMUM of the bandwidth of the disk controller itself, and the sum of the bandwidths of the disks controlled by that controller. For example, if disk controller a2 is also a SCSI controller having a maximum sustainable data rate of 4 MB/sec, and each disk controlled by that controller has a maximum sustainable data rate of 1.5 MB/sec (again considering such factors as block size discussed previously), then the available bandwidth through controller a2 is 3 MB/sec, the disks themselves being the limiting factor.

In step S303, the available bandwidth of node N1 is determined by taking the MINIMUM of the bandwidth of the node itself, and the sum of the available bandwidths of disk controllers a1 and a2. For example, if the bandwidth of node N1 is empirically determined to be 12 MB/sec, then the available bandwidth of the node is 7 MB/sec, because the disk controllers (4 MB/sec and 3 MB/sec, respectively, as determined above) are the limiting factors. The available bandwidth of node N1 is thus the total bandwidth which can be supplied by media server MS1.

Finally, in step S304, the available bandwidth of the entire system (i.e., the combined available bandwidth from all the media servers) may be determined in step S304 by adding the available bandwidths of each of the nodes, where steps for determining the available bandwidths for nodes N2 and N3 are omitted for brevity. It is assumed in this case that the network itself does not limit the available bandwidth; Ethernet has an approximate bandwidth of 10 megabits/sec so its use is not realistic in the example presented above. However, FDDI data rates can approach 100 megabits/sec, and an FDDI network would therefore be more suitable for the example given.

2. RAID Data Storage Principles

The present invention contemplates storing data in the system in a manner which allows it to be quickly and reliably retrieved. One known technology which facilitates these objectives is RAID (Redundant Arrays of Inexpensive Disks). As can be seen in FIG. 1, systems constructed according to the invention may comprise a plurality of disks, which may be inexpensive small disks. Large data storage capacity can be achieved by "bunching" many disks onto each computer in a hierarchical manner through one or more disk controllers.

RAID technology generally includes many different schemes for replicating data and using error-correcting codes to achieve data reliability and increased data rates. A RAID system can be generally described as one that combines two or more physical disk drives into a single logical drive in order to achieve data redundancy. Briefly, the following "levels" of RAID technology have been developed:

level 0: Data striping without parity. This means that data is spread out over multiple disks for speed. Successive read operations can operate nearly in parallel over multiple disks, rather than issuing successive read operations to the same disk.

level 1: Mirrored disk array. For every data disk there is a redundant twin. Also includes duplexing, the use of dual intelligent controllers for additional speed and reliability.

level 2: Bit interleaves data across arrays of disks, and reads using only whole sectors.

level 3: Parallel disk array. Data striping with dedicated parity drives. Drives are synchronized for efficiency in large parallel data transfers.

level 4: Independent disk array. Reads and writes on independent drives in the array with dedicated parity drive using sector-level interleave.

level 5: Independent disk array. Reads and writes data and parity across all disks with no dedicated parity drive. Allows parallel transfers. Multiple controllers can be used for higher speed. Usually loses only the equivalent of one drive for redundancy. One example of this type of system which is available is the Radion LT by Micropolis, Inc.

The present invention prefer