WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Intelligent storage manager for data storage apparatus having simulation capability    
United States Patent5325505   
Link to this pagehttp://www.wikipatents.com/5325505.html
Inventor(s)Hoffecker; John C. (Berthoud, CO); McNamara; Alan R. (MacGregor, AU); Schafer; Charles P. (Louisville, CO); Smith; Harry E. (Lakewood, CO); Walsh; Nathan E. (Boulder, CO)
AbstractThe intelligent storage manager includes a number of data bases which are used by the expert system software to manage the computer system data storage devices. One element provided in this apparatus is a set of data storage device configuration data that provides a description of the various data storage devices and their interconnection in the computer system. A second element is a knowledge data base that includes a set of functional rules that describe the data storage device management function. These rules indicate the operational characteristics of the various data storage devices and the steps that need to be taken to provide the various functions required to improve the performance of the computer system memory. In addition, various mathematical models are used to determine data relating to the operation of the data storage devices. These models not only assist in the identification of conflicts, but are also used to predict the effect of proposed conflict solutions. The models are also used to manage the addition of data storage devices to the computer system memory by identifying a plan to migrate existing data sets to this additional data storage devices.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5325505
Intelligent storage manager for data storage apparatus having simulation

     capability - US Patent 5325505 Drawing
Intelligent storage manager for data storage apparatus having simulation capability
Inventor     Hoffecker; John C. (Berthoud, CO); McNamara; Alan R. (MacGregor, AU); Schafer; Charles P. (Louisville, CO); Smith; Harry E. (Lakewood, CO); Walsh; Nathan E. (Boulder, CO)
Owner/Assignee     Storage Technology Corporation (Louisville, CO)
Patent assignment
All assignments
Publication Date     June 28, 1994
Application Number     07/755,018
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 4, 1991
US Classification     707/101 706/45 706/934 707/200
Int'l Classification     G06F 012/02
Examiner     Kulik; Paul V.
Assistant Examiner    
Attorney/Law Firm     Duft, Graziano & Forest
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/425 395/600 395/50 395/60 395/934 364/478
Patent Tags     intelligent storage manager data storage simulation capability
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5164909
Leonhardt
700/215
Nov,1992

[0 after 0 votes]
5131087
Warr
711/113
Jul,1992

[0 after 0 votes]
4928245
Moy
700/218
May,1990

[0 after 0 votes]
4771375
Beglin
711/111
Sep,1988

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method for dynamically reorganizing placement of data in memory of a computer system to improve retrievability of data stored therein comprising the steps of:

monitoring the operation of said computer system memory;

detecting memory performance conflicts in said computer system memory;

identifying datasets in said computer system memory that must be relocated to resolve said memory performance conflicts;

validating relocation of said identified datasets to resolve said memory performance conflicts, comprising:

storing information describing a configuration of said computer system memory,

storing a set of functional rules describing a data management function,

mathematically modeling at least a portion of said computer system memory, and

collecting performance data on said identified datasets to validate relocation of said identified datasets to resolve said memory performance conflicts.

2. The method of claim 1 further comprising the step of:

generating said information describing said configuration of said computer system memory.

3. The method of claim 1 wherein said step of detecting includes:

predicting potential performance conflicts in said computer system memory.

4. The method of claim 3 wherein said step of detecting further includes:

calculating statistical data from dataset read/write activity indicative of the frequency of usage and locale of the datasets stored in said computer system memory.

5. The method of claim 4, where said computer system memory includes a plurality of DASD units, said step of identifying includes:

listing the ones of said DASD units that are in greatest and least conflict with performance objectives.

6. The method of claim 5 wherein said step of identifying further includes:

selecting datasets from the ones of said DASD units listed as being in greatest conflict and from the ones of said DASD units listed as being in least conflict for exchange therebetween to balance the activity on these listed DASD units.

7. The method of claim 6 further comprising the step of:

exchanging said selected datasets between said listed DASD units to reduce the conflict on these listed DASD units.

8. The method of claim 4 where said computer system memory includes a plurality of DASD units and a cache memory, said step of identifying includes:

classifying all the datasets on a DASD unit that are good and bad candidates for relocation to said cache memory.

9. The method of claim 8 wherein said step of identifying further includes:

listing the ones of said datasets, classified as good candidates for relocation to said cache memory, that can be stored on one volume in said DASD unit.

10. The method of claim 9 further comprising the step of:

transporting said listed datasets to said one volume in said DASD unit to resolve said memory performance conflicts.

11. The method of claim 1 wherein said step of mode further includes:

generating data describing operational details of said configuration of said computer system memory.

12. The method of claim 1 further comprising the step of:

transporting said identified datasets to alternate memory storage locations to resolve said memory performance conflicts.

13. The method of claim 1 wherein said computer system memory further includes at least one data channel connecting said computer system to a control unit which manages a plurality of DASD units, said step of detecting includes:

monitoring channel, control unit, device and dataset read/write activity in said computer system memory.

14. The method of claim 1 wherein said step of modeling further includes:

storing a set of functional rules describing performance characteristics of data storage devices that comprise said computer system memory; and

storing a set of functional rules describing desired service levels for said data storage devices.

15. The method of claim 14 wherein said step of detecting includes:

predicting potential performance conflicts in said computer system memory; and

identifying at least one of said data storage devices subject to said predicted performance conflicts.

16. The method of claim 14 wherein said step of modeling further includes:

generating data describing operational details of said data storage devices in said computer system memory.

17. A system for dynamically reorganizing placement of data in memory of a computer system to improve retrievability of data stored therein comprising:

means for monitoring the operation of said computer system memory;

means, responsive to said monitoring means, for detecting memory performance conflicts in said computer system memory;

means, responsive to said detecting means, for identifying datasets in said computer system memory that must be relocated to resolve said memory performance conflicts;

means for validating relocation of said identified datasets to resolve said memory performance conflicts, comprising:

means for storing information describing a configuration of said computer system memory,

means for storing a set of functional rules describing a data management function,

means for mathematically modeling at least a portion of said computer system memory, and

means for collecting performance data on said identified datasets.

18. The system of claim 17 further comprising:

means for generating said information describing said configuration of said computer system memory.

19. The system of claim 17 wherein said detecting means includes:

means for predicting potential performance conflicts in said computer system memory.

20. The system of claim 19 wherein said detecting means further includes:

means for calculating statistical data from dataset read/write activity indicative of the frequency of usage and locale of the datasets stored in said computer system memory.

21. The system of claim 20, where said computer system memory includes a plurality of DASD units, said identifying means includes:

means for listing the ones of said DASD units that are in greatest and least conflict with performance objectives.

22. The system of claim 21 wherein said identifying means further includes:

means responsive to said listing means for selecting datasets from the ones of said DASD units listed as being in greatest conflict and from the ones of said DASD units listed as being in least conflict for exchange therebetween to balance the activity on these listed DASD units.

23. The system of claim 22 further comprising:

means responsive to said identifying means for exchanging said selected datasets between said listed DASD units to reduce the conflict on these listed DASD units.

24. The system of claim 20 where said computer system memory includes a plurality of DASD units and a cache memory, said identifying means includes:

means for classifying all the datasets on a DASD unit that are good and bad candidates for relocation to said cache memory.

25. The system of claim 24 wherein said identifying means further includes:

means for listing the ones of said datasets, classified as bad candidates for relocation to said cache memory, that can be stored on one volume in said DASD unit.

26. The system of claim 25 further comprising:

means responsive to said identifying means for transporting said listed datasets to said one volume in said DASD unit to resolve said memory performance conflicts.

27. The system of claim 17 wherein said modeling means further includes:

means for generating data describing operational details of said configuration of said computer system memory.

28. The system of claim 17 further comprising:

means responsive to said identifying means for transporting said identified datasets to alternate memory storage locations to resolve said memory performance conflicts.

29. The system of claim 17 wherein said computer system memory further includes at least one data channel connecting said computer system to a control unit which manages a plurality of DASD units, said detecting means includes:

means for monitoring channel, control unit, device and dataset read/write activity in said computer system memory.

30. The system of claim 17 wherein said modeling means further includes:

means for storing a set of functional rules describing performance characteristics of data storage devices that comprise said computer system memory; and

means for storing a set of functional rules describing desired service levels for said data storage devices.

31. The system of claim 30 wherein said detecting means includes:

means for predicting potential performance conflicts in said computer system memory; and

means for identifying at least one of said data storage devices subject to said predicted performance conflicts.

32. The system of claim 30 wherein said modeling means further includes:

means for generating data describing operational details of said data storage devices in said computer system memory.

33. A system for dynamically reorganizing placement of data in memory of a computer system to improve retrievability of data stored therein, wherein said computer system memory is a hierarchial memory including a cache memory and a plurality of DASD units, comprising:

means for monitoring the operation of said computer system memory;

means, responsive to said monitoring means, for detecting memory performance conflicts in both said cache memory and said DASD units in said computer system memory;

means, responsive to said detecting means, for identifying datasets in said computer system memory that must be relocated to resolve said memory performance conflicts;

means for validating relocation of said identified datasets to resolve said memory performance conflicts, comprising:

means for storing information describing a configuration of said computer system memory,

means for storing a set of functional rules describing a data management function,

means for mathematically modeling at least a portion of said computer system memory, and

means for collecting performance data on said identified datasets to validate a relocation of said identified datasets to resolve said memory performance conflicts.

34. The system of claim 33 further comprising:

means responsive to said identifying means for transporting said identified datasets to alternate memory storage locations to resolve said memory performance conflicts.

35. The system of claim 33 wherein said detecting means includes:

means for monitoring dataset read/write activity in said computer system memory;

means for calculating statistical data from said monitored dataset read/write activity indicative of the frequency of usage and locale of the datasets stored in said computer system memory.

36. The system of claim 33 wherein said identifying means includes:

means for listing the ones of said DASD units that are most and least utilized;

means responsive to said listing means for selecting datasets from the ones of said DASD units listed as most utilized and from the ones of said DASD units listed as least utilized for exchange therebetween to balance the activity on these listed DASD units.

37. The system of claim 36 further comprising:

means responsive to said identifying means for exchanging said selected datasets between said listed DASD units to balance the activity on these listed DASD units.

38. The system of claim 33 wherein said identifying means includes:

means for classifying all the datasets on a DASD unit that are good and bad candidates for relocation to said cache memory;

means for listing the ones of said datasets, classified as good candidates for relocation to said cache memory, that can be stored on one volume in said DASD unit.

39. The system of claim 38 further comprising:

means responsive to said identifying means for transporting said listed datasets to said one volume in said DASD unit to resolve said memory performance conflicts.

40. The system of claim 33 wherein said modeling means further includes:

means for storing a set of functional rules describing performance characteristics of each of said DASD units; and

means for storing a set of functional rules describing desired service levels for each of said DASD units.

41. The system of claim 40 wherein said detecting means includes:

means for predicting potential performance conflicts in said computer system memory; and

means for identifying at least one of said DASD units subject to said predicted performance conflicts.

42. The system of claim 40 wherein said modeling means further includes:

means for generating data describing operational details of said DASD units.

43. A system for dynamically reorganizing the placement of data in memory of a computer system to improve the retrievability of data stored therein, wherein said computer system memory is a hierarchial memory including a cache memory and a plurality of DASD units, comprising;

means for storing information describing the configuration of said computer system memory;

means for storing a set of functional rules describing a data management function;

means for mathematically modeling at least a portion of said computer system memory;

means for detecting memory performance conflicts in said computer system memory, including:

means for monitoring the dataset read/write activity in said computer system memory;

means for calculating statistical data from said monitored dataset read/write activity indicative of the frequency of usage and locale of the datasets stored in aid computer system memory;

means, responsive to said detecting means, for identifying datasets in said computer system memory that must be relocated to resolve said memory performance conflicts, including:

means for listing the ones of said DASD units that are most and least utilized;

means, responsive to said listing means, for selecting datasets form the ones of said DASD units listed as most utilized and from the ones of said DASD units listed as lest utilized for exchange therebetween to balance the activity on these listed DASD units;

means for classifying all the datasets on a DASD unit that are good and bad candidates for relocation to said cache memory;

means for listing the ones of said datasets, classified as good candidates for relocation to said cache memory, that can be stored on one volume in said DASD unit;

means, responsive to said identifying means, for exchanging said selected datasets between said listed DASD units to balance the activity on these listed DASD units;

means, responsive to said identifying means, for transporting said listed datasets to said one volume in said DASD unit to resolve said memory performance conflicts; and

means for writing said one volume from said DASD unit into said cache memory.

44. A method of improving the data retrieval efficiency of a computer system memory, wherein said computer system memory is a hierarchial memory including a cache memory and a plurality of DASD units, comprising the steps of:

recording information describing the configuration of said computer system memory;

storing a set of functional rules describing a computer system memory management function;

mathematically modeling at least a portion of said computer system memory;

monitoring the operation of both said cache memory and said DASD units;

detecting memory performance conflicts as a result of said monitoring;

identifying datasets in said computer system memory that must be relocated to resolve said memory performance conflicts;

transporting said identified datasets to alternate memory storage locations to resolve said identified memory performance conflicts.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

This invention relates to computer system memory and, in particular, to an intelligent storage manager for the data storage apparatus comprising the computer memory system. This intelligent storage manager both monitors the data retrieval efficiency of the computer system memory and either modifies data set storage locations, or system storage configurations, in response to memory performance conflicts detected in the computer system memory to optimize the level of service provided to the user.

PROBLEM

It is a problem in data processing systems to efficiently manage the storage of data sets on the computer system memory. Many large data processing systems include a hierarchy of data storage devices that are used by the computer systems contained therein. These data storage devices range from off line magnetic tape cartridge data storage systems to on line direct access storage devices which are directly connected to the computer system, cache memory and extended store (a variation of computer's main memory). Cache and extended store are fast on line memory for use with data sets that are frequently read by the computer system. It is difficult to dynamically allocate the location of the data sets in this hierarchal arrangement of data storage devices so that the frequency of data retrieval from these data sets matches the location and time wise retrieval efficiency of the data storage device. It is advantageous for computer system performance purposes to place the most frequently retrieved data sets in cache memory and the most infrequently retrieved data sets in the off line magnetic tape cartridge data storage systems.

From surveys done in the data processing field, in 1971 the median data processing system installation had 1.2 gigabytes of direct access data storage device capacity and these data storage devices had 450 data sets. In 1980, these numbers were 23 gigabytes of direct access data storage device capacity and 10,000 data sets. In 1988, surveys revealed that the typical or median installation had 208 gigabytes of direct access data storage device capacity which devices had 45,000 data sets stored therein. There is a continuing significant growth in the capacity of direct access data storage devices in the typical computer system. However, the utilization of these direct access data storage devices has fallen from 80% in the late 1960s to approximately 45% in the present time frame. Thus, there are significant increases in the cost of direct access data storage device memory with a decreasing efficiency in the use of these direct access data storage devices in the hierarchy of data storage devices in the computer memory.

It is a commonly used metric in the computer system field that in order to manage the computer system memory, it requires approximately 1 data management employee to manage every 10 gigabytes of data in a computer system installation. This data management person performs the functions of I/O optimization, data set conflict avoidance, conflict resolution, data set placement, and cache tuning in order to provide improved efficiency in the use of the direct access data storage devices in the computer system. Any improvement in the management of the data storage devices therefore reduces the number of data management employees required to manage the computer system memory. Newer data storage device architectures, e.g. distributed databases, client server, etc., present new dimensions to the storage performance management problem.

In addition to the problem of managing performance on a hierarchy of storage devices locally attached to a computer system (or to a computer complex consisting of multiple computer systems), network technologies and emerging computer system architectures have added two additional dimensions to the problem: geographical location and time. In a computer network environment, data can reside on data storage devices located in different geographic locations, e.g. Chicago and New York. Computer users frequently have service requirements which may require data to be made available on the local computer memory for a specific timeframe. A common example is that of a regional sales office which needs access to their client data for periodic update. Performance considerations may dictate that this data be stored locally during update, e.g. in Chicago, and then transferred to a central site, e.g. in New York City, for corporate processing.

This increased complexity of the data processing system and all the elements contained therein render the data storage device management task beyond the capability of data management personnel. These tasks must be automated and there presently exists no good solution to this problem.

SOLUTION

These problems are solved and a technical advance achieved in the art by the intelligent storage manager for data storage apparatus. This apparatus makes use of a knowledge based (expert) system to monitor in real time the performance of the computer system data storage devices, identify memory performance conflicts and resolve these conflicts by directing the relocation of data sets, or user workloads, to other segments of the computer system memory, and/or reconfiguring the computer system memory.

The subject apparatus includes a number of data bases which are used by the expert system software to manage the computer system data storage devices. One element provided in this apparatus is a set of data storage device configuration data that provides a description of the various data storage devices and their interconnection in the computer system. A second element is a knowledge data base that includes a set of functional rules that describe the data storage device management function. These rules indicate the operational characteristics of the various data storage devices and the steps that need to be taken to provide the various functions required to improve the performance of the computer system memory. In addition to these two data bases, this apparatus includes various mathematical models to determine data relating to the operation of the data storage devices. This additional data either cannot be directly measured or is not cost effective to measure. The additional data is used to assist in the identification of conflicts within the data storage apparatus. These models are also used to predict the effect of various data management actions and proposed solutions to identified conflicts. The models are also used to manage the addition of data storage devices to the computer system memory by identifying a plan to migrate existing data sets to these additional data storage devices in a manner to optimize the service level of the data processing system. Another element included in this intelligent storage manager is an automatic system configuration definition apparatus. This apparatus collects data generated within the data processing system to identify the data storage devices and their interconnection, without requiring user input. The determined configuration data is used to populate the configuration database.

Operating on these two data bases, in an interactive manner with the mathematical models, is an expert system, which is a computer program that uses explicitly represented knowledge and computational inference techniques to achieve a level of performance compatible to that of a human expert in that application area or domain. The expert system includes an inference engine, or technical equivalent, that executes the rules which comprise the basic intelligence of the expert system. The inference engine allows a virtually infinite number of rules or conditional statements to be chained together in a variety of ways. The inference engine manages the rules and the flow of data through those rules. Thus, the expert system uses the data stored in the configuration data base and the knowledge data base as well as data from monitoring of the actual performance of the data storage devices in the computer system and data derived from model to analyze, on a dynamic basis, the performance of the computer system data storage devices.

The expert system identifies memory performance conflicts such as a performance degradation of the computer system data storage devices due to a plurality of computers in the computer system attempting to access a common data storage device, or access to a given data set exceeding acceptable service levels. The expert system identifies the performance conflict as well as the data sets stored on these data storage devices related to this conflict. Once the data sets and user workload involved in the performance conflict are identified, the expert system determines alternative memory storage locations for these data sets and activates various software routines to transport these conflict data sets to the alternative data storage locations. Alternatively the expert system relocates the processing location of the user workload. The relocation of these conflict data sets and/or user workload resolves the memory performance conflict and improves the retrievability of the data stored on these data storage devices. By performing the conflict identification and resolution on a dynamic real time basis, the data storage devices of the computer system are operated in a more efficient manner and the retrievability of the data stored on these data storage devices is significantly improved without the need for the data management personnel. The intelligent storage manager continuously monitors and modifies the performance of the data storage devices associated with the computer system, even if the data storage devices are geographically dispersed. The intelligent storage manager identifies and resolves conflicts in a manner that accounts for the temporal, spatial and hierarchical characteristics of the data processing system.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 illustrates, in block diagram form, the architecture of the intelligent storage manager;

FIG. 2 illustrates in block diagram form, the architecture of the cache tuning and DASD performance optimizer elements of the intelligent storage manager;

FIGS. 3 and 4 illustrate the elements that comprise the cache tuning and DASD performance optimizer elements of the intelligent storage manager;

FIG. 5 illustrates an example of the objects used by the expert system model apparatus.

DETAILED DESCRIPTION

The intelligent storage manager for data storage apparatus functions to monitor the performance of a computer system memory and manage this computer system memory in order to maximize performance. This function is accomplished by the use of a knowledge based (expert) system that monitors the activity of the various data storage devices in the computer system memory, identifies memory performance conflicts and take steps to resolve these conflicts in order to improve the performance of the computer system memory. This apparatus includes a number of databases which are used by the expert system software to manage the data storage devices.

DATA PROCESSING SYSTEM ARCHITECTURE

FIG. 1 illustrates a typical computer system in block diagram form, which computer system includes the intelligent storage manager for data storage apparatus. This computer system consists of a mainframe 101 which can be any of a number of large computer systems such as an IBM 390 series, or an Amdahl 5990 computer. Each of these mainframe computers can process on the order of 10-20 MIPS per processor and are typically equipped with a significant amount of disk storage memory, typically on the order of 200 gigabytes of data storage capacity. Frequently such systems are configured into multi-system, multiple CPU complexes which share disk storage among the individual systems, and/or data across networks. For the purpose of illustration, assume that mainframe 101 is an IBM 3084 type of computer running an operating system 102 that typically is the MVS/XA operating system. These products are well known in the field and need no further explanation herein. Mainframe 101 is interconnected via a plurality of channel processors 110-119 to the data storage devices associated with mainframe 101. For the purpose of illustration, a number of disk storage devices 150-0 to 150-15 are illustrated in FIG. 1. Each of these disk drives are a commercially available device such as the IBM 3390 class device disk drive system or any other ones of the multitude of similar devices. In a typical configuration, sixteen disk drives 150-0 to 150-15 are connected via associated data channels 140-0, 140-15 to a device controller 130 which is typically an IBM 3990 class device type of controller. Device controller 130 functions to multiplex the data being read to and from the plurality of disk drives 150-0 to 150-15 onto a channel path 120 which interconnects device controller 130 with channel processor 110. Device controller 130 includes a cache memory 160, including a non-volatile store 161 to provide greater data transfer efficiency between mainframe 101 and the data storage devices 150-*, as is well-known in disk technology.

The computer system can be equipped with a plurality of channel processors illustrated in FIG. 1 as devices 110 to 119. Each channel processor 110 is interconnected with mainframe 101 and provides mainframe 101 with access to data storage devices connected to the channel processor. A multitude of data storage devices can be connected to mainframe 101 in this manner. For example, channel processor 119 is connected to a disk array 139 data storage device. The other data storage devices include tape drives, optical disk, solid state disk and other memory devices. While this block diagram illustrates a general computer system configuration, it in no way should be construed as limiting the applicability of the present invention. In particular, various memory configurations can be used to interconnect mainframe 101 with various data storage devices. In addition, the data processing system can include a plurality of computer systems, interconnected via network 191. The intelligent storage manager for data storage apparatus 103 is included in mainframe 101 and operator terminal 170 is connected via data link 190 to mainframe 101.

MEMORY CONFIGURATION AND MANAGEMENT

From a data management standpoint, the computer memory illustrated in FIG. 1 is viewed as a data storage subsystem 180 that consists of sixteen volumes of disk storage 150-0 to 150-15. Each volume of this data storage subsystem contains a plurality of data sets or data files. These data sets or data files are stored on one or more corresponding volumes in an ordered fashion by dividing the volume up into sub-elements. For example, in the 8380E disk drive, each volume consists of 1770 cylinders each of which includes fifteen data tracks. Each data track can store 47K bytes of data.

In operation, mainframe 101 accesses data stored on the data volumes in data storage subsystem 180 by transmitting a data retrieval request via the channel processor 110 associated with the data volume (e.g. - 150-0) that contains the requested data. This data retrieval request is transmitted by channel processor 110 over channel path 120 to device controller 130, which in turn manages the retrieval of the data from the designated data storage device 150-0. This data retrieval is implemented by device controller 130 monitoring the rotation of disk drive 150-0 to identify the beginning of the data set stored on a track of the disk drive 150-0. Once the disk has rotated sufficiently to place the beginning of the requested data set under the read/write heads of the disk drive 150-0, the data is transferred from disk drive 150-0 over data link 140-0 to device controller 130. Device controller 130 temporarily stores the retrieved data in cache memory 160 for transmission to mainframe 101 via channel path 120 and channel processor 110.

It is obvious that this is an asynchronous data transfer in that mainframe 101 transmits data retrieval requests to various device controllers, where these data retrieval requests are processed as the data sets become available. If too many data retrieval requests are made via one route, such as concentrating the requests on a specific volume or small subset of volumes, the performance of the computer system memory significantly degrades. Likewise if the inherent performance characteristics of the device are inadequate for the service level required for a given data set by a given user, a performance conflict exists. It is advantageous to distribute the data retrieval requests throughout the various channel processors, channel paths and disk drives to optimize the service time. Thus, an intelligent distribution of the data retrieval requests among the various volumes on each data storage subsystem and across all of the data storage subsystems significantly improves the memory performance. There are many choices available to improve memory performance such as moving data sets from one volume to another or moving a volume from one data storage subsystem to another, etc. It should also be noted that for purposes of this patent disclosure, the terms "string" and "subsystem" can be used interchangeably.

Another such improvement is the use of a cache memory, such as cache 160, which enables the more frequently accessed volumes to load data in cache memory 160 for more rapid data retrieval times. The volumes are selected as being suitable for caching based on the normal access pattern of the data sets that are stored therein. Any input and output accesses to the data that use the cache memory 160 benefit from the cache memory 160 by speeding up the data transfer time. In order to preserve data integrity, any writes that are performed cause the data to be written directly to the disk, or to a form of cache commonly called "non-volatile storage" 161 and therefore these operations gain benefit from a cache memory 160 depending on the specifics of each case. A good volume for caching is one that has a high read rate, high activity and input/output accesses concentrated in a small number of tracks rather than scattered across the entire disk. Volumes that satisfy these conditions have a high hit ratio and therefore the input/output reads often find the desired track in cache 160 and do not have to spend the time retrieving the data set from the disk as described above. Often, data storage subsystems contain no volumes that are cachable or the activity of volumes that are cachable accounts for only a fraction of the total input/output activity of the data storage subsystem. Newer storage technologies, such as disk array, data "farms", data servers, or single level storage, are represented by the disk drive array 139. These technologies present a functional image of DASD volumes to the channel 119. The physical storage of the data may vary considerably from the traditional DASD storage architecture, frequently involving a hierarchy of devices with varying performance characteristics. The performance management of these types od devices is the same as managing that of the mainframe 101. In fact the controllers for these types of devices are typically powerful computer processors. The central controller for such a device should be viewed as a mainframe 101 with attached physical storage devices.

CACHE PERFORMANCE IMPROVEMENT

In order to gain full benefit of the cache, it is possible to reorganize the data storage subsystem 180 in such a way as to increase the amount of input/output accesses that go to specific cache volumes. A significant problem is to reorganize the data storage subsystem 180 on a data set level so as to separate the possible caching data sets from data sets that would degrade the cache performance and, in the process, create entire volumes that are suitable for storage in the cache memory 160. It is obvious that restructuring the data set organization can entail a complete restructuring of the volumes of the data storage subsystem 180 which involves an enormous number of data set moves to accomplish. Instead of a complete reorganization of the data storage subsystem 180, a better approach is to identify places on the subsystem where conflicts exist between possible cache data and data that is detrimental to the cache 160. The conflict exists where there is good cache data and bad cache data on the same volume: if the volume where cached in order to obtain the benefit of caching the good data, the bad data would cause so much interference in the cache 160 that the overall performance would actually deteriorate. The intelligent storage manager locates conflicts and then moves the smallest number of data sets in order to resolve the conflict. Using this approach, a small number of moves provides a large increase in performance and better utilization of the cache memory 160.

In order to determine which data sets are possible candidates for caching and which data sets should not be cached, the input/output activity of the data storage subsystem 180 is monitored over a period of time and data is recorded on a data set basis. This stored data consists of the input/output activity (the number of input and output operation to the data set), the read percentage (percentage of operations that were reads), and the locality of reference measure, which is defined as the number of input and output to the data set divided by the number of unique tracks covered by those inputs and outputs over the period monitored. Along with these numbers, is stored the name of the data set, its size in bytes and a value called the reason code that flags data sets that are not suitable for moving. This data is used by the intelligent storage manager in identifying the data sets that are to be moved and the target location for these data sets.

INTELLIGENT STORAGE MANAGER FOR DATA STORAGE APPARATUS

The intelligent storage manager consists of a set of integrated data storage management functions that use knowledge or expert system techniques to monitor and automatically assist in the control of performance of the direct access data storage sub-systems. FIG. 2 illustrates the principal components of the intelligent storage manager 103. Intelligent storage manager 103 interfaces with and runs on the MVS operating system 102 of mainframe 101. In multiple system complexes, at least the monitoring activity 171 must be running on all systems. Data collected from all systems is processed on one of the systems by the complete cache volume creation 105 and DASD performance optimizer 106. The MVS operating system 102 provides device management, file management, test management, processor management, communication management and a number of other administrative functions to the intelligent storage manager 103.

In order to clarify terminology, the various aspects of knowledge systems are discussed. Knowledge systems are computer based systems that emulate human reasoning by using an inference engine or technical equivalent to interpret the encoded knowledge of human experts that is stored in a knowledge base. If the domain of the knowledge base is sufficiently narrow and a sufficiently large body of knowledge is properly coded in the knowledge base, then the knowledge system can achieve performance that matches or exceeds the ability of a human expert. In such a case, the knowledge system becomes an expert system.

One step in building an expert system is obtaining and encoding the collective knowledge of human experts into the machine readable expert system language. The specific implementation details of this encoding step is largely a function of the particular syntax of the expert system programming language selected. As an example one such expert system programming language is PROLOG which is described in the text "Programming In Prolog" by W. F. Clocksin and C. S. Mellish, Springer Verlag Inc., New York, N.Y. (1981). Basically, the expert system contains a set of rules and instructions on how the rules are to be applied to the available facts to solve a specific problem.

In the intelligent storage manager 103, expert system techniques are used to monitor the performance of the computer system memory. The Configuration module 172 is a key element of the intelligent storage manager 103. The Configuration module 172 analyzes the system configuration of memory devices based on definitions in the operating system 102 on every computer in the complex(s) and queries sent to every element in the configuration to determine nature. It builds a model configuration which is presented to an administrator for review and approval. If the administrator desires to change the configuration he/she can do so thru an interactive computer session. The facts gathered through this monitoring operation are then used to identify modifications to the organization of the computer system memory as well as the data stored therein to improve the performance of the computer system memory. FIG. 2 illustrates some of the various routines or subsystems that are implemented in intelligent storage manager 103.

Included are cache volume creation function 105, and DASD performance optimizer function 106. Cache subsystems often end up in a state where there are no volumes on the DASD subsystem suitable for caching, or where the number of I/Os to cached volumes account for only a fraction of the total number of I/Os to the data storage subsystem. The cache volume creation function 105 creates volumes (205) on the data storage subsystem that are suitable for caching. Cache volume creation 105 monitors (201) the performance of the cache memory 160 and identifies where cache