|
Description  |
|
|
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it
appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
The present invention relates generally to information processing environments, and more particularly to systems requiring management, storage, and retrieval of diverse or non-uniform information.
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is
an organized collection of related information stored as "records" having "fields" of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee,
such as name, home address, salary, and the like. A Database Management System (DBMS) is the computer system that allows users to exploit the power of databases.
Traditional databases, such as ones employing the well-known relational database model approach, typically employ a separate database table for each kind of data to be managed. The basic assumption underlying the approach is that the data to be
managed consists of a large number of very similar types of data. A corporation may, for example, have hundreds of "Customer" records, or thousands of "Invoice" records. Still further, the "Invoice" records may beget tens of thousands of "Line Item"
records. The central notion underlying this approach is that each collection of records (i.e., "table") stores similar information--each having essentially fixed contents. In a customer table, for instance, each customer record would include a First
Name, a Last Name, an Address, and the like. Each of these information "type" is, in turn, stored in a particular "field" of the record (e.g., Address field of the Customer record). The general construction and operation of a database management
system, including relational ones, is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volumes I and II, Addison Wesley, 1990; the disclosures of which are hereby incorporated by reference.
Employing the relational database approach of how information is organized or modeled, a system designer would typically implement such a system with a particular storage mechanism and optimizations thereof. Designers employing the popular
relational database model approach would, for instance, proceed by creating a separate table or "relation" for each type of information (e.g., Customer table, Invoice table, and the like) and then define "links" between the tables, for establishing how
information is related. All the while, the assumption is being made that one is dealing with a relatively small set of tables, with each table having records storing similar information.
Although the approach has served the needs of business users well, it is problematic when applied to environments requiring the modeling of non-uniform or dissimilar information. For the data storage needs of home users, for example, the problem
exists of how one may effectively store items which are not similar and may in fact be quite dissimilar. "Effectively store," in this context, means a storage mechanism which preserves the functionality/utility normally associated with traditional
systems, including searching, "browsing," performing "lookups," and the like, but provides this for items which are not similar. A home user may, for instance, want to manage information about life insurance, car loans, car service, personal health, and
so forth and so on. For each type of information, the user may have only a few "records" and, often, only one record. Even in instances of multiple records of one type, the number of such records will generally be low, typically less than 20. Because
of the information managed by home users is generally very diverse, the traditional approach of dividing the information to be modeled into a small number of similar-information tables simply fails. If this diverse information were to be stored by
grouping together similar information into records to be stored in a single table, the information would require dozens of tables, or more. At best, the approach is wasteful of resources.
Although the basic problem has been stated in terms of the disparate information desired to be managed by a home user, the problem has implications for other environments. Consider, for example, the task of automating a particular office
environment. The office may track information about utility bills, rent, insurance, clients, and so forth. The information managed by offices has, to a large extent, not been automated, since no practical mechanism for handling such diverse information
has existed. Instead, the office would settle for automating one or two particular aspects of its information, such as an accounts payable system for tracking invoices or a payroll system for tracking employee salaries (using the above-described
traditional storage architecture and methodology). At best, only a few types of office information have been automated, with each type typically being managed by a particular dedicated system (e.g., accounts payable system).
For environments where the number of different style records numbers in the dozens or even hundreds, whether it be information of a home user or office information, the traditional approach to information storage/management is totally
impractical. True "office automation" using a traditional database system would, as in the case of the home user, requires a database file or table for each of these different style records--each type requiring maintenance of a separate file on the
computer's storage disk. As each table has a certain amount of overhead associated with it, managing such a large number of tables would waste system resources and degrade system performance to an unacceptable level. All told, such an approach is
highly inefficient and, thus, not practical to implement.
One approach to addressing the problem is to just store the disparate information in a single, large "BLOB" (Binary Large Object) field. This approach is also problematic, however. In a BLOB field, the stored data is completely
unstructured--simply existing as a block of bytes. In this unstructured state, the information cannot participate in conventional structured operations, such as "views" and "lookups." Admittedly, some rudimentary methods may be applied to a BLOB field,
such as generic or brute-force text searching. The results are limited. A query of Last Name=`Freund` and State=`California`, for instance, is simply not supported. By and large, the BLOB field approach does not serve to satisfy typical data
processing needs.
What is needed are system and methods which provide for the efficient storage of non-similar information yet, at the same time, provide the database tools that users have come to expect. In particular, such a system would store records of
various styles or formats without incurring substantial penalty in terms of system performance or use of resources. At the same time, such a system would allow the information to be processed using traditional database tools and methodology, such as
indexing/searching, browsing, performing queries, and the like. The present invention fulfills this and other needs.
SUMMARY OF THE INVENTION
The present invention recognizes a need for providing efficient storage of non-similar information. Accordingly, the present invention provides a system and methods for storing records of various styles and/or formats in a fashion which does not
incur substantial overhead or performance penalty. Additionally, traditional database functionality (e.g., lookups and queries) is preserved.
According to one embodiment, a Databank is provided as a single, central storage mechanism for all user information, regardless of a particular format (i.e., data record type) in which the information may exist. The Databank includes a Databank
table comprising a plurality of records having "static" fields and a single "dynamic" field (comprising various logical fields or "subfields"). These are used in conjunction with one or more Form Definitions having field descriptors. Using the Databank
approach of the present invention, new types of information (e.g., new styles of records) may be added to the system without restructuring the stored data.
The static fields store descriptor information common to all data records and are thus core fields necessary for each of the data records, irrespective of what type of information a given data record stores. The actual data from the non-uniform
data record, on the other hand, is stored in the "dynamic" field. Actual user data are stored in the Databank in a structured, pre-defined manner using the logical fields of the dynamic field. The dynamic field may be implemented on any database engine
which supports at least one free-form data field, such as a BLOB or memo field. Even with databases that do not support BLOBs or memo fields, the dynamic field may be implemented using a single large text string field.
In order for the system to correctly interpret the dynamic and static contents, every record type is associated with a corresponding form description (Form Definition) that contains field descriptors and other information relevant to interpreting
the dynamic and static contents. The field descriptors thoroughly describe the contents of particular dynamic logical fields and how their contents are to be interpreted. Information stored includes, for example, default values, screen presentation
information (e.g., relative size, position, orientation, and the like), field type (e.g., logical, alphanumeric, or the like), pick list information, and "hints" as to how the data should be presented. Thus, the Databank may be viewed as comprising a
central Data Repository for storing the actual user data and also comprising a table of descriptors describing how to interpret data in the Data Repository.
Methods are described from reading data from and writing data to the Databank storage mechanism. In a preferred embodiment, the methods are implemented in a fashion so that the client (i.e., software application requiring data storage and
processing) is unaware of storage of information in dynamic logical fields. The client simply requests retrieval (or storage) of information, whereupon the system automatically retrieves the information from a conventional database field (if found there
at) or a dynamic logical field (if not found at the conventional fields). Thus, the Databank storage subsystem may be implemented in a manner which is transparent to clients and, at the same time, store nonuniform information in a highly efficient
manner.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A is a block diagram of a computer system in which the present invention may be embodied.
FIG. 1B is a block diagram of a software system of the present invention, which includes a Databank subsystem for the storage of non-uniform data records.
FIG. 2 is a block diagram illustrating the storage of non-uniform data records from separate tables as Databank data records in a single table.
FIG. 3 is a bit map screen shot illustrating an Address Manager in the system of the present invention, which allows a user to store family addresses, personal addresses, and business addresses.
FIG. 4 is a block diagram illustrating a Family Page in the system of the present invention, which employs hints stored in the Form Description (accompanying the Databank fields) to build the actual form on screen.
FIGS. 5A-E are bit map screen shots illustrating use of a single browser in the system of the present invention to find information in any data record, regardless of its particular "record type."
FIG. 6A is a block diagram illustrating the functional modules of a Databank system constructed in accordance with the present invention.
FIG. 6B is a block diagram illustrating the Databank storage module, which includes a Descriptor Table ("Form Definition") and a Data Repository.
FIG. 7 is a flow chart illustrating a GetDBankData method of the present invention for retrieving data stored in the Databank.
FIGS. 8A-B comprise a flow chart illustrating a PutDBankData method of the present invention, for storing data in the Databank.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
The following description will focus on the presently preferred embodiment of the present invention, which is operative in an end-user application running under the Microsoft.RTM. Windows environment. The present invention, however, is not
limited to any particular one application or any particular environment. Instead, those skilled in the art will find that the system and methods of the present invention may be advantageously applied to a variety of system and application software,
including database management systems, wordprocessors, spreadsheets, and the like. Moreover, the present invention may be embodied on a variety of different platforms, including Macintosh, UNIX, NextStep, and the like. Therefore, the description of the
exemplary embodiments which follows is for purposes of illustration and not limitation.
System Hardware
The invention may be embodied on a computer system such as the system 100 of FIG. 1A, which comprises a central processor 101, a main memory 102, an input/output controller 103, a keyboard 104, a pointing device 105 (e.g., mouse, track ball, pen
device, or the like), a display device 106, and a mass storage 107 (e.g., hard or fixed disk, optical disk, magneto-optical disk, or flash memory). Processor 101 includes or is coupled to a cache memory 109 for storing frequently accessed information;
memory 109 may be an on-chip cache or external cache (as shown). Additional input/output devices, such as a printing device 108, may be included in the system 100 as desired. As shown, the various components of the system 100 communicate through a
system bus 110 or similar architecture. In a preferred embodiment, the system 100 includes an IBM PC-compatible personal computer, available from a variety of vendors (including IBM of Armonk, N.Y.).
System Software
A. Overview
Illustrated in FIG. 1B, a computer software system 120 is provided for directing the operation of the computer system 100. Software system 120, which is stored in system memory 102 and on disk memory 107, includes a kernel or operating system
(OS) 140 and a windows shell 150. One or more application programs, such as client application software 145 may be "loaded" (i.e., transferred from storage 107 into memory 102) for execution by the system 100. As shown, at least one windows application
software employs a Databank storage subsystem 130 of the present invention, which includes a Databank Engine 135 for storing non-uniform data 131 in a Databank table 133. As will be described below, the Databank storage subsystem 155 provides an
improved storage mechanism of the present invention; at the same time, it preserves the full functionality of a traditional database.
System 120 includes a user interface (UI) 160, preferably a Graphical User Interface (GUI), for receiving user commands and data. These inputs, in turn, may be acted upon by the system 100 in accordance with instructions from operating module
140, windows 150, and/or client application module(s) 145. The UI 160 also serves to display the results of operation from the OS 140, windows 150, and application(s) 145, whereupon the user may supply additional inputs or terminate the session.
Although shown conceptually as a separate module, the UI is typically provided by interaction of the application modules with the windows shell, both operating under OS 140. In a preferred embodiment, OS 140 is MS-DOS and windows 145 is Microsoft.RTM.
Windows; both are available from Microsoft Corporation of Redmond, Wash. Databank 130 will now be described in further detail.
Databank storage subsystem
A. Overview
The task of storing various "pieces" of information has remained a problem for database designers. Although such information may be "lumped" into a single text field, the ability to further process the data (e.g., performing queries, indexed
lookups, and the like) is largely lost, as one must instead resort to brute-force text string searches. If the information is stored in a text file, the functionality of other database tools is lost as well. File locking and other concurrency controls,
for instance, are not available.
The Databank of the present invention provides a single, central storage for all user information, including non-uniform data records. According to one embodiment of the present invention, the Databank includes a Databank table comprising a
plurality of records having "static" fields and a single "dynamic" field (comprising various logical fields or "subfields"). These are used in conjunction with a Form Definition having field descriptors. Using the Databank approach of the present
invention, new types of information (e.g., new styles of records) may be added to the system without restructuring the stored data.
B. Static and Dynamic Fields and Field Descriptors (Form Definition)
The static fields store descriptor information common to all data records and are thus core fields necessary for each of the data records, irrespective of what type of information a given data record stores. The actual data from the non-uniform
data record, on the other hand, is stored in the "dynamic" field which functions as the Data Repository. Actual user data are stored in the Databank in a structured, pre-defined manner using the logical fields of the dynamic field. The dynamic field
may be implemented on any database engine which supports at least one free-form data field, such as a BLOB or memo field. Even with databases that do not support BLOBs or memo fields, the dynamic field may be implemented using a single large text string
field.
In order for the system to correctly interpret the dynamic and static contents, every record type is associated with a corresponding form description (Form Definition) that contains field descriptors and other information relevant to interpreting
(e.g., reading, saving, displaying, and the like) the dynamic and static contents. The field descriptors thoroughly describe the contents of particular dynamic logical fields and how their contents are to be interpreted. Information stored includes,
for example, default values, screen presentation information (e.g., relative size, position, and orientation), field type (e.g., logical, alphanumeric, or the like), pick list information, and other "hints" as to how the data should be presented. Thus,
the Databank may be viewed as comprising a central Data Repository for storing the actual underlying data and also comprising a table of descriptors describing how to interpret data in the Data Repository.
FIG. 2 is a block diagram illustrating this approach for storing non-uniform data. For purposes of clarity, a simple example of storing non-uniform text is demonstrated. As shown, non-uniform data records 210 may include non-similar data, such
as personal, vehicle, and home information. Personal data record 211, for instance, may be stored as a Databank record 250, as shown. In particular, the actual user information in the personal data record 211 is stored in a single dynamic field 251.
As shown by the enlarged view 261, the dynamic field 251 comprises a text block storing the data members in a structured, contiguous fashion. For the example of FIG. 2, the format is as follows:
For text, therefore, the format is Field name, followed by "=", followed by actual data, and finally followed by a carriage return character.
C. Generic sort fields (indexing)
Each "Form Definition" allows additional sort criteria to be applied to the data. An exemplary Databank data file includes at least one static field serving as a generic field dedicated to sorting. Multiple indexes may be maintained on the data
file, with the meaning of any particular index being interpreted based on what the Form Definition contains (i.e., the sort may be interpreted in the context of a particular form). For a particular data record, for instance, a first index may be
interpreted as an index on last name (index1=Last Name) and a second index on company name (index2=Company Name). For the next record, however, index 1 may be by service date (index1 =service date) and index 2 may be by vehicle name (index2=vehicle
name). In either case, both indexes operate by matching key values, thus correctly indexing a given type of record (despite the fact that the index actually indexes a plurality of different record types). This allows the system to perform lookups and
browsing efficiently across diverse data record types, based on the indexes which are all the while being maintained by a traditional database engine.
D. Cross-Reference Records
The data file or Data Repository also includes (optionally) a cross-reference field for storing a cross-reference from one record to another. This functionality is perhaps best explained by example. Consider, for instance, the storage of
information specifying a recurrent "To Do," such as a task to be performed weekly, monthly, or the like. Internally, the system maintains a base record defining the recurring "To Do" item. Moreover, however, the system also stores "instance records"
for each particular instance of the "To Do" task. An instance record is employed for each instance so that the user may modify a given instance, such as marking it as completed or rescheduling it to a later time. The individual "To Do" stores a
cross-reference to the corresponding general "To Do" record.
Consider, for instance, base or master "To Do" record storing:
Desc: English 101
A particular instance of the "To Do" may then store:
Date: Sep. 12, 1996
together with a cross-reference back to the master. A request to retrieve the description (Desc) from the instance record is actually satisfied by referring back to the master record, via the cross-reference. Finally, the instance may store new
information for a particular field, such as:
Desc: English 101 Final Exam
Date: Sep. 12, 1996
whereupon, a request to retrieve the description (Desc) from the instance record is actually satisfied by the instance itself. There is no need to refer back to the master record.
E. Combining Similar Records
Referring to FIG. 3, combination of similar types of records will now be illustrated. "Similar records" are ones which have some characteristic in common, but which are not efficiently stored using traditional database storage methodology. FIG.
3 illustrates an Address Manager 300 in the system of the present invention; it allows the user to store family addresses, personal addresses, and business addresses. The address types have information in common, such as street names and telephone
numbers. They are, nevertheless, different. Family and personal addresses store, for example, birthday information, while business address does not. Business address, on the other hand, stores a business name and title, while family and personal
address do not. Family address stores relationships, while personal and business do not. Although there are differences among the address types, they are to a large extent the same.
The system of the present invention allows the combination of three different forms but still uses the same display. Thus when the user creates a new address, he or she may create a particular type. Depending on which type is chosen, the system
displays the address with a slightly changed form. The indexes employed to track the various types of addresses can nevertheless be maintained for similar information. For example, index on Last Name applies for all records, despite the fact that where
the Last Name field appears in any address type varies from one address record type to another. Information which is not similar, such as a business name, can still be indexed. For those record types for which the index information would make no sense,
the index simply stores an empty string (i.e., index on business name stores an empty string for a personal address record). Thus not only can the database approach be applied to non-similar information, but the approach can also be advantageously
applied to similar, but different, information.
In an exemplary embodiment, the indexes are maintained on the first four "logical fields" stored in the dynamic field. This is achieved through aliasing. In the form definition, for example, Last Name is aliased into index1, First Name is
aliased into index2, and the like. For the business address, the corresponding form definition (descriptor) aliases Business Name into index3; the other two form definitions (i.e., for family and personal) do not contain an alias definition for the
third index (i.e., it is treated as an empty index). In this fashion, indexes can be re-used based on the Form Definition. Moreover, the user can perform lookups across different record types, so long as there is some commonality, such as last name,
which would form the basis for the lookup.
F. Use of Form Hints
Instead of hand-designing each form for a given record type, the system employs the hints stored by the Form Definition to build the actual form on screen. This is illustrated by the Family Page 400, shown in FIG. 4. As illustrated for family
member Fred Flintstone, the category is medical care and the record type is patient history. Each category has a set of record types, which correspond to form names. The medical category, for instance, includes record types of patient history, family
history, hospitalizations, doctor visits, and the like. These are different pieces of information, yet they contain a core part which is identical.
Recall that the Form Definition stores information describing the contents of the dynamic field--what are the (user) data fields of a particular record. Since the field descriptors completely characterize the user data, they may be employed to
construct on-the-fly a visible form on screen, for displaying the corresponding user data. For a Last Name field, for example, in addition to storing that the field comprises alphanumeric information, the Form Definition also store desired display
characteristics of the user data. Thus, for instance, the Last Name field may also have associated with it a descriptor specifying minimum and maximum lengths that the field may be rendered on screen, as well as orientation information and relative
location (i.e., relative to other fields). For instance, a Last Name may include a hint specifying that it is to be displayed before (i.e., above) an Address field, displayed to the left of a First Name field, displayed with a separator (i.e.,
separating line), displayed as part of a group (i.e., personal information group), or the like.
According to the present invention, since there are so many different types of information to be modeled, it is preferable to not store forms for rendering the user data (as is conventionally done). Even without a particular form having been
designed for the data (such as a car service form having been designed for car service data), the system may render the data on screen with optimal formatting, using the hints. Using the field descriptors, the system may employ a generic engine for
rendering data on screen in a variety of forms created on-the-fly, at run time. For instance, a separator may be displayed between the address portion and phone number portion of the record rendered on screen. The hints for the address field in this
instance stores a hint specifying that a certain amount of space is to be left on screen after rendering the address information. At the same time, however, it is not known beforehand exactly what information is to be rendered after the address field.
The hints for any particular user-data field of the dynamic field includes a list of features that are desired for the data (not only to itself but relative to other fields) when it is rendered among a group of other user-data fields (which may not be
known beforehand). This allows the system the flexibility to render a set of user data on screen (e.g., given a particular size and location in which it must be rendered) by using the hints to determine on-the-fly how individual fields of the user data
are to be displayed.
G. Browsing using generic indexes
The system of the present invention provides single browser, implemented as a "Find" option, for browsing across different types of information in a single setting and single context. In particular, Find employs a generic index on the various
record types, such as an index on who the record "belongs to." In a preferred embodiment, the generic fields of the Databank include fields storing information about a particular user (e.g., family member), particular form (e.g., doctor visit), date
created, date modified, and the like. Using Find, the user can easily browse with a single browser across the different data records for a given user, or for a given form, or for a given date, or the like.
FIG. 5A-E illustrates browsing of different data record types using the single browser. FIG. 5A shows patient history, which has a certain form. FIG. 5B, on the other hand, shows a form for doctor visit. As illustrated in FIG. 5C, the user can
use a single browser to find information in any record in the system, regardless of its particular "record type." In this case, all records pertaining to Fred regardless of type are displayed. Further, conventional filtering techniques may be applied.
For instance, the records could be filtered to not show Fred Flintstone, as illustrated in FIG. 5D. Or, alternatively, the records could be filtered to not show a particular record type, such as not showing vehicle registration records, as illustrated
in FIG. 5E. By using generic indexes, therefore, a single browser may be employed for browsing different record types. This allows several generic operations to be performed on the data, even though the data content itself may be non-uniform.
H. Query Optimization
The Databank may be used for query optimization. Consider, for example, the task of searching for various records having Last Name=Freund. To answer such a query, the system may first look at the field descriptors for determining those record
types which include a field of "last name" (i.e., have a dynamic field which stores last name information). Next, the system may set a filter, for filtering out all records in the Databank which do not have a last name field. This filtering step may be
done in a convention manner, such as using bitmask filters as described by Fulton et al. in International Application No. PCT/US91/07260, International Publication No. WO 92/06440, Apr. 16, 1992. After filtering out the irrelevant records, the system
may then proceed to satisfy the query by processing the information stored in the dynamic field, such as by using the above-described index lookup. The reader should note that since the storage mechanism is generic, the Databank engine itself is also
generic and thus may be easily adapted to a variety of applications and their data. This is one example of how the generic approach of the present invention may be applied for processing information which, on the other hand, is diverse.
I. Adapting to new data types on the fly
Although the Databank system of the present invention has been illustrated in terms of database management of a plurality of diverse data records, the invention may also be advantageously applied to storage and management of more conventional
information. Regardless of information to be modeled, almost every database design includes fields which are used rarely. Conside | | |