WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Relational database management system and method for storing, retrieving and modifying directed graph data structures    
United States Patent5201046   
Link to this pagehttp://www.wikipatents.com/5201046.html
Inventor(s)Goldberg; Robert N. (Redwood City, CA); Jirak; Gregory A. (La Honda, CA)
AbstractAn improved database management system (DBMS) stores, retrieves and manipulates directed graph data structures in a relational database. Each directed graph data structure contains one or more records of data which are interconnected by pointers. Data is stored in the database in the form of two dimensional tables, also known as flat files. The improved DBMS defines a schema for each table in the database. The schema defines the name and data type of each column in a database table. In tables used to store directed graph data structures, at least one column will be defined as having a reference data type. Non-empty entries in that column are pointers to rows in a specified table. Directed graph data structures are stored in specified tables by storing each record of the directed graph in a distinct row of one of the specified tables, with references corresponding to interconnections between records being stored in reference data type columns. Portions of a directed graph are retrieved from the specified table, in accordance with a single specified query and then the query is automatically expanded by also retrieving additional portions of the tables which are referenced by the previously retrieved portions, thereby performing a transitive closure. The retrieved data is stored in a buffer as a list of rows, and then communicated to an application process. An interface program converts the list of rows stored in the buffer into a directed graph data structure.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5201046
Relational database management system and method for storing, retrieving

     and modifying directed graph data structures - US Patent 5201046 Drawing
Relational database management system and method for storing, retrieving and modifying directed graph data structures
Inventor     Goldberg; Robert N. (Redwood City, CA); Jirak; Gregory A. (La Honda, CA)
Owner/Assignee     Xidak, Inc. (Palo Alto, CA)
Patent assignment
All assignments
Publication Date     April 6, 1993
Application Number     07/542,163
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 22, 1990
US Classification     707/100 707/101
Int'l Classification     G06F 015/419
Examiner     Clark; David L.
Assistant Examiner     Amsbury; Wayne
Attorney/Law Firm     Flehr, Hohbach, Test, Albritton & Herbert
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/200 364/300 395/600
Patent Tags     relational database management storing, retrieving modifying directed graph data structures
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4918593
Huber
707/200
Apr,1990

[0 after 0 votes]
4829427
Green
707/4
May,1989

[0 after 0 votes]
4468732
Raver
707/102
Aug,1984

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A computer system for storing, retrieving and modifying data stored in a database, comprising:

a database server; and

a multiplicity of application processes coupled to said database server, each application process including:

an application program that utilizes directed graph data structures in a corresponding application specific data format; each said directed graph data structure including one or more records of data interconnected by pointers, each record composed of one or more data elements having respective specified data types;

an application interface that translates directed graph data structures in said application specific data format into a predefined intermediate data format and translates directed graph data structures in said predefined intermediate data a format into said application specific data format; and

query generating means for generating and sending queries to said database server for storing, retrieving and updating specified directed graph data structures in said database;

said database server including:

schema defining means for defining a distinct schema for each of a plurality of tables in said database, each said database table having a plurality of rows and a specified number of columns, wherein each row of said table stores data values in each of said columns of said table; said schema denoting a data type for each column of data values stored in said table, each denoted data type being selected from a set of predefined data types including a reference data type;

each non-empty data value stored in a reference data type column comprising a reference to a row of one of said tables in said database; and

directed graph storage means for storing and retrieving specified directed graph data structures in and from specified tables in said database in accordance with queries received from said application processes;

said directed graph storage means including:

graph storing means for receiving a data storage query that includes a specified directed graph data structure in said predefined intermediate data format and for storing each record of said received directed graph data structure in a distinct row of a respective one of said tables in said database, said graph storing means including means for storing said data elements of said each record in corresponding columns of said one respective table and means for storing references, each reference corresponding to one of said pointers that interconnect said each record to other records of said specified directed graph data structure, in corresponding ones of said reference data type columns of said one respective table; and

directed graph retrieving means for retrieving a specified directed graph data structure from said database in accordance with a single specified query received from one of said application processes, including means for retrieving in accordance with said specified query at least one specified row from at least one respective table in said database and then retrieving additional rows of data from respective tables in said database, said additional rows of data comprising rows of data that are referenced by references in other rows of data retrieved in accordance with said specified query, for converting said retrieved rows of data into a directed graph data structure in said predefined intermediate data format, and for transmitting said retrieved directed graph data structure to said one application process;

whereby a directed graph data structure having multiple records is retrieved by said database server in response to a single query from said one application process.

2. A computer system as in claim 1,

said schema defining means denoting in each said schema a table identifier for each reference data type column, said table identifier specifying which database table is referenced by references in that column;

each said table in said database having an associated primary key comprising an ordered list of columns of said table whose values identify a particular row of said table; and

each of said references comprising a primary key value of a row in the respective database table specified by the schema for the table in which each said reference is stored.

3. A computer system as in claim 1, wherein each directed graph data structure in said intermediate data format includes rows of data values, each row of data values corresponding to a respective record of a corresponding directed graph data structure in one said application specific data format, said rows of data values including row pointers interconnecting said rows of data values, each row pointer corresponding to a respective pointer in said corresponding directed graph data structure in one said application specific data format.

4. A computer system as in claim 1,

said query generating means including means for including in said generated queries specified criteria for limiting retrieval of said additional rows of data.

5. A computer system as in claim 1,

said query generating means including means for including retrieval limiting criteria in ones of said generated queries, said retrieval limiting criteria denoting a maximum depth, said maximum depth comprising a maximum number of pointers used in sequence to interconnect one or more top-level records of said specified directed graph data structure with other records of said specified directed graph data structure; and

said directed graph retrieving means including means for limiting retrieval of said additional rows of data, when retrieving a specified directed graph data structure in accordance with a query including retrieval limiting criteria, to those of said additional rows of data that are connected to said specified at least one row of data by a sequence of references no greater in number than said maximum number denoted by said retrieval limiting criteria.

6. A computer system for storing, retrieving and modifying data stored in a database, comprising:

a database server;

a multiplicity of application processes coupled to said database server, each application process including:

an application program that utilizes directed graph data structures in a corresponding application specific data format; each said directed graph data structure including one or more records of data interconnected by pointers, each record composed of one or more data elements having respective specified data types;

query generating means for sending queries to said database server for storing, retrieving said updating specified directed graph structures in said database; and

at least one application interface that translates directed graph data structures in one respective application specific data format into a predefined intermediate data format and translates directed graph data structures in said predefined intermediate data format into said one respective application specific data format; said at least one application interface coupling said database server to a respective at least one of said application processes;

said database server including:

schema defining means for defining a distinct schema for each of a plurality of tables in said database, each said database table having a plurality of rows and a specified number of columns, wherein each row of said table stores data values in each of said columns of said table; said schema denoting a data type for each column of data values stored in said table, each denoted data type being selected from a set of predefined data types including a reference data type;

each non-empty data value stored in a reference data type column comprising a reference to a row of one of said tables in said database; and

directed graph storage means for storing and retrieving specified directed graph data structures in and from specified tables in said database in accordance with queries received from said application processes;

said directed graph storage means including:

graph storing means for receiving a data storage query that includes a specified directed graph data structure in said predefined intermediate data format and for storing each record of said received directed graph data structure in a distinct row of a respective one of said tables in said database, said graph storing means including means for storing said data elements of said each record in corresponding columns of said one respective table and means for storing references, each reference corresponding to one of said pointers that interconnect said each record to other records of said specified directed graph data structure, in corresponding ones of said reference data type columns of said one respective table; and

directed graph retrieving means for retrieving a specified directed graph from said database in accordance with a single specified query received from one of said application processes, including means for retrieving in accordance with said specified query at least one specified row from at least one respective table in said database and then retrieving additional rows of data from respective tables in said database, said additional rows of data comprising rows of data that are referenced by references in other rows of data retrieved in accordance with said specified query, for converting said retrieved rows of data into a directed graph data structure in said predefined intermediate data a format, and for transmitting said retrieved directed graph data structure to said one application process;

whereby a directed graph having multiple records is retrieved by said database server in response to a signal query from said one application process.

7. A computer system as in claim 6,

said schema defining means denoting in each said schema a table identifier for each reference data type column, said table identifier specifying which database table is referenced by references in that column;

each said table in said database having an associated primary key comprising an ordered list of columns of said table whose values identify a particular row of said table; and

each of said references comprising a primary key value of a row in the respective database table specified by the schema for the table in which each said reference is stored.

8. A computer system as in claim 6, wherein each directed graph data structure in said intermediate data format includes rows of data values, each row of data values corresponding to a respective record of a corresponding directed graph data structure in one said application specific data format, said rows of data values including row pointers interconnecting said rows of data values, each row pointer corresponding to a respective pointer in said corresponding directed graph data structure in one said application specific data format.

9. A computer system as in claim 6,

said query generating means including means for including in said generated queries specified criteria for limiting retrieval of said additional rows of data.

10. A computer system as in claim 6,

said query generating means including means for including retrieval limiting criteria in ones of said generated queries, said retrieval limiting criteria denoting a maximum depth, said maximum depth comprising a maximum number of pointers used in sequence to interconnect one or more top-level records of said specified directed graph data structure with other records of said specified directed graph data structure; and

said directed graph retrieving means including means for limiting retrieval of said additional rows of data, when retrieving a specified directed graph data structure in accordance with a query including retrieval limiting criteria, to those of said additional rows of data that are connected to said specified at least one row of data by a sequence of references no greater in number than said maximum number denoted by said retrieval limiting criteria.

11. In a computer system, a control process of storing and retrieving directed graph data structures in a data table in a computer system; said computer system having a multiplicity of application processes coupled to a database server that responds to queries from said application processes by storing, retrieving and updating data in said database; the steps of the control process comprising:

in each application process, executing an application program that utilizes directed graph data structures in a corresponding application specified data format; each said directed graph data structure including one or more records of data interconnected by pointers, each record composed of one or more data elements having respective specified data types;

each application process generating and sending queries to said database reserver for storing, retrieving and updating specified directed graph structures in said database;

when transmitting directed graph data structures from any one of said application processes to said database server, translating said transmitted directed graph data structures from the application specific data format utilized by said one of said application processes into a predefined intermediate data format,

when transmitting directed graph data structures from said database server to any one of said application processes, translating said transmitted directed graph data structures from said predefined intermediate data format into the application specific data format utilized by said one of said application processes;

in said database server, defining a distinct schema for each of a plurality of tables in said database, each said database table having a plurality of rows and a specified number of columns, wherein each row of said database table stores data values in each of said columns of said database table; said schema denoting a data type for each column of data values stored in said database table, each denoted data type being selected from a set of predefined data types including a reference data type; type;

each non-empty data value stored in a reference data type column comprising a reference to a row of one of said tables in said database;

in said database server, storing and retrieving specified directed graph data structures in and from specified tables in said database in accordance with queries received from one of said application processes; said storing and retrieving steps including:

receiving from one of said application processes a data storage query that includes a specified directed graph data structure in said predefined intermediate data format, and storing each record of said received directed graph data structure in a distinct row of a respective one of said database tables, said record storing step including storing said data elements of said each record in corresponding columns of said one respective database table and storing references, each reference corresponding to one of said pointers that interconnect said each record to other records of said specified directed graph data structure, in corresponding ones of said reference data type columns of said one respective database table; and

retrieving a specified directed graph data structure from said database in accordance with a single specified query received from one of said application processes, including retrieving in accordance with said specified query at least one specified row from at least one respective database table and then retrieving additional rows of data from respective ones of said database tables, said additional rows of data comprising rows of data that are referenced by references in other rows of data retrieved in accordance with said specified query, converting said retrieved rows of data into a directed graph data structure in said predefined intermediate data format, and transmitting said retrieved directed graph data structure to said one application process.

12. The control process of claim 11,

said schema defining step including denoting in each said schema a table identifier for each reference data type column, said table identifier specifying which database table is referenced by references in that column;

each said table in said database having an associated primary key comprising an ordered list of columns of said table whose values identify a particular row of said table; and

each of said references comprising a primary key value of a row in the respective database table specified by the schema for the table in which each said reference is stored.

13. The control process of claim 11, wherein each directed graph data structure in said intermediate data format includes rows of data values, each row of data values corresponding to a respective record of a corresponding directed graph data structure in the application specific data format utilized by one of said application processes, said rows of data values including row pointers interconnecting said rows of data values, each row pointer corresponding to a respective pointer in said corresponding directed graph data structure in said application specific data format.

14. The control process of claim 11,

said query generating step including the step of including in ones of said generated queries specified criteria for limiting retrieval of said additional rows of data.

15. The control process of claim 11,

said query generating means including means for including retrieval limiting criteria in ones of said generated queries, said retrieval limiting criteria denoting a maximum depth, said maximum depth comprising a maximum number of pointers used in sequence to interconnect one or more top-level records of said specified directed graph data structure with other records of said specified directed graph data structure; and

said step of retrieving a specified directed graph data structure including limiting retrieval of said additional rows of data, when retrieving a specified directed graph data structure in accordance with a query including retrieval limiting criteria, to those of said additional rows of data that are connected to said specified at least one row of data by a sequence of references no greater in number than said maximum number denoted by said retrieval limiting criteria.
 Description Submit all comments and votes
 


The present invention relates generally to computer database systems and particularly to relational database storage methods and systems for storing and retrieving directed graph data structures.

BACKGROUND OF THE INVENTION

A database management system (DBMS) is computer software that stores data and provides software routines for manipulating the stored data. A DBMS may be used directly (i.e., by human users), as a component of another software package, or to provide service to another software package.

A database is a collection of data which is stored and managed as a unit by a DBMS. A "relational database" is a database which contains tables that are used to store sets of data and to specify relationships between the different sets of data stored in the database. Relational databases and database management systems are widely used in the prior art. Therefore this document will describe prior art database systems only to the extent necessary to point out the differences between such prior art systems and the present invention.

Typically, databases are used to store sets of related data. For example, a database may be used to store all the seat reservations made by the customers of an airline, plus information about the airplane (e.g., seating chart information), information about the customers (e.g., address, credit card used to purchase tickets, and travel agent), and so on. This is an example of a database which is well suited for a prior art relational database management system.

The reason that the airline seat reservation database is easy to use with prior art database technology is that the data is easily organized as a set of flat records, in the form of a few tables: one for seat reservations, one for customer information, and so on.

An example of a set of data that is "difficult" to efficiently store and manipulate in a prior art relational database is shown in FIG. 1. This set of data 100, which denotes a set of automobile parts and also denotes which parts are components of other parts, is herein called a "directed graph". The data structures conventionally used to store such sets of data in computers are called "directed graph data structures". The reason that a directed graph is "difficult" to handle with a conventional database system is that while this data can be stored in and retrieved from a conventional database table, it is awkward to do so.

FIG. 2 contains a typical prior art table 110 (herein called the ContainsParts table) that would be used by a prior art database management system to store the directed graph shown in FIG. 1. FIG. 2 also shows a second table 120 (herein called the Parts table) which contains cost data for automobile parts. By using the two tables 110 and 120 together, one can determine the relative costs of manufacturing various portions of an automobile.

While table 110 in FIG. 2 contains all the data needed to reconstruct the directed graph of FIG. 1, it is very awkward for a prior art database management system to utilize data which is organized in this fashion. For example, consider the steps which would need to be performed by the prior art DBMS to generate a directed graph representing the set of all components of the engine. To do this, we would first have to examine all the records with a partName of ENGINE to generate a first list of engine parts. Then we would have to examine all the records for the parts identified in this first search (i.e., with partName equal to CAM SHAFT or WATER JACKET OR CYLINDER 1, etc.). In a real life example, we would then have to examine all the records for the parts identified in the second search, and so on.

In terms of search commands using SQL, the industry standard language for querying databases, a separate query would be required for retrieving each set of subparts. As will be explained in more detail below, to regenerate the portion of the directed graph corresponding to ENGINE, one would have to perform literally dozens of queries. TABLE 1 lists the fifty-four SQL queries which would be required to regenerate the entire directed graph for AUTOMOBILE:

TABLE 1 __________________________________________________________________________ PRIOR ART QUERIES FOR RETRIEVING DIRECTED GRAPH __________________________________________________________________________ 1) SELECT * FROM ContainsParts WHERE PARTNAME = "AUTOMOBILE" 2) SELECT * FROM ContainsParts WHERE PARTNAME = "BODY" 3) SELECT * FROM ContainsParts WHERE PARTNAME = "FRAME" 4) SELECT * FROM ContainsParts WHERE PARTNAME = "POWER TRAIN" 5) SELECT * FROM ContainsParts WHERE PARTNAME = "DASH BOARD" 6) SELECT * FROM ContainsParts WHERE PARTNAME = "SEATS" 7) SELECT * FROM ContainsParts WHERE PARTNAME = "SHELL" 8) SELECT * FROM ContainsParts WHERE PARTNAME = "WINDSHIELD" 9) SELECT * FROM ContainsParts WHERE PARTNAME = "DIFFERENTIAL" 10) SELECT * FROM ContainsParts WHERE PARTNAME = "DRIVE SHAFT" 11) SELECT * FROM ContainsParts WHERE PARTNAME = "ENGINE" 12) SELECT * FROM ContainsParts WHERE PARTNAME = "TRANSMISSION" 13) SELECT * FROM ContainsParts WHERE PARTNAME = "CAM SHAFT" 14) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 1" 15) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 2" 16) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 3" 17) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 4" 18) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 5" 19) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 6" 20) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 7" 21) SELECT * FROM ContainsParts WHERE PARTNAME = "CYLINDER 8" 22) SELECT * FROM ContainsParts WHERE PARTNAME = "WATER JACKET" 23) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 1" 24) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 2" 25) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 3" 26) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 4" 27) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 5" 28) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 6" 29) SELECT * FROM ContainsParts WHERE PARTNAME = "PISTON 7" 30) SELECT * FROM ContainsParts WHERE PARTNAME = " PISTON 8" 31) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 1" 32) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 2" 33) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 3" 34) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 4" 35) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 5" 36) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 6" 37) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 7" 38) SELECT * FROM ContainsParts WHERE PARTNAME = "SPARK PLUG 8" 39) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 1" 40) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 2" 41) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 3" 42) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 4" 43) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 5" 44) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 6" 45) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 7" 46) SELECT * FROM ContainsParts WHERE PARTNAME = "EXHAUST VALVE 1" 47) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 1" 48) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 2" 49) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 3" 50) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 4" 51) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 5" 52) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 6" 53) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE 7" 54) SELECT * FROM ContainsParts WHERE PARTNAME = "INTAKE VALVE __________________________________________________________________________ 8"

By way of comparison, the present invention allows a person or program to retrieve an entire subtree (or even a pruned subtree) of a directed graph using a single query. The single query needed to retrieve the entire directed graph in the preferred embodiment of the present invention is:

______________________________________ SELECT * FROM ContainsParts EXPAND ContainsParts(*) WHERE PARTNAME = "AUTOMOBILE" ______________________________________

The single query which would retrieve all portions of the directed graph corresponding to ENGINE is:

______________________________________ SELECT * FROM ContainsParts EXPAND ContainsParts(*) WHERE PARTNAME = "ENGINE" ______________________________________

Trees and other directed graph data structures are commonly used in scientific and engineering applications to represent and store data. Because of the limitations in the prior art, these types of scientific and engineering data are typically not stored using database management systems. As a result, all of the well developed tools associated with database management systems are generally not available to the users of scientific and engineering data. Instead, such data is typically stored and manipulated using a wide variety of special software programs. These programs vary widely in their manner of operation, how they represent data internally, and so on. Unlike relational database management systems, the programs each have a different theory of operation and each tends to be used by only a small market niche.

The primary goal of the present invention is to enable scientific and engineering data, which is normally stored in the form of tree data structures or directed graph data structures in operating system files (i.e., files directly accessed by application programs), to be easily stored and manipulated in a relational database management system. From another perspective, the primary goal of the present invention is to modify conventional relational database management systems so as to efficiently and intelligently handle data which is logically organized as a directed graph.

An important property of the present invention that is not provided by prior art relational database management systems is "transitive closure". Transitive closure means the ability to follow the links in a directed graph data structure and to process an entire or specified portion of a tree or directed graph as single entity. Database management systems which include the features of the present invention perform transitive closures, whereas prior art relational database management systems do not.

SUMMARY OF THE INVENTION

In summary, the present invention is an improved database management system (DBMS) which can store, retrieve and manipulate directed graph data structures in a relational database. Each directed graph data structure comprises one or more records of data which are interconnected by pointers. Data is stored in the database in the form of two dimensional tables, also known as flat files or base tables.

The improved DBMS defines a schema for each base table in the database. The schema defines the name and data type of each column in a database table. For tables used to store directed graph data structures, at least one column of the table will be defined as having a reference data type, which means that non-empty entries in that column contain "references" to other rows in the same or other tables in the database. A "reference" is a datum (stored in a reference column of a table) which matches the primary key of a particular row in a specified table in the database.

Directed graph data structures are stored in specified database tables by storing each record of the directed graph in a distinct row of a specified table. Interconnections between the directed graph's records are represented or denoted by references stored in reference columns, i.e., columns denoted in the table's schema as being a reference data type column. Portions of a directed graph are retrieved from base tables, in response to a query, by retrieving a portion of a first specified base table in accordance with the specified query and then expanding the retrieved data by also retrieving additional portions of base tables which are referenced by the portions of the first specified table already retrieved in accordance with the query.

In the preferred embodiment, the portions of a table retrieved in response to a query are stored in a buffer and then transmitted or communicated to an application process (i.e., to whomever sent the query to the DBMS). If the retrieved rows from the database include non-empty reference values, those reference values are automatically converted by the DBMS into pointers which point to other retrieved rows that are stored in the buffer. The application process includes an interface program for converting the retrieved portions of the specified table into a directed graph data structure.

Updates or modifications of directed graph data structures are handled in much the same way as storing a directed graph in the first place. The modified portions of the directed graph are converted into a set of rows in accordance with the specified schema for the target table or tables (i.e., the table(s) in which the directed graph is to be stored). Each resulting row of data is then used to modify or update corresponding portions of the target table(s).

BRIEF DESCRIPTION OF THE DRAWINGS

Additional objects and features of the invention will be more readily apparent from the following detailed description and appended claims when taken in conjunction with the drawings, in which:

FIG. 1 depicts an example of a directed graph.

FIG. 2 depicts two tables for storing data corresponding to the directed graph shown in FIG. 1.

FIG. 3 is a block diagram of a database management system in accordance with the present invention.

FIG. 4 illustrates a directed graph data structure.

FIG. 5 depicts the schema for a database table used to stored directed graph data structures.

FIG. 6 depicts an example of a database table used to stored directed graph data structures.

FIG. 7 depicts the data structure for a list of rows retrieved from a database using an expanded query.

FIG. 8 depicts a hash table used during data retrieval and storage.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring to FIG. 3, there is shown a multiuser database management system (DBMS) 200, drawn so as to emphasize the portions of the DBMS which are particularly relevant to the present invention. Before discussing this Figure in detail, the following glossary of terms is provided.

GLOSSARY

APPLICATION--A computer program that solves some particular problem. Applications may use a DBMS to store and manipulate information relevant to such application.

APPLICATION PROGRAMMER--The person or persons who write the source code for an application, such as a computer aided design program. Application programmers of ordinary skill in the art possess the skills necessary to implement applications within a specified application domain, and are also skilled in the use of prior art RDBMS products such as IBM's DB2, Oracle's Oracle, and Ingres (a trademark of Ingres Inc.). Application programmers of ordinary skill in the art are also skilled users of programming languages, such as C, the MAINSAIL programming language, and ADA, and can construct programming language interfaces which enable an application to send queries to a prior art relational DBMS system and to receive data and other responses generated by the prior art DBMS.

BASE TABLE--A table that corresponds to data physically stored in a permanent storage medium, as distinct from a derived table that is defined by a query used to retrieve data from a base table.

C--A programming language, often used to write engineering application programs. See for example, B. Kernighan and D. Ritchie, "The C Programming Language", 2nd Edition, Prentice Hall, New Jersey, 1988.

CYCLE--In the context of structured data such as directed graph data structures, a cycle is a closed loop consisting of a series of references from one component back to itself. For example, if the interstate highway system were represented in a database, where each highway was a row that contains references to each other row (highway) for which there is an exit, there would be many cycles, because it is possible to start on one highway, exit to another, and eventually exit back onto the original highway.

DATA ITEM--An instance of a particular data type. For example, the number 1 is an instance of the data type INTEGER.

DATA TYPE--A name for a set of possible values that may be represented in a computer's memory, along with a set of operations on those values. For example, the data type INTEGER allows for the representation of positive counting numbers (1,2,3,...), negative numbers (-1,-2,-3,...), and zero (0), and provides for arithmetic operations such as addition and multiplication. Each DBMS product supports a predefined set of data types.

DATABASE--A collection of related information, or data. Many databases are abstract representation of information pertinent to an enterprise, such as the design of integrated circuits.

DATABASE MANAGEMENT SYSTEM (DBMS)--A component of a computer system that provides support for the storage and management of databases.

DERIVED TABLE--A table that is defined by the results of a query that retrieves data.

DIRECTED GRAPH DATA STRUCTURE--A data structure that models arbitrary relationships among data objects. Directed graph data structures include, for example, trees, linked lists, and cyclical graph structures (structures with cycles). Directed graph data structures are defined and discussed in Aho, Data Structures and Algorithms, Addison-Wesley, 1983 (pp. 198-199).

ENGINEERING APPLICATIONS--Applications, such as computer-aided design (CAD), computer-aided manufacturing (CAM), and computer-aided software engineering (CASE), that must retrieve, manipulate, and store structured data that model objects of much greater complexity than typically found in business applications.

FOREIGN KEY--A foreign key in a table T1 is a list of one or more columns in table T1 that correspond to the primary key columns of a base table T2. (Table T1 and T2 may or may not be distinct.) Thus, the foreign key column values in a particular row of T1 logically refer to the row of T2 whose primary key column values matches such foreign key column values.

LIST--An ordered collection of zero or more data items, possibly containing some duplicate values.

NULL--A missing or unspecified value.

PRIMARY KEY--Associated with each base table in a relational database is an ordered list of columns whose values uniquely identify a particular row of the table. This list of columns comprises the primary key of the table. More particularly, the primary key of an individual row of the table consists of the values of the primary key columns for that row. Specification of the primary key values for a given row is the only way to identify that particular row.

PROGRAM--See application.

QUERY--An instruction (command) to a DBMS that tells the DBMS to perform a desired action. Typical actions include the retrieval of data from the database and making changes to the data stored in the database.

QUERY LANGUAGE--A computer language for expressing data retrieval questions and modification operations on a database. The query language defines the set of commands that are accepted by the DBMS, including the syntax and the actions or meaning associated with the syntax. A query is thus a particular instruction or sentence in the query language.

RECORD--A record is a memory resident aggregate data type supported by many programming environments. Each such programming environment provides a way to declare a description for each type of record used by an application (including the names and types of the constituent data types making up the record), a way to allocate new instances of each record type, and, often, a way to dispose of (recycle) records when no longer needed by the application.

REFERENCE--REFERENCE is a database column data type which is an extension of the concept of foreign key. Column that are declared in the schema to be of type REFERENCE are used to refer to individual rows by matching foreign key values to primary key values. The REFERENCE data type is used in the database representation of directed graph data structures. In a table having a reference column, each non-empty reference in the table matches the primary key value of a row in a specified table in the database. References differ from foreign keys in several ways that are explain in detail below. Most importantly, a reference is essentially a set of bits which match the primary key of a row in a base table. Each reference value is stored in a single reference column, even if the corresponding primary key occupies two or more columns of the referenced table.

RELATION--A data table of values, such data table being an open-ended and unordered collection of rows, each row consisting of an ordered and fixed list of data items.

RELATIONAL DBMS (RDBMS)--A DBMS that allows an application to define and operate on a database using the abstractions of the standard relational model, which frees the application from physical database storage considerations. In particular, data are represented using relations (tables).

RETRIEVAL QUERY--Any query that specifies that data be retrieved (passed from the DBMS to an application).

ROW--The list of related column values in a table corresponding to a particular primary key value.

SCHEMA--A description of the information that is represented by a database. More particularly for relational databases, a description of the tables that make up the database, including the table names, the column names and column types of each table, and any other information that is needed to enable an application, a DBMS, or a user to interpret the contents of the database.

SET--An unordered collection of zero or more unique data items, none of which can be null.

STRUCTURED DATA--Data that model the complex structure of real-world objects and events, including (in general) multiple links from one component of the structure to another and cycles. Structured data may conveniently be represented using a directed graph data structure.

STRUCTURED QUERY LANGUAGE (SQL)--The industry-standard query language for relational DBMS, as defined by the American National Standards Institute's standard ANSI X3.135-1986. An extension of this standard to include more powerful constructs, called SQL2, is currently being carried out in the working joint standards committee ANSI X3H2 and the International Organization for Standardization ISO DBL SYD-2. NOTE: The use of the word "structured" in the context of SQL is completely unrelated to the use of the same word in "structured data".

TABLE--Each table is described by the database schema and consists of a set of "rows", each of which contains values for each column of the table.

TOP-LEVEL RECORD--The starting point of a directed graph data structure as represented in an application's memory.

TOP-LEVEL ROW--The starting point of a directed graph data structure as represented in an extended relational database.

DATABASE MANAGEMENT SYSTEM

The database management system (DBMS) 200 shown in FIG. 3 is coupled to a number of application processes 202-204, hereinafter called applications. The DBMS 200 stores, retrieves and updates information in a database 206 on behalf of the applications. One of the primary benefits of using a DBMS 200 is that it relieves applications programmers from having to deal with data storage and retrieval, and instead allows application programmers to concentrate on solving the problems for which a specified application program is being developed. Another benefit of using a DBMS 200 is that it provides a mechanism which allows multiple applications to share the information in a shared database 206.

Physically, the data in the database is typically stored partially in high-speed random access memory (RAM) and partially on disk drives. A storage management subsystem 208 manages physical storage of the database 206, and typically includes performance enhancement features such as software for disk caching and for indexing on column values. A "b-tree" or "b+tree" physical storage technique is often used by commercial DBMS. Since storage management is not related to the present invention, and is well understood by those skilled in the art, it is not discussed any further herein.

In the preferred embodiment, each application process 202, 204 will reside on a separate computer, each of which is coupled to a host computer or network server which runs the database management system process 210. The DBMS process 210 includes a separate application task 212-214 for each application that is currently using the database. To coordinate communications with multiple applications and to allow a single computer to run multiple application tasks, the DBMS process 210 includes a multitasking transaction manager 220. In other words, the computer on which the DBMS is running includes multitasking operating system software for handling multiple tasks or execution threads.

As will be understood by those skilled in the art, there are many possible system configurations, all of which are equivalent for the purposes of the present invention. For example, in other embodiments it would possible for both the DBMS and the application programs to all be running on a single computer, or for the DBMS and some of the application programs to be running on one host computer (e.g., a mainframe or high performance minicomputer) with other application programs running in separate computers that are coupled to the host computer by a communications network.

The commands sent by application pro