WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases    
United States Patent5634053   
Link to this pagehttp://www.wikipatents.com/5634053.html
Inventor(s)Noble; William B. (Santa Monica, CA); Patel; Bhadra K. (Anaheim, CA); Wang; Jenny K. (Cerritos, CA)
AbstractA FIM system integrates data from a plurality of interconnected local databases to provide users with access to a virtual database. The FIM includes a user interface for generating a global query to search the virtual database, a smart dictionary database (SDD) that contains configuration data, a data information manager (DIM) that decomposes the global query into local queries, and a plurality of local information managers (LIMs) that execute the local queries to search for and retrieve data from the enumerated databases. A filter generates a list of those local databases that contain information relevant to the global query. As a result, the DIM only generates local queries for the enumerated local databases. An input translator converts the global query into the respective local formats for the local databases so that the FIM provides true integration of heterogeneous databases. An output translator converts the data retrieved from each local database into a uniform i/o format so that the data presented to the user is integrated. The user typically selects the i/o format as his or her local format or a global format associated with the virtual database.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5634053
Federated information management (FIM) system and method for providing

     data site filtering and translation for heterogeneous databases - US Patent 5634053 Drawing
Federated information management (FIM) system and method for providing data site filtering and translation for heterogeneous databases
Inventor     Noble; William B. (Santa Monica, CA); Patel; Bhadra K. (Anaheim, CA); Wang; Jenny K. (Cerritos, CA)
Owner/Assignee     Hughes Aircraft Company (Los Angeles, CA)
Patent assignment
All assignments
Publication Date     May 27, 1997
Application Number     08/521,340
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 29, 1995
US Classification     707/4 707/9
Int'l Classification     G06F 017/30 G06F 015/16
Examiner     Black; Thomas G.
Assistant Examiner     Corrielus; Jean M.
Attorney/Law Firm     Grunebach; G. S Sales; M. W ., Low; W. K ., Denson- .
Address
Parent Case    
Priority Data    
USPTO Field of Search     395/600 395/604 395/610
Patent Tags     federated information management (fim) providing data site filtering translation heterogeneous databases
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5412806
Du
707/2
May,1995

[0 after 0 votes]
5375235
Berry
707/5
Dec,1994

[0 after 0 votes]
4714995
Materna
707/201
Dec,1987

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A database controller for integrating data from a plurality of interconnected local databases to provide users with access to a virtual database, comprising:

a user interface for generating a global query to search said virtual data base, which has an associated global format, said global query including at least one data field from a set of commonly used data fields whose values are represented in an input format;

a smart data dictionary (SDD) that contains configuration data for each of said local databases including respective local formats for each of said commonly used data fields;

a selector for selecting the input format for generating the global query from one of said global and local formats;

an input translator that converts the value of said data field in said global query into local values in the respective local formats;

a data information manager (DIM) that generates local queries including said local values for said data field in response to said global query and in accordance with the respective configuration data;

a plurality of local information managers (LIMs) that execute the local queries to search for and retrieve from the respective local databases data that is associated with the local values of said data field, said LIMs passing the data back to the DIM where it is combined to present the requesting user with an integrated response; and

an output translator that converts the data passed back from said LIMs from their respective local formats into said input format so that the data can be combined to present the user with the integrated response.

2. A database controller for integrating data from a plurality of interconnected local databases to provide users with access to a virtual database, comprising:

a plurality of local databases at different remote locations, said local databases including different subsets of data fields from a set of commonly used data fields, in which a particular data field in said set has heterogeneous local formats in said local databases;

a network for interconnecting the remotely located local databases;

a user interface for generating a global query to search the interconnected local databases using a single global schema that defines a virtual database, said global query including at least one data field from the set of commonly used data fields whose values are represented in an input format;

a centralized database that interacts with the local databases over the network to retrieve data related to the value of the data field in the global query, comprising:

a smart data dictionary (SDD) that contains configuration data for each of said local databases and said global schema, said configuration data including respective local formats for each of said commonly used data fields;

a filter that enumerates the local databases that contain data related to the value of the data field in the global query to improve the efficiency of the search of the virtual database;

an input translator that converts the value of said data field in said global query into local values in the respective local formats for the enumerated local databases to facilitate a complete search of the virtual database;

a data information manager (DIM) that generates local queries including said data field's local values, for the enumerated local databases in response to said global query and in accordance with the respective configuration data, transmits the local queries to the enumerated local databases, and receives data in response to the local queries in their respective local formats; and

an output translator that converts the data passed back by the local databases from their respective local formats into said input format, said DIM combining said data into am integrated response in said local format and transmitting the response back to the requesting user; and

a plurality of local information managers (LIMs) at the respective local databases that interact with the centralized database in accordance with the global schema to execute the local queries to search for and retrieve from the respective local databases data that is associated with the local values of said data field, said LIMs passing the data back to the centralized database over the network where the data is translated into the input format and combined to present the requesting user with an integrated response.

3. The database controller of claim 2, wherein said filter is a database that, for each of the data fields in said set and for each of the values, represented in a global format, associated with each of those fields, enumerates a list of the local databases which contain the respective local values, said input translator comprising:

a global translator that converts the value of the data field in said global query into said global format, said filter thereafter selecting the list that corresponds to that value; and

a local translator that converts the value of the data field from said global format into the local values in their respective local formats for the local databases enumerated in the selected list, said DIM generating said local queries for the enumerated databases.

4. The database controller of claim 2, wherein said data fields in said set each have a name which can vary between the local and global formats, said input translator also converting the name of the data field in said global query into the local names for each of said local databases.

5. A database controller for integrating data from a network of interconnected local databases to provide users with access to a virtual database, comprising:

a smart data dictionary (SDD) containing data representing schema, data distribution, local site configuration and inter-site relationships of data among the local databases for each of said local databases and a single global schema that defines a virtual database, said local site configurations including respective local formats and a global format for each data field in a set of commonly used data fields, said SDD maintaining data consistency of the existing local databases and global schema as new local databases are added to the network;

a user interface for selecting an i/o format from among said local and global formats and for generating a global query to search said virtual data base, said global query including at least one data field from said set of commonly used data fields whose values are represented in the selected i/o format;

a global translator that converts the value of said data field in said global query into a global value in the global format;

a global filter that enumerates the local databases that contain data related to the global value;

a local translator that converts the global value into local values in their local formats for the enumerated local databases, respectively;

a data information manager (DIM) that generates local queries, which include said local values, for the enumerated local databases in response to said global query and in accordance with the SDD data;

a plurality of local information managers (LIMs) that execute the local queries to search for and retrieve from the enumerated local databases data that is associated with the local values of said data field, each of said LIMs being further adapted to generate, in accord with the data contained in said SDD, a data retrieval request for execution by another LIM and for receiving data from that local database; and

an output translator that converts the retrieved data from the local formats into the selected i/o format and passes the data back to the DIM where it is combined to present the requesting user with an integrated response.

6. The database controller of claim 5, wherein said data fields in said set each have a name which can vary between the local and global formats, said global translator also converting the name of the data field in said global query into a global name and said local translator converting the global name into local names for each of the enumerated local databases.

7. The database controller of claim 5, wherein said global translator, said filter and said local translator together comprise:

a centralized translation-filtration database which includes:

a) a global look-up-table (LUT) that maps values from any of said local formats into said global format;

b) a filter LUT that provides an enumerated list of local databases that contain information in response to values in said global format; and

c) a local LUT that maps the value from said global format into the local values in their respective local formats for said enumerated local databases.

8. In a computer data network having a plurality of interconnected local databases with a plurality of users each capable of generating a global query for accessing and retrieving data from said databases in accord with a single query protocol, said global query including at least one data field from a set of commonly used data fields, a database controller for directing the transmission of the user generated global query to individual ones of the local databases and for receiving and integrating the requested data received from the databases into a single response and for transmitting the integrated single response to the requesting user, said database controller comprising:

a central database comprising:

a smart data dictionary (SDD) containing a database of data representing schema, data distribution, local site configuration and inter-site relationships of data among the local databases in the network for each database in the network that together define a single global schema for accessing a virtual database;

a filter that enumerates the local databases having data responsive to the global query in accord with said data field and the data contained in the SDD;

an input translator that converts the data field in the global query into local data fields compatible with the enumerated local databases in accord with the local site configuration in the SDD; and

a data information manager (DIM) communicating both with said SDD to retrieve data therefrom, and with said user to receive said global query therefrom and to transmit responsive data thereto, for decomposing the global data query into a local-site execution plan that includes said local data fields for retrieval of data from the enumerated local databases, and for transmitting that portion of said local-site execution plan to be executed to the appropriate database for execution, and receiving data therefrom responsive to said local-site execution plan and said local data field;

a plurality of local information managers (LIMs), each communicating with said DIM and said SDD, for controlling data flow to and from a specified database in the network in response to that portion of said local-site execution plan received from said DIM and for transmitting retrieved data responsive to that portion of said local-site execution plan to said DIM,

each of said LIMs further adapted for generating, in accord with the data contained in said SDD, a data retrieval request for execution by another LIM and for receiving data therefrom in response thereto, in order to complete that portion of said local-site execution plan received by it for execution; and

an output translator in said central database that converts the data received from each of the LIMs in accord with its local site configuration to a single site configuration, said DIM combining said data into an integrated single response and transmitting it back to the requesting user.

9. The database controller of claim 8, further comprising:

at least one LIM controlling data flow to and from at least two local databases and adapted to decompose that portion of said local site execution plan received from said data information manager means into a sub-local site execution plan for retrieval of data responsive to that portion of said local site execution plan received from said DIM from each of said controlled local databases.

10. The database controller of claim 8, wherein said DIM comprises:

a syntactic and semantic parser interfacing with said SDD for retrieving data representing local schema information and the interrelationships among data, and for parsing and validating the syntax of the global query using such data retrieved from said SDD.

11. The database controller of claim 8, wherein said DIM comprises:

an optimizer interfacing with said SDD for retrieving data representing local schema information and the interrelationships among data, and for reducing the amount of data needed to be transferred among local site databases and for choosing the appropriate local database for processing each portion of the local site execution plan.

12. The database controller of claim 8, wherein said DIM comprises:

a local site execution plan controller interfacing with each of said local databases to send each of them that portion of said local site execution plan necessary to extract responsive data from each said local database.

13. The database controller of claim 8, wherein said LIM further includes:

a local controller for controlling the execution of that portion of the local site execution plan sent by the DIM by coordinating all internal operations.

14. A method for integrating data for a plurality of interconnected local databases to provide users with access to a virtual database, comprising:

a) providing meta-data information for each of a plurality of local databases and a virtual database in a central location, said meta-data information including respective local and global formats for each data field in a set of commonly used data fields;

b) generating a global query to search said virtual database, said global query including at least one data field from said set of commonly used data fields whose values are represented in an i/o format;

c) filtering said global query to enumerate those local databases that contain data related to said date field in accord with the meta-data information;

d) translating the value of said date field in said global query into local values in the respective local formats in accord with the meta-data information for the enumerated local databases;

e) generating local queries for the enumerated local databases that include the respective local values in accordance with the meta-data information;

f) passing the local queries to the respective local databases;

g) executing the local queries to search for and retrieve from the local databases data that is associated with the local values of said data field;

h) translating the data retrieved from the local databases from their respective local formats into said i/o format; and

i) integrating said data to present the requesting user with an integrated response.

15. The method of claim 14, further comprising:

j) selecting said i/o format from among said local and global formats.

16. A virtual database, comprising:

a plurality of local databases at different remote locations, said local databases including data fields, in which a particular data field has heterogeneous local formats in said local databases and a particular value for a data field is stored in a subset of the local databases;

a network for interconnecting the remotely located local databases;

a user interface for generating a global query to search a virtual database, said global query including at least one data field from a set of commonly used data fields whose values are represented in an input format;

a centralized database that interacts with and searches via the network the local databases that contain data related to the value of the data field in the global query, said database comprising:

a smart data dictionary (SDD) containing meta-data including schema, data distribution, local site configuration and inter-site relationships of data among the local databases in the network for each local database and a single global schema that defines a virtual database, said local site configuration including respective local formats for each of said commonly used data fields, said SDD maintaining data consistency of the existing local databases and global schema.sub.-- as new local databases are added to the network;

a filter that enumerates the local databases having data responsive to the global query in accord with the value of said data field and the meta-data contained in the SDD;

an input translator that converts the value of said data field into local values in the respective local formats for the enumerated local databases in accord with the meta-data contained in the SDD; and

a data information manager (DIM) that generates local queries, including said data field's local values, for the enumerated local databases in response to said global query and in accordance with the SDD data, transmits the local queries to the enumerated local databases, and receives data in response to the local queries in their respective local formats; and

an output translator that converts the data passed back by the local databases from their respective local formats into said input format, said DIM combining said data into an integrated response in said input format and transmitting the response back to the requesting user; and

a plurality of local information managers (LIMs) at the respective local databases that execute the local queries to search for and retrieve from the respective local databases data that is associated with the local values of said data field, each of said LIMs further adapted for generating, in accord with the data contained in said SDD, a data retrieval request for execution by another LIM and for receiving data therefrom in response thereto, to complete the execution of its local query said LIMs passing the data back to the centralized database over the network where the data is translated into the input format and combined to present the requesting user with an integrated response.
 Description Submit all comments and votes
 


CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to a co-pending commonly assigned U.S. patent application Ser. No. 08/064,690 filed on May 20, 1993, entitled "Federated Information Management Architecture and System" by Son K. Dao and Nader Ebeid, now U.S. Pat. No. 5,596,744.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to DataBase Management Systems (DBMS) dispersed over a long haul network, and more specifically to a federated (global) DataBase controller that provides a user access to a virtual database by integrating heterogeneous DBMSs (data sites) dispersed over a long haul network.

2. Description of the Related Art

Large scale information and DataBase Management Systems have been developed independently and without consideration that one day they may need to be integrated. DBMSs exist in fields such as earth science, computer integrated manufacturing and medicine. As a result, the systems have become more and more complex, are often incompatible and are characterized by several types of heterogeneity.

Many different DataBase Management Systems (DBMS) models may be used to represent data, such as the hierarchical, network, and relational models. Aside from databases, many software systems (such as spreadsheets, multi-media databases and knowledge bases) store other types of data, each with its own data model. Furthermore, the same data may be seen by various users at different levels of abstraction.

In addition, the exact same data may be represented by as many different names and/or values as there are databases. For example, a patient in different medical databases may be represented by his or her name, social security number, drivers license number or a personal ID number. Furthermore, the patient's height may be represented in inches, feet, centimeters or meters.

Because of such differences, users find it difficult to integrate and understand the meaning of all the types of data presented to them. Analysts, operators and current data processing technology are not able to organize, process and intelligently analyze these diverse and massive quantities of information. In order to access data, users may have to learn different operating and word processing systems. This increases training costs and reduces efficiency, which often results in late reports to decision makers, missed intelligence opportunities and unexploited data.

Another related issue is the efficiency or rather inefficiency of searching each database for a piece of information. For example, a network of medical databases may interconnect over a hundred different databases of which any one patient may be included in only two or three. As the number, size and complexity of databases grows, the inefficiency of searching each database has become a significant problem.

Past and current research and development in distributed databases allows integrated access by providing a homogenizing layer on top of the underlying information systems (UISs). Common approaches for supporting this layer focus on defining a single uniform database language and data model that can accommodate all features of the UISS. The two main approaches are known as view integration and multi-database language.

The view integration approach advocates the use of a relational, an object-oriented (00), or a logic model both for defining views (virtual or snapshot) on the schemas of more than one target database and for formulating queries against the views. The view integration approach is one mechanism for homogenizing the schema incompatibilities of the UlSs. In this framework, all UISs are converted to the equivalent schemas in the standard relational, 00, or logical data model. The choice of the uniform data model is based on its expressiveness, its representation power and its supported environment. This technique is very powerful from the user's point of view. It insulates the user from the design and changes of the underlying Information Management System (IMS) Thus, it allows the user to spend more time in an application environment. However, the view integration approach has a limited applicability (low degree of heterogeneity) because there are many situations when the semantics of the data are deeply dependent on the way in which the applications manipulate it, and are only partially expressed by the schema. Many recent applications in areas where traditional DBMSs are not usable fall into this situation (multi-media applications involving Text, Graphics and Images are typical examples), in addition, there are no available tools to semi-automate the building and the maintenance of the unified view which is vital to the success of this technique.

In the multi-database language approach, a user, or application, must understand the contents of each UIS in order to access the shared information and to resolve conflicts of facts in a manner particular to each application. There is no global schema to provide advice about the meta-data. Ease of maintenance and ability to deal with inconsistent databases make this approach very attractive. The major drawback of this approach is that the burden of understanding the underlying IMSs lies on the user. Accordingly, there is a tradeoff between this multi-database language approach and the view integration approach discussed above.

Chin-Wan Chung, "Design and Implementation of a Heterogeneous Distributed Database Management System," Proceeding of the IEEE Computer and Communications Societies, pp. 356-362, 1989 discloses a software program called DATAPLEX. DATAPLEX translates an SQL query into various other query formats so that it can query non-relational data base systems. Marjorie Templeton et al., "Mermaid-A Front-End to Distributed Heterogeneous Databases," Proceeding of the IEEE, Vol. 75, No. 5, May 1987, pp. 695-707 discloses a program called "Mermaid" that allows the user of multiple databases stored under various relational DBMSs running on different machines to manipulate the data using a common language.

There remains an urgent need to integrate these dispersed heterogeneous databases to provide uniform access to the data, to maintain integrity of the data, to control its access and use, and to improve search efficiency. Rather than requiring users to learn a variety of interfaces in order to access different databases, it is preferable that a single interface be made available which provides access to each of the DBMSs and supports queries which reference data managed by more than one information system. The single interface should provide true integration of the heterogenous databases so that the user can easily access and analyze data that is not represented uniformly across the databases. The interface should also restrict the search to only databases that potentially contain the data of interest to improve efficiency.

SUMMARY OF THE INVENTION

The present invention seeks to provide a Federated Information Management (FIM) system and method for efficiently and truly integrating data from a plurality of interconnected and heterogeneous local databases to provide users with access to a virtual database.

This is accomplished with a FIM architecture that includes a user interface for generating a global query to search the virtual database and for selecting an i/o format, a smart data dictionary (SDD) that contains configuration data for each of the local databases and the virtual data base, a data information manager (DIM) that decomposes the global query into multiple local queries, and a plurality of local information managers (LIMs) that execute the local queries to search for and retrieve data from the local databases.

To improve search efficiency, a filter generates a list of those local databases that contain information relevant to the global query. As a result, the DIM only generates local queries for the enumerated local databases. In practice, only a small percentage of the databases in a network will contain data for a particular global query. Thus, the filter substantially increases search efficiency and reduces the chance that the system will hang up.

To increase the completeness of the search, an input translator is provided which translates the global query into the respective local formats for the local databases. As a result, the local queries will find and retrieve all of the relevant data requested by the user, not just the data that is represented in the exact same format as the global query. Thus, the input translator provides true integration of heterogeneous databases.

To improve user friendliness and to facilitate analysis of the data, an output translator converts the data retrieved from each local database into the uniform i/o format. The user typically selects the i/o format as his or her local format or a global format associated with the virtual database. Alternately, the user could select a different local format or potentially a mixed format.

For a better understanding of the invention, and to show how the same may be carried into effect, reference will now be made, by way of example, to the accompanying drawings in which:

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram illustrating one mode of a data communication network between individual users or applications and underlying information systems in accordance with the present invention;

FIG. 2 is a diagram showing the component architecture and operational processing flow of a distributed information manager in accordance with the present invention;

FIG. 3 is a diagram showing the component architecture and operational processing flow between a distributed information manager and an associated local information manager both in accordance with the present invention;

FIG. 4 is a diagram showing the component architecture and operational processing flow Of a limited information manager in accordance with the present invention;

FIG. 5 is a diagram showing a Bulk-Load Copy Protocol (BCP) in accordance with the present invention;

FIG. 6 is a diagram slowing the component architecture and operational processing flow between a Smart Data Dictionary Cache Memory Management and a Smart Data Dictionary Server in accordance with the present invention;

FIG. 7 is a diagram showing the component architecture and operational processing flow of a Cache Memory Management device in accordance with the present invention;

FIG. 8 is a diagram showing a Client/Server Model in accordance with the present invention;

FIG. 9 is a diagram showing a Hierarchial Star Distributed Processing Topology in accordance with the present invention;

FIG. 10 is a diagram showing a Hierarchial Distributed Processing Topology in accordance with the present invention; and,

FIG. 11 is a diagram showing a Distributed Transaction Service Hierarchy Topology in accordance with the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention allows multiple users to access through a global query geographically dispersed and heterogeneous information management systems or Relational DataBase Management Systems (RDBMS). It provides each user with a unified view of the underlying information management systems so that they appear to be a single (virtual) database. The invention includes a Federated Information Management (FIM) architecture coupled with the Inter-Site Transaction Service (ISTS) architecture to allow transparent access to a wide variety of DBMSs while maintaining the local autonomy of the underlying DBMSs. This means that the users can still use the same application to access the local databases, and only minimum change to the local database system is required for sharing and remote accessing relevant data. With this invention architecture, the FIM of the present invention can run on top of different hardware, operating systems, communication networks, and DBMSs. The federated architecture of the present invention is not limited to integrate relational DBMSs, but may also integrate legacy DBMSs such as hierarchical or network DBMSs, spatial information systems or geographical Information Systems, and text retrieval systems.

One novel aspect of the invention is the input translation of the data fields in the user's global query into the local and heterogeneous formats associated with each of the local databases. This allows the users to access all information that is related to a particular data field without missing information because the data field in a particular local database is represented in a different format.

A second novel aspect is the filtration of the local databases to identify only those databases that actually contain data related to the particular value of the data field selected by the user. This substantially improves the operating efficiency of the virtual database because, in practice, only a small percentage of all the local databases in the network will contain relevant information for a particular data field. Furthermore, reducing the number of local queries that must be executed greatly reduces the probability of an error that would hang up the system.

Another novel aspect of the invention is the output translation of the data retrieved from each of the local databases into a uniform format. The uniform format is typically the user's local format or a global format for the virtual database itself. This allows the retrieved data to be integrated or merged to present the requesting user with an integrated response. Thus, the data presented to the user appears to come from a single uniform (virtual) database.

An additional aspect is that the requesting user can select to generate the global query and receive the integrated response in either the user's local format or in the global format. This allows the user to access the virtual database and analyze the data in the language or format with which the user is most familiar.

As shown in FIG. 1, a Federated Information Management System (FIMS) architecture 10 is employed to present multiple users 12 with the illusion of a single, integrated, non-distributed (virtual) database which accesses data from multiple heterogeneous Relational DataBase Management Systems (RDBMS) 14 through a uniform user interface 16. As discussed previously, the individual RDBMSs in a large network use many different architectures, data formats and user interfaces. Each of the RDBMSs contain multiple data tables in which the data, referred to herein as data fields, are organized into records. A user can access a particular record or sets of records by specifying one of the data fields in the global query. As a result of the heterogeneous architectures, the names and format for the tables and fields, although associated with the same data field, can be very different. Similarly, the virtual database contains global names and formats for the tables and data fields, which may or may not correspond to any of the local names and formats.

For example, in a medical network the individual heterogeneous RDBMS may be located in remote doctors' offices, hospitals, pharmacies and insurance companies. Medical data can be divided into tables such as medications, treatments and surgical procedures. The table that represents medications may be referred to as med.sub.-- table, medical.sub.-- table or any number of other descriptive names in the various RDBMSs. The records in the medication table include data fields for the patient, drug, date, amount, the prescribing doctor and ailment. The data field that identifies the patient may be represented as the patient's name, drivers license, social security number, an insurance number or a local ID number. In the virtual database, the medication table may be represented as MED.sub.-- TABLE and the patient ID by the patients name; last name first and first name last (Smith, John). For example, a portion of the requesting user's RDBMS, number 34, may appear as:

______________________________________ Med.sub.-- Table ID BlueShield.123 aspirin Jan. 1 1985 500 mg Dr. Adams demurol March 5 1987 250 mg Dr. Brown ______________________________________

and a portion of another RDMBS, number 8, may appear as:

______________________________________ medical.sub.-- table SS# 617-22-837 codeine 6-3-91 800 mg Dr. Bob tylenol 8-19-93 500 mg Dr. Tom ______________________________________

The user interface 16 includes a Query Browser and Editor (QuBE) module that provides the users 12 with a uniform access to the multiple RDBMSs 14. Users can formulate their global queries using either a structured query language (SQL) or a graphical user interface (GUI). The requesting user selects an i/o format for constructing the global query and for receiving the integrated response. The user preferably selects either the user's local format or the global format. Alternately, the user may select any of the other local formats or a mixed format.

The following is an example of a global query generated in SQL:

Select ID, Medications From Med.sub.-- Table Where ID=BlueShield.123

where the Select command identifies the desired data fields, the From command identifies the table of interest, and the Where command assigns the data field a particular value for searching the databases. The above global query would retrieve all of the medication from the Med.sub.-- Table associated with the particular ID value BlueShield.123.

The FIM architecture 10 includes a Smart Data Dictionary (SDD) server 18 that contains meta-data information for each of the RDBMSs 14 and the virtual database such as: schema; data distribution; sites configuration including data field formats; domain knowledge; and, inter-site relationships. The SDD automatically maintains data consistency as new applications or databases are added. To support reasoning and problem solving capabilities in a cohesive way, the SDD uses a Multi-dimensional reference model that allows multiple integrated layers of abstractions spanning a wide variety of data types (text, spatial, etc.).

A Distributed Information Manager (DIM) 20 accesses the meta-data stored in SDD 18 to decompose the global query into multiple local queries. The DIM also provides a distributed access plan (DAP) for executing the global query. This access plan is composed of local execution plans (LEPs), one for each RDBMS. Each LEP includes the local query and control information.

The DIM 20 includes a Syntactic and Semantic Parser (SSP) that parses and validates the syntax of the global query. An optimizer provides planning to control the time that is required to access the RDBMSs and process the global query. An execution plan generator (EPG) translates the LEPs from an internal data structure to the Distributed Intermediate Structured Query Language (DISQL). A distributed processing coordinator (DPC) coordinates the execution of the local queries.

A plurality of local information managers (LIMs) 22, one for each RDBMS, execute the local execution plans to retrieve data from the respective RDBMSs. The DIM, LIMs and RDBMSs are inter-connected via an inter-site transaction service (ISTS) 24. The LIM 22 provides a mapping from the global view associated with the virtual database to the multiple local views, translates the local queries from DISQL to the local RDBMS language, and interfaces with the local RDBMS. The LIM includes the following sub-components: a local controller, a local reduction processor, a fragment replicator, a local query processor, a result transmitter, and a results integrator. These sub-components execute the local execution plan and pass the retrieved data back to the DIM, which, in turn, combines the results and presents and integrated response to the requesting user.

The FIM architecture 10 further includes an input filter 26 and input and output translators 28 and 30 that improve its performance. Specifically, the filter and translators improve the completeness of the search for data, the efficiency of the search, and the integration of the retrieved data.

The input filter 26 provides an enumerated list of those RDBMS 14 that actually contain the data field value, translated to their local formats, prescribed in the global query. This causes the SDD 18 to only pass meta-data for the enumerated RDBMS to the DIM 20. Thus, the number of local queries and local execution plans generated by the DIM and executed by the LIMs is greatly reduced. For example, in a network of 100 medical RDBMSs an individual patient may be included in only three of them. Alternately, the input filter could be applied after the DIM generates a complete set of local queries to remove those queries not included in the list. However, this approach would be less efficient and is not preferred.

Continuing the above example, the input filter 26 itself is preferably a data base of which the portion relevant to the example could look like the following:

______________________________________ MED.sub.-- TABLE PATIENT NAME SMITH, JOHN 1, 8, 34, 57 SMYTH, JIM 5, 19, 42, 84 TOWNSEND, DON 11, 32, 54 ______________________________________

where MED.sub.-- TABLE is the global name for the medication table, PATIENT NAME is the global name for the patient data field and the global format is last name first and first name last. The filter database includes three patients for which medical data is contained in the enumerated local data bases. The input filter locates the table, then the data field and finally the value specified in the global query.

The input translator 28 includes two components: a global translator 32 and a local translator 34. The global translator first translates the values of the data fields in the global query into the global format. This allows the input filter 26 to enumerate the relevant RDBMSs. Thereafter, local translator 28 converts the data field value and, when necessary, the name of the data field or table into the local formats for the RDBMSs enumerated by the filter 26. As a result, the SDD 18 passes the local names and values to the DIM so that it generates the local queries with the correct local names and values for searching the respective RDBMS. Thus, all of the relevant data will be retrieved and presented to the requesting user.

The input translator 28, as well as the output translator 30, include both algorithmic and non-algorithmic translators. The algorithmic translators include those for converting meters to miles or uppercase to lower case letters. Non-algorithmic translators use look-up-tables (LUTs) and include conversions from proper names to insurance codes or medical procedure names to numeric codes. Both types of translators can be implemented as a single database.

For the above example, the non-algorithmic global and local translators are preferably databases for which the portions relevant to the example have the following form:

______________________________________ Global Translator: RDBMS Local Value Global Value ______________________________________ 1 med.sub.-- table MED.sub.-- TABLE I.D. PATIENT NAME 418 SMITH, JOHN 8 medical.sub.-- table MED.sub.-- TABLE SS# PATIENT NAME 617-22-837 SMITH, JOHN 34 Med.sub.-- Table MED.sub.-- TABLE ID PATIENT NAME BlueShield.123 SMITH, JOHN ______________________________________ Local Translator: Global Value RDBMS1 RDBMS8 RDBMS34 ______________________________________ MED.sub.-- TABLE med.sub.-- table medical.sub.-- table Med.sub.-- Table PATIENT I.D. SS# ID NAME SMITH, JOHN 418 617-22-837 BlueShield.123 ______________________________________

The output translator 30 converts the data retrieved by the LIMs from their respective RDBMSs into the i/o format selected by the requesting user. The