|
Description  |
|
|
TECHNICAL FIELD
This invention relates generally to mapping data between two standard data
models. In particular, this invention relates to methods for mapping data
between an object-oriented data model and a relational data model.
BACKGROUND ART
The relational database model was introduced in the early 1970's by E. F.
Codd. Since then, the relational model has become the model employed by
most commercial database management systems (DBMS).
Data in a relational database is represented as a collection of relations.
Each relation can be thought of as a table.
Like the relational database model, object-oriented programming ("OOP") has
also existed since the early 1970's. In the early 1990's, object-oriented
programming gained widespread acceptance due to increased power of
workstations, proliferation of graphical user-interfaces and the
development of hybrid object-oriented languages such as C++.
The OOP paradigm provides a class construct which combines data and
procedural abstractions. The definition of a class includes a definition
of the storage requirements of the class as well as the procedures which
define how objects of the class behave.
An object is an instance of a class. Every object includes the data and
procedural characteristics of its class. In addition, new .objects inherit
the storage and functionality defined by all classes used to define the
parent of the object.
The present proliferation of relational DBMSs coupled with the increasing
popularity of the OOP paradigm has resulted in a desire to map data
between data models. In particular, it is desirable to access relational
databases in OOP applications, and to access object-oriented data from
within a relational DBMS.
Commercial tools currently available for mapping object-oriented data to
relational DBMSs include Persistence, ROCK Phase II, and ObjectStore.
These tools are primarily intended to allow application objects to be
persistent. Further, these applications typically assume a straight
mapping correspondence between application objects and a database schema.
Various approaches have been considered for object-relational integration.
In most approaches, the purpose has been to interface object-oriented
applications with relational data storage. These approaches include:
The embedded database interaction in which the interaction is controlled
directly by the methods of the object class (e.g. using embedded SQL).
This approach makes the object-oriented application rather tightly coupled
to the data-storage technology. It is well suited to code generation
techniques when the mapping is straightforward. Persistence is one
commercial product incorporating this approach.
The import-export approach uses an external module which is invoked as a
conversion facility for objects. This approach has been used for
conversions between relational and object databases. It can be used for
providing persistence and object views to object-oriented applications.
The import-export module acts as an external object server. Although the
functional coupling is loose, the module requires information regarding
the models on each end and must maintain the consistency of its
representations.
The SQL gateway is a query server, which is more flexible than the
import-export approach. In its simplest version, the object methods
encapsulate some parameterized SQL statements and invoke the gateway to
handle them. The SQL gateway does the conversion between the relational
form of the data and some convenient host representation such as an array
or primitive object. The SQL gateway can be encapsulated in a single class
that is inherited by any other application object class.
The helper class can be associated with each class of an application. The
helper class includes methods which store/retrieve data. An inheritance
relationship between the object class and its helper is not required. A
handle on the object is passed to the helper which is directly manipulated
itself as a separate object.
Each prior art solution has advantages and disadvantages that must be
weighted depending on the requirements in the following areas:
Flexibility: The solution should provide independence from the storage
technology;
Composability: The mapping operations and operators should be easy to
combine, since requests may concern aggregations of objects such as
collections and compositions hierarchies;
Security: The solution should prevent the application designer from
accessing the database in an unauthorized or inefficient way;
Evolution: The mapping technique should be flexible with regard to changes
in the domain object-oriented model, in the database schema and in the
physical organization; and
"Design overhead": The mapping solution should limit the complexity added
to the application object model.
DISCLOSURE OF THE INVENTION
A need therefore exists for an improved method for mapping data between an
object oriented format and a relational format. More particularly, a need
exists for a method for mapping data between an object oriented format and
a relational format which provides an application with not only a facility
for making objects persistent but also a facility for populating objects
with data from existing relational databases.
The present invention described and disclosed herein comprises a method for
mapping data between an object oriented format and a relational format
which satisfies these needs.
It is an object of the present invention to provide a method for mapping
data between an object oriented format and a relational format using a
transit object transmitted between an object-oriented client application
and a data server managing relational databases.
It is another object of the present invention to provide a method for
mapping data between an object oriented format and a relational format
which accommodates various levels of granularity of the data flow.
It is yet another object of the present invention to provide a method for
mapping data between an object oriented format and a relational format
which makes the application code independent from the database language,
thus being transparent to the client application.
In carrying out the above objects and other objects of the present
invention, a first method is provided for mapping data from a plurality of
objects to a relational database. The method is intended for use in a data
processing system which includes a processor, a memory, a client object
broker ("COB"), a communication server and a server object broker ("SOB")
.
The method begins with the step of generating a transit object. The transit
object is a complex data structure that is comparable to a small size
database. The implementation name Of a TO is a "dataGraph". A dataGraph
contains at least one "dataBlock".
The method continues with the step of populating at least one dataBlock
object of the transit object based on the data of the plurality of
objects. Next, the method includes the step of transmitting the transit
object from the COB to the SOB using the communication server.
The method further includes the step of populating a data structure based
on at least one dataBlock object. The method concludes with the step of
populating the relational database based on the data structure.
In carrying out the above objects and other objects of the present
invention, a second method is provided for mapping data from a relational
database to a plurality of objects. The second method begins with the step
of generating in the memory a transit object. The transit object includes
at least one dataBlock object.
The method continues with the step of generating in memory a data
structure. The method also includes the step of populating the data
structure based on the data of the relational database. The method further
includes the step of populating at least one dataBlock object of the
transit object based on the data structure.
Next, the method includes the step of transmitting the transit object from
the COB to the SOB using the communication server. Finally, the method
concludes with the step of populating at least one object based on at
least one dataBlock object.
The objects, features and advantages of the present invention are readily
apparent from the detailed description of the best mode for carrying out
the invention when taken in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
A more complete appreciation of the invention and many of the attendant
advantages thereof may be readily obtained by reference to the following
detailed description when considered with the accompanying drawings in
which reference characters indicate corresponding parts in all of the
views, wherein:
FIG. 1 is a schematic block diagram illustrating a many-to-many mapping
between object classes and relational tables;
FIG. 2 is a schematic block diagram illustrating the architecture employed
by the present invention;
FIG. 3 is a schematic block diagram illustrating a query of a relational
database;
FIG. 4 is a schematic block diagram illustrating a constructor used for
producing a customized relational query result;
FIG. 5 is a schematic block diagram illustrating the elements of the SOB;
FIG. 6 is a schematic diagram illustrating a typical object class
hierarchy;
FIG. 7 is a block diagram illustrating the contents of a TO associated with
the typical object class hierarchy;
FIG. 8 is a block diagram illustrating the TO-schema of the TO associated
with the typical object class hierarchy;
FIG. 9 is a block diagram illustrating a list-based implementation of a
datalist and references of the present invention;
FIG. 10 is a block diagram illustrating a collection of objects to be
mapped to a TO;
FIG. 11 is a flowchart illustrating the steps of the process to map a
collection of objects to a TO;
FIG. 12 is a flowchart illustrating the steps of the process to populate an
application object from a TO;
FIG. 13 is a block diagram illustrating example classes in an application
domain and corresponding tables of a relational database;
FIG. 13 is a block diagram illustrating example classes in an application
domain and corresponding tables of a relational database;
FIG. 14 is a diagram illustrating the result of an SQL retrieval and a
corresponding TO; and
FIG. 15 is a block diagram illustrating the structure of an example
relational database.
DETAILED DESCRIPTION
Referring now to the drawing figures, there is illustrated in FIG. 1 a
typical many-to-many mapping between object classes and relational tables
which can be handled using the present invention. Data of object class 110
maps to the relational tables of group 112. Group 112 includes relational
tables 122, 124 and 126.
Data of object class 114 maps to the relational tables of group 116. Group
116 includes relational tables 126 and 128. As illustrated, relational
table 126 belongs to both group 112 and group 116.
Data of object class 118 maps to the relational tables of group 120. Group
120 consists of relational table 128. Relational table 128 also belongs to
group 116.
Referring now to FIG. 2, there is illustrated a schematic block diagram of
the distributed architecture employed by the present invention. The
architecture is divided into two sides. The first side is an application
side referred to as a client object broker ("COB") 212. COB 212 is an
object-oriented application which uses data stored in application object
210.
The second side of the architecture is a data server side referred to as a
server object broker ("SOB") 216. SOB 216 is a relational server which is
responsible for relational data stored in database 218.
The mapping process of the present invention is distributed on both sides
of the client-server architecture. Each side performs a portion of the
mapping. The intermediate form of the data between the two sides is
Transit Object ("TO") 214.
When circulating over the network, the data is in TO form. The mapping
operations of COB 212 are get.sub.-- CobTO and put.sub.-- CobTO. The
mapping operations of SOB 216 are get.sub.-- SobTO and put.sub.-- SobTO.
Client-Server Considerations
Security: In an object client-server architecture, security is a major
concern. The application access to the database has to be restricted. For
this reason, as well as for the sake of independence from the storage
technology, no direct SQL capability should be made available to the
application. This restriction can also be extended to read only
transactions since there is a need to control the cost of databases
transactions at a server level. In other words, the Object Server has a
control over both content and processing of the database access.
Transaction granularity: various levels of granularity in database
interaction should be accommodated (e.g. single attribute update as well
as a transaction on an entire aggregation/collection of objects), and not
restricted to object units. This is especially important when it comes to
optimize the data flow as well as the number of transactions that is
handled by a data served (and therefore its performance). Such scalability
of requests for manipulating parts of objects as well as
aggregations/collections of objects requires an interface more flexible
than an import/export module.
Source transparency and Client transparency: An object server is a piece of
network infrastructure that should neither depend too heavily on its data
sources, nor have its code depend on the applications objects it serves.
In other words, there should not be any shut down of the server or
recompiling/relinking of its code when changing the database or its
content, or when adding a new application. Tasks such as switching to
another database or changing the database interface can be handled at
run-time, and the code of the server is totally application independent.
Object services: one can expect an object broker or an object server to
provide some services, especially in a multi-user context, like object
caching, locking, or notification based on events related to objects (e.g.
an `object` X has been checked out by the application Y). Such services
require the data to be already in some structured form when handled by the
broker.
In addition, there should be an `object` ID for this intermediate form of
data. However, if one wants to keep the object broker independent from the
applications, the code of the intermediate data structure should be
independent from any application object model. Only the content should be
allowed to depend on the application model.
Similarly, the notion of an ID for the intermediate data structure should
not be related to the address space of an application, and rather have a
scope spanning several applications. This is required if one wants the
services mentioned above to be provided for both multi-user applications
and for multiple applications sharing data.
Referring now to FIG. 3, there is illustrated a query of relational
database 218. On the SOB 216 side, get.sub.-- SobTO, a stored procedure,
is invoked by a query processor and returns the query result into an array
of tuples 310.
A TO constructor 412 can be implemented as shown in FIG. 4. The TO
constructor 412 is used for producing a customized relational query
result.
Associated with the stored procedure call is a specification called a
TO-schema. The TO-schema describes how to structure the result of the
query. The TO-schema, for example, describes how to generate the resulting
TO. The TO, therefore, reflects processing in addition to the SQL
retrieval of the data.
A default TO-schema is associated with each stored read-procedure 410 in
the SOB 216. Each application, however, can override the default TO-schema
when sending a request, thus customizing the result of the stored
procedure.
As shown in FIG. 4, TO constructor 412 customizes the default TO schema
into TO1 414, TO2 416 and TO3 418.
Note that the SOB 216 does not reflect any knowledge of a specific
application model. In fact, a single SOB 216 can serve many object models.
The TOs are implemented as instances of a single object class called
DataGraph which is orthogonal to any object of an application specific
domain model.
The structure of the DataGraph's contents can be customized for specific
client objects in order to match the object's structure. This aspect of
the mapping is comparable to the SQL gateway solution.
On the COB 212 side, put.sub.-- CobTO rebuilds the resulting TO and makes
the data available through the Persistent Object class. The Persistent
Object class is inherited by all persistent object classes of the COB
application.
The TO is then accessed by the object class which initiated the request or
by the iterator of a corresponding collection class in case the DataGraph
is expected to contain several objects of same type.
Since the TO is a customized form of the retrieved data, its structure is
already much closer to the application object than, for example, the set
of tuples resulting from a complex relational join. The TO data must then
be assigned to one or several objects of the application.
At this point, several options are available to locally convert the TO. The
preferred embodiment embeds the conversion task into each application
object class. By handling a standardized intermediate object such as the
TO, the inconvenience of embedding queries and structures that are
specific to a specific database technology is avoided.
Referring now to FIG. 5, there is a more detailed illustration of the
elements of the SOB 216. As shown at 514, an application call is placed to
the DBMS. The stored procedures 510 of the DBMS process the call. In the
preferred embodiment, the call associated with a request does not
necessarily use the name of the stored procedure. The call uses a
surrogate name that is mapped to the name of the stored procedure by the
query processor 512 of the SOB 216.
If the call is a request to retrieve relational data, the stored procedures
510 produce an array of selected tuples from the stored data 218. The
query processor 512 then invokes get.sub.-- SobTO to produce an output TO
as shown at 514.
If the call is a request to store relational data, the query processor
produces a set of arrays using put.sub.-- SobTO. The stored procedure 510
then extract the data from the arrays and store the data at 218. As shown
at 514, two TOs are actually associated with each request: an input TO and
an output TO.
Referring now to FIG. 6, there is illustrated a typical object class
hierarchy. The example hierarchy is used to describe a customer order CO
610. The CO 610 is composed of at least one customer product CP 612 and of
at least one order item OI 614. Each OI 614 is composed of at least one
item attribute IA 616.
Referring now to FIG. 7, there is a block diagram illustrating the contents
of the TO associated with the example class hierarchy shown in FIG. 6. A
TO that maps to an object of class IA 616 contains a list of two elements:
name and value.
A TO that maps to an object of class OI 614 contains a list of two
elements: item number and action. Such a TO must also include a reference
to a list of lists having the form: name and value--one for each IA 616.
A TO that maps to an object of class CO 610 contains a list of three
elements: number, order date, and charge. A TO that maps to CO 610 must
also include two references to lists of lists. The first reference points
to a list of CPs 612. The second reference points to a list of OIs 614.
Finally, a single TO could hold a collection of COs 610. Each CO 610 being
represented with its components as previously described.
Ad-hoc persistence operations can also be handled through TOs. An object
can build a customized TO for one or a group of its attributes, therefore
avoiding the use of standard TOs associated to its class.
A list of values that holds the attribute values of an object is called its
datalist 710. The TO shown in the FIG. 7 contains z CO datalists. The
k.sup.th of these CO datalists 712 refers in turn to a list of CP
datalists 714 and to a list of OI datalists 716.
The cp.sub.j 2 element represents the value of the second attribute
("quantity") of the first CP object that is part of the k.sup.th CO object
stored in this TO. The index j means that this datalist is the j.sup.th of
the CP datalists.
The example illustrated in FIG. 7 shows that a single TO can store the data
of one or several objects of any type, as well as hierarchies of objects
of various types, or even a collection of such hierarchies.
It further shows that the only data structure that is needed for
representing TO data is a tree-like structure, each node of which is a
list of lists (or a set of lists) of values.
The TO-schema
In the previous examples, the TO closely matches the object data and its
composition hierarchy. One can, however, build a TO from an object where
the TO structure does not closely reflect the object structure.
For example, one could eliminate some attributes, eliminate some
components, or add some attributes in the TO that were hot in the original
object. One could also reorganize the object data by flattening all its
data, or introduce some additional hierarchy. Tracking all these
modifications is greatly facilitated if the TO contains some
meta-information.
In addition to the tree-like structure that holds its TO-data, a TO also
contains its own data model known as a TO-schema. FIG. 8 illustrates the
TO-schema of the previously discussed TO. The TO-schema contains:
TO-entities, TO-relationships and TO-attributes.
A TO-entity contains TO-attributes of different types. A TO-entity has
instances, each of which is represented as a datalist.
TO-relationships are oriented, binary relationships. A TO-relationship
relates a TO-entity, called the domain TO-entity, to another TO-entity,
called the range TO-entity, in an oriented way. A TO-relationship is
described by: (1) its name, (2) a type (e.g. "association", "composition",
"inheritance"), and (3) its domain and range TO-entities.
At the TO-data level, a TO-relationship is a many-to-many relationship
between datalists. It can be represented by associating to each datalist
of the domain TO-entity, a reference to a group of datalists that are
instances of the range TO-entity.
Each TO-attribute of a TO-entity is described by: (1) a name, (2) a type,
and (3) a maximum size in bytes. Optionally, a TO attribute can by
described by (4) a flag indicating whether the attribute can be considered
as part of the identifier for the datalist in which it is contained, and
(5) a coordinate slot that is used for mapping the TO from or to a
multi-array data structure. Such a slot can be used to store index
information such as a column number.
TO-entity Instances
The preferred embodiment implements a TO two ways depending on the
representation of the TO-data. The first method is based on arrays. The
second method is based on lists.
The array-based implementation assigns a one-dimensional array to each
TO-attribute. The array holds its instances for all TO-entity instances.
The grouping of the different TO-attributes of a TO-entity can in turn be
done by chaining the one-dimension arrays into a list or into a
bi-dimensional array.
The advantage of the array-based implementation is that it facilitates the
memory allocation in cases where the size of the TO is known or bounded in
advance. In addition, it provides control over the memory allocation. For
example, one can decide to allocate these arrays in such a way that all
the instances of a TO-attribute are contiguous in memory. This facilitates
the transfer of TO-data using communication primitives such as RPC calls.
The list-based implementation assigns a list to each TO-entity instance
(i.e. to each datalist). Therefore, it actually implements the datalist.
Each element of the list, however, is actually a pointer to the value of
the element. Thus, such a list can be heterogeneous having elements of
different sizes.
The instances of a TO-entity can then be grouped by using a list, each
element of which is a pointer to an instance-list. The advantage of this
representation is that the instances of a TO-entity can be easily updated,
removed, or inserted.
TO-relationship Instances
The implementation of a TO-relationship requires some means to associate a
TO-entity instance to zero, one or several other TO-entity instances. A
TO-relationship is implicitly considered as an oriented, binary,
many-to-many relationship. Given a TO-relationship r.sub.12 from a
TO-entity e.sub.1 to a TO-entity e.sub.2, we call "r.sub.12 -reference"
the link from an instance of e.sub.1 to an instance of e.sub.2.
There are two ways to represent an r-reference: (1) by using a list of
pointers to the referenced datalists, or a list of indices to the arrays
entries that correspond to the referenced datalists, or (2) by using two
indices that represent an index range in an array.
The latter representation assumes that the referenced datalists are
consecutively stored, which means that they can be identified by a single
interval of indices. If this is the case, we call this property the "index
density property" (IDP). Although the IDP poses some constraints on the
way the TO is built, it allows for an implementation of references that
spares memory and speeds up the access to the referenced datalists.
Current Implementation
Favoring flexibility in memory allocation and TO updates over control, the
current implementation of TOs uses a list-based representation for
TO-data. The current implementation is intended to handle relational data
that results from SQL queries that perform joins across tables.
The mapping process can guarantee that the referenced datalists be
consecutively stored in the list of instances of a TO-entity. Therefore, a
pair of indices will suffice for each r-reference.
To provide system independence, the elementary types of values in the
datalists are limited to strings of characters. Thus, some conversion,
such as string to numeric, may need to take place when mapping to and from
TOs.
The implementation of a datalist that is an instance of the CO class,
including its two r-references, one to CP, the other to OI, is represented
in FIG. 9.
A C++ Implementation
Three major C++ classes can be used to handle TOs: dataGraph.sub.-- info,
dataGraph, and dataBlock.
The objects of the dataGraph.sub.-- info class contain a description of the
TO-schema as illustrated in FIG. 8. Such a TO-schema description can be
read from a text file. Appendix A illustrates the preferred format of such
a text file.
A dataGraph.sub.-- info object can be dynamically extended by adding a new
TO-entity and connecting it through a TO-relationship to an existing one.
The class description of Appendix B defines a dataGraph.sub.-- Info class.
The main class for TOs is called dataGraph. An instance of dataGraph
actually represents a TO. One constructor of dataGraph requires a
dataGraph.sub.-- info object as input.
Once the dataGraph object is built by this constructor, it contains a
representation of the TO-schema with empty TO-data. The main difference
between such an "empty" dataGraph object and the corresponding
dataGraph.sub.-- info object is that the former is a sort of "compiled"
version of the latter, and therefore less easily updatable.
A dataGraph object is actually composed of a list of other objects that are
instances of the dataBlock class. A dataBlock object represents a
TO-entity (schema level description) and its instances such as the list of
CO datalists, as illustrated in FIG. 9.
For example, in our example of TO as illustrated in FIG. 7, there would be
four dataBlock objects in the dataGraph object that represents this TO.
When creating an "empty" dataGraph object from a dataGraph.sub.-- info
object, the TO-relationships are interpreted as connections between
"empty" dataBlock objects, thus ordering them as a tree.
Mapping to and from the transit object
An application that needs to make object data persistent by saving it into
a database has the responsibility to build its own TOs. In other words,
the persistence methods of an object should map to and from TOs.
There might be several TOs corresponding to an application object. A common
case of such multiple TOs associated to a same object occurs when there is
a need for several persistence methods. For example, one for the core part
of the object, another for the object and all its components.
In the first case, a TO with one TO-entity is sufficient. In the second
case, a TO with a more complex TO-schema is required such as the one
illustrated in FIG. 8.
In the preferred embodiment, an application object class should include a
classID. A classID is an integer made accessible as a class member by any
instance of this class.
There are generally three activities an application must complete to make
an application object persistent by using TOs. First, the application must
build an empty TO or dataGraph object and access the part of this TO to be
populated (e.g. the dataBlock object of interest). Next, the application
must populate the dataBlock object. Finally, the application must send the
dataGraph object to a communication server, after having converted it into
a communication format.
Create the dataBlock(s)
A persistence method must first create a dataGraph object, unless the
constructor of the object has already built all the empty dataGraph
objects that are to be used by persistence methods. It is assumed that the
method or the constructor has access to a dataGraph.sub.-- info object,
that can be a class member of this application object class (i.e. instance
independent),
Upon invocation of a persistence method, the method must access, inside the
dataGraph object, the dataBlock object that pertains to the data to be
transferred. In C++, this can be done by using two methods:
int dataGraph::GetDBindex(int class ID)
and
dataBlock* dataGraph::Get.sub.-- dataBlock(int i)
The first method returns an index in the list of dataBlock objects that are
components of a dataGraph object. The index identifies the dataBlock
object that corresponds to the classID argument. The second access method
returns the actual dataBlock object given the index.
Populate the dataBlock(s)
Once the dataBlock object is accessed, the method must build a datalist in
it. Three basic methods handle this task:
int dataBlock::OpenTuple();
/* create and open a datalist for this dataBlock: a list of n+2*r slots is
created, n being the number of datalist attributes, r the number of
references from this dataBlock. */
int dataBlock::AddSlotToTuple(char *valueptr);
/* add an element to the currently open datalist, i.e. set the next current
slotpointer to valueptr */
int dataBlock::CloseTuple();
/* close the datalist and append it as a new item in the list of datalists
of the dataBlock. Return its index */
Note that the method must be implemented with knowledge of the position at
which each attribute must be stored in the datalist. An application object
method that uses the three basic dataBlock methods above is called a
TO-write method.
Transmit the dataGraph
Finally, the persistence method must transmit the TO over a communication
channel. This could be done by subclassing the dataGraph class in order to
add some communication methods such as by multiple inheritance.
The dataGraph class provides a standard ASCII conversion, the protocol of
which is illustrated in Appendix C.
The methods that map a TO to and from the ASCII form are:
string dataGraph::FormatToASCII();
Int dataGraph::CreateFromASCII(string datatext);
/* this method populates a dataGraph object initially created as empty, the
dataGraph may have been created by a constructor without dataGraph.sub.--
info argument */
Mapping a collection of objects to a TO
A persistence method for mapping a collection class to a TO is slightly
more involved. The persistence method must get the dataBlock object that
is related to the collection class of application objects. For each
element of the collection class, the persistence method must call a
TO-write method of the element class that creates one datalist and adds it
to the dataBlock object. Finally, the persistence method must transmit the
dataGraph.
Mapping a composition hierarchy to a TO
Consider the previous example of a Customer.sub.-- Order composition
hierarchy. Each CO object may have several collections of components of
different types. Further, each component may have sub-components.
In this case, each CO has customer.sub.-- product and order.sub.-- item
components. In such a case, the object is going to map itself by
performing a recursive traversal of its components. Each component is
responsible for calling the TO-write method of its immediate
sub-components.
Before adding a datalist d.sub.x to the dataBlock object | | |