|
Claims  |
|
|
We claim:
1. A system for integrating user software application programs operating on
respective nodes of a distributed processing network having at least two
of said nodes, each node of said distributed processing network
comprising:
application adapting means for providing user software application programs
operating on each of said nodes with communicative access to said
distributed processing network;
message management means for managing data transfer requests between each
user software application program operating on each of said nodes and
other user software application programs operating on other of said nodes
connected to said distributed processing network;
communications means responsive to a data transfer request from a source
user software application program operating on any of said nodes for
establishing node to node communications over said distributed processing
network between a source node and a destination node, said destination
node having a destination user software application program operating
thereon to which data are to be transferred from said source user software
application program operating on said node;
data manipulation means for manipulating, prior to transmission by said
communications means to said destination user software application program
operating on said destination node, message data from said source user
software application program into a common data representation, said
common data representation being formed in accordance with a universal
encoding scheme used by all nodes of said distributed processing network
to account for hardware differences between each of said nodes, computer
language semantic differences between said user software application
programs operating on each of said nodes, and data type formats of said
data transfer requests, whereby said message data from said source user
software application program are manipulated by said data manipulation
means only when at least one of data types, data formats, computer
languages, and physical data representations of said source and
destination user software application programs do not correspond to each
other;
means at each node for manipulating message data in said common data
representation received from said distributed processing network into the
data types, data formats, physical data representations, and computer
language of each of said noes; and
means for forming node-specific data manipulation means at each node of
each hardware type in said distributed processing network, said
node-specific data manipulation means at each node manipulating data from
said source user software application program into said common data
representation for transmission to said destination user software
application program, and for manipulating data received from said source
user software application program in said common data representation into
data compatible with said destination user software application program
when said destination user software application program operates on any of
said nodes, said node-specific data manipulation forming means comprising:
file means for storing a high level description of user software
application programs and nodes operating on said distributed processing
network, the physical characteristics of data at each source and
destination user software application program, and the manipulations
necessary to convert data from source to destination physical
characteristics;
validation module means for generating, as source code on a node designated
as an adminstration node, configuration files, based on said high level
description;
configuration table compiling means and data manipulation compiling means
for generating, as source code on said administration node, manipulation
files, based on said high level description;
data manipulation module builder means for copying said manipulation files
and compiling, on each node designated as a compilation node, said
manipulation files to form node-specific manipulation modules; and
start up module means for copying said configuration files, for loading
said configuration files in memory, and for starting up said manipulation
files.
2. A system as in claim 1, wherein said common data representation is
independent of the architecture of the nodes and computer programming
languages used on said distributed processing network.
3. A method for integrating user software application programs operating on
respective nodes of a distributed processing network having at least two
of said nodes, the method at each node of said distributed processing
network comprising the steps of:
providing user software application programs operating on each of said
nodes with communicative access to said distributed processing network;
managing data transfer requests between each user software application
program operating on each of said nodes and other user software
application programs operating on another node connected to said
distributed processing network;
establishing node to node communications over said distributed processing
network between a source node and a destination node, said destination
node having a destination user software application program operating
thereon to which data are to be transferred from said source user software
application program operating on at least one of said nodes;
manipulating, prior to transmission to said destination user software
application program operating on said destination node, message data from
said source user software application program into a common data
representation, said common data representation being formed in accordance
with a universal encoding scheme used by all nodes of said distributed
processing network to account for hardware differences between each of
said nodes, computer language semantic differences between said user
software application programs operating on each of said nodes, and data
type formats of said data transfer requests, whereby said message data
from said source user software application program are manipulated by said
data manipulation means only when at least one of data types, data
formats, computer languages, and physical data representations of said
source and destination user software application programs do not
correspond to each other;
manipulating message data in said common data representation received from
said distributed processing network into said data types, data formats,
physical data representations of said destination node, and the computer
language of said destination user software application program when said
destination user software application program operates on said destination
node; and
forming node-specific data manipulation means at each node of each hardware
type in said distributed processing network, said node-specific data
manipulation step manipulating data from said source user software
application program into said common data representation for transmission
to said destination user software application program, and manipulating
data received from said source user software application program in said
common data representation into data compatible with said destination user
software application program when said destination user software
application program operates on said destination node, said node-specific
data manipulation forming step further comprising the steps of:
storing a high level description of user software application programs and
nodes operating on said distributed processing network, the physical
characteristics of data at each source and destination user software
application program, and manipulations necessary to convert data from
source to destination physical characteristics;
generating, as source code on a node designated as an administration node,
configuration files, based on said high level description;
generating, as source code on said administration node, manipulation files,
based on said high level description;
copying said manipulation files and compiling, on each node designated as a
compilation node, said manipulation files to form node-specific data
manipulation modules; and
copying said configuration files, for loading said configuration files in
memory, and for starting up said manipulation files. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system for integrating existing
application programs in a networked environment, and more particularly, to
a system with mechanisms for transforming and manipulating data messages
for transfer between different applications on the same computer or on
different computers connected via a network or networks and having the
same or different computer architectures.
2. Description of the Prior Art
Since the beginning of the computer age, computers and, in particular,
computer software programs have been used in a variety of settings to
automate processes which were previously conducted mechanically. This
automation has typically led to improved efficiency and increased
productivity. However, because of the costs of such automation, automation
of large businesses and factories has often been conducted on a piecemeal
basis. For example, different portions of an assembly line have been
automated at different times and often with different computer equipment
as a result of the varying functionalities of the various computer systems
available at the time of purchase. As a result, many assembly lines and
businesses have developed "islands of automation" in which different
functions in the overall process are automated but do not necessarily
communicate with one another. In addition, in the office environment LANs
have been used to allow new computer equipment to communicate; however,
software applications typically may not be integrated because of data
incompatibilities.
Such heterogeneous systems pose a significant problem to the further
efficiencies of automation since these different "islands of automation"
and machines with incompatible data types connected to the same network
cannot communicate with one another very easily. As a result, it has been
difficult and expensive to control an entire assembly line process for a
large manufacturing facility from a central location except on a piecemeal
basis unless the entire factory was automated at the same time with
homogeneous equipment which can intercommunicate. Thus, for those
businesses and factories which have already been automated on a piecemeal
basis, they are faced with the choices of eliminating all equipment so
that homogeneous equipment may be substituted therefor (with the
associated prohibitive costs) or waiting for the existing system to become
obsolete so that it can be replaced (again at significant expense).
One solution to the above problem has been to hire software programmers to
prepare custom code which allows the different "islands of automation" to
communicate with each other. However, such an approach is also quite
expensive and is rather inflexible and assumes that the overall system
remains static. In other words, when further equipment and application
software must be integrated into the overall system, the software
programmers must be called back in to rewrite the code for all
applications involved and to prepare additional custom code for interface
purposes. A more flexible and less expensive solution is needed.
The integration of existing heterogeneous applications is a problem which
has yet to be adequately solved. There are numerous major problems in such
integration of existing applications because of the differences in
hardware and their associated operating systems and because of the
differences in the applications themselves. For example, because computers
are built on proprietary hardware architectures and operating systems,
data from applications running on one system is often not usable on
another system. Also, programmers must frequently change application code
to create interfaces to different sets of network services because of the
diversity of such network services. In addition, different applications
use different data types according to their specific needs, and, as a
result, programmers must alter a receiving application's code to convert
the data from another application into types that the receiving
application can use. Moreover, incompatible data structures often result
because of the different groupings of data elements by the applications.
For example, an element with a common logical definition in two
applications may still be stored in two different physical ways (i.e.,
application A may store it in one two-dimensional array and application B
may store it in two one-dimensional arrays). Moreover, applications
written in different languages usually cannot communicate with one another
since data values are often interpreted differently. For example, C and
FORTRAN interpret logical or boolean values differently.
Partial solutions to the above problems have been proposed to provide
distributed networks for allowing various applications to share data. In
doing so, these applications have relied on transparent data sharing
mechanisms such as Sun Microsystems' Network File System (NFS), AT&T's
Remote File Sharing (RFS), FTAM (as defined by the MAP/TOP
specifications), or Apollo's Domain File System. However, these systems
are limited in that they allow data sharing but do not allow true
integration of the different application programs to be accomplished.
Another example of a system for providing interprocess communication
between different computer processes connected over a distributed network
is the Process Activation and Message Support (PAMS) system from Digital
Equipment Corp. This system generally allows processes to communicate with
each other regardless of where the processes reside on a common network.
Such processes may be located on a single CPU or spread across
workstations, clusters, or local or wide area networks (LANs or WANs). The
PAMs system manages all connections over the network and provides
integration features so that processes on respective workstations,
clusters and the like may communicate. In particular, the PAMs message
processing system is a network layer which is implemented above other
networks to transparently integrate new networks and events into a common
message bus. Such a system enables network configuration to be monitored
and message flow on the message bus to be monitored from a single point.
The result is a common programming interface for all host environments to
which the computer system is connected. Thus, all host environments appear
the same to the user.
For example, an ULTRIX host environment running ULTRIXPAMS is directly
connected to a VMS host running VAX-PAMS on its networks, and ULTRIX-PAMS
uses VAX transport processes to route all messages over the network.
Specific rules are then provided for routing messages using ULTRIX-PAMS
and VAX transport processes, where the ULTRIX-PAMS functions as a slave
transport in that it can only communicate to other PAMS processes via the
network to a full function PAMS router. As a result, the PAMS system is
limited in that there is no support for "direct" task-to-task
communications between ULTRIX processes. In addition, since all traffic
must be routed through a VAX-PAMS routing node, a single point of failure
exists for the system.
Other systems have been proposed for an information processing environment
in which various machines behave as one single integrated information
system. However, to date such systems are limited to connecting various
subroutines of homogeneous applications running on different machines
connected to a common network. For example, the Network Computing System
(NCS) of Apollo is a Remote Procedure Call (RPC) software package which
allows a process (user application) to make procedure calls to the
services exported by a remote server process. However, such RPC systems
are typically not fit for the development of a networked transaction
management system, for NCS does not provide a message and file handling
system, a data manipulation system, a local and remote process control
system and the like which allows for the integration of existing
applications. Rather, NCS allows for the building of new distributed
applications, and does not provide for the integration of existing
heterogeneous applications. RPCs instead isolate the user from networking
details and machine architectures while allowing the application developer
to define structured interfaces to services provided across the existing
network.
RPCs can be used at different levels, for the RPC model does not dictate
how they should be used. Generally, a developer can select subroutines of
a single application and run them on remote machines without changing the
application or subroutine code. The simplest use of RPCs is to provide
intrinsic access to distributed resources which are directly callable by
an application, such as printers, plotters, tape drives for backup tasks,
math processors for complex and time-consuming applications, and the like.
A more efficient use of RPC at the application level would be to partition
the application so that the software modules are co-located with the
resources that they use. For example, an application which needs to
extract data from a database could be partitioned so that the modules
which access the database could reside on the database machine.
A diagram of NCS is shown in FIG. 1. The system 100 therein shown generally
consists of three components: an RPC run time environment 132,134 which
handles packaging, transmission and reception of data and error correction
between the user and server processes; a Network Interface Definition
Compiler (NIDC) 136 which compiles high-level Network Interface Definition
Language (NIDL) into a C-language code that runs on both sides of the
connection (the user and server computers); and a Location Broker 128
which lets applications determine at run time which remote computers on
the network can provide the required services to the user computer. In
particular, as shown in FIG. 1, a user application 102 interfaces with a
procedure call translator stub 104 which masquerades as the desired
subroutine on the remote computer. During operation, the RPC run time
system 106 of the user's computer and the RPC run time system 108 of the
server system communicate with each other over a standard network to allow
the remote procedure call. Stub 110 on the server side, which masquerades
as the application for the remote subroutine 112, then connects the remote
subroutine 112 across the network to the user's system.
The NCS system functions by allowing a programmer to use a subroutine call
to define the number and type of data to be used and returned by the
remote subroutine. More particularly, NCS allows the application developer
to provide an interface definition 114 with a language called the Network
Interface Definition Language (NIDL) which is then passed through NIDL
compiler 116 to automatically generate C source code for both the user and
server stubs. In other words, the NIDL compiler 116 generates stub source
code 118 and 120 which is then compiled with RPC run time source code 122
by C compilers 124 and 126 and linked with the application 102 and
user-side stub 104 to run on the user's machine while the subroutine 112
and its server-side stub 110 are compiled and linked on the server
machine. After the application 102 has been written and distributed
throughout the network, location broker 128 containing network information
130 may then be used to allow the user to ask whether the required
services (RPC) are available on the server system.
Thus, with NCS, the NIDL compiler automatically generates the stubs that
create and interpret data passed between an application and remote
subroutines. As a result, the remote subroutine call appears as nothing
more than a local subroutine call that just happens to execute on a remote
host, and no protocol manipulations need to be performed by the
application developer. In other words, the NCS system is primarily a
remote execution service and does not need to manipulate data for transfer
by restructuring a message to allow for conversion from one data type to
another. A more detailed description of the NCS system can be found in the
article by H. Johnson entitled "Each Piece In Its Place," Unix Review,
June 1987, pages 66-75.
The RPC system of the NCS primarily provides a remote execution service
which operates synchronously in a client/server relationship in which the
client and server have agreed in advance on what the requests and replies
will be. Applications must be developed specifically to run on NCS or
substantially recoded to run on NCS. Moreover, because a remote procedure
cannot tell when it will be invoked again, it always initiates
communications at the beginning of its execution and terminates
communications at the end. The initiation and termination at every
invocation makes it very costly in performance for a remote procedure to
set up a connection with its caller. As a result, most RPC systems are
connectionless. This is why RPC systems such as NCS must build another
protocol on top of the existing protocol to ensure reliability. This
overhead causes additional processing to be performed which detracts from
performance.
Accordingly, although NCS provides a consistent method for remote execution
in a heterogeneous network environment, it is designed primarily to broker
distributable services such as printing and plotting across the network,
where the user may not care which printer prints the information as long
as it gets printed. Another type of service might be providing processing
time for applications where a small amount of data in a message can
trigger an intensive and time consuming calculation effort to achieve an
answer that can itself be turned into a message. However, the NCS system
cannot provide a truly integrated system for incompatible node type
formats and data processing languages.
None of the known prior art systems address the substantial problems of
integrating existing heterogeneous applications in a heterogeneous and/or
homogeneous network environment. Accordingly, there is a long-felt need in
the art for an integration system which provides for flexible data
transfer and transformation and manipulation of data among existing
applications programmed in a networked environment of heterogeneous and/or
homogeneous computers in a manner that is transparent to the user. The
present invention has been designed to meet these needs.
SUMMARY OF THE INVENTION
The inventors of the subject matter disclosed and claimed herein have
satisfied the above-mentioned long-felt needs in the art by developing a
software tool which enables a system integrator or end-user flexibly and
efficiently to produce run time software for integration of existing
applications in a networked environment of heterogeneous computers. In
order to achieve this goal, the present invention provides functionality
equivalent to that of a combination of a message and file handling system,
a data manipulation system, and a local and remote program control system.
From the system integrator's viewpoint, the present invention provides a
message handling system which allows data types and data formats to be
different at each end of the messaging system, while any changes in data
elements, data types or data formats of the messages will only require a
reconfiguration of the system before start-up. Since reconfiguration is an
administrative level activity, the user will not be required to change his
or her source code in the communicating applications.
Accordingly, the present invention is specialized for easy modification of
data types and data formats passed so as to allow transparent
communication between data processes of different formats on machines of
different types. As a result, the application programs which are
communicating need not be written in the same language or be downloaded
onto the same computer type. The present invention further allows users to
link existing applications with minimal or no changes to the code of the
applications, thereby reducing the amount of custom code that needs to be
written, maintained and supported for integrating existing systems.
The present invention addresses the major integration problems noted in the
background portion of this specification by providing for local and remote
inter-application data transfer whereby existing applications may be
linked with minimal or no modifications to the applications. Synchronous
and asynchronous memory-based message transfers and file transfers between
applications are also supported. In addition, language, data format and
data type differences are resolved utilizing data manipulation features
such as rearranging, adding or deleting fields and converting between data
types in accordance with differences in hardware, language alignment, and
data size. This is accomplished in a preferred embodiment by using a
Common Data Representation (CDR) for the messages to be transferred
between heterogenous nodes.
The data manipulator (DMM) of the invention provides automatic manipulation
of data during run time so that two communicating processes can use each
other's data without having to change their own data models. The data
manipulator of the invention takes care of hardware discrepancies,
application dependencies and computer language semantic differences. It
can convert one data type to another, restructure a message format and
assign values to data items.
Typically, conversion routines are only good for the two machine
architectures and/or two languages involved. With the addition of any new
language or machine architecture to this networked system, a new set of
routines must be created on all previous machine architectures in the
network to support the transfer of the data to applications on the new
machine or to applications written in the new language. The present
invention has been designed to minimize the alteration or addition of
routines that were written on older machine architectures in the network
when new machines or languages are added to the system. Also, by making
the data manipulation module node-specific, it is also possible in
accordance with the invention to cut down on the number of sets of
routines a particular machine might need to send/receive data to/from
other machines.
BRIEF DESCRIPTION OF THE DRAWINGS
The objects and advantages of the invention will become more apparent and
more readily appreciated from the following detailed description of
presently preferred exemplary embodiments of the invention taken in
conjunction with the accompanying drawings of which:
FIG. schematically illustrates a prior art Network Computing System (NCS)
which allows applications running in a distributed environment to share
computations as well as data;
FIG. 2 schematically illustrates the basic components of the integration
system in accordance with the invention;
FIG. 3 schematically illustrates how the invention can be used to connect
an application to others on the same system or to applications that reside
on one or more remote systems.
FIG. 4 schematically illustrates the configuration of the run time
components of the integration system in accordance with the invention;
FIG. 5 schematically illustrates the creation of a Data Manipulation Module
(DMM) of the invention through start-up by the start-up node;
FIG. 6 schematically illustrates the distribution of the configuration
information from the compilation node to each of the respective nodes of
the same computer architecture as the compilation node;
FIG. 7 is a flowchart illustrating the procedure for start-up of the
integrated system which uses this invention;
FIG. 8 schematically illustrates an example of a heterogeneous networking
system in accordance with the invention; and
FIG. 9 illustrates sample configuration files for the sample configuration
shown in FIG. 8.
DETAILED DESCRIPTION OF THE PRESENTLY PREFERRED EMBODIMENT
A system with the above-mentioned beneficial features in accordance with a
presently preferred exemplary embodiment of the invention will be
described below with reference to FIGS. 2-9. It will be appreciated by
those of ordinary skill in the art that the description given herein with
respect to those figures is for exemplary purposes only and is not
intended in any way to limit the scope of the invention. All questions
regarding the scope of the invention may be resolved by referring to the
appended claims.
As noted above, present day manufacturing computer systems have often been
built from the bottom up, thereby resulting in the creation of isolated
"islands of automation." For example, the areas of design and engineering,
manufacturing resource planning, and manufacturing have typically been
independently automated. As a result, these islands of automation consist
of heterogeneous computer systems, operating systems, and data base
systems. However, manufacturers are now looking for higher levels of
efficiency and productivity than is available in such prior art systems.
Computer Integrated Manufacturing (CIM), the integration of these islands
of automation, is a means of achieving such higher levels of efficiency
and productivity. The preferred embodiment of the present invention is
designed for facilitating the development of CIM solutions. As will be
clear from the following, this is accomplished by integrating existing
application programs in a networked environment of heterogeneous computers
by providing flexible data transfer, transformation and manipulation
mechanisms.
Generally, the present invention produces run time code that comprises
several active programs (or software modules) which are used at run time
to provide communication between applications on the same or different
nodes, where a node is any computer in the network's domain. The
applications can be on different computer systems or on the same computer
and may be in the same or different computer languages. These run time
programs handle the data transfer from the source application program to
the destination application program by transforming the data from the
source program into an architectural and language independent format using
a CDR. For example, the field and record data in the source machine's
format is converted to the destination machine's format by first
converting to a CDR using locally stored routines which convert the
internal representation of data on the source machine to the common data
representation format, and after the transfer of the CDR data, the CDR
data is converted to the internal representation for data on the
destination machine using locally stored routines. In particular,
predefined links of sources and destinations are called by software
modules of the invention to make certain that the information is correctly
routed from the source to the destination machine. This latter information
is called the configuration data and corresponds to information provided
by the system's integrator concerning the nodes, processes and possible
data manipulations to be performed on the network.
The system of the invention generally consists of run time and non-run time
components. The run time components of the invention will be described in
more detail with respect to FIGS. 2-4, while the non-run time components
will be described in more detail below with respect to FIGS. 5 and 6.
As shown in FIG. 2, the run time components 208 include a data
manipulator/translator, a data transporter, configuration tables and a
system manager. As will be described in more detail below, the data
manipulator/translator functions to transform data to and from a CDR
format acceptable to a given process of a specified host computer, while
the data transporter functions to pass data in the common data
representation to/from the network system for sending to a destination
application. However, as will be noted below, the data transporter also
functions to send unmanipulated data to the destination application when
the source and destination applications have the same type and format and
hence the message data does not need to be manipulated.
Thus, as shown in FIG. 2, in the system of the invention a plurality of
user application programs 202 are connected via application adaptors 204
using access routines 206 so as to communicate with the run time
components 208 of the invention. The application adaptors 204 comprise
user-written programs which act as software bridges between the user's
application and the system of the invention. In particular, these
user-written application adaptor programs call access routines from an
access routine library 206 in order to send messages, copy files, control
processes, and the like on the network of the invention. These access
routines 206 are thus used to provide the user with the necessary commands
for preparing the application adaptor programs. Sample access routines are
provided below as an Appendix A hereto, and sample adaptors are attached
as an Appendix B hereto.
System manager 210 enables a user to administer operation of the system
during run time operation. For example, the system manager 210 allows the
user to access the software of the invention to validate configuration,
start-up and shut-down the system and perform other system management
functions. The command processor 212, on the other hand, corresponds to a
module accessed through the system manager 210 so as to allow a user to
interactively test the messages and links to any process or node within
the domain of the invention. (As used herein "domain" means the collection
of computers that are configured to work together in accordance with the
invention.) In addition, configuration files 214 contain information about
the nodes, the structure of the data produced and consumed by the
processes, the manipulations that must be performed on data produced by an
application so that it is acceptable by different processes, and the links
binding manipulations to specific languages. Such information is provided
by the systems integrator before run time and will be discussed in more
detail below. The run time components are loaded into each node at
start-up time using the system manager 210, and these components
manipulate messages (as necessary) and transport them using the data
transporter. The manipulations are performed in accordance with data in
configuration files 214 held in memory at each node. Finally, the data
transporter communicates with the network services 216 to pass the message
to the destination node.
FIG. 3 shows a system in accordance with the invention where two nodes on a
LAN have run time integration systems A and B in accordance with the
invention for integrating applications 14. As shown, each node is loaded
with the run time system of the invention (including all elements shown in
FIG. 2) for allowing the processes 1-4 to communicate even if they have
different data types and language formats. In FIG. 3, there are various
possibilities for communication between the processes shown. For example,
Application 1 may communicate with Application 2 through the integration
System A. Application 1 may generate data meant for Application 2 in the
shared memory of the host computer or in the form of files resident on the
host disk. This data is picked up by Application Adaptor 1 and passed to
Integration System A, which then gives it to Application Adaptor 2 for
passing on to Application 2. Integration System A optionally can be
instructed to manipulate the data before sending it to Application 2. The
data manipulation can consist of literal assignments to data, moving data
from one structure to another, and translating data from one structure to
another.
On the other hand, the data may be picked up by Application | | |