|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of Invention
This invention relates in general to computer databases, and in particular
to locating and generating connections between program objects and data
objects distributed throughout a computer network.
2. Related Materials and Definitions
This application is related to the following applications which are hereby
incorporated by reference:
UNIVERSAL TAG IDENTIFIER ARCHITECTURE (Application-07/963,885), now U.S.
Pat. No. 5,414,841, issued May 9, 1995.
METHOD FOR ASSOCIATION OF HETEROGENEOUS INFORMATION (application Ser. No.
08/262,838), pending.
FACILITY FOR THE STORAGE AND MANAGEMENT OF INFORMATION OBJECTS (NOUMENA
SERVER) (application Ser. No. 08/263,146) now U.S. Pat. No. 5,557,790.
FACILITY FOR THE INTELLIGENT RETRIEVAL OF INFORMATION OBJECTS (PERSONA)
(application 08/262,834), pending,
FACILITY FOR THE STORAGE AND MANAGEMENT OF CONNECTIONS (CONNECTION SERVER)
(application 08/267,022), pending.
METHOD FOR STORING AND RETRIEVING HETEROGENEOUS CLASSIFICATION SYSTEMS
(application 08/263,379), pending.
The following definitions may be helpful to the understanding of the basic
elements of each of the copending application:
Tags: Tags are globally unique identifiers. Tags are sequentially numbered
identifiers identifying data objects (i.e. video, text, audio,
observations, opinions, etc.)
Phenomena: The logical structure of the system begins with a unit of human
perception, the "phenomena". In the universe of a computer system,
"Phenomena" is defined as a representation of phenomena which exist in the
universe of human experience. Phenomena can be ideas, written matter,
video, computer data, etc. Examples include viewing a computer file using
a wordprocessor, watching a digital video clip or listening to a digital
video segment.
Connections: That which gathers (or links) Phenomena into interrelated
collections. Connections are that which lead the user from one Phenomena
to another Phenomena. Connections are not simply a road-map from a
Phenomena to all other Phenomena. More specifically, Connections represent
an observation of related Phenomena made by human or by computer
observers.
Connection Attributes: In the logical structure of the system, "Connection
Attributes" allow the entire network of Phenomena and Connections to
become usable to each user of the system. Connection Attributes store the
rationale behind each connection. In fairly generic terms, Connection
Attributes describe the Who, What, Where, When and Why of a particular
observation.
Noumena: Another concept in the logical structure of the system is
"Noumena". Noumena are that which lie beyond the realm of human
perception. In computer-based systems, such as the instant invention, they
are the computer stored data, examples are "computer files" or "datasets".
When these computer files, the Noumena, are observed in their "raw" form,
they do not resemble pictures, sounds, nor words. These Noumena resemble a
series of bits, bytes, or numbers. These computer files must be
manipulated by computer programs, "Phenominated", to become as they appear
to the observer. In the present system, Noumena are all of the generic
format computer files needed to produce a representation of a Phenomena.
This includes the computer data files as well as the computer program
files.
Grinding: Grinding is a systematic, computer-based observation of
Phenomena. This is typically done with a "narrow view". The programs are
usually looking for well defined criteria. When Phenomena are observed by
the computer programs, the programs make Connections between the observed
Phenomena and other Phenomena known by the programs. In effect, acting as
a human observer would when viewing a Phenomena and manually Connection it
to other Phenomena.
Persona: to determine the value of information based on each user's
subjective preferences.
Capture: During knowledge capture, the human or computer observer Connects
two Phenomena and provides the rationale for the Connection by supplying
Connection Attributes. The user can also Connect a new Phenomena to
previously existing Phenomena.
Retrieve: During knowledge retrieval, an observer navigates from Phenomena
to Phenomena via Connections. Knowledge is delivered by experiencing the
reconstituted Phenomena. Which knowledge is delivered is controlled by the
Connections and the assessment of the Connection Attributes, preferably
under the auspices of a Persona.
The present invention supports the overall system of copending application
"Method for Association of Heterogeneous Information" It supports the Tag
Architecture, Connection Server, Grinding, Noumena Server and the design
and infrastructure of the overall system, but is not limited thereto. The
term "Phenomena" could be read "object", and the term "Connection" could
be read "link" in this disclosure. The distinction between Noumena and
Phenomena is made to distinguish between objects as experienced by users
(Phenomena) and objects as they are actually stored (Noumena).
The amount of information available in computer databases is expanding
rapidly. Additional information is rapidly coming on-line in the form of
images, audio and text files. One of the major problems facing the user of
a computer system, which has access to large amounts of data, is locating
relevant information. The process of locating information is at best very
time consuming, as a user requires many computer processor cycles with
frequent I/O transmissions. This problem is further exacerbated in modern
computer networks where much if not all of relevant data is remote from
the user's computer. Worse still though, the user frequently will have
access to enormous amounts of data that which he or she is not even aware
or of which he or she does not have the capability to search. This data is
thus useless to the computer user.
In the prior art, numerous techniques have been implemented to ameliorate
the above described deficiency. The most widely used approach has been
manual, human initiated search efforts which attempt to correlate data
objects to one another. Increasingly however, as volume of accessible
information explodes, the human search effort required to manually
correlate large amounts of existing information is too expensive, too
error prone, and too time consuming.
To cope with the inability of manual, human searching to create the
required correlations necessary as the basis for meaningful access to
data, recent attempts have been made at automating these search processes.
These systems generally employ the generation of hyperlinks in hypermedia
using authoring tools. However, these systems suffer several shortcomings
which render them inadequate for general purpose use in a computer network
housing large quantities of distributed data. First, the volume of
potential hyperlinks is large and the manual generation process used by
these systems is slow, costly, inefficient and error prone, and thus can
only accommodate a small percentage of the available input. Second, these
systems are normally static; that is, they are executed once against the
data at a single instant in time. Thus, they are unable to respond to
updates in the data or to the existence or nonexistence of connections in
the data. Third, the manual link generation attempted by these systems is
subjective and dependant upon the ability, point of view, and value
judgements of the author. The possibility exists that an author might
forget the association between some objects over time as memory
diminishes, or miss a connection due to boredom, fatigue, distractions,
etc. Finally, these systems have no ability to record additional
information regarding the connections they create so as to enable a future
user to establish why a connection was created.
3. Discussion of Prior Art
HYPERTEXT/HYPERMEDIA
Hypertext, and its multimedia counterpart hypermedia, are methods used by
programmers to interconnect references to additional related sources.
Hypertext programmers usually store maps of selected links for a
particular application within the application itself. These are "closed"
systems with no external API's to add links from outside their
application. Additional limitations of Hypertext are its static authoring
linking process, rapid development of large volumes of data and its
inability to crosslink easily to remotely located, and incompatible,
sources of information. The most beneficial uses of hypertext/hypermedia
are restricted to the workstation level.
Entity-Relationship model
Chen developed the "Entity-Relationship Model". Chen sought to model the
relationships universal to a class of entities. His goal was to unify data
models for the rigid, predefined, structure provided in database systems.
The system fails to provide for a dynamic individualized method to
interrelate instances of information, but rather is directed to relating
entire classes of information.
Accordingly, the prior art has heretofore failed to address the need for a
connection generating process capable of generating large numbers of
connections exhaustively, automatically and without the need for manual
intervention, in advance of queries for them. Moreover, the prior art has
failed provide a connection generating technique which uses multiple
passes to search for additional connections whose existence is suspected
based on already located connections into research objects for additional
connections after those objects are modified. Finally, no method has yet
been created in the prior art for storing connections in association with
the connected objects to permit future reference to and use of the
connections.
4. Objects of the Invention
It is the object of the present invention to create connections between
data objects.
It is further an object of the present invention to store said
representations of said connections.
It is further an object of the instant invention to create multiple
connections.
It is further an object of the present invention to dynamically create said
connections.
It is further an object of the invention to create said connections based
on human/computer generated search criteria.
It is further an object of the present invention to create said connections
exhaustively within a defined scope.
It is further an object of the present invention to make the present
connection processes work across networks.
It is further an object of the present invention to perform the connection
creation process with as little interruption of normal system processes as
possible.
These and other objects will be discussed hereafter as they relate to the
drawings, detailed specification and claims.
SUMMARY OF THE INVENTION
The system of the present invention provides for a method of connecting
various sources of locally and remotely located sources of data. The
various sources of data may include, but are not limited to, reports,
articles, books, audio recording, multi-media or computer data which may
be distributed across computer networks.
The invention uses a systematic approach of selecting a search criteria,
selecting a set of search objects, identifying connections (relationships)
between the search criteria and the search objects, recording connection
attributes (description of the relationship) and storage of the connection
and its attributes. Iterative replication of the connection process can be
used to exhaustively collect all possible relationships between objects
within selected sets. The collected knowledge created by having a body of
connections for a particular subject allows the user access to both
foreseen and unforeseen relationships creating an immensely powerful
research tool.
Each object in the network is given a unique identifier to allow for
transportability between locations and systems allowing the present
invention to be compatible across networks. The system tracks unused cycle
time of participating systems on the network and when sufficient cycles
are present to perform connection searching the available system performs
the connection process with storage of the results to ultimately create a
massive database of connections without the requirement of having to store
the actual data contained within the objects. Future requests for
information can be quickly satisfied as the locations and attributes of
connected data are immediately available.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating a computer system for generating
connections in accordance with the present invention.
FIG. 2 is a depiction of a connection generated between objects.
FIG. 3 is a flow chart illustrating the process for generating connections.
FIGS. 4a, 4b, 4c and 4d depict a progression of connection generating
process identifying connections between objects.
FIG. 5 is a block diagram illustrating the detailed structure of a
connection.
FIGS. 6a, 6b and 6c depict an example of the implementation of generating
connections between objects.
FIG. 7 is a block diagram illustrating a network of connection generators
interconnected for operation in accordance with the present invention.
DETAILED DESCRIPTION OF INVENTION
FIG. 1 illustrates a computer system configured for generating connections
in accordance with the present invention. The system includes processor
102, display 104, keyboard 106, storage subsystem 108, random access
memory (RAM) 110 and I/O channels 112. In operation, a user enters
commands at keyboard 106. Processor 102 executes instructions and operates
on data stored in RAM 110 and/or disk and/or tape subsystem 108. System
responses in information are presented to the user on display 104. I/O
channels 112 communicate with other systems to form a computer network, as
will be discussed in more detail subsequently.
Various objects, including computer programs, text files, image files,
audio files, all of arbitrary size, are stored in storage subsystem 108.
Other systems accessible via I/O channels 112 may also have objects housed
in their local storage banks. In accordance with the present invention, a
schema is provided, as will be described in detail below, which searches
for associations or "connections" between objects within a single system
and those distributed over the network, and records them for future
reference upon user or system request.
Shown in FIG. 2 is an illustration of the basic object connection structure
provided in accordance with the present invention. The figure includes
search basis object 202, search object 204, search criteria 206 and
connection 208. The fig. illustrates a typical scenario for the
construction of a connection between objects. Initially, no connection
exists between objects 202 and 204. In the particular example shown,
object 202 consists of a search list of domesticated animals. Specific
animals included in the list are cat, dog, horse, cow and others. Object
204 may contain the entire text of the Bible, or perhaps any portion
thereof. A process or user, not shown, determines that it is desirable to
search for instances of the search list entries in the Bible. Based on
this requirement, object 202 is established as a search basis object while
object 204 is established as a search object. The distinction between the
objects is that the search basis object provides the specification or
source for the search criteria 206, while the search object is that object
to which the search criteria is applied in an attempt to locate
connections. In the example shown, the particular search criteria to be
located in search objects 204 is the animal "dog", shown within search
criteria 206. The choice of this particular domesticated animal may be
reflective of a specific user request, a specific program request or
simply an interactive search through search basis object 202 starting at
cat and proceeding through cow and to domesticated animals beyond. In the
simple example depicted, search object 204 is keyword searched using well
known prior art techniques for instances of the character string dog. At
least one instance of the search term or search criteria may be found in
the Holy Bible, for instance in the story of "Noah and the Ark". Upon
finding the search criteria in the search object, in accordance with the
present invention, a connection is created between search basis object 202
and search object 204. Further examination of FIG. 2 reveals several
important aspects of the present invention which will be described in more
detail subsequently. First it is apparent that there are three scenarios
in which connections may be generated: one, where the origination set is
known and it is desired to locate connections from the origination set;
the second, where the destination set is known and it is desired to locate
connections to the destination set; and third where neither the
destination set nor the origination set is known, but it is simply desired
to search for connections between two data objects. All three of these
scenarios are addressed by the present invention.
A second aspect to be noted in FIG. 2, is that an object can be defined on
any scope. Thus, an object can contain a single term such as dog or an
entire table such is as shown in search basis object 202, or an entire
document such as shown in search object 204. Additionally, as will be seen
below, an object may consist of a process, an image file, an audio file,
or any other body of data. To provide complete flexibility, connections
may be made between objects defined on any one of the above scales. Thus,
a single term such as dog may be connected to an entire document, or two
documents may be connected to one another, or a document may be connected
to an image file, or a search list may be connected to a process or any
combination of the above. Further, because objects include processes,
connections may be made to processes which search for additional
connections. This aspect of the present invention, which will be described
in greater detail below, enables a first connection generating process to
connect a search object to a second connection generating process which is
determined to be particularly applicable to the search object. Thus, once
a first connection generating process discovers a connection to the word
dog in search object 204, it may connect search object 204 to a second
connection generating process which is designed to look for specific
information regarding "dog".
Another aspect, derived from the fact that objects need be defined at any
level of granularity, is that connections may be created to objects or
into objects so as to attach to smaller objects within larger objects. In
the example given, a connection may be created between search basis object
202 generally and search object 204 generally or between the story of
Noah's Ark which would be defined as its own object within search object
204 and search basis object 202, or between the story of Noah and the Ark
and the particular search criteria "dog", which is defined as an object
within search basis object 202, etc.
Finally, it is to be noted from FIG. 2 that connections may be generated in
either direction or in both directions between connected objects. Thus, in
the example, the connection may be established from dog to Bible or from
Bible to dog.
Shown in FIG. 3 is a flow chart illustrating the process for generating
connections is accordance with the present invention. At 302, a search
criteria is selected based on some processing of a search basis object.
The processing may be a trivial selection of a search word or a term from
a list. Alternatively, it may include more sophisticated analysis such as
color extraction from an image object. Once the search criteria is
selected, at 304, a set of search objects are selected. The set may
include one object item, a document, a volume of data, multiple volumes of
data, images, audio, processes, etc. The only requirement on the set of
search objects is that it be identified as a possible basis for some
connections. Further, it is to be noted that both the search criteria and
the search objects may be provided while alternatively only one may be
provided or further alternatively neither may be provided.
Given the search criteria and the set of search objects, at 306, the search
is conducted on a search object to locate an instance of the search
criteria. The search may be trivial in nature such as a key word search or
it may be more sophisticated such as pattern matching on an image or voice
recognition on an audio file. At 308, if the search Criteria is not found,
the process continues by returning to 306, selecting another search object
and once again searching for instances of the search criteria is found at
308, processing continues at 310, where a connection is established
between the search object and either the search basis object or another
designated object. The direction of the connection may be based on a
directive from the process or user requesting the connection. Next, at
312, connection attributes may optionally be created for the connection.
The connection attributes can describe any desirable information
concerning the connection. The attributes may simply state that the search
object is an example. In the preferred embodiment, the attributes describe
the destination of the connection. At 314, the connection and the
attributes, if any, are stored in logical association with connection.
Next, at 316, it is determined whether the connection has invoked the need
for an additional connection to a specialty connection generating process
whose function may be to search for additional connections related to the
first connection. If no such process is implicated, processing returns to
306, where a new search object is selected. If an additional process is
implicated, processing continues at 318, where a connection is established
between the search basis object and the newly implicated process object.
Additionally or alternately, a connection is established between the
search object and the newly implicated process object. The result of
establishing the connection between the search basis object and/or the
search object and an additional connection generating process is that
process becomes a candidate for another search. These searches may be
dispatched immediately in nested fashion, or alternatively may be
scheduled for later dispatch in execution. Also, it is to be noted that
additional or different search criteria may be used in the subsequent
search.
Finally, it is to be noted in general with reference to FIG. 3 that trivial
aspects of the algorithm have been omitted. For instance, in accordance
with normal programming fashion, it is clear that the search must be
exited when data is exhausted. Such aspects are omitted for the sake of
brevity, but are intended to be accomplished using processes well known in
the art.
Shown in FIGS. 4a through 4d is an exemplary sequence of connection
generating processes being scheduled and dispatched to locate connections
between a set of data objects. With particular reference to FIG. 4a, a
connection generating process designated as CGP.sub.1 is initiated to
search for connections from object O.sub.1 to objects O.sub.2, O.sub.3,
O.sub.4, O.sub.5, O.sub.6, O.sub.7 and potentially additional objects.
Initially, at 402, object O.sub.1 is used to create a search criteria
labeled SC. This search criteria will be the basis for identifying
connections to objects O.sub.2 through O.sub.7. It is to be noted that
objects O.sub.2 through O.sub.7 may be data objects or process objects of
any kind, provided they are compatible with a connection generating
procedure such that they can be meaningfully searched. The connection
generating procedures are also considered to be objects; however, they are
shown separated from objects O.sub.1 through O.sub.7 in FIGS. 4a through
4d for ease of identification and for purposes of illustration as will
become more apparent.
FIG. 4b shows the result of CGP.sub.1 searching for instances of the search
criteria in objects 2 through 7. CGP.sub.1 creates two connections, one
from O.sub.1 to O.sub.5, labeled 406, which designates that the search
criteria was found in O.sub.5, and one from CGP.sub.2 to O.sub.1, labeled
404, because the subsequent scheduling of a second connection generating
process (CGP.sub.2) to search for additional connections between O.sub.1
and O.sub.2 through O.sub.7. The arrows on 404 and 406 designate
directions of reference for the connections, indicating a connection from
O.sub.1 to O.sub.5 and from CGP.sub.2 to O.sub.1.
The connection between CGP.sub.2 and O.sub.1 is created to cause follow-up,
or secondary searching for additional connections in a specific discipline
when it is discovered that an object may contain information pertaining to
that discipline. Thus, for example, if a connection generating process
which is designed to generate connections on the basis of keyword
searching finds a connection based on the name (Abraham Lincoln) it may,
in addition to creating the keyword connection, connect the object to a
specialty connection generating process designed to perform additional
searching for connections involving U.S. Presidents. This specialty
searching may also be keyword searching, but involving different keyword
or additional objects. Alternatively, it may involve more sophisticated
searching such as knowledged based or inference based searching.
Shown in FIG. 4c is the result of execution of CGP.sub.2 searching O.sub.2
through O.sub.7 for additional connections. As shown, CGP.sub.2 finds two
additional connections, one to O.sub.3, labeled 410, and the other from
CGP3 labeled 408, designating another specialty connection process whose
execution is desired based on the connection found between O.sub.1 and
O.sub.3. Please note that these connections may have different attributes
than connections made by CGP.sub.1.
FIG. 4d illustrates the result of the execution of CGP.sub.3. As shown,
CGP.sub.3 creates three additional connections involving O.sub.1. The
first two, labeled 414 and 416, are additional connections. The third is
an additional connection from CGP.sub.4. Upon subsequent dispatch,
CGP.sub.4 may create still more connections to additional connection
generating processes and/or objects, or may find no further connections,
or may find additional connections involving objects to which connections
already exists. In the later case, additional connections may be
established to the object. These connections, and indeed all connections,
may run between objects generally or may run from/to the particular point
of connection. Thus, if the objects are entire documents and a connection
involves keyword, the connection may simply "attach" to the documents
generally, or may attach within the documents between the connected
keyword. Of course, as an object becomes smaller, the distinction narrow,
so that for an object consisting of a single word, the distinction
disappears.
FIG. 5 illustrates the detailed implementation of connection structure in
accordance with the present invention. The structure includes search basis
object 502, tag 504, search object 506, tag 508, connection 510, tag 512,
connection attribute 514 and tag 516. In the detailed implementation, the
connection structure, involves the used globally unique identifiers called
tags, which are discussed more fully in co-pending application UNIVERSAL
TAG IDENTIFIER ARCHITECTURE. Every object (including data and process) is
given a unique tag. When a connection or connection attribute is created,
it too is given a unique tag. To record the connection, tags of the search
basis object and the search object are recorded with the connection tag.
As will be discussed in greater detail below, the direction of the
connection is denoted by recording one tag as a "from" and the other as a
"to" tag. By using globally unique identifiers--tags--as the basis for
recording and managing connections, any two objects anywhere in an
arbitrarily large distributed network can be quickly, easily, and
efficiently connected. Ambiguity of reference, even for two objects with
the same name, is not a problem since every tag is unique. Objects can be
connected across a network without moving any data, since only the tag is
needed to make the connection. Enormous numbers of connections can be
recorded using little memory since only tags--not file names or memory
addresses or network locations--are recorded. This aspect, coupled with
the ability of the invention to exploit idle processor cycles as will be
discussed subsequently, enables the invention to implement an
exhaustion-based searching strategy.
If a connection searching process or a user desires to attach attributes to
a connection, the tag corresponding to the desired attribute is located or
generated for the new attributes and a reference is associated with the
tag of the connection to which the attribute applies. Thus, as shown in
FIG. 5, connection attribute 514 includes its own tag 516 and a reference
recorded with connection 510 enabling connection 510 to identify its
associated connection attribute 514.
Thus, in FIG. 5, search basis object 502 has its own associated tag 504
while search object 506 has its own tag 508 and connection 510 has its own
tag 512. The connection identifies tags 504 and 508 as handles to the
object which they in turn identify.
Shown in FIGS. 6a through 6c is a detailed example illustrating the
preferred embodiment for the implementation of connections. FIG. 6a shows
three related objects; image file 602, audio file 604, and text file 606.
For purposes for the example, these three objects are established as the
search objects.
An application which may have executed previously provides the search basis
object labeled 608. The application may have been designed by a custom
catalogue enterprise, for instance, to identify possible patrons to which
catalogues will be mailed and to collect information for use in
customizing catalogues. Thus, as shown in the example, the search basis
object includes a list of consumer categories: young adult, seniors,
professionals, babies, etc. In the example, the search criteria of babies
shown at 610 is established by the connection generating process of the
present invention based on search basis object 608. The search criteria is
then applied to text file 606 in an effort to identify instances of the
search criteria babies.
FIG. 6b shows the recording of a connection between search basis object 608
and search object 606. Included are tag 612 denoting the point of
connection to search basis object 608, search object connection point 614,
tag 616 identifying the connection point to the search object, connection
618 including connection tag 620, from connection tag 622, and to
connection tag 624, and attribute 626 including attribute tag 628,
attribute value tag 630, and attribute connection tag 632. Tags 612, 616,
620 and 628 are the globally unique identifiers, or handles, associated
with their objects, connections, and attributes. Thus, it can be seen from
FIG. 6b that connection 618 has a "from" tag value of "<7135BOC" which
corresponds to search basis object 608. Similarly, connection 618 has a
"to" tag value of "<0044PEJ", which corresponds to search object
connection point 614. This indicates that the particular connection has
been established from consumer categories at 608 to b.family bio at 614.
It is to be noted that the particular connection shown in FIG. 6b is
recorded with respect to | | |