|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to user interfaces for data storage and
retrieval systems and more particularly to a system and method for
choosing and executing queries to such data storage and retrieval systems
and traversing databases associated therewith.
2. Description of Related Art
Many current data storage and retrieval systems are organized using a
principle commonly identified as "hypertext". Hypertext has resulted as
computer work stations and digital storage have grown cheaper, more
powerful and more available. It has become increasingly more attractive to
extend the traditional notion of "flat" text files organized
hierarchically by allowing more complex nonlinear organizations of
material. In a hypertext system, each data entity, i.e. document or node,
may be directly connected to other documents in the system by pointers, or
links. The human user of a hypertext system moves between (browses)
documents by following these links.
An essential characteristic of a hypertext database system is a machine
supported ability to efficiently traverse via these links from node to
node. In this regard, as noted in the article entitled "Hypertext: An
Introduction and Survey" by J. Conklin, COMPUTER, September 1987, pages
17-41, to qualify as hypertext, a system should require no more than a
couple of keystrokes (or mouse movements) from the user to follow a single
link. The links provided by the interface transport the user quickly and
easily to a new place in the hypertext system. Another characteristic of a
hypertext system is the speed with which the system responds to
referencing requests. Only brief delays typically occur (one or two
seconds at most).
Although hypertext systems presently provide the ability for the user to
traverse efficiently between nodes via links once he or she determines the
desired links to be utilized, the number of documents in a hypertext
system may be very large. Consequently, the number of links connected to
any document may also become very large. This leads to difficulties in
"navigating" through the database, the large number of links from each
document often resulting in confusion by the user when attempting to
select which link to follow.
One attempt to overcome this problem is by providing an overview display or
"map" of the hypertext documents and links. This technique has the
disadvantage of creating a large and complex map display when the number
of documents and links is large. There is a resultant need for further
control and display options which the user must learn, and in the
expenditure of the user's time in manipulating the map, rather than more
effective use of his time, such as reading documents.
Another solution to this navigational dilemma is to apply standard database
search and query techniques for locating documents which the user is
seeking. This involves addressing entities by content; that is, by text or
numbers stored, in addition to or rather than a user-assigned name or
symbol. This is usually executed by applying some combination, using
boolean operations of key word and full string search and predicates on
other attributes (such as author, time of creation, type, etc.) of nodes
or links. Various standard and proprietary languages exist for querying
structured databases or text retrieval systems (for example, DIALOG, SQL).
All of these languages share the drawback of being arbitrary and complex,
which poses a problem in applications where untrained users must query a
data storage system, or in educational and training uses where presuming
prior user training in the query method is inappropriate.
Experience with textual query methods has also shown that they are subject
to tradeoffs between precision (the number of retrieved entities which are
actually interesting) and recall (the fraction of total interesting
entities which are actually found). Such studies have found that, for
instance, a typical query to a legal information system produces only 20%
of those database entities which are actually relevant. (See the article
entitled "An Evaluation of Retrieval Effectiveness for a Full Text
Document Retrieval System", by D. C. Blair and M. E. Maron, COMMUNICATIONS
OF THE ACM, March 1985, Vol. 28, No. 3, Pgs. 289-299.)
Other attempts to control the complexity of linking have concentrated on
database-wide elision of sets of links. For instance, the Intermedia
system allows the separation of links into sets called webs. Only one of
these sets is visible to the user at a time. This achieves simplification
but at the expense of possibly removing valuable links from consideration
if those links are stored in the webs which are not loaded. (See the
article entitled "Intermedia: The Concept and the Construction of a
Seamless Information Environment", by N. Yankelovich, et.al., COMPUTER,
January 1988, pgs. 81-96.)
Another approach to elision is filtering, that is, database-wide selection
of documents and links based on a query, in a fashion similar to that
described above. In some systems (e.g. see the article entitled "Super
Book: An automatic tool for information exploration--hypertext?", by J. R.
Remde, et al., Bell Communications Research, HYPERTEXT '87 PAPERS,
November 1987, pgs. 175-188; and the article entitled "Searching for
Information in a Hypertext Medical Handbook", COMMUNICATIONS 0F THE ACM,
July 1988, pgs. 880-886), the pattern of links is also considered in the
decision to remove entities from the user's view. However, because such
filtering methods treat the entire database at once, they share the limit
of precision--recall tradeoff as described above, meaning that they
achieve reduction of complexity at the expense of loss of information.
With the growing use of multimedia databases containing not only textual
documents, but also data entities containing sound and graphics, and the
growing utilization of hypertext-type nodal networks within these
multimedia databases, the requirement for effective and meaningful
navigation has become even more imperative.
Utilization of a hypertext-type nodal network in conjunction with a
multimedia database may be described as a "hypermedia database". Thus, as
defined broadly herein, the term "hypermedia system" refers to a database
which may be constructed to include documents or nodes and machine
supported selected linkages or pointers which provide the user with the
ability to efficiently travel from one node to another. These nodes may
include text, sound, or graphic material. An example of a system that
supports hypermedia is Apple Computer Inc.'s system sold under the
trademark "HYPERCARD" which allows traversal through a hypertext-type
nodal network containing text, sound and graphics. HYPERCARD provides the
machine supported ability to selectively traverse in an automatic fashion
via linkages, from one item to another, the items being selectively linked
to each other in the nodal network.
In attempting to develop improved techniques for browsing through mature
HYPERCARD databases, the present inventors have discovered the novel
system and method of the present invention. As will be disclosed below, an
aspect of the present invention involves a process for selecting, ordering
and displaying a subset of the possible links from a document, resulting
in reduced user confusion while browsing through a hypermedia system.
Another aspect of the inventive technique involves the use of an intuitive
representation to the user of options or "guides", the selection thereof
effectively masking a complex indexing and query method.
SUMMARY OF THE INVENTION
A user interface system and method for traversing a database. In one aspect
the present invention includes providing a plurality of command options,
each of the command options represented by a set of descriptive option
index terms characterizing that command option. Means are provided for
comparing the set of option index terms of the command options selected by
the user with sets of document index terms. Each set of document index
terms being compared characterizes an electronic document in a
hypertext-type database which is selectively linked in that database with
the user's present position The term "hypertext-type database", as used
herein, may refer to either a hypertext database or a hypermedia database.
The comparisons result in a ranked list of the selectively linked
electronic documents. The electronic documents are ranked in accordance
with the relevancy of each document with respect to the selected command
option. The user is therefore enabled to efficiently retrieve relevant
documents in accordance with his selected command option.
In another aspect of the present invention, a plurality of command options
are generated and displayed on a CCDS, each command option being
represented by a portrayed character or personality associable to the user
as being biased toward a particular type of information. Each of the
command options represent a set of option index terms which characterize
that particular command option. Means are provided for comparing the set
of option index terms characterizing the command option presently selected
with sets of document index terms. Each set of document index terms
characterize an electronic document located within the database. The
comparisons result in a ranked list of electronic documents, the documents
being ranked in accordance with the particular bias of the portrayed
character or personality.
In a hypertext-type database the use of such portrayed character is
particularly useful, the portrayed character effectively serving as a
"guide" to the user, the guide providing an efficient mechanism for
browsing the database.
The automatic formulation of a ranked list of electronic documents may be
produced by assigning a match value to each electronic document being
compared with the selected option.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates the computer incorporating the present invention.
FIG. 2 is a system diagram of the logic associated with the operation of
the database traversal system of the present invention.
FIG. 3 is a schematic illustration of a simplified hypertext database.
FIG. 4 is a schematic illustration of the hypertext database of FIG. 3
tagged with document index terms.
FIGS. 5a-5f are screen displays of a reduction to practice of the present
invention.
FIG. 5a shows a screen display illustrating the choices of guides or
options available for traversing the subject hypermedia database.
FIG. 5b illustrates a screen display with a window showing the particular
selected guide's option index terms and first choice of article.
FIG. 5c illustrates a screen display of the selected electronic document.
FIG. 5d is a screen display with a window showing a ranked list of
preferred documents.
FIG. 5e is a screen display with a window illustrating the correspondence
between the presently selected document index terms and the index terms
characterizing the selected option.
FIG. 5f is a screen display with two windows, the top window showing the
correspondence between the document index terms of another document on the
ranked list of documents and the option index terms characterizing the
selected guide.
The same elements or parts throughout the figures of the drawings are
designated by the same reference characters, while equivalent elements
bear a prime designation.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The following detailed description is divided into several sections. The
first of these discloses a general system arrangement for storing,
retrieving, and manipulating data. Subsequent sections deal with the
traversal system including construction of the database, specification of
the criteria set, user start-up, traversing the database, and the use of a
guide in the user interface. Additionally, an example of an implementation
of this system is provided.
I. General System Configuration
FIG. 1 illustrates a typical computer controlled display system (CCDS)
implementing the present invention. Shown there is a computer 20 which
comprises three major components. The first of these is the input/output
(I/O) circuit 22 which is used to communicate information in appropriately
structured form to and from the other parts of computer 20. Also shown as
part of computer 20 is the central processing unit (CPU) 24 and memory 26.
These latter two elements are those typically found in most general
purpose computers and almost all special purpose computers. In fact, the
several elements contained within computer 20 are intended to be
representative of this broad category of data processors. Particular
examples of suitable data processors to fill the role of computer 20
include machines manufactured by Apple Computer, Inc., and International
Business Machines (IBM). Other computers having like capabilities may be
adapted in a straight forward manner to perform the several functions
described below.
Also shown in FIG. 1 is an input device 30 shown in a typical embodiment as
a keyboard. It should be understood, however, that the input device may
actually be a card reader, magnetic or paper tape reader, or other well
known input device (including, of course, another computer). Also shown is
an audio input device 31 which converts analog audio information to
digital information. The audio signal may be supplied by a microphone or
other conventional audio source such as compact disc, cassette, etc. A
mass memory device 32 coupled to the I/O circuit 22 provides additional
storage capability for the computer 20. The mass memory 32 may include
other programs and the like and may take the form of a magnetic or paper
tape reader, hard disk drive, compact laser disk, or other well known mass
storage device. It will be appreciated that the data retained within mass
memory 32, may in appropriate cases, be incorporated in standard fashion
into computer 20 as part of the memory 26.
In addition, a "mouse" input device 34 is illustrated which permits the
user to input graphic information to computer 20 through I/O circuit 22,
in a well known manner. Generally, mouse 34 provides cursor control to
identify and position a cursor on a display screen. A computer controlled
display monitor 36 is illustrated which is used to display the images
being generated by the present invention. Such a display monitor 36 may
take the form of a cathode ray tube (CRT), liquid crystal panel (LCD), or
other well known display devices.
An audio output analog signal converts digital information to analog audio
signals. The audio output is represented by numeral designation 37. The
analog output may be delivered to headphones, loudspeaker, or other well
known audio mixing and storage devices such as magnetic tape.
In the present invention, memory 26 includes a "bit map" which represents
the video memory for display monitor 36. Each bit (or groups of bits) in
the bit map 40 within memory 26 corresponds to a display element on
display monitor 36. Thus, the bit map can be described by a two
dimensional array of points having known X, Y coordinates. The display
elements comprising the display of display monitor 36 may be selectively
enabled, or disabled, as a function of the bit map of memory 26 is "on" or
"off". The use of bit maps to display images on a display monitor is well
known, and will not be further described in detail in this Specification.
II. The Database Traversal System
FIG. 2 is a diagram illustrating the logic associated with the operation of
the database traversal system. The system diagram includes a plurality of
data stores, designated generally as 40,40', 40" . . . which are shown to
appear as three-dimensional depositories. The data stores represent
collections of related data that can be stored in the computer mass memory
or main memory. The system diagram also includes a plurality of process
blocks, designated generally as 42, 42', 42" . . . , and shown to appear
as rectangles. The process blocks 42 represent interactions between the
user and the computer or interactions within the computer itself. It is
assumed that the necessary program functions are provided by the
aforementioned operating system to display characters and the like on the
screen associated with the display device 36.
A. Construction of the Database
The first data store in the system diagram is labeled "Hypertext database".
As noted, one aspect of the invention involves the utilization of any
hypertext-type database. This may include hypermedia databases, which
include graphics and sound. However, for ease in explanation, the system
will be described in connection with a hypertext database. In view of the
noted broader utility of the invention, it will be understood that this
described application involving a hypertext database is purely
illustrative and not limiting in nature.
Referring to FIG. 3, a simplified schematic of a hypertext database is
illustrated, designated generally as 40. (The corresponding hypertext
database is Data Store 1, also designated 40 in FIG. 2.) The database 40
may be located on a hard disk or optical disk in mass memory 32 or memory
26. Database 40 includes a plurality of hypertext nodes, designated
Document A through Document E. The nodes may include text. (In a
multimedia database each node might include, for example, graphics, a
collection of sounds, or a combination thereof.) Although database 40
includes only five nodes, for simplified illustrative purposes, a typical
hypertext database would include many more nodes. The nodes in the
hypertext database 40 are linked by pointers or links, designated by the
arrows in the figure. These links represent a machine supported capability
of a user to traverse from one node to another, as indicated by the
arrows, in an efficient manner. As noted in the Background of the
Invention, transport should require no more than a couple of keystrokes
(or mouse movements) and with only brief delays.
Links have two ends, and are usually considered to be directed, although it
is common to support going "backwards" along the link. The origination of
the link is called the "link source", and usually acts as a reference. The
source can logically be either a single point or a region of text. The
other end of the link, the "destination" of the link, usually functions as
the reference, and can also be either a point or a region. A region source
(or destination) is a set of contiguous characters which is displayed in
some way as being a single unit.
In forming a useful indexed hypertext database, the documents of the
database are tagged with descriptive terms or document index terms. The
document index terms may be drawn from a pool of possible index terms.
Data Store 2, containing these index terms (i.e. Index Term Set), is
designated 40' in FIG. 2. This set might, for example, be stored in the
mass storage 32 of the CCDS.
The Index Term Set may be formed manually (Process Block 3), or
automatically (Process Block 4).
In manual indexing, a human generates a set of indexing terms based on
knowledge of the database's subject area. This may be unordered, or it may
be ordered in some fashion. A typical ordering is hierarchical, with more
specific terms nested in an outline fashion below general terms.
In automatic indexing, a computer uses an algorithmic process to extract
indexing terms from the nodes of the Hypertext database. A typical
process, applied to text, is to extract every word from the document
corresponding to the node, discard common or "stop" words and remove
common suffixes. The entire residue list of unique words are then in the
indexing set for the database.
Terms from Index Term Set may then be applied to each of the nodes in the
hypertext database, by, for example, an automatic function of the CPU, in
an algorithmic fashion or manually by the human editor. This process is
represented as Process Block 5.
The resulting Indexed hypertext database (Data Store 6) is the same as that
designated as Data Store 1; however, it includes document index terms.
FIG. 4 illustrates this indexed hypertext database, generally designated
40". Thus, for example, as shown in FIG. 4, Document A might be tagged
with Index Terms 1-4, Document B tagged with Index Terms 1, 5 and 3, etc..
B. Specification of the Criteria Set
Typically, from the set of indexing terms, one or more criteria or options
are formulated manually, by the author of the database or by an end user.
This procedure corresponds to Process Block 7 in FIG. 2. Traversal from
one node of the hypertext link to a second node, using the system of the
present invention, requires that at least one index term included in a
selected criterion be from the subset of index terms contained in the
indexed hypertext database (as explained below).
The formulation of a set of index terms as a criteria or "option"
correlates to forming a query in a normal retrieval system. Relevant index
terms for a particular criterion or option can be selected. Thus, for
example, for one particular study, relevant index terms might be Index
Term 4 and Index Term 5. For another study, for example, to study a
different viewpoint, a different set of index terms would be included in
another criterion corresponding to another option. A set of criteria or
options based on index term sets is illustrated as Data Store 8.
Preferably, this aspect of the invention utilizes a portrayed character or
personality to represent a complex set of index terms. In this respect, as
will be described in detail below, this portrayed character serves as a
type of "guide" for the user in browsing through the database, the guide
representing a certain "viewpoint". As noted in the Background of the
Invention, it is difficult to traverse through a mature hypertext database
with a multiplicity of nodes and linkages. Applicants have discovered that
confusion during the traversal of the database can be greatly eased by use
of such a guide.
As used herein the term "option index term" refers to an index term
included in the set of index terms representing an option. The term
"document index term" refers to an index term included in the set of index
terms representing a document.
C. User Traversal
Now that both:
(1) a set of criteria or options based upon a set of option index terms
which contain (at least in part) a subset of the option terms in Data
Store 8, and
(2) an indexed hypertext database (e.g. Data Store 6) have been
established, the system is set up to allow the user to traverse from one
node to a second node in the database. The user selects a first command
option from the CCDS, as denoted by Process Block 9. Data store 10
represents the selected option with selected option index terms tagged
thereon.
The user, in browsing the indexed hypertext database, arrives at a
particular node in the database (see Process Block 11). This particular
node is represented by Data Store 12. The CPU computes the set of nodes
connected to this particular node by hypertext links (Process Block 13).
The set of these linked nodes is represented in Data Store 14. The
document index terms characterizing the linked nodes are then compared
with the option index terms characterizing the selected option. The
comparison involves estimating the "relevance" of each linked node to the
selected option and is preferably an automatic, algorithmic process. The
desired result is the assignment of a "match value" to each linked
document, the match value reflecting the correspondence between the
document index terms characterizing that particular linked document with
the option index terms characterizing the selected option.
A preferred method for assigning matched values to linked documents
includes utilization of the following equation:
MV.sub.d =(i.sub.od).sup.2 /i.sub.o .times.i.sub.d (1)
where,
MV.sub.d =the match value of a document, d;
i.sub.o =the number of option index terms characterizing that particular
option;
i.sub.d =the number of document index terms characterizing the document for
which a match value is being determined; and,
i.sub.od =the number of co-occurring index terms when comparing the
document index terms of document d, for which a matched value is being
determined and the option index terms characterizing that particular
option.
Thus, by computation of a match value for each linked document, the linked
documents may be ranked in accordance with their relevancy to the selected
option. This ranked list of documents is represented by Data Store 16.
Thus, each specific option has a one-to-one correspondence with a ranked
list of documents.
For example, referring to FIG. 4, assume that the user is positioned at
Document A. Also, assume that the user selects an option which is tagged
with Index Terms 2 and 3. Therefore, matched values for each document can
be computed as follows;
MV.sub.B =(i.sub.oB).sup.2 /(i.sub.o .times.i.sub.B)=1.sup.2 /2.times.3=1/6
MV.sub.C =(i.sub.oC).sup.2 /(i.sub.o .times.i.sub.C)=1.sup.2 /2.times.2=1/4
MV.sub.D =(i.sub.oD).sup.2 /(i.sub.o .times.i.sub.D)=2.sup.2 /2.times.3=2/3
Thus, the ranked list of linked documents sorted in accordance with the
relevancy of each document with respect to the criterion set (i.e. option)
comprising option index terms 2 and 3 is: D,C,B, in that order.
The above example is obviously a simplified illustration; however, the
utility of this technique with reference to a complicated database is
evident. Note that since each linked document had at least one document
index term which was also in the set of option index terms, all three
linked documents had a match value. However, in the event that there is no
correspondence, the match value would be zero and the document absent from
the list of sorted documents. It is also pointed out that since Document E
is not directly linked to Document A, it is not even considered for the
ranked list of documents.
Furthermore, depending on | | |