|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to methods and apparatus for automatically
analyzing and modifying documents, and more specifically for automatically
extracting key topics, such as concepts or phrases, from documents and
generating summary pages containing key topic lists and hyperlinks to the
extracted key topics, and even specifically to automatically generating
hyperlink indexes for documents stored on CD-ROM or available over
networks such as the Internet.
BACKGROUND OF THE INVENTION
Document authors often provide the ability to a reader to efficiently find
and retrieve more information about any particular item in a document. One
method to provide such an ability is to give the reader a reference. An
example of such a reference is a footnote or citation in a scholarly
article or book. A reader can use the footnote to identify another book or
article, and even the page number in the book or article, in order to
obtain more detailed information. Similarly, an index entry in a book's
index points to the places in the book where more information regarding
the term in the index entry can be found. In the prior art, the author or
editor of the information must manually find the key topics which might be
of interest to readers, and then generate the footnote or index entry
which points the reader from the footnote or index entry to the point
where the topic is more fully explained.
A `hyperlink` is defined as a point-and-click mechanism implemented on a
computer which allows a viewer to link (or jump) from one screen display
where a topic is referred to (called the `hyperlink source`), to other
screen displays where more information about that topic exists (called the
`hyperlink destination`). These hyperlinked screen displays can all be of
portions of the media data (media data can include, e.g., text, graphics,
audio, video, etc.) from a single data file, or can be portions of a
plurality of different data files; these can be stored in a single
location, or at a plurality of separate locations. The hyperlink is the
combination of a display element or a (generally visual) indication that a
hyperlink is available for a particular hyperlink source, and a computer
program which finds and displays the hyperlink destination. A hyperlink
thus provides a computer-assisted way for a human user to efficiently jump
between various locations containing information which is somehow related.
A `hypermedia application` is defined as a computer application which
contains media data and hyperlinks between hyperlink sources and
destinations in the media data.
The people who provide the content (e.g., text and pictures), edit that
information, and who define the hyperlinks are called `authors.` People
who use the finished application are called `viewers.` People who have
computers which transmit information for others to view are called
`database providers.`
Prior-art FIG. 1 is a conceptual drawing of a hyperlink. A hyperlink is a
link between hyperlink source 72, which is located in a first data file,
and hyperlink destination 74, which can be located in the same or in a
second data file. Hyperlink source 72 and hyperlink destination 74 are
typically displayed on computer screen 52 at different points in time.
Three elements that comprise a hyperlink are:
(a) hyperlink source 72, which specifies a key topic to be displayed in a
hot area. A `hot area` is a portion of the display screen that, if pointed
at and clicked on, will cause the computer to execute computer code such
as a hyperlink program 79 which hyperlinks (i.e., causes a branch) to a
hyperlink destination 74. (Typically, the hot area is visually indicated
by highlighting, such as color, a bold font, blinking or underlining, but
it may contain an icon, picture graphic, or other visual indication.)
(b) hyperlink destination 74, which includes information, (e.g.,
destination location specification 73) specifying the location of the text
or picture that will be displayed if the hyperlink is taken. Destination
location specification 73 for hyperlink destination 74 is generally stored
in the data file containing hyperlink source 72. Hyperlink destination 74
itself can be either in the same or a different dam file as hyperlink
source 72.
(c) hyperlink computer code 79 that, in response to a `viewer action`,
causes hyperlink destination 74 to be displayed in the context in which it
appears. Typically, that `viewer action` comprises a viewer clicking on
the hyperlink source 72. `Clicking` is defined as pointing with and
activating pointing device 54 at a hot area, such as hyperlink source 72.
A pointing device can include a mouse, joystick, or other device that is
used to select a location on a computer screen and is activated by, for
example, depressing a switch such as a mouse button 59, or otherwise
indicating that the computer should execute hyperlink code 79. Upon
activation, hyperlink code 79 uses destination location specification 73
to locate hyperlink destination 74, and to display that information.
Apple Computer's HyperCard.TM. program provided an early and widely known
program that supports the development of hypermedia applications and
hyperlinking. A HyperCard author specifies the hyperlink source 72,
including a simple program that is activated when the hot area for
hyperlink source 72 was clicked-on and that hyperlinks to a hyperlink
destination. The process of identifying topics and generating and
embedding hyperlinks in the text is a manual, labor-intensive process.
The Internet, and particularly the World Wide Web protocol, has brought
hyperlinking to use over networks. A network is a collection of computers
connected by communication lines. The Internet is a international network
comprised of many heterogeneous sub-networks which link thousands of
computers which have millions of users, many of whom are authors. The
World Wide Web protocol (sometimes simply called "the Web") is an
interface and communications protocol which is sometimes used on the
Internet to make use of the Internet easier.
Prior-art FIG. 2A shows a simplified schematic of a network 400, such as
the Internet, with four computers 411, 412, 413, and 414 connected as
nodes on the network. In the embodiment shown in FIG. 2A, the nodes are
viewer's computer 411, author's computer 412, database provider's computer
413 which presents a Web application to others, such as the human viewer
at viewer's computer 411, to use. A single user on multi-use computer 414
may be a viewer, an author and a database provider at various times. There
are not necessarily any physical differences among computers 411, 412,
413, and 414; they are simply generic computers put to different uses.
Prior-art FIG. 2B shows a simple connection between viewer's computer 411
and CD-ROM drive 723. `Downloading` is defined as the transfer of data
across network, typically from database provider's computer 413 to
viewer's computer 411. `Downloading` can also refer to transmitting a
document from a CD-ROM 723 to viewer's computer 411 as is shown in FIG.
2B. Alternatively, a CD-ROM 723 could connect to database provider's
computer 413 or multi-use computer 414 to provide CD-ROM access to other
network computers.
The term `document` is defined in a broad sense as text and other
information stored in one or more computer files. Documents include
everything from simple short text documents to large computer multi-media
databases. Examples of database-provider computers 413 containing these
documents include computers in the patent office and the Library of
Congress, organizations which have huge volumes of information, much of it
already computerized and more in the process of being computerized.
As the many different online databases of documents such as legal libraries
become available on the Web, a mechanism to automatically generate
hyperlinks to places in these databases in order to facilitate viewing
across the Internet is needed. Indeed, a major and widely recognized
problem with the Internet is that, while the Internet has a wealth of
information, most users find it difficult to access the information.
One way of organizing information on the Interact in order to minimize
download time has been to provide users with an overview interface, called
a `home page,` to the information. Although a home page is often merely
used as a visually interesting trademark, the home page typically contains
a key topic summary of the information provided by one author or database
provider, and hyperlinks which take a viewer to the information the viewer
has chosen.
At about the same time as the capabilities of the Internet have grown,
CD-ROM (Compact-Disk, Read-Only Memory) drives have become important
peripherals. CD-ROMs today typically contain up to about 680 megabytes of
information, and generally contain many of the same kinds of documents
that are accessible via the Internet. Many of these CD-ROMs also contain
documents that are, or need to be, indexed. A home-page-like interface
with hyperlinks into the information contained on a CD-ROM is needed.
Another technology which is relevant to the present invention is the
automatic semantic analysis of text to identify and extract key topics for
indexing. One exemplary kernel incorporating this semantic analysis
technology, the Syntactica Engine, does a syntactic analysis of the text
of a document to determine how each word is being used (since some words
having identical spelling have quite different meanings) and then uses a
"lexicon dictionary" (also called a "lexicon") which specifies semantic
weights assigned to the words in the text reflecting their value as index
entries. A computer program can use the synthesized values, or semantic
weights, for words to qualify phrases as key topics. A user is able to
specify a threshold value so that the computer program could select only
those phrases greater than, or equal to, that specified threshold value as
key topics. Known semantic-analysis computer programs do not generate
hyperlinks.
A significant problem with generating information for computer-based
hyperlink systems is that the author must review the material to be
hyperlinked, must identify key topics to which to hyperlink, and must set
up the hyperlinks. This is a time-consuming and labor-intensive process.
What is needed, and what the present invention provides, is a system and
method that automatically identifies key topics and phrases in a
document's text, inserts identifying tokens for hyperlinks to those key
topics, generates one or more summary pages having key topic lists, and
automatically generates hyperlinks from the summary pages to the key
topics in the document's text. In particular what is needed is an
apparatus and method for automatically identifying semantically important
key topics. What is also needed is a system and method for automatically
generating home pages containing various types of index information and
the associated hyperlinks to other information located on the Internet and
the Web.
SUMMARY OF THE INVENTION
The present invention scans one or more documents, automatically identifies
significant key topics, concepts, and phrases in the documents, and
creates summary pages for, and hyperlinks between, these key topics. Where
the same key topic appears at several places in the documents, one
embodiment of the present invention creates hyperlinks between all of the
instances of that key topic. The present invention also provides for
segmenting of documents, in order that only the needed segment of a
hyperlinked-to document need be transferred to a viewer's display.
One embodiment of the present invention includes a process running on a
computer which (a) allows an author to select documents and then, using a
semantic analyzer program running on a computer, (b) automatically
identifies significant key topics within the selected documents, (c)
compiles those key topics into summary pages, (d) generates presentation
pages by segmenting the selected documents into smaller pieces, and (e)
embeds hyperlinks from these summary pages to the locations where key
topics appear in the presentation pages. In particular, the present
invention creates summary pages containing various abstractions of
information which is contained in selected documents, and hyperlinks into
the documents. Summary pages are pages which are typically viewed using a
web browser program and which contain lists of key topics and hyperlinks
to places in the selected documents where the key topics appear. Different
types of summary-page are available, including abstract, concept, phrase,
and table-of-contents summary pages. A method of using a computer to
hyperlink through automatically generated hyperlinks and a data structure
which can be used to support that hyperlinking are described. In one
embodiment, the summary page which is generated provides an index to the
source document which is appended to the source document to provide a more
usable document which can be viewed by a document viewer program such as a
word-processor program or a web-browser program.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following detailed description of the invention, reference is made
to the accompanying drawings which form a part hereof, and in which is
shown by way of illustration only, specific exemplary embodiments in which
the invention may be practiced. It is to be understood that other
embodiments may be utilized, and structural changes may be made, without
departing from the scope of the present invention.
FIG. 1 shows a conceptual drawing of a single prior-art hyperlink.
FIG. 2A shows a prior-art network connected to a plurality of computers.
FIG. 2B shows a prior-art CD-ROM drive connected to a computer.
FIG. 3 shows the flow from source document 20 through summary page
generator 40 to resultant documents.
FIG. 4 shows a conceptual drawing of vertical hyperlinking.
FIG. 5 including FIGS. 5A-5B shows a conceptual drawing of circular
hyperlinking.
FIG. 6 including FIGS. 6A-6C shows a conceptual drawing of horizontal
hyperlinking.
FIG. 7 shows the opening screen of one embodiment of the present invention.
FIG. 8 shows a conceptual drawing of the entry page, summary pages, and
presentation pages generated by the present invention along with
hyperlinks between them.
FIG. 9A shows an example IPF data structure for a word.
FIG. 9B shows an example IPF data structure for a paragraph.
FIG. 10 shows the hyperlinking for 26 summary pages, one for each letter.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
In the following detailed description of the preferred embodiments,
reference is made to the accompanying drawings which form a part hereof,
and in which are shown, by way of illustration, specific embodiments in
which the invention may be practiced. It is to be understood that other
embodiments may be utilized and structural changes may be made without
departing from the scope of the present invention.
An `anchor` is defined as a word, phrase, or graphic (for example, one that
might likely be used to locate information of interest) which is
`anchored` to its location within the context of the file data, as opposed
to being fixed to a specific numerical address within the file. The source
and destination ends of hyperlinks for the present invention are coupled
to anchors so they are anchored to a specific portion of text or to a
specific icon or picture displayed on a computer screen, rather than being
associated with a specific address in a file. Thus, the anchor remains
with the same piece of data when information is inserted in or deleted
from the file, whereas the specific address of that piece of data may
change.
The American Heritage dictionary begins its definition of `index` as
"something that serves to guide, point out, or otherwise facilitate
reference . . . " The term `index entry` is defined to include a term or
phrase, with information as to the location where more information
regarding that index entry can be found. The term `index` is defined as a
grouping or listing of index entries. An index is often ordered in some
manner, for example by alphabetization. Hypermedia applications usually
include text, and often also include pictures, icons, graphics,
animations, sound, and video (movies).
A `web browser` is traditionally defined as a computer program which
supports the displaying of documents, which include Hypertext Markup
Language (HTML) formatting markup tokens (discussed further below), and
hyperlinking to other documents, or phrases in documents, across a
network. In particular, web browsers are used to access documents across
the Internet's World Wide Web. The discussion of present invention defines
both `web browser` and `browser` to include browser programs which enable
accessing hyperlinked information over the Internet and other networks, as
well as from magnetic disk, CD-ROM, or other memory, and does not limit
web browsers to just use over the Internet. Several Internet web browsers
are available, some of them commercially. Two of the best known of these,
Mosaic and Netscape Navigator are described in Internet Starter Kit by
Adam Engst, Corwin Low and Michael Simon, Second Edition, Hayden Books,
1995. Any viewer of the World Wide Web will typically use a web browser.
Indeed, a viewer viewing documents created by the present invention
normally uses a web browser to access the documents that a database
provider may make available on the network. Web browsers allow clicking on
hot areas (generated by source anchors containing a document reference
name and a hyperlink to that document) so that clicking on the hot area
causes the specified document to be downloaded over the network and
displayed for the viewer. Most web browsers also maintain a history of
previously used source anchors and display a hot area which allows
hyperlinking back to the database provider's home page (or back through
the locations the viewer has previously "visited") so the viewer can
always go back to a familiar place.
What makes a web browser on a network such as the Internet so powerful is
that any of the documents viewed with the program may be located (or
scattered in pieces) on any computer connected to network 400. The viewer
can use a mouse, or other pointing device, to click-on a hot area, such as
highlighted text or a button, and cause the relevant portion of the
referenced document to be downloaded to the viewer's computer 411 for
viewing. These downloaded documents in turn can contain hyperlinks to
other documents on the same or other computers. `Downloading` is defined
as the transmitting of a document or other information from the database
provider's computer 413 over a network 400 to the viewer's computer 411.
A `source anchor` is an anchor which is combined with a hyperlink source 72
and, typically, an index term 69. The index term 69 conveys information
regarding a key topic to a viewer. The index term 69 is generally
highlighted as a hot area to indicate to a viewer that a hyperlink is
available. Alternative embodiments replace the index term 69 with an icon
or graphic. A destination anchor 76 is an anchor placed in a file at a
hyperlink destination 74. A source anchor 75 typically contains the name
of the destination anchor 76 stored in destination location specification
73 in order that a web browser can find and hyperlink to the hyperlink
destination 74. Combination anchor 67 is an anchor which is combined with
a combination hyperlink 77 (which comprises both a hyperlink source 72 and
a hyperlink destination 74). In one embodiment, combination anchor 67 is
implemented by using a source anchor 75 in close proximity to a
destination anchor 76.
Information is presented to World Wide Web viewers as a collection of
`documents` and `pages`. As mentioned above, a `document` is defined in a
broad sense to indicate text, pictorial, audio, video and other
information stored in one or more computer files. Viewing such multimedia
files can be much like watching television. Documents include everything
from simple short text documents to large computer multi-media databases.
A `page` is defined as any discrete file which can be downloaded as a
single download segment. Technically, a web browser does not recognize or
access documents per se, but instead accesses pages. Typically, one page
is downloaded by a web browser as the result of clicking on a hot area. A
page often has several source anchors 75 with hyperlinks to various other
pages or to specific locations within pages.
One problem with accessing documents over the Internet is that many
documents are quite long, and thus can take quite some time to download
over the network. This means that viewers are often reluctant to access a
document unless they know it will be useful. The present invention
facilitates dividing documents into a plurality of pages which can be
efficiently chosen by a viewer and downloaded, one page at a time, and
only when the particular page desired is referenced. A page is thus a
document which contains a portion of a source document. A source document
is a document from which derivative documents (such as pages) are
produced. The source document could be reconstructed from the pages
generated from the source document.
A `summary page` is defined as an overview-type page containing summary
information about another document (or a set of documents, if desired) and
one or more hyperlinks to that other document.
A `presentation page` is defined as a page containing a portion or segment
of a larger source document. Presentation pages provide conveniently sized
pieces of the larger source document which are downloaded one at a time
(rather than downloading the entire source document), typically as a
result of a hyperlink the viewer wants to take into the corresponding
portion of the source document.
From the point-of-view of a web browser program, presentation pages and
summary pages are technically indistinguishable. However, summary pages
are normally documents that are designed by people to contain hyperlinks
to presentation pages (or to other summary pages), and are designed for
use on the World Wide Web. In the context of the present invention,
summary pages are also used to help navigate through information contained
on a CD-ROM.
An `entry page` is defined as a summary page that has been assembled by a
person or computer as an entry point to hyperlink to other summary pages
and presentation pages of interest. Note, however, that any page,
including summary pages and presentation pages can be accessed and/or
downloaded directly, without having to go through an entry page.
A `home page` is defined as an entry page used by a database provider to
provide an overview of other pages and/or documents available through the
system associated with the home page. A home page often contains a
trademark and other flashy pictorial or aesthetic information identifying
the database provider. The viewer normally begins by clicking on one of
the hot areas on a home page which the World Wide Web uses as an entry
page to the information a database provider presents. The viewer likely
starts to trace through a web of hyperlinks to a series of various
documents on various computers on a network. (Hence the term World Wide
Web.)
To support the Internet and the World Wide Web, a markup language called
Hypertext Markup Language (HTML) has been developed. HTML has two major
objectives. First, HTML provides a way to specify the structural elements
of text (e.g., this is a heading, this is body text, this is a list, etc.)
using tokens which are independent of the content of the text. A web
browser uses these tokens to format the displayed text for the particular
display device of a particular viewer. So, for example, HTML allows an
author to specify up to six levels of heading information bracketed by six
different heading-token pairs. Applications (e.g., web browsers) on
different computers then process the HTML documents for visual
presentation in a manner customized for particular display devices. An
application on one computer could display a level 1 heading as 14 point
bold Bodini, while an application on another computer could display it as
20 point italic Roman. A level 1 sequence is heralded with the sequence
token <h1> and terminated with the token </h1>. Thus, a heading might be
encoded as might be displayed as:
<h1> This is a level 1 heading </h1>
for a level one heading or
<h6> This is a level 6 heading </h6>
for a level 6 heading. As a markup language, HTML enables a document to be
displayed within the capabilities of any particular display system even
though that display system does not support italic, or bold, color, or any
particular typeface or size. Thus, HTML supports writing documents so they
can be output to everything from simple monospaced, single-size fonts to
proportional-spaced, multiple-size, multiple-style fonts. Each computer
program that accesses an HTML document can translate that HTML document
into a display format supported by the hardware it will run on.
The second and more important aspect of HTML, for the purposes of the
present invention, is that it provides a mechanism to incorporate
hyperlinks within a single document and between documents located at
different nodes on the Internet. These hyperlinks can contain addresses of
documents anyplace on the Internet. HTML is described in The HTML Manual
of Style by Larry Aronson, Ziff Davis, 1994.
FIG. 3 shows the flow from source document 20 through summary page
generator 40 to resultant documents 64. In its most general form, summary
page generator 40 is a program running on a computer which automatically
analyzes textual data in a source document 20, and using weighting rules
determines from the textual data what are the most significant phrases
(i.e., strings of words), and generates a presentation page 150 which
contains textual data from source document 20 plus special codes embedded
in that textual data, the codes which specify to another program
(generally a browser) where those significant phrases are.
Summary page generator 40 is typically a computer program that processes
one or more source documents 20 to produce one or more output summary
pages 62, and optionally, produces entry page 78 and divides source
document 20 into a plurality of presentation pages 150. In one embodiment,
summary page generator 40 runs on an IBM-compatible personal computer.
In one embodiment, a typical summary page 62 contains key-topic index
entries that include hyperlinks to destination anchors where those key
topics appear in the presentation pages 150 generated from source document
20. Various types of summary pages 62 are created, for example, separate
summary pages can be created which contain a table-of-contents, a concept
index, a phrase index, or an abstract index, respectively. In one
embodiment, summary page generator 40 also generates an entry page 78
which contains source anchors 75 having hyperlink sources 72 to the
various summary pages 62, and in one embodiment, optionally to
presentation pages 150. In an alternative embodiment, summary page
generator 40 combines all the summary pages 62 on a single summary page
62. A key topic index entry is an index term 69 for the key topic and an
associated source anchor 75 or combination anchor 67 that are typically
hyperlinked to occurrences of that key topic in the source documents 20 or
their derivative documents (i.e., presentation pages 150).
The viewer begins navigating a document or database of documents starting
at an entry page 78, and from there, hyperlinking to one of several
summary pages, which in turn hyperlink to presentation pages, where data
from the actual source documents are displayed for the viewer. In some
cases, such as a key-phrase summary page 100 which contain hyperlinks to
an abstract summary page 140, one summary page 62 will hyperlink to
another summary page 62.
A summary page 62 could fit on a single computer display screen, or could
be tens of thousands of lines of text which are scrolled, as in a word
processor. In one embodiment, presentation pages 40 are derivative
versions of source document 150 that contain embedded hyperlinks inserted
by summary page generator 40.
In another embodiment, source document 20 is its own presentation page 150,
especially if source document 20 already contains hyperlinks and/or
hyperlink destinations inserted by an author before being processed by
summary page generator 40.
There are three kinds of hyperlinks that can be generated:
Vertical hyperlinks: Vertical hyperlinks 91 are single-hop hyperlinks as
shown in FIG. 4. Each vertical hyperlink 91 hyperlinks to one instance of
a key topic in the presentation pages. As many hyperlink source anchor
entries 72 for vertical hyperlinks 91 are created in summary page 62 for a
key topic as them are instances of that key topic in the presentation
pages 150. In one embodiment, summary page generator 40 locates each
significant instance of a key topic by using semantic analysis on source
documents 20.
Circular hyperlinks: When circular hyperlinks are generated, only one
combination anchor entry 67 is created in summary page 62 for each key
topic, no matter how many times that key topic appears in presentation
pages 150. That combination anchor entry 67 is circularly hyperlinked
through all the instances of that key topic in the presentation pages 150.
FIG. 5 shows a conceptual schematic of a circular hyperlink starting at
combination anchor entry 67 in summary page 62 and hyperlinking through
each combination anchor 67 in presentation page 150, each of which, being
a combination anchor 67 in a circular hyperlink chain, is both a source
anchor and a destination anchor. The combination anchor entry 67 on
summary page 62 allows vertical hyperlink 91 to the first instance of that
key topic in presentation page 150, which in turn through the key topic's
combination anchor 67 allows hyperlink 92 to the second instance which
allows hyperlink 93 to a third instance of the key topic, and so on, until
the final instance of the key topic allows hyperlink 94 back to the
combination anchor entry 67 in the summary page 62.
In an alternative embodiment, the final instance of the key topic allows a
hyperlink back to the first instance of the key topic in the presentation
page rather than to the summary page. The preferred embodiment uses
hyperlinking back through the summary page, since this function gives the
viewer visual feedback every time the viewer completes a hyperlink cycle.
One embodiment of the present invention is included in the AnchorPage.TM.
program by the assignee of the present invention.
Circular hyperlinks are an alternative to non-circular hyperlinks, which
have the advantage of making the navigation to all instances of a key
topic easier and/or faster, thus reducing the number of entries in a
summary page by allowing a single entry per key topic in the summary page,
rather than one entry for each instance of the key topic in the
presentation pages.
Horizontal hyperlinks: FIG. 6 shows horizontal hyperlinks 95. Horizontal
hyperlinks 95 are hyperlinks from a key-topic entry in one summary page 62
to instances of the same key-topic entry in other summary pages 62,
typically from a brief key-topic entry such as a key-phrase entry, to a
more-detailed entry such as an abstract entry. Although horizontal
hyperlinks 95 need not be circular, the horizontal hyperlinks 95 shown in
FIG. 6 are circular hyperlinks that hyperlink through combination anchors
67. In one embodiment, summary page generator 40 scans all the summary
pages for key topics and inserts horizontal hyperlinks 95 to those key
topics in the summary pages.
In one embodiment, only one of horizontal hyperlinks or vertical hyperlinks
can be selected as being circular hyperlinks. However, in another
embodiment both types are selected as circular, and two sets of circular
hyperlink could cycle through the same summary page list entry. In yet
another embodiment a separate icon for horizontal hyperlinks is eliminated
and each circular hyperlink cycles through both the presentation pages and
the summary pages.
In one embodiment, since both horizontal hyperlinks and vertical hyperlinks
are used, a horizontal hyperlink icon appears near the source anchor so
the viewer can either click on the highlighted key topic term in the
source anchor to hyperlink through vertical hyperlinks or click on the
horizontal hyperlink icon to hyperlink through horizontal hyperlinks.
When circular hyperlinks are used, a combination anchor 67 such as shown in
FIG. 5, is both a source anchor and a destination anchor, so there are, in
effect, three forms of embedded hyperlinks: Source, Destination and
Combination. A source anchor 75 specifies a hot area to be highlighted and
the name of the location in a document to which to hyperlink. A
destination anchor 76 specifies the name of a place in a document in order
that a hyperlink can go to that destination place. A combination anchor 67
contains both source anchor and destination anchor types of information
and is described now in more detail.
A combination anchor 67 is a destination anchor that ends one hyperlink
combined with a source anchor that begins another hyperlink.
The verb `to hyperlink` is defined as the clicking on a source anchor to go
to a destination anchor, and includes following down a chain of hyperlinks
by continuing to click on combined anchors.
To generate a combination anchor 67, summary page generator 40 embeds a
hyperlink which (a) identifies index term 69 for the key topic in order
that the key topic term can be highlighted as in a hot area, (b) specifies
a name for destination anchor 76 for the combination anchor, in order that
the combination anchor can be found and hyperlinked to as a destination,
and (c) specifies destination location specification 73, used to find the
location in a document to which to hyperlink when the hot area is clicked
on.
If the text in which one wanted to embed a hyperlink is, "One should
identify the essence of the idea if one wants to think clearly.", and the
key topic term is "the essence", then in order to insert a combination
anchor 67 comprising a destination anchor 76 and a source anchor 75,
summary page generator 40 changes the text to be:
##STR1##
The above is a typical HTML format. "<A...>" defines the beginning of an
anchor and "</A>" terminates that anchor and so defines the area where the
intervening text is displayed to be a hot area that should be highlighted
so that a mouse click or other input device activates the hyperlink. The
phrase NAME="DEF34876" defines the name of the destination anchor 76
within the document in order that a hyperlink to the destination can find
the destination. The phrase HREF="#GEN03789" provides the destination
location specification 73 used as a destination reference name which the
web browser will hyperlink to if the highlighted area is clicked on or
otherwise activated. Here "DEF34876" and "GEN03789" are arbitrary
generated names, but could just as well be "CAT" or "DOG". Where a
hyperlink is to another document, the name of that document precedes the
"#", so for example the hyperlink:
HREF="http://www.myserver.com/user1/project2#GEN03789" would hyperlink to
the label name GEN03789 in the document user1/project2 at the server
computer at the network address http://www.myserver.com.
The just-described method is only one embodiment of hyperlinks. In another
embodiment, for example, at the beginning or end of each document there
could be a table of hyperlink numbers or names along with the location
within the document where the source anchor associated with the table
entry is located.
Entry to the plurality of presentation pages 150 that summary page
generator 40 generates is typically through a hyperlink from an entry page
78 which the database provider likely will define as their home page. The
viewer uses a mouse or other pointing device to click on a highlighted hot
area of source anchor 72 in entry page 78 that hyperlinks to one of the
summary pages 62. One embodiment, shown in FIG. 8, provides four types of
summary pages 62: an abstract summary page 140, a concept summary page
200, a key-phrase summary page 100 and a table-of-contents summary page
80. Each of these summary pages 62 contains a list of key-topic-entry
hyperlink anchors. These hyperlink anchors may be used for vertical
hyperlinks (such as vertical hyperlink 91 of FIG. 5) or horizontal
hyperlinks (such as horizontal hyperlink 95 of FIG. 6) referred to
earlier, and they may be circular or not.
Abstract summary page 140 comprises a list of abstracts (high semantic
content sentences are treated as `abstracts`; in one embodiment, abstracts
whose semantic content exceed the threshold value that the author selected
will be listed in the abstract summary page 140 in the order in which they
occurred in the text) automatically derived by summary page generator 40.
Concept summary page 200 comprises a list of concepts (wherein `concepts`
are noun phrases or noun-verb phrases that contain high-semantic-weight
words; in one embodiment, the above-mentioned list of abstracts is
generated; each abstract is then examined to determine all key topics; for
each determined key topic, a copy of the abstract is made and `rotated` so
the key topic appears first to make the `concept` (thus several concepts
can be generated from each abstract); the list of concepts is then
alphabetized) automatically derived by summary page generator 40.
Key-phrase summary page 100 comprises a list of key phrases (key phrases
are phrases with a high semantic weight; the key phrase is rotated so that
| | |