WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Automatic summary page creation and hyperlink generation    
United States Patent5708825   
Link to this pagehttp://www.wikipatents.com/5708825.html
Inventor(s)Sotomayor; Bernardo Rafael (Burnsville, MN)
AbstractMethod and apparatus to enable scanning one or more documents, automatically identifying significant key topics, concepts, and phrases in the documents, and creating summary pages for, and hyperlinks between, some or all of these key topics. Optionally, documents are divided into segments, in order that only the needed segment of a hyperlinked-to document need be transferred to a viewer's display. A process running on a computer can be used which (a) allows an author to select source documents and then, using a semantic analyzer program running on a computer, (b) automatically identifies significant key topics within the selected documents, (c) compiles those key topics into summary pages, (d) generates presentation pages and optionally segmenting the selected documents into smaller pieces, and (e) embeds hyperlinks from these summary pages to the locations where key topics appear in the presentation pages. Different types of summary-page are available, including abstract, concept, phrase, and table-of-contents summary pages. A summary page provides an index into the source document, and can be appended to the source document. A method of using a computer to hyperlink through automatically generated hyperlinks, and a data structure which can be used to support such hyperlinking are described.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5708825
Automatic summary page creation and hyperlink generation - US Patent 5708825 Drawing
Automatic summary page creation and hyperlink generation
Inventor     Sotomayor; Bernardo Rafael (Burnsville, MN)
Owner/Assignee     Iconovex Corporation (Bloomington, MN)
Patent assignment
All assignments
Publication Date     January 13, 1998
Application Number     08/452,174
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 26, 1995
US Classification     715/501.1
Int'l Classification     G06F 015/00
Examiner     Nguyen; Phu K.
Assistant Examiner    
Attorney/Law Firm     Schwegman, Lundberg, Woessner, & Kluth, P.A.
Address
Parent Case    
Priority Data    
USPTO Field of Search     364/419.19 364/419.01 364/419.1 395/762 395/765 395/761
Patent Tags     automatic summary page creation hyperlink generation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for creating a hyperlink to text in a computer-readable document comprising the steps of:

identifying a first key topic in said document, wherein said identifying said first key topic step includes the step of analyzing text in said document with a computer program;

inserting a first source anchor associated with said first key topic into a list; and

creating a hyperlink between said first source anchor and said first key topic in said document.

2. The method according to claim 1 further including the steps of:

identifying a second key topic in said document, wherein said step identifying said second key topic includes the step of analyzing text in said document with a computer program;

inserting a second source anchor associated with said second key topic into said list; and

creating a hyperlink between said second source anchor and said second key topic in said document.

3. The method according to claim 2 further including the steps of:

dividing said list into a plurality of sublists; and

outputting each one of said plurality of sublists into a respective separate segment, wherein each said separate segment is individually loadable into a computer.

4. The method according to claim 2 further including the steps of:

creating an entry page;

creating a summary page;

inserting said list into said summary page; and creating a hyperlink from said entry page to said summary page.

5. The method according to claim 4 wherein said list comprises one or more of the following: a table of contents, a list of concepts, a list of abstracts and a list of phrases.

6. The method according to claim 2 further including the steps of:

creating a page;

selecting a summary page template;

inserting said selected summary page template in said page; and

inserting said list in said page.

7. The method according to claim 1 further including the steps of:

identifying a second key topic in said document, wherein said step identifying said second key topic includes the steps of:

semantically analyzing text in said document with a computer program, and

determining with a computer program that said second key topic is semantically similar to said first key topic; and

creating a hyperlink between said first source anchor and said second key topic in said document.

8. The method according to claim 1 further including the steps of:

identifying a second key topic in said document, wherein said step identifying said second key topic includes the steps of:

semantically analyzing text in said document with a computer program, and

determining with a computer program that said second key topic is semantically similar to said first key topic; and

creating a hyperlink between said first key topic and said second key topic in said document.

9. The method according to claim 1, wherein said step of analyzing text includes the steps of:

selecting a threshold weight for identifying key topics;

locating a candidate key topic;

calculating a weight for said candidate key topic; and

choosing said first key topic as a result of comparing said weight of said candidate key topic and said threshold weight.

10. The method according to claim 1 wherein said step of analyzing text includes the step of semantically analyzing text in said document with a computer program.

11. The method according to claim 10 wherein said step of semantically analyzing includes the step of using a lexicon dictionary in which semantic weights are assigned to words and wherein the values of said semantic weights can be edited.

12. The method according to claim 10 wherein said semantically analyzing step includes the step of recognizing of noun phrases with a computer program.

13. The method according to claim 1 wherein said step of identifying a first key topic includes the step of determining whether said first key topic is marked as a heading.

14. The method according to claim 1 further including the steps of:

marking as excluded a portion of said document; and

suppressing insertion of anchors for key topics within said excluded portion.

15. The method according to claim 1 wherein at least part of said document is downloaded over a network.

16. The method according to claim 1 wherein at least part of said document is stored on a CD-ROM.

17. A summary page generator for creating a hyperlink to text in a computer-readable document comprising:

means for identifying a first key topic in said document, wherein said identifying said first key topic step includes the step of analyzing text in said document with a computer program;

means for inserting a first source anchor associated with said first key topic into an list; and

means for creating a hyperlink between said first source anchor and said first key topic in said document.

18. A method for navigating to, and viewing, key topics in a viewer document using a first computer comprising the steps of:

displaying a list on a computer display, wherein said list comprises a listing of key topics which was generated by a computer program that scanned a source document and identified said key topics within said source document;

selecting a key topic from said list to be a selected key topic in response to input from a human; and

using a hyperlink associated with said selected key topic to locate and display said selected key topic in context in said viewer document, wherein said viewer document was generated from said source document.

19. The method according to claim 18 further including the steps of:

loading a first segment of said viewer document from a second computer to said first computer over a network, wherein said first segment contains said list; and

displaying said first segment.

20. The method according to claim 19 further including the steps of:

determining that an end of said first segment has been displayed by said first computer;

downloading a second segment from said second computer to said first computer; and

displaying said second segment.

21. A computer document data structure comprising:

an entry page; and

a summary page, wherein said summary page includes a list of index entries including a first and a second index entry, wherein said first index entry includes a first hyperlink to a first key topic appearing in a first segment, said first segment comprising textual information, and wherein said second index entry includes a second hyperlink to a second key topic appearing in a second segment, said second segment comprising textual information, wherein said second key topic is substantially similar to said first key topic,

wherein said summary page is located in a third segment, and

and wherein said entry page includes a third hyperlink to said summary page.

22. The computer document data structure of claim 21, wherein said first second and third hyperlinks specify on which computer said first, second and third segments, respectively, are located.

23. A tool for a word-processor program, the word-processor program capable of processing a first computer-readable document, the tool comprising:

means for identifying a first key topic in said first document, wherein said means for identifying said first key topic step includes means for analyzing text in said first document with a computer program;

means for inserting a first source anchor associated with said first key topic into a list; and

means for creating a hyperlink between said first source anchor and said first key topic in said first document.

24. The tool according to claim 23, wherein the means for analyzing text further comprises means for semantically analyzing text in said first document with a computer program.

25. The tool according to claim 23, wherein said list is an index.

26. The tool according to claim 25, wherein said index is appended to an end of said first document.

27. The tool according to claim 23, further comprising means for generating a second computer-readable document, wherein said second document comprises hypertext markup language (HTML) tokens and at least some textual information from said first document.

28. The tool according to claim 27, wherein said first document comprises rich text format (RTF) tokens.

29. A page template data structure for a hypertext mark-up language comprising:

a token, wherein the token is embedded within a comment code, wherein the comment code is interpreted by the hypertext markup language as a comment, and wherein the token comprises a token definition which includes a specification for a hypertext markup language command.

30. The summary-page template data structure according to claim 29, wherein the token definition comprises either a destination hyperlink anchor or a source hyperlink anchor.

31. The summary-page template data structure according to claim 29, wherein the token definition comprises a data placeholder.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention relates to methods and apparatus for automatically analyzing and modifying documents, and more specifically for automatically extracting key topics, such as concepts or phrases, from documents and generating summary pages containing key topic lists and hyperlinks to the extracted key topics, and even specifically to automatically generating hyperlink indexes for documents stored on CD-ROM or available over networks such as the Internet.

BACKGROUND OF THE INVENTION

Document authors often provide the ability to a reader to efficiently find and retrieve more information about any particular item in a document. One method to provide such an ability is to give the reader a reference. An example of such a reference is a footnote or citation in a scholarly article or book. A reader can use the footnote to identify another book or article, and even the page number in the book or article, in order to obtain more detailed information. Similarly, an index entry in a book's index points to the places in the book where more information regarding the term in the index entry can be found. In the prior art, the author or editor of the information must manually find the key topics which might be of interest to readers, and then generate the footnote or index entry which points the reader from the footnote or index entry to the point where the topic is more fully explained.

A `hyperlink` is defined as a point-and-click mechanism implemented on a computer which allows a viewer to link (or jump) from one screen display where a topic is referred to (called the `hyperlink source`), to other screen displays where more information about that topic exists (called the `hyperlink destination`). These hyperlinked screen displays can all be of portions of the media data (media data can include, e.g., text, graphics, audio, video, etc.) from a single data file, or can be portions of a plurality of different data files; these can be stored in a single location, or at a plurality of separate locations. The hyperlink is the combination of a display element or a (generally visual) indication that a hyperlink is available for a particular hyperlink source, and a computer program which finds and displays the hyperlink destination. A hyperlink thus provides a computer-assisted way for a human user to efficiently jump between various locations containing information which is somehow related.

A `hypermedia application` is defined as a computer application which contains media data and hyperlinks between hyperlink sources and destinations in the media data.

The people who provide the content (e.g., text and pictures), edit that information, and who define the hyperlinks are called `authors.` People who use the finished application are called `viewers.` People who have computers which transmit information for others to view are called `database providers.`

Prior-art FIG. 1 is a conceptual drawing of a hyperlink. A hyperlink is a link between hyperlink source 72, which is located in a first data file, and hyperlink destination 74, which can be located in the same or in a second data file. Hyperlink source 72 and hyperlink destination 74 are typically displayed on computer screen 52 at different points in time. Three elements that comprise a hyperlink are:

(a) hyperlink source 72, which specifies a key topic to be displayed in a hot area. A `hot area` is a portion of the display screen that, if pointed at and clicked on, will cause the computer to execute computer code such as a hyperlink program 79 which hyperlinks (i.e., causes a branch) to a hyperlink destination 74. (Typically, the hot area is visually indicated by highlighting, such as color, a bold font, blinking or underlining, but it may contain an icon, picture graphic, or other visual indication.)

(b) hyperlink destination 74, which includes information, (e.g., destination location specification 73) specifying the location of the text or picture that will be displayed if the hyperlink is taken. Destination location specification 73 for hyperlink destination 74 is generally stored in the data file containing hyperlink source 72. Hyperlink destination 74 itself can be either in the same or a different dam file as hyperlink source 72.

(c) hyperlink computer code 79 that, in response to a `viewer action`, causes hyperlink destination 74 to be displayed in the context in which it appears. Typically, that `viewer action` comprises a viewer clicking on the hyperlink source 72. `Clicking` is defined as pointing with and activating pointing device 54 at a hot area, such as hyperlink source 72. A pointing device can include a mouse, joystick, or other device that is used to select a location on a computer screen and is activated by, for example, depressing a switch such as a mouse button 59, or otherwise indicating that the computer should execute hyperlink code 79. Upon activation, hyperlink code 79 uses destination location specification 73 to locate hyperlink destination 74, and to display that information.

Apple Computer's HyperCard.TM. program provided an early and widely known program that supports the development of hypermedia applications and hyperlinking. A HyperCard author specifies the hyperlink source 72, including a simple program that is activated when the hot area for hyperlink source 72 was clicked-on and that hyperlinks to a hyperlink destination. The process of identifying topics and generating and embedding hyperlinks in the text is a manual, labor-intensive process.

The Internet, and particularly the World Wide Web protocol, has brought hyperlinking to use over networks. A network is a collection of computers connected by communication lines. The Internet is a international network comprised of many heterogeneous sub-networks which link thousands of computers which have millions of users, many of whom are authors. The World Wide Web protocol (sometimes simply called "the Web") is an interface and communications protocol which is sometimes used on the Internet to make use of the Internet easier.

Prior-art FIG. 2A shows a simplified schematic of a network 400, such as the Internet, with four computers 411, 412, 413, and 414 connected as nodes on the network. In the embodiment shown in FIG. 2A, the nodes are viewer's computer 411, author's computer 412, database provider's computer 413 which presents a Web application to others, such as the human viewer at viewer's computer 411, to use. A single user on multi-use computer 414 may be a viewer, an author and a database provider at various times. There are not necessarily any physical differences among computers 411, 412, 413, and 414; they are simply generic computers put to different uses.

Prior-art FIG. 2B shows a simple connection between viewer's computer 411 and CD-ROM drive 723. `Downloading` is defined as the transfer of data across network, typically from database provider's computer 413 to viewer's computer 411. `Downloading` can also refer to transmitting a document from a CD-ROM 723 to viewer's computer 411 as is shown in FIG. 2B. Alternatively, a CD-ROM 723 could connect to database provider's computer 413 or multi-use computer 414 to provide CD-ROM access to other network computers.

The term `document` is defined in a broad sense as text and other information stored in one or more computer files. Documents include everything from simple short text documents to large computer multi-media databases. Examples of database-provider computers 413 containing these documents include computers in the patent office and the Library of Congress, organizations which have huge volumes of information, much of it already computerized and more in the process of being computerized.

As the many different online databases of documents such as legal libraries become available on the Web, a mechanism to automatically generate hyperlinks to places in these databases in order to facilitate viewing across the Internet is needed. Indeed, a major and widely recognized problem with the Internet is that, while the Internet has a wealth of information, most users find it difficult to access the information.

One way of organizing information on the Interact in order to minimize download time has been to provide users with an overview interface, called a `home page,` to the information. Although a home page is often merely used as a visually interesting trademark, the home page typically contains a key topic summary of the information provided by one author or database provider, and hyperlinks which take a viewer to the information the viewer has chosen.

At about the same time as the capabilities of the Internet have grown, CD-ROM (Compact-Disk, Read-Only Memory) drives have become important peripherals. CD-ROMs today typically contain up to about 680 megabytes of information, and generally contain many of the same kinds of documents that are accessible via the Internet. Many of these CD-ROMs also contain documents that are, or need to be, indexed. A home-page-like interface with hyperlinks into the information contained on a CD-ROM is needed.

Another technology which is relevant to the present invention is the automatic semantic analysis of text to identify and extract key topics for indexing. One exemplary kernel incorporating this semantic analysis technology, the Syntactica Engine, does a syntactic analysis of the text of a document to determine how each word is being used (since some words having identical spelling have quite different meanings) and then uses a "lexicon dictionary" (also called a "lexicon") which specifies semantic weights assigned to the words in the text reflecting their value as index entries. A computer program can use the synthesized values, or semantic weights, for words to qualify phrases as key topics. A user is able to specify a threshold value so that the computer program could select only those phrases greater than, or equal to, that specified threshold value as key topics. Known semantic-analysis computer programs do not generate hyperlinks.

A significant problem with generating information for computer-based hyperlink systems is that the author must review the material to be hyperlinked, must identify key topics to which to hyperlink, and must set up the hyperlinks. This is a time-consuming and labor-intensive process. What is needed, and what the present invention provides, is a system and method that automatically identifies key topics and phrases in a document's text, inserts identifying tokens for hyperlinks to those key topics, generates one or more summary pages having key topic lists, and automatically generates hyperlinks from the summary pages to the key topics in the document's text. In particular what is needed is an apparatus and method for automatically identifying semantically important key topics. What is also needed is a system and method for automatically generating home pages containing various types of index information and the associated hyperlinks to other information located on the Internet and the Web.

SUMMARY OF THE INVENTION

The present invention scans one or more documents, automatically identifies significant key topics, concepts, and phrases in the documents, and creates summary pages for, and hyperlinks between, these key topics. Where the same key topic appears at several places in the documents, one embodiment of the present invention creates hyperlinks between all of the instances of that key topic. The present invention also provides for segmenting of documents, in order that only the needed segment of a hyperlinked-to document need be transferred to a viewer's display.

One embodiment of the present invention includes a process running on a computer which (a) allows an author to select documents and then, using a semantic analyzer program running on a computer, (b) automatically identifies significant key topics within the selected documents, (c) compiles those key topics into summary pages, (d) generates presentation pages by segmenting the selected documents into smaller pieces, and (e) embeds hyperlinks from these summary pages to the locations where key topics appear in the presentation pages. In particular, the present invention creates summary pages containing various abstractions of information which is contained in selected documents, and hyperlinks into the documents. Summary pages are pages which are typically viewed using a web browser program and which contain lists of key topics and hyperlinks to places in the selected documents where the key topics appear. Different types of summary-page are available, including abstract, concept, phrase, and table-of-contents summary pages. A method of using a computer to hyperlink through automatically generated hyperlinks and a data structure which can be used to support that hyperlinking are described. In one embodiment, the summary page which is generated provides an index to the source document which is appended to the source document to provide a more usable document which can be viewed by a document viewer program such as a word-processor program or a web-browser program.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following detailed description of the invention, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration only, specific exemplary embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized, and structural changes may be made, without departing from the scope of the present invention.

FIG. 1 shows a conceptual drawing of a single prior-art hyperlink.

FIG. 2A shows a prior-art network connected to a plurality of computers.

FIG. 2B shows a prior-art CD-ROM drive connected to a computer.

FIG. 3 shows the flow from source document 20 through summary page generator 40 to resultant documents.

FIG. 4 shows a conceptual drawing of vertical hyperlinking.

FIG. 5 including FIGS. 5A-5B shows a conceptual drawing of circular hyperlinking.

FIG. 6 including FIGS. 6A-6C shows a conceptual drawing of horizontal hyperlinking.

FIG. 7 shows the opening screen of one embodiment of the present invention.

FIG. 8 shows a conceptual drawing of the entry page, summary pages, and presentation pages generated by the present invention along with hyperlinks between them.

FIG. 9A shows an example IPF data structure for a word.

FIG. 9B shows an example IPF data structure for a paragraph.

FIG. 10 shows the hyperlinking for 26 summary pages, one for each letter.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.

An `anchor` is defined as a word, phrase, or graphic (for example, one that might likely be used to locate information of interest) which is `anchored` to its location within the context of the file data, as opposed to being fixed to a specific numerical address within the file. The source and destination ends of hyperlinks for the present invention are coupled to anchors so they are anchored to a specific portion of text or to a specific icon or picture displayed on a computer screen, rather than being associated with a specific address in a file. Thus, the anchor remains with the same piece of data when information is inserted in or deleted from the file, whereas the specific address of that piece of data may change.

The American Heritage dictionary begins its definition of `index` as "something that serves to guide, point out, or otherwise facilitate reference . . . " The term `index entry` is defined to include a term or phrase, with information as to the location where more information regarding that index entry can be found. The term `index` is defined as a grouping or listing of index entries. An index is often ordered in some manner, for example by alphabetization. Hypermedia applications usually include text, and often also include pictures, icons, graphics, animations, sound, and video (movies).

A `web browser` is traditionally defined as a computer program which supports the displaying of documents, which include Hypertext Markup Language (HTML) formatting markup tokens (discussed further below), and hyperlinking to other documents, or phrases in documents, across a network. In particular, web browsers are used to access documents across the Internet's World Wide Web. The discussion of present invention defines both `web browser` and `browser` to include browser programs which enable accessing hyperlinked information over the Internet and other networks, as well as from magnetic disk, CD-ROM, or other memory, and does not limit web browsers to just use over the Internet. Several Internet web browsers are available, some of them commercially. Two of the best known of these, Mosaic and Netscape Navigator are described in Internet Starter Kit by Adam Engst, Corwin Low and Michael Simon, Second Edition, Hayden Books, 1995. Any viewer of the World Wide Web will typically use a web browser. Indeed, a viewer viewing documents created by the present invention normally uses a web browser to access the documents that a database provider may make available on the network. Web browsers allow clicking on hot areas (generated by source anchors containing a document reference name and a hyperlink to that document) so that clicking on the hot area causes the specified document to be downloaded over the network and displayed for the viewer. Most web browsers also maintain a history of previously used source anchors and display a hot area which allows hyperlinking back to the database provider's home page (or back through the locations the viewer has previously "visited") so the viewer can always go back to a familiar place.

What makes a web browser on a network such as the Internet so powerful is that any of the documents viewed with the program may be located (or scattered in pieces) on any computer connected to network 400. The viewer can use a mouse, or other pointing device, to click-on a hot area, such as highlighted text or a button, and cause the relevant portion of the referenced document to be downloaded to the viewer's computer 411 for viewing. These downloaded documents in turn can contain hyperlinks to other documents on the same or other computers. `Downloading` is defined as the transmitting of a document or other information from the database provider's computer 413 over a network 400 to the viewer's computer 411.

A `source anchor` is an anchor which is combined with a hyperlink source 72 and, typically, an index term 69. The index term 69 conveys information regarding a key topic to a viewer. The index term 69 is generally highlighted as a hot area to indicate to a viewer that a hyperlink is available. Alternative embodiments replace the index term 69 with an icon or graphic. A destination anchor 76 is an anchor placed in a file at a hyperlink destination 74. A source anchor 75 typically contains the name of the destination anchor 76 stored in destination location specification 73 in order that a web browser can find and hyperlink to the hyperlink destination 74. Combination anchor 67 is an anchor which is combined with a combination hyperlink 77 (which comprises both a hyperlink source 72 and a hyperlink destination 74). In one embodiment, combination anchor 67 is implemented by using a source anchor 75 in close proximity to a destination anchor 76.

Information is presented to World Wide Web viewers as a collection of `documents` and `pages`. As mentioned above, a `document` is defined in a broad sense to indicate text, pictorial, audio, video and other information stored in one or more computer files. Viewing such multimedia files can be much like watching television. Documents include everything from simple short text documents to large computer multi-media databases.

A `page` is defined as any discrete file which can be downloaded as a single download segment. Technically, a web browser does not recognize or access documents per se, but instead accesses pages. Typically, one page is downloaded by a web browser as the result of clicking on a hot area. A page often has several source anchors 75 with hyperlinks to various other pages or to specific locations within pages.

One problem with accessing documents over the Internet is that many documents are quite long, and thus can take quite some time to download over the network. This means that viewers are often reluctant to access a document unless they know it will be useful. The present invention facilitates dividing documents into a plurality of pages which can be efficiently chosen by a viewer and downloaded, one page at a time, and only when the particular page desired is referenced. A page is thus a document which contains a portion of a source document. A source document is a document from which derivative documents (such as pages) are produced. The source document could be reconstructed from the pages generated from the source document.

A `summary page` is defined as an overview-type page containing summary information about another document (or a set of documents, if desired) and one or more hyperlinks to that other document.

A `presentation page` is defined as a page containing a portion or segment of a larger source document. Presentation pages provide conveniently sized pieces of the larger source document which are downloaded one at a time (rather than downloading the entire source document), typically as a result of a hyperlink the viewer wants to take into the corresponding portion of the source document.

From the point-of-view of a web browser program, presentation pages and summary pages are technically indistinguishable. However, summary pages are normally documents that are designed by people to contain hyperlinks to presentation pages (or to other summary pages), and are designed for use on the World Wide Web. In the context of the present invention, summary pages are also used to help navigate through information contained on a CD-ROM.

An `entry page` is defined as a summary page that has been assembled by a person or computer as an entry point to hyperlink to other summary pages and presentation pages of interest. Note, however, that any page, including summary pages and presentation pages can be accessed and/or downloaded directly, without having to go through an entry page.

A `home page` is defined as an entry page used by a database provider to provide an overview of other pages and/or documents available through the system associated with the home page. A home page often contains a trademark and other flashy pictorial or aesthetic information identifying the database provider. The viewer normally begins by clicking on one of the hot areas on a home page which the World Wide Web uses as an entry page to the information a database provider presents. The viewer likely starts to trace through a web of hyperlinks to a series of various documents on various computers on a network. (Hence the term World Wide Web.)

To support the Internet and the World Wide Web, a markup language called Hypertext Markup Language (HTML) has been developed. HTML has two major objectives. First, HTML provides a way to specify the structural elements of text (e.g., this is a heading, this is body text, this is a list, etc.) using tokens which are independent of the content of the text. A web browser uses these tokens to format the displayed text for the particular display device of a particular viewer. So, for example, HTML allows an author to specify up to six levels of heading information bracketed by six different heading-token pairs. Applications (e.g., web browsers) on different computers then process the HTML documents for visual presentation in a manner customized for particular display devices. An application on one computer could display a level 1 heading as 14 point bold Bodini, while an application on another computer could display it as 20 point italic Roman. A level 1 sequence is heralded with the sequence token <h1> and terminated with the token </h1>. Thus, a heading might be encoded as might be displayed as:

<h1> This is a level 1 heading </h1>

for a level one heading or

<h6> This is a level 6 heading </h6>

for a level 6 heading. As a markup language, HTML enables a document to be displayed within the capabilities of any particular display system even though that display system does not support italic, or bold, color, or any particular typeface or size. Thus, HTML supports writing documents so they can be output to everything from simple monospaced, single-size fonts to proportional-spaced, multiple-size, multiple-style fonts. Each computer program that accesses an HTML document can translate that HTML document into a display format supported by the hardware it will run on.

The second and more important aspect of HTML, for the purposes of the present invention, is that it provides a mechanism to incorporate hyperlinks within a single document and between documents located at different nodes on the Internet. These hyperlinks can contain addresses of documents anyplace on the Internet. HTML is described in The HTML Manual of Style by Larry Aronson, Ziff Davis, 1994.

FIG. 3 shows the flow from source document 20 through summary page generator 40 to resultant documents 64. In its most general form, summary page generator 40 is a program running on a computer which automatically analyzes textual data in a source document 20, and using weighting rules determines from the textual data what are the most significant phrases (i.e., strings of words), and generates a presentation page 150 which contains textual data from source document 20 plus special codes embedded in that textual data, the codes which specify to another program (generally a browser) where those significant phrases are.

Summary page generator 40 is typically a computer program that processes one or more source documents 20 to produce one or more output summary pages 62, and optionally, produces entry page 78 and divides source document 20 into a plurality of presentation pages 150. In one embodiment, summary page generator 40 runs on an IBM-compatible personal computer.

In one embodiment, a typical summary page 62 contains key-topic index entries that include hyperlinks to destination anchors where those key topics appear in the presentation pages 150 generated from source document 20. Various types of summary pages 62 are created, for example, separate summary pages can be created which contain a table-of-contents, a concept index, a phrase index, or an abstract index, respectively. In one embodiment, summary page generator 40 also generates an entry page 78 which contains source anchors 75 having hyperlink sources 72 to the various summary pages 62, and in one embodiment, optionally to presentation pages 150. In an alternative embodiment, summary page generator 40 combines all the summary pages 62 on a single summary page 62. A key topic index entry is an index term 69 for the key topic and an associated source anchor 75 or combination anchor 67 that are typically hyperlinked to occurrences of that key topic in the source documents 20 or their derivative documents (i.e., presentation pages 150).

The viewer begins navigating a document or database of documents starting at an entry page 78, and from there, hyperlinking to one of several summary pages, which in turn hyperlink to presentation pages, where data from the actual source documents are displayed for the viewer. In some cases, such as a key-phrase summary page 100 which contain hyperlinks to an abstract summary page 140, one summary page 62 will hyperlink to another summary page 62.

A summary page 62 could fit on a single computer display screen, or could be tens of thousands of lines of text which are scrolled, as in a word processor. In one embodiment, presentation pages 40 are derivative versions of source document 150 that contain embedded hyperlinks inserted by summary page generator 40.

In another embodiment, source document 20 is its own presentation page 150, especially if source document 20 already contains hyperlinks and/or hyperlink destinations inserted by an author before being processed by summary page generator 40.

There are three kinds of hyperlinks that can be generated:

Vertical hyperlinks: Vertical hyperlinks 91 are single-hop hyperlinks as shown in FIG. 4. Each vertical hyperlink 91 hyperlinks to one instance of a key topic in the presentation pages. As many hyperlink source anchor entries 72 for vertical hyperlinks 91 are created in summary page 62 for a key topic as them are instances of that key topic in the presentation pages 150. In one embodiment, summary page generator 40 locates each significant instance of a key topic by using semantic analysis on source documents 20.

Circular hyperlinks: When circular hyperlinks are generated, only one combination anchor entry 67 is created in summary page 62 for each key topic, no matter how many times that key topic appears in presentation pages 150. That combination anchor entry 67 is circularly hyperlinked through all the instances of that key topic in the presentation pages 150. FIG. 5 shows a conceptual schematic of a circular hyperlink starting at combination anchor entry 67 in summary page 62 and hyperlinking through each combination anchor 67 in presentation page 150, each of which, being a combination anchor 67 in a circular hyperlink chain, is both a source anchor and a destination anchor. The combination anchor entry 67 on summary page 62 allows vertical hyperlink 91 to the first instance of that key topic in presentation page 150, which in turn through the key topic's combination anchor 67 allows hyperlink 92 to the second instance which allows hyperlink 93 to a third instance of the key topic, and so on, until the final instance of the key topic allows hyperlink 94 back to the combination anchor entry 67 in the summary page 62.

In an alternative embodiment, the final instance of the key topic allows a hyperlink back to the first instance of the key topic in the presentation page rather than to the summary page. The preferred embodiment uses hyperlinking back through the summary page, since this function gives the viewer visual feedback every time the viewer completes a hyperlink cycle.

One embodiment of the present invention is included in the AnchorPage.TM. program by the assignee of the present invention.

Circular hyperlinks are an alternative to non-circular hyperlinks, which have the advantage of making the navigation to all instances of a key topic easier and/or faster, thus reducing the number of entries in a summary page by allowing a single entry per key topic in the summary page, rather than one entry for each instance of the key topic in the presentation pages.

Horizontal hyperlinks: FIG. 6 shows horizontal hyperlinks 95. Horizontal hyperlinks 95 are hyperlinks from a key-topic entry in one summary page 62 to instances of the same key-topic entry in other summary pages 62, typically from a brief key-topic entry such as a key-phrase entry, to a more-detailed entry such as an abstract entry. Although horizontal hyperlinks 95 need not be circular, the horizontal hyperlinks 95 shown in FIG. 6 are circular hyperlinks that hyperlink through combination anchors 67. In one embodiment, summary page generator 40 scans all the summary pages for key topics and inserts horizontal hyperlinks 95 to those key topics in the summary pages.

In one embodiment, only one of horizontal hyperlinks or vertical hyperlinks can be selected as being circular hyperlinks. However, in another embodiment both types are selected as circular, and two sets of circular hyperlink could cycle through the same summary page list entry. In yet another embodiment a separate icon for horizontal hyperlinks is eliminated and each circular hyperlink cycles through both the presentation pages and the summary pages.

In one embodiment, since both horizontal hyperlinks and vertical hyperlinks are used, a horizontal hyperlink icon appears near the source anchor so the viewer can either click on the highlighted key topic term in the source anchor to hyperlink through vertical hyperlinks or click on the horizontal hyperlink icon to hyperlink through horizontal hyperlinks.

When circular hyperlinks are used, a combination anchor 67 such as shown in FIG. 5, is both a source anchor and a destination anchor, so there are, in effect, three forms of embedded hyperlinks: Source, Destination and Combination. A source anchor 75 specifies a hot area to be highlighted and the name of the location in a document to which to hyperlink. A destination anchor 76 specifies the name of a place in a document in order that a hyperlink can go to that destination place. A combination anchor 67 contains both source anchor and destination anchor types of information and is described now in more detail.

A combination anchor 67 is a destination anchor that ends one hyperlink combined with a source anchor that begins another hyperlink.

The verb `to hyperlink` is defined as the clicking on a source anchor to go to a destination anchor, and includes following down a chain of hyperlinks by continuing to click on combined anchors.

To generate a combination anchor 67, summary page generator 40 embeds a hyperlink which (a) identifies index term 69 for the key topic in order that the key topic term can be highlighted as in a hot area, (b) specifies a name for destination anchor 76 for the combination anchor, in order that the combination anchor can be found and hyperlinked to as a destination, and (c) specifies destination location specification 73, used to find the location in a document to which to hyperlink when the hot area is clicked on.

If the text in which one wanted to embed a hyperlink is, "One should identify the essence of the idea if one wants to think clearly.", and the key topic term is "the essence", then in order to insert a combination anchor 67 comprising a destination anchor 76 and a source anchor 75, summary page generator 40 changes the text to be: ##STR1##

The above is a typical HTML format. "<A...>" defines the beginning of an anchor and "</A>" terminates that anchor and so defines the area where the intervening text is displayed to be a hot area that should be highlighted so that a mouse click or other input device activates the hyperlink. The phrase NAME="DEF34876" defines the name of the destination anchor 76 within the document in order that a hyperlink to the destination can find the destination. The phrase HREF="#GEN03789" provides the destination location specification 73 used as a destination reference name which the web browser will hyperlink to if the highlighted area is clicked on or otherwise activated. Here "DEF34876" and "GEN03789" are arbitrary generated names, but could just as well be "CAT" or "DOG". Where a hyperlink is to another document, the name of that document precedes the "#", so for example the hyperlink:

HREF="http://www.myserver.com/user1/project2#GEN03789" would hyperlink to the label name GEN03789 in the document user1/project2 at the server computer at the network address http://www.myserver.com.

The just-described method is only one embodiment of hyperlinks. In another embodiment, for example, at the beginning or end of each document there could be a table of hyperlink numbers or names along with the location within the document where the source anchor associated with the table entry is located.

Entry to the plurality of presentation pages 150 that summary page generator 40 generates is typically through a hyperlink from an entry page 78 which the database provider likely will define as their home page. The viewer uses a mouse or other pointing device to click on a highlighted hot area of source anchor 72 in entry page 78 that hyperlinks to one of the summary pages 62. One embodiment, shown in FIG. 8, provides four types of summary pages 62: an abstract summary page 140, a concept summary page 200, a key-phrase summary page 100 and a table-of-contents summary page 80. Each of these summary pages 62 contains a list of key-topic-entry hyperlink anchors. These hyperlink anchors may be used for vertical hyperlinks (such as vertical hyperlink 91 of FIG. 5) or horizontal hyperlinks (such as horizontal hyperlink 95 of FIG. 6) referred to earlier, and they may be circular or not.

Abstract summary page 140 comprises a list of abstracts (high semantic content sentences are treated as `abstracts`; in one embodiment, abstracts whose semantic content exceed the threshold value that the author selected will be listed in the abstract summary page 140 in the order in which they occurred in the text) automatically derived by summary page generator 40. Concept summary page 200 comprises a list of concepts (wherein `concepts` are noun phrases or noun-verb phrases that contain high-semantic-weight words; in one embodiment, the above-mentioned list of abstracts is generated; each abstract is then examined to determine all key topics; for each determined key topic, a copy of the abstract is made and `rotated` so the key topic appears first to make the `concept` (thus several concepts can be generated from each abstract); the list of concepts is then alphabetized) automatically derived by summary page generator 40. Key-phrase summary page 100 comprises a list of key phrases (key phrases are phrases with a high semantic weight; the key phrase is rotated so that