|
Description  |
|
|
FIELD OF THE INVENTION
Aspects of the present invention provide a method and system for remotely
specifying which section of a hypertext document to display on a user's
computer.
BACKGROUND OF THE INVENTION
HTML is a "markup" language which allows an author to turn a simple text
document into a hypertext document for the World Wide Web ("the web ").
FIG. 1 is an example of a hypertext document from Sun Microsystems as
viewed through a browser from Netscape Communications, Inc. FIG. 2
illustrates the HTML source code which describes the hypertext document of
FIG. 1.
The HTML markup language is analogous in some ways to the formatting codes
used in word processing documents. A word processing document viewed
through a word processing program is actually a combination of the text
that you see and a series of hidden formatting codes (e.g., carriage
return, bold, underline) which instruct the word processing program to
display the word processing document in a specified way. Similarly, a
hypertext document is actually a combination of the text that you see and
a series of hidden "tags" or "anchors" (for new paragraphs, graphics
images, hypertext links, etc.) which instruct the browser program to
display the hypertext document in a specified way.
A hypertext document is usually broken down into sections, with each
section delineated by one or more HTML tags. HTML tags are formatting
codes surrounded by the characters < and > (less than and greater than
symbols). Some HTML tags have a start tag and an end tag. In general, end
tags are in the format </"symbol"> where the "symbol" is the character
string found between the characters < and > in the start tag. FIG. 3 is an
example of a series of HTML document tags forming a template for a typical
hypertext document. For example, the document of FIG. 3 is defined as an
HTML document using the tags <html> and </html>Then the "head" to the
document, which typically includes a title, is defined using the tags
<head>, </head>, <title>, and </title>, respectively. Following the head
comes the "body" of the document which is often organized into subtopics
with different levels of headings. The body is defined by the tags <body>
and </body>. Headings are indicated by the tags <h#> and </h#>, where #is
the level of the heading. Heading levels indicate the relative size of the
heading. Heading level 1 is the largest heading size and heading level 6
is the smallest heading size. Finally, it is good practice to indicate the
author of the document at the bottom of the document using the tags
<address> and </address>. FIG. 4 summarizes this information in a table
format.
Once the HTML template has been established, text is added to create a
basic hypertext document. In order to improve readability, the author adds
HTML character and paragraph formatting tags to the document. For example,
the <p> tag instructs the browser to begin a new paragraph. If an author
wants to highlight some text in bold, the author inserts the <b> tag at
the beginning of the text to be highlighted and inserts a </b> tag at the
end of the text to be highlighted. The tags <i> and </i> indicate text to
display in italics. FIG. 5 illustrates additional tags for formatting
characters and paragraphs.
If HTML was merely made up of the document, paragraph, and character
formatting tags discussed above, it would only allow an author to define a
document which stands by itself. Fortunately, additional HTML tags allow
an author to "link" documents together. If a reader of a hypertext
document wants to know more about a topic before reading the rest of the
current hypertext document, the reader selects a "link" or "hot link",
which retrieves and displays a new document that provides related
information. FIG. 6 illustrates a hypertext document (i.e, a "source
document") on Thomas Jefferson with a hot link named "the American
Constitution". The link could take the reader to a second hypertext
document (i.e., a "destination document") which, for example, displays the
text of the American Constitution or which provides more information on
Thomas Jefferson's role in the drafting of the American Constitution.
In HTML, a hot link to a destination document is made by placing a
"reference anchor" around the text to be highlighted (e.g., "the American
Constitution") and then providing a network location where the destination
document is located. Reference anchors extend the idea of start and end
tags. A reference anchor is created when the start tag <a> and the end tag
</a> are placed around the text to be highlighted (e.g., <a> the American
Constitution </a>). Then attribute information that identifies the network
location of the destination document is inserted within the <a> reference
tag. In HTML, the "href=" attribute, followed by the network location for
the destination document, is inserted within the <a> tag. For example,
<a href="network location for the destination document"> the American
Constitution </a>illustrates the basic format for a reference anchor. On
the web, network locations of hypertext documents are provided using the
Universal Resource Locator ("URL") naming scheme. FIG. 7 illustrates the
primary components of a URL.
A service type 701 is a required part of a URL. The service type tells the
user's browser how to contact the server for the requested data. The most
common service type is the HyperText Transport Protocol or http. The web
can handle several other services including gopher, wais, ftp, netnews,
and telnet and can be extended to handle new service types. A system name
703 is also a required part of a URL. The system name is the fully
qualified domain name of the server which stores the dam being requested.
A port 705 is an optional part of a URL. Ports are the network socket
addresses for specific protocols. By default, http connects at port 80.
Ports are only needed when the server does not communicate on the default
port for that service. A directory path 707 is a required part of a URL.
Once connected to the system in question, a path to the file must be
specified. A filename 709 is an optional part of a URL. The file name is
the data file itself. The server can be configured so that if a filename
isn't specified, a default file or directory listing is returned. A search
component 711 is another optional part of a URL. If the URL is a request
to search a data base, the query can be embedded in the URL. The search
component is the text after the ? or #in a URL.
Substituting the URL "http://system/dir/file.html" into the example above,
the reference anchor:
<a href="http://system/dir/file.html/"> the America Constitution </a>
identifies an html file to retrieve and display when a user selects "fie
American Constitution" hot link.
Sometimes an author may want to direct the reader's attention not to the
destination document as a whole but to a specific part of the destination
document. For example, instead of pointing the reader to the beginning
(i.e., the Preamble) of the American Constitution, an author may want to
point the reader directly to the 10th Amendment (i.e., Article X) of the
American Constitution. Hypertext links that point to a specific point in a
destination document are known as named anchors. Named anchors are
essentially modified reference anchors. Continuing with the example above,
if an author wants to point to the section on the 10th Amendment within a
destination document containing HTML source code for the entire American
Constitution, then the author follows a two step process. First the author
modifies the HTML source code for the destination document by inserting a
"NAME" attribute within an <a> tag which is inserted before the start of
the section on the 10th Amendment. For example, the tag
<a NAME="10th Amendment"> Article X </a>
could be inserted into the destination document's HTML, source code before
the start of the section on Article X. To reference this point, the author
of the source document creates a named anchor in the source document which
uses a #character to reference the"10th Amendment" NAME attribute in the
destination document. For example, the named anchor:
<a href="http://system/dir/file.html#10th Amendment"> the 10th Amendment
</a>
identifies the section on Article X as the section to retrieve and display
when a user selects the hot link "the 10th Amendment".
An implicit assumption of the example set forth above is that the author of
the source document has permission to edit and modify the destination
document in order to add a "NAME" attribute before the section on Article
X. At the very least, the author of the source document has to be able to
convince the author of the destination document to add such a "NAME"
attribute before the section on Article X. However, since the web is a
distributed, network-based hypertext system, the author of the source
document may in fact not have access to the destination document. Thus, it
would be beneficial to provide a method and system which allows browsers
to automatically display sections of destination documents, even though
those sections do not include embedded NAME attributes.
SUMMARY OF THE INVENTION
Embodiments of the present invention use a new extension to the HTML
language to support remotely specified named anchors. A remotely specified
named anchor, when embedded within a source document, instructs a browser
program to access a portion of a destination document indicated in the
remotely specified named anchor. One benefit of the present invention over
previously implemented named anchors is that embodiments of the present
invention provide this functionality even when the indicated portion of
the destination document does not contain a "NAME" attribute. In this way,
an author of a source document can create a hot link which scrolls to an
indicated portion of a destination document even though the author of the
source document is unable to modify, or have modified, the source code of
the destination document to include a "NAME" attribute.
In one embodiment, when the browser program reads a remotely specified
named anchor such as:
<a href=http://foo.com/bar.html SCROLL="Some Text">
the browser program performs the following steps: 1) the browser retrieves
the file "bar.html" from the server "foo.com",2) the browser searches the
file bar.html for "Some Text", and 3) if the browser finds the character
string being searched for, then the browser displays the file bar.html
scrolled to the line containing the first character of the character
string being searched for.
The present invention also provides graceful degradation to support legacy
browsers. If a remotely specified named anchor such as:
<a href=http://foo.com/bar.html SCROLL="Some Text">
is read by a browser program which does not support the new HTML extension
of the present invention then the legacy browser will simply ignore the
SCROLL attribute and will instead display the destination file bar.html in
the normal fashion, i.e., scrolled to the top of the file.
NOTATIONS AND NOMENCLATURE
The detailed descriptions which follow are presented largely in terms of
methods and symbolic representations of operations on data bits within a
computer. These method descriptions and representations are the means used
by those skilled in the data processing arts to most effectively convey
the substance of their work to others skilled in the art.
A method is here, and generally, conceived to be a self-consistent sequence
of steps leading to a desired result. These steps require physical
manipulations of physical quantities. Usually, though not necessarily,
these quantities take the form of electrical or magnetic signals capable
of being stored, transferred, combined, compared, and otherwise
manipulated. It proves convenient at times, principally for reasons of
common usage, to refer to these signals as bits, values, elements,
symbols, characters, terms, numbers, or the like. It should be bourne in
mind, however, that all of these and similar terms are to be associated
with the appropriate physical quantities and are merely convenient labels
applied to these quantities.
Useful machines for performing the operations of the present invention
include general purpose digital computers or similar devices. The general
purpose computer may be selectively activated or reconfigured by a
computer program stored in the computer. A special purpose computer may
also be used to perform the operations of the present invention. In short,
use of the methods described and suggested herein is not limited to a
particular computer configuration.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an example of a hypertext document from Sun Microsystems as
viewed through a browser from Netscape Communications, Inc.
FIG. 2 illustrates HTML source code which describes the hypertext document
of FIG. 1.
FIG. 3 is an example of a series of HTML document tags forming a template
for a typical hypertext document.
FIG. 4 summarizes information regarding HTML document tags.
FIG. 5 summarizes information regarding HTML character and paragraph tags.
FIG. 6 illustrates a hypertext document on Thomas Jefferson with a hot link
named "the American Constitution".
FIG. 7 illustrates the primary components of a Universal Resource Locator
("URL").
FIG. 8 is a block diagram of a computer system for practicing the preferred
embodiment of the present invention.
FIG. 9 is a flow diagram which illustrates the preferred steps taken to
access a portion of a destination document identified in a remotely
specified named anchor, even when the destination document does not
contain a "NAME" attribute.
DETAILED DESCRIPTION
Overview Of The Preferred Method
Embodiments of the present invention use a new extension to the HTML
language to support remotely specified named anchors. A remotely specified
named anchor, when embedded within a source document, instructs a browser
program to access a portion of a destination document indicated in the
remotely specified named anchor. When the browser program reads a remotely
specified named anchor such as:
<a href=http://foo.com/bar.html SCROLL="Some Text">
from the source document, the browser program performs the following steps:
1) the browser retrieves the destination file "bar.html" from the server
"foo.com", 2) the browser searches the file bar.html for "Some Text", and
3) if the browser finds the character string being searched for, then the
browser displays the file bar.html, scrolled to the line containing the
first character of the character string being searched for.
One benefit of the present invention over previously implemented named
anchors is that embodiments of the present invention provide this
functionality even when the indicated portion of the destination document
does not contain a "NAME" attribute. In this way, an author of a source
document can create a hot link which scrolls to an indicated portion of a
destination document even though the author of the source document is
unable to modify, or have modified, the source code of the destination
document to include a "NAME" attribute.
Overview Of The Preferred System
FIG. 8 is a block diagram of a computer system 800 for practicing the
preferred embodiment of the present invention. The computer system 800
includes a user computer 801, a source document server computer 803, a
destination document server computer 805, and a network communications
mechanism 807.
The user computer 801 includes a processor 809, a memory 811, and an
interface 813 for facilitating input and output in the user computer 801.
The memory 811 stores a number of items, including a browser 815, and an
operating system 817. The preferred browser is a Java.TM. enabled browser
such as Hot Java.TM. from Sun Microsystems, Inc., of Mountain View,
Calif..sup.1 The preferred operating system is the Solaris.TM. operating
system from Sun Microsystems, Inc.
1. Sun and Solaris are trademarks or registered trademarks of Sun
Microsystems, Inc., in the United States and other countries.
The source document computer 803 includes a processor 819, a memory 821,
and an interface 823 for facilitating input and output in the source
computer 803. The memory 821 stores a number of items, including a source
document 825, and an operating system 827. The preferred operating system
is the Solaris.TM. operating system from Sun Microsystems, Inc. of
Mountain View, Calif.
The preferred source document is a text document interspersed with
constructs of the HTML markup language. Another possibility would be a
text document marked up with SGML (Standard Generalized Markup Language).
In general, this embodiment does not require that the source document is
encoded in HTML, it is preferred, however, that the document contain one
or more URLs. For example, this patent application could be a source
document, and in fact it would be quite convenient to be able to refer in
this patent application to specific examples of web pages, hot links, and
reference anchors. If the source document is not encoded in HTML, SGML, or
some other standard format, it becomes more difficult, but certainly not
impossible, to recognize the URLs.
This embodiment of the invention does rely on the information being
represented as text, though there is no requirement that the text be
encoded in ASCII. For use with other languages, text may be encoded in
Unicode (the preferred embodiment for non-European languages) or any other
text encoding scheme that has the simple property of allowing the computer
to compare a string with a substring of the entire file and determine
whether the string is identical to the substring.
The destination document computer 805 includes a processor 829, a memory
831, and an interface 833 for facilitating input and output in the
destination computer 805. The memory 831 stores a number of items,
including a destination document 835, and an operating system 837. The
preferred destination document is a text document interspersed with
constructs of the HTML markup language. The preferred operating system is
the Solaris.TM. operating system from Sun Microsystems, Inc. of Mountain
View, Calif. The network communications mechanism 807 provides a mechanism
for facilitating communication between the user computer 801, the source
document server 803, and the destination document server 805.
It should be noted that the user computer 801, the source document server
803, and the destination document server 805 may all contain additional
components not shown in FIG. 8. For example, each computer could also
include some combination of additional components including a video
display device, an input device, such as a keyboard, mouse, or pointing
device, a CD-ROM drive, and a permanent storage device, such as a disk
drive.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The preferred operation of the system in FIG. 8 is perhaps best described
by way of example. FIG. 9 is a flow diagram which illustrates the
preferred steps taken to access a portion of a destination document
identified in a remotely specified named anchor, even when the destination
document does not contain a "NAME" attribute. First, a browser program
reads a remotely specified named anchor from a source document. Then the
browser parses the remotely specified named anchor and retrieves the name
of the file to access, and the network location of the server that stores
the file. After retrieving the file, the browser searches the file for the
indicated text. If the indicated text is found, then the browser displays
the file starting at the first character of the indicated text.
In step 901 the browser retrieves a source document. Typically, the source
document will be identified by a URL supplied by the user in an "Open
File" dialog box displayed by the browser. In step 903 the browser
displays the source document on the user's computer. In step 905 the
browser receives input by the user on the displayed source document. In
step 907 the browser determines whether the user selected a hot link
containing a remotely specified named anchor. If the user requested an
operation other than selecting a hot link containing a remotely specified
anchor then the browser merely performs the requested operation using
techniques available in the prior art (step 909). If, however, the user
did select a hot link containing a remotely specified named anchor then,
in steps 911 through 923, the browser processes the remotely specified
named anchor in order to access a specified portion of a destination
document identified in the remotely specified named anchor, even though
the specified portion of the destination document does not contain an HTML
"NAME" attribute associated with it.
In step 911 the browser parses the remotely specified named anchor to
obtain the name of the file to retrieve, as well as the name of the server
storing the file. In step 913 the browser retrieves the file from the
server. In step 915 the browser parses the remotely specified named anchor
in order to retrieve the character string for which to search. Those of
ordinary skill in this art will understand that, alternatively, the
browser could, in step 911, parse the remotely specified named anchor to
retrieve the character string to search for. In step 917 the browser
searches the retrieved file for the character string. If the character
string is not found in the file (step 919) then the retrieved file is
displayed to the user starting at the top of the file (step 921). If,
however, the character string is found in the retrieved file then the
browser displays the file scrolled to the line containing the first
occurrence of the character string being searched for (step 923).
In this way, a new method and system are provided which access a portion of
a destination document identified in a remotely specified named anchor,
even when the destination document does not contain a "NAME" attribute.
One weakness of the preferred embodiment is that it does not allow the
author of the source document to point to a second occurrence of a
character string. Consider as an example the following file:
1. Socrates was a man.
2. All men are mortal.
3. Therefore, Socrates was mortal.
4. So his ultimate downfall was due to the fact that Socrates was a man.
For example, the preferred embodiment will not link to the character
string "Socrates was a man" in line 4 in response to the remotely
specified named anchor
<a href=http://foo.com/socratestory.html SCROLL:"Socrates was a man">
because the preferred embodiment will instead scroll to the first
occurrence of the string "Socrates was a man" in line 1 of the file. This
limitation is not severe, however, since it will normally be the case that
the author can merely keep adding to the character string of choice until
it is uniquely identified. For example, if it indeed was desired to link
to the occurrence of "Socrates was a man" in line 4, then the author could
merely search for the string "that Socrates was a man". In this way, the
preferred embodiment would scroll the file to line 4, as desired. Though
not a perfect solution, this solution will be adequate in almost all
cases.
In general, embodiments of the invention apply to any system where it is
desired to be able to point to a specific part of a larger whole even when
one cannot get access to this larger whole to insert a reference marker.
There is an alternative, much simpler, way of solving one aspect addressed
by the present invention, but it is *not* recommended. One could simply
point to the character position (offset) of the desired scroll within the
destination file. The reason this is not recommended is that the character
position will change if the owner of the destination file edits it.
Editing the file may also change the search string but this is much less
likely to happen than the more common case where the author adds or
deletes text in another part of the document.
While specific embodiments have been described herein for purposes of
illustration, various modifications may be made without departing from the
spirit and scope of the invention. Accordingly, the invention is not
limited to the above described embodiments, but instead is defined by the
claims which follow, along with their full scope of equivalents.
* * * * *
|
|
|
|
|
Description  |
|