|
Description  |
|
|
TECHNICAL FIELD
The present invention relates generally to World Wide Web information
retrieval and more particularly to optimizing utilization of a
communication link during offpeak caching of Web data.
BACKGROUND OF THE INVENTION
The World Wide Web of the Internet is the most successful distributed
application in the history of computing. In the Web environment, client
machines effect transactions to Web servers use the Hypertext Transfer
Protocol (HTTP), which is a known application protocol providing users
access to files (e.g., text, graphics, images, sound, video, etc.) using a
standard page description language known as Hypertext Markup Language
(HTML). HTML provides basic document formatting and allows the developer
to specify "links" to other servers and files. In the Internet paradigm, a
network path to a server is identified by a so-called Uniform Resource
Locator (URL) having a special syntax for defining a network connection.
Use of an HTML-compatible browser (e.g., Netscape Navigator or Microsoft
Internet Explorer) at a client machine involves specification of a link
via the URL. In response, the client makes a request to the server
identified in the link and receives in return a document formatted
according to HTML.
There has been great interest in providing Internet access at minimal
economic cost. While most computers now are pre-configured for Internet
access, a significant percentage of households still do not have a
personal computer. Thus, it has now been proposed to provide a data
processing system that, much like a VCR, may be connected to a television
set and used in lieu of a personal computer to provide Web access through
a conventional remote control device associated with the system unit. Such
a system enables the television to become, in effect, a "Web" appliance.
The viewer can rapidly switch between conventional television and Internet
access using the remote control unit. All of the conventional "Internet"
access tools and navigational functions are preferably "built-in" to the
system and thus hidden to the user.
One such tool is so-called "off-line" browsing. As any casual user of the
Internet can attest, interesting or attractive web sites are sometimes
difficult to access due to large traffic demands. As a result, several
companies have developed so-called "off-line" browser programs that are
designed to deliver web pages from favorite Web servers to a user's hard
drive for browsing at the user's convenience. Typically, such programs
include some form of scheduling feature that enables the user to fetch
identifiable pages at off-peak hours, saving time and connection charges.
The user may then browse the pages at his or her convenience without a
modem and even without an active connection to the Internet.
While off-line browser programs offer certain advantages, they do not have
the capability to optimize utilization of the communication link between
the client and the World Wide Web during the off-peak information
retrieval process. This problem becomes more acute if there are
constraints on the amount of time that off-peak information retrieval may
be accomplished. In the future, it is anticipated that Web appliances of
the type described above will be provided by computer network or other
service providers, who will only allow their subscribers limited periods
of time during which off-line browsing will be permitted. Thus, for
example, a network operator may restrict subscribers to off-line browsing
for just one hour per night. During this hour, a user may desire to obtain
content from numerous Web sites. Thus, it would be desirable to provide
some mechanism that could optimize retrieval of Web content during this
limited period of time.
The present invention addresses and solves this important problem.
BRIEF SUMMARY OF THE INVENTION
It is thus a primary goal of the present invention to optimize utilization
of the communication link between a Web appliance and World Wide Web
servers during off-peak caching of Web data.
It is a further object of the invention to enhance off-peak caching of Web
data when client access to the network is restricted.
It is another object of the invention to ensure that the link between a Web
client and one or more Web servers is used to its maximum bandwidth during
a time-restricted off-peak browsing session.
It is a further important object of the invention to provide equitable
caching of content from a plurality of Web sites during an automatic
download session so that a user obtains a significant percentage of the
downloads that he or she desires.
It is another object to ensure that each of a plurality of Web sites has an
opportunity to deliver content to a client during an automatic download
session.
It is still another object of the invention to improve the functionality of
off-line browsing programs to make more efficient use of limited
communication resources.
It is yet another object of the invention to ensure maximum utilization of
the Web client modem during off-peak caching of Web data from the World
Wide Web of the Internet.
Still another object of the invention is to provide efficient off-peak Web
browsing for a Web client appliance, such as a data processing system
connected to a conventional television.
It is a more general object of the invention to provide a Web appliance
with an off-line browsing capability.
These and other objects of the invention are provided in a method of
retrieving Web content from a plurality of Web servers for delivery to a
Web client connectable to the World Wide Web via a communication link. The
Web client is preferably a data processing system connectable to a
television or other conventional monitor to provide low cost Internet
access. The method begins by having the user define a set of one or more
servers from which content is desired to be retrieved and stored in the
cache. These servers are preferably identified by a "list" of favorite Web
sites. A test is then made to determine whether a given download period
has terminated. Typically, this download period occurs during an "off"
period, such as in the middle of the night, to avoid traffic congestion at
the Web server sites. If the given download period has not terminated, a
determination is then made of an activity level for the communication link
as content is being downloaded to the cache from the one or more servers.
If the activity level for the communication link is less than a given
threshold level, additional requests for content are issued to the cache
according to a so-called "fairness policy" that ensures that content from
as many sites as possible is downloaded during the download period.
Thus, for example, according to the fairness policy the additional requests
for content are issued to the set of one or more servers as one request
per server in an ordered sequence. Alternatively, the additional requests
for content are issued to the set of one or more servers up to a
predetermined number of requests per server in an ordered sequence. Or,
the additional requests are issued to the set of one or more servers based
on the number of bytes received in the cache from a particular server. The
fairness policy ensures that no one server "dominates" the download
session to the exclusion of the other servers from which the user desires
to download content. When content (i.e. a Web document) is received from a
particular server, it is stored in the cache for off-line browsing. Prior
to storage, the routine preferably removes duplicative and/or non-local
HTML links so that subsequent access requests to the same document are
handled more expediently.
According to another feature of the invention, a data processing system is
provided to facilitate low cost Internet access. The system comprises two
major parts: a remote control unit, and a base unit connectable to a
monitor for providing Internet access under the control of the remote
control unit. The base unit is, in effect, a computer, and includes a
modem connected to a communication link, a processor, a memory, and
various embedded control programs. These programs include a browser
program including means responsive to commands from the remote control
unit for generating a list of Web sites, and a cache control program for
initiating download requests to the communication link based on the list
of Web sites. An optimization routine maintains maximum utilization of the
modem during a download session by issuing multiple HTTP GET requests to
the cache control program based on the fairness policy.
The foregoing has outlined some of the more pertinent objects and features
of the present invention. These objects should be construed to be merely
illustrative of some of the more prominent features and applications of
the invention. Many other beneficial results can be attained by applying
the disclosed invention in a different manner or modifying the invention
as will be described. Accordingly, other objects and a fuller
understanding of the invention may be had by referring to the following
Detailed Description of the Preferred Embodiment.
BRIEF DESCRIPTION OF THE DRAWINGS
For a more complete understanding of the present invention and the
advantages thereof, reference should be made to the following Detailed
Description taken in connection with the accompanying drawings in which:
FIG. 1A is pictorial representation of a data processing system unit
connected to a conventional television set to form a "Web" appliance;
FIG. 1B is a pictorial representation of a front panel of the data
processing system unit of the present invention;
FIG. 1C is a pictorial representation of a rear panel of the data
processing system unit;
FIG. 1D is a pictorial representation of a remote control unit associated
with the data processing system unit of the invention;
FIG. 2 is a block diagram of the major components of the data processing
system unit in accordance with a preferred embodiment of the present
invention;
FIG. 3 is a representative "favorites" list created by the user as a result
of browsing the World Wide Web of the Internet;
FIG. 4 is a representative server URL queue list for the "favorites" list
of FIG. 3;
FIG. 5 is a flowchart of a preferred method of the present invention for
optimizing communication link activity during an off-peak caching session;
FIG. 6 is a flowchart of the process response routine that is executed for
each Web document received in response to a URL request; and
FIG. 7 is a block diagram of a representative Web server platform or Web
site; and
FIG. 8 is a flowchart of the methods that are carried out by a Web server
in response to receipt of a request from an Internet client such as the
Web appliance described herein.
DETAILED DESCRIPTION
With reference now to the figures, and in particular with reference to
FIGS. 1A through 1D, various pictorial representations of a data
processing system in which a preferred embodiment of the present invention
may be implemented are depicted. FIG. 1A is a pictorial representation of
the data processing system as a whole. Data processing system 100 in the
depicted example provides, with minimal economic costs for hardware to the
user, access to the Internet. Data processing system 100 includes a data
processing unit 102. Data processing unit 102 is preferably sized to fit
in typical entertainment centers and provides all required functionality,
which is conventionally found in personal computers, to enable a user to
"browse" the Internet. Additionally, data processing unit 102 may provide
other common functions such as serving as an answering machine or
receiving facsimile transmissions.
Data processing unit 102 is connected to television 104 for display of
graphical information. Television 104 may be any suitable television,
although color televisions with an S-Video input will provide better
presentations of the graphical information. Data processing unit 102 may
be connected to television 104 through a standard coaxial cable
connection. A remote control unit 106 allows a user to interact with and
control data processing unit 102. Remote control unit 106 allows a user to
interact with and control data processing unit 102. Remote control unit
106 emits infrared (IR) signals, preferably modulated at a different
frequency than the normal television, stereo, and VCR infrared remote
control frequencies in order to avoid interference. Remote control unit
106 provides the functionality of a pointing device (such as a mouse,
glidepoint, trackball or the like) in conventional personal computers,
including the ability to move a cursor on a display and select items.
FIG. 1B is a pictorial representation of the front panel of data processing
unit 102 in accordance with a preferred embodiment of the present
invention. The front panel includes an infrared window 108 for receiving
signals from remote control unit 106 and for transmitting infrared
signals. Data processing unit 102 may transmit infrared signals to be
reflected off objects or surfaces, allowing data processing unit 102 to
automatically control television 104 and other infrared remote controlled
devices. Volume control 110 permits adjustment of the sound level
emanating from a speaker within data processing unit 102 or from
television 104. A plurality of light-emitting diode (LED) indicators 112
provide an indication to the user of when data processing unit 102 is on,
whether the user has messages, whether the modem/phone line is in use, or
whether data processing unit 102 requires service.
FIG. 1C is a pictorial representation of the rear panel of data processing
unit 102 in accordance with a preferred embodiment of the present
invention. A three wire (ground included) insulated power cord 114 passes
through the rear panel. Standard telephone jacks 116 and 118 on the rear
panel provide an input to a modem from the phone line and an output to a
handset (not shown). The real panel also provides a standard computer
keyboard connection 120, mouse port 122, computer monitor port 124,
printer port 126, and an additional serial port 128. These connections may
be employed to allow data processing unit 102 to operate in the manner of
a conventional personal computer. Game port 130 on the rear panel provides
a connection for a joystick or other gaming control device (glove, etc.).
Infrared extension jack 132 allows a cabled infrared LED to be utilized to
transmit infrared signals. Microphone jack 134 allows an external
microphone to be connected to data processing unit 102.
Video connection 136, a standard coaxial cable connector, connects to the
video-in terminal of television 104 or a video cassette recorder (not
shown). Left and right audio jacks 138 connect to the corresponding
audio-in connectors on television 104 or to a stereo (not shown). If the
user has S-Video input, then S-Video connection 140 may be used to connect
to television 104 to provide a better picture than the composite signal.
If television 104 has no video inputs, an external channel 3/4 modulator
(not shown) may be connected in-line with the antenna connection.
FIG. 1D is a pictorial representation of remote control unit 106 in
accordance with a preferred embodiment of the present invention. Similar
to a standard telephone keypad, remote control unit 106 includes buttons
142 for Arabic numerals 0 through 9, the asterisk or "star" symbol (*),
and the pound sign (#). Remote control unit also includes "TV" button 144
for selectively viewing television broadcasts and "Web" button 146 for
initiating "browsing" of the Internet. Pressing "Web" button 146 will
cause data processing unit 102 to initiate modem dial-up of the user's
Internet service provider and display the start-up screen for an Internet
browser. The browser includes a "Favorites" or "Bookmarks" feature that
enables the viewer to record the Uniform Resource Locator (URL) for those
Web sites that the user desires to revisit.
A pointing device 147, which is preferably a trackpoint or "button"
pointing device, is included on remote control unit 106 and allows a user
to manipulate a cursor on the display of television 104. "Go" and "Back"
buttons 148 and 150, respectively, allow a user to select an option or
return to a previous selection. "Help" button 151 causes context-sensitive
help to be displayed or otherwise provided. "Menu" button 152 causes a
context-sensitive menu of options to be displayed, and "Update" button 153
will update the options displayed based on the user's input, while home
button 154 allows the user to return to a default display of options. One
of the options is the Favorites or Bookmarks list. A representative list
is shown in FIG. 3 as a pull-down menu 155 on the television screen.
"PgUp" and "PgDn" buttons 156 and 158 allows the user to change the
context of the display in display-sized blocks rather than by scrolling.
The message button 160 allows the user to retrieve messages.
In addition to, or in lieu of, remote control unit 106, an infrared
keyboard (not shown) with an integral pointing device may be used to
control data processing unit 102. The integral pointing device is
preferably a trackpoint or button type of pointing device. A wired
keyboard (also not shown) may also be used through keyboard connection
120, and a wired pointing device such as a mouse or trackball may be used
through mouse port 122. When a user has one or more of the remote control
unit 106, infrared keyboard, wired keyboard and/or wired pointing device
operable, the active device locks out all others until a prescribed period
of inactivity has passed.
Referring now to FIG. 2, a block diagram for the major components of data
processing unit 102 in accordance with a preferred embodiment of the
present invention is portrayed. As with conventional personal computers,
data processing unit 102 includes a motherboard 202 containing a processor
204 and memory 206 connected to system bus 280. Processor 205 is
preferably at least a 486 class processor operating at or above 100 MHz.
Memory 206 may include cache memory and/or video RAM. Processor 205,
memory 206, and system bus 208 operate in the same manner as corresponding
components in a conventional data processing system.
Video/TV converter 210, located on motherboard 202 and connected to system
bus 208, generates computer video signals for computer monitors, a
composite television signal, and an S-Video signal. The functionality of
Video/TV converter 210 may be achieved through a Trident TVG9685 video
chip in conjunction with an Analog Devices AD722 converter chip. Video/TV
converter 210 may require loading of special operating system device
drivers.
Keyboard/remote control interface unit 212 on motherboard 202 receives
keyboard codes through controller 214, regardless of whether a wired
keyboard/pointing device or an infrared keyboard/remote control is being
employed. Infrared remote control unit 106 transmits signals which are
ultimately sent to the serial port as control signals generated by
conventional mouse or pointing device movements. Two buttons on remote
control unit 106 are interpreted identically to the two buttons on a
conventional mouse, while the remainder of the buttons transmit signals
corresponding to keystrokes on an infrared keyboard. Thus, remote control
unit 106 has a subset of the function provided by an infrared keyboard.
Connectors/indicators 216 on motherboard 202 provide some of the
connections and indicators on data processing unit 102 described above.
Other connections are associated with and found on other components. For
example, telephone jacks 116 and 118 are located on modem 222. The power
indicator within connectors/indicators 216 is controlled by controller
214.
External to motherboard 202 in the depicted example are power supply 218,
hard drive 220, modem 222 and speaker 224. Power supply 218 is a
conventional power supply except that it receives a control signal from
controller 214 which effects shut down of all power to motherboard 202,
hard drive 220 and modem 222. In some recovery situations, removing power
and rebooting is the only guaranteed method of resetting all of these
devices to a known state. Thus, power supply 218, in response to a signal
from controller 214, is capable of powering down and restarting data
processing unit 102.
Controller 214 is preferably one or more of the 805x family controllers.
Controller 214 receives and processes input from infrared remote control
106, infrared keyboard, wired keyboard, or wired mouse. When one keyboard
or pointing device is used, all others are locked out (ignored) until none
have been active for a prescribed period. Then the first keyboard or
pointing device to generate activity locks out all others. Controller 214
also directly controls all LED indicators except that indicating modem
use. As part of the failure recovery system, controller 214 specifies the
boot sector selection during any power off-on cycle.
Hard drive 220 contains operating system and applications software for data
processing unit 102, which preferably includes IBM DOS 7.0, a product of
International Business Machines Corporation in Armonk, N.Y.; an operating
system such as Windows 3.1 (or higher), a product of Microsoft Corporation
in Redmond, Wash.; and Netscape Navigator (Version 1.0 or higher), a
product of Netscape Communications Corporation in Mountain View, Calif.
Minor modifications of these software packages may be desirable to
optimize performance of data processing unit 102. Also, it is highly
desirable to update one or more of these "off-the-shelf" programs as well
as the other software used by the present invention by downloading new
versions of the code via the Internet. Web appliance includes appropriate
control software to facilitate such downloading. Hard drive 220 also
stores data, such as the list of favorite Internet sites or unviewed
downloads from one or more Internet site(s). A cache controller program
225 run by the processor is used to administer and manage these downloads
as will be described below.
Modem 222 may be any suitable modem used in conventional data processing
systems, but is preferably a 33.6 kbps modem supporting the V.42bis, V.34,
V.17 Fax, MNP 1-5, and AT command sets. To maintain the slim height of
data processing system 102, modem 222 is preferably inserted into a slot
mounted sideways on motherboard 202. Modem 222 is connected to a physical
communication link 227, which, in turn, in connected or connectable to the
World Wide Web of the Internet (not shown). As is well-known, the World
Wide Web is the Internet's multimedia information retrieval system. In the
Hypertext Transfer Protocol (HTTP), which provides users access to files
using Hypertext Markup Language (HTML). A link activity monitor 229
determines the extent to which the communication link 227 is being
utilized at a given point in time. The link activity monitor may be a
hardware-based controller or a software application run by the processor.
A Web server, sometimes referred to as a "Web" site, supports hypertext
documents in directories and files accessible through the Uniform Resource
Locator. Typically, all hypertext documents available at a particular Web
site are considered part of the same "domain" (e.g., www.domainname.com).
Pages that are "local" to the domain usually are identified by a
"relative" link, which is a reference to a path and/or filename within the
domain, e.g., www.domainname.com/path/html1. A representative Web server
is illustrated in FIG. 7 below.
Those skilled in the art will recognize that the components depicted in
FIGS. 1A-1D and 2 and described above may be varied for specific
applications or embodiments. Such variations in which the present
invention may be implemented are considered to be within the spirit and
scope of the present invention.
It is desired to enable a user of the data processing system to browse the
Web "off-line." This function is provided by the cache controller 225. The
cache controller may be a piece of dedicated hardware, or it may be an
application program run by the processor. In the preferred embodiment,
cache controller 225 is implemented as a software program upgradable
through Internet downloads.
As noted above, during one or more on-line browsing sessions, a viewer may
compile a list of "favorite" or "bookmark" Web sites that he or she
desires to revisit. All or any portion of this list may also be designated
for access "off-line" so that the content of such sites may be downloaded
and stored in a dedicated "cache" of the hard drive for later viewing,
preferably "off-line." The cache controller program thus includes a
control engine 231 (preferably implemented in software run by the
processor) for controlling the modem 222 to dial up and connect to the
Internet site(s) automatically (e.g., each night while the appliance is
unattended). As seen in FIG. 4, each "favorite" Web site is associated
with a server URL queue 235. A server URL queue 235 is a data structure
that identifies the URL of the Web site as well as one or more HTML links
spawned from (i.e. located within) the page. Preferably, the server URL
queue 235 includes only relative links, although this is not a limitation
of the present invention. Moreover, although the queue 235 is shown as a
dedicated portion of the memory 206, this is not a requirement, as the
queue may be a linked list or any other convenient data structure.
In a representative embodiment, the user will not have the ability to set
the time period during which the engine will cache Internet site content;
rather, this time period is predetermined by the network service provider.
Generally, this time period will be restricted, e.g., one (1) hour per
night. Therefore, according to the present invention, the cache controller
program 225 also includes an optimization routine to ensure that the modem
222 is used to its maximum capability during the restricted period of time
that Internet sites are cachable to the hard drive 220. Moreover, the
optimization routine includes a "load balancing" function to ensure that
content identified by the server URL queues is equitably cached during the
download period. As will be seen, this feature of the invention enables
the viewer to obtain a significant percentage of the downloads that he or
she desires.
This optimization routine is now described with reference to the flowchart
of FIG. 5. The primary processing of the routine begins at step 250 with a
test to determine whether it is time to stop the process, i.e. whether the
predetermined download period has expired. As discussed above, in an
exemplary embodiment, this download period is one (1) hour, although it
should be appreciated that other time period(s) may be used as well. This
period may also be selectively adjusted if desired, but typically not by
the user. If the outcome of the test at step 250 is positive, the primary
processing routine is complete at step 252. If more time is available,
then the routine continues at step 254 to query the activity on the
communication link 227 to which the modem 222 is connected. Step 254
determines how much "bandwidth" is being used since a last "iteration" or
cycle (of the routine) by receiving information from the link activity
monitor 229. The routine then continues at step 256 to test whether the
measured link activity meets some peak usage criteria.
As discussed above, the present invention has as a goal to maximize
download throughput to the cache during the download period. The peak
usage criteria generally is dependent on conditions on the communication
link, the modem type, or such other criteria as may be predetermined or
defined. Thus, for example, peak usage criteria may be defined by an
average link utilization for a monitored interval exceeded some preset
limit between 0-100%. Or, the peak usage criteria may be based on some
predefined limit on the number of outstanding HTTP GET requests that are
issued from the cache manager to the network. A given HTTP GET request is
used to request download of the content from a given Web site. Thus, for
example, the peak usage criteria may be defined to include: not less than
N total outstanding HTTP GET requests, not more than M total outstanding
GET requests (M>N), and so on. According to the present invention, any
convenient "peak usage" criteria may be used in the comparison at step
256.
If the link activity meets the peak usage criteria, then the modem 222 is
being used to its maximum capacity. As a result, the outcome of the test
at step 256 is positive and the routine loops back to step 258, which is
indicated as a delay. This box reflects that no more content requests are
submitted. The routine then returns to step 250, as previously described.
If, however, the link activity does not meet the peak usage criteria, then,
in effect, the modem is not being used to its maximum capacity. This is a
negative outcome of the test at step 256. As a result, the routine
continues at step 260 to obtain another URL from a server URL queue. The
particular way in which this is accomplished will be described below, but
it will be appreciated that the present invention provides a "load
balancing" function that ensures that content identified by the server URL
queue(s) 235 is cached equitably during the session. At step 262, a test
is made to determine whether a URL was obtained from a queue. If not, the
routine continues by testing at step 263 whether the total outstanding
requests is greater than 0. If the result of the test at step 263 is
positive, then the routine returns to the path of the delay 258 and
returns. If the result of the test at step 263 is negative, meaning that
no more outstanding requests exist, the routine is done and terminates.
This outcome would occur, for example, if there were no more unserviced
URLs on any server URL queue. If a URL was obtained at step 260, the
outcome of the test at step 262 is positive, and the routine continues at
step 264 to submit the URL request to the cache controller. Although not
described in detail here, it should be appreciated that the cache
controller then processes the request in a known manner to initiate the
download process. The routine then continues at 266 to increment a count
of the number of outstanding requests. This number may be a total for the
overall sessions, or a per server URL queue count, or both. After step
266, the routine returns a | | |