|
Description  |
|
|
BACKGROUND OF THE INVENTION
The invention relates to a method of monitoring the use of information
provided over a computer network, and to computers for implementing the
method.
A network is a means of communicating between two or more computers or
processors, and can take many forms, including for example the internet,
infra-red, radio signals and cabling. Any medium for transporting
information from one computer to another can be regarded as a network.
Most of the existing technology in this field is based in the area of
internet or intranet Web servers. The invention will therefore be
described in the context of server logs as used in this area, but it is
applicable across many other areas of client/server interaction. Another
particular area of potential interest is in relation to e-mail.
Currently, when a computer which provides information (i.e. server) fills
an information request, it writes to a file at the server (a server log)
whatever information it has about the request. Typically this will include
the following details:
(a) what time the request reached the server;
(b) what information was requested;
(c) where the information was sent to; and
(d) how the user was referred to the information (i.e. the referrer).
Some servers, or programs designed to run in conjunction with servers, will
also send (e) an item of information to identify the recipient, and will
record this if it is subsequently included with any future information
requests. There are currently many log analysers on the market which take
the details written by the server and try to form a picture of what
information users have been looking at, when, and for how long.
Since servers only see the client requests, the information on timing that
a server log analyser provides can be inaccurate. The time spent by the
user examining the information can only be estimated by looking at the
difference in time between when one request was made by a user and when
the next request was made by the same user. If a user stops viewing the
information sent, and later returns to it to make another information
request, the time spent elsewhere may be included in the estimate for time
spent viewing the sent information. This can be extremely inaccurate, to
the extent of reporting hours spent viewing the requested information when
only seconds were actually spent.
Another problem with timing is that over large networks such as the
internet, particularly when accessed over low bandwidth connections, the
time that the information takes to move from one computer to another can
also become significant, and cannot be measured by server log analysis.
When copies of the information sent by the server are stored on the client
computer and then viewed off-line, the server is not contacted and so no
record of this viewing of the copies is kept. If either the server or the
client is not connected to the network, then nothing will be seen using
log analysis even if the computers are later reconnected across the
network. This is another major deficiency in current methods.
With a Web server accessed by a standard browser, cached pages will often
be accessed whenever the `back button` or `forward button` are used. No
new request is sent to the server. Depending on individual settings, they
may also be used for any page that is revisited on a Web site. Other
different things may also cause caching, but in all cases it can lead to
inaccurate timing reports being produced from server logs.
A proxy server is a server which takes a copy of information from a content
server when it is first requested, and then passes it on to each client
that requests it within a limited time period. Subsequent copies of any
information passed through a proxy server do not involve any interaction
with the content server and are therefore not recorded by the content
server.
For all these reasons current methods of server log analysis are liable to
be inaccurate and unreliable. We have appreciated that there is a need to
provide a structure which can be used to provide more effective analysis
of server usage, by overcoming at least the major ones of these problems.
The problem of proxy servers has been recognised by MatchLogic Inc., of
10333 Church Ranch Boulevard, Westminster, Colo. 80021, United States of
America, which has produced a TrueCount system with a view to ameliorating
the inaccuracies in counting resulting from proxy server use. In this
system a small element of code is added to the header on the content pages
to be counted. If the pages are cached on a proxy server, then whenever
the proxy server delivers the stored content to a subsequent user, this
added code element acts as a messenger and transmits a message to a
special server set up to receive these messages. This however, is only a
limited solution to the problems enumerated above and does not enable the
other difficulties to be overcome. In particular, the system can take no
account of off-line viewing, and indeed has no need to, as its intended
purpose is to determine by how many users a page, or more likely an
advertising banner, has been accessed. It can not give information on the
length of time a page was viewed, whether on or off line, or any
information of a more complex nature.
International Patent Application Publication No. WO98/10349 describes a
system for monitoring the display by a user of content (e.g.
advertisements) received from a server. The system is designed inter alia
to make it difficult for the content provider to manipulate the log file
at the server, by setting up user computers automatically to access but
not display the content, and to avoid undercounting of cached pages. This
is achieved by monitoring at the user computer the display of web pages,
rather than just requests for pages. When a page is requested, a program
is transmitted to the user which causes the user computer to determine
which part of the content is being displayed on the user's screen, and to
note either the number of times the content is displayed or the start and
finish times of such display. This information is then transmitted back to
the content provider or to another location where the monitoring
information is analysed.
There are a number of problems with this system. First, it still does not
disclose how to handle the viewing of pages off-line. Secondly, it is
limited to the monitoring of display, which is complex and may itself be
inaccurate and not properly represent the effectiveness of the content,
e.g. the advertisements, being displayed. Finally, there is no effective
way of both ensuring that the monitoring data reaches the location where
the monitoring information is analysed and also avoiding data being stored
for long and/or indefinite periods at the user computer.
Reference may also be made to International Patent Application publication
No. WO97/41673.
It is well-known for web servers to send `cookies` to user computers which
are stored on the user computer and provide information about the user to
the server.
SUMMARY OF THE INVENTION
The invention is defined in the independent claims below, to which
reference should now be made. Advantageous features of the invention are
set forth in the appendant claims.
In a preferred embodiment of the invention, described in more detail below,
a provider or sender computer transmits to a requester or receiver
computer code which causes the requestor computer to monitor each time a
page is accessed or displayed, whether on-line or off-line, and to
generate a log of such usage. The log includes events which occur not only
when the requester computer is on-line to the provider computer but also
events which occur off-line. When a subsequent request for information is
made by the requester of the provider, the logged information is returned
to or accessed by the provider computer, where it can be analysed.
The provider computer also accompanies each transmitted page with a version
stamp, for example comprising date/time code. The requestor computer
stores the latest received version stamp. On each occurrence of a
specified event or events, it compares the version stamp in that page with
the stored version stamp. If the stored version stamp is older than the
version stamp of the displayed page, then it knows that the new page is
being received in response to a request it has sent to the provider
computer for information, and is not a stored page. Thus it knows that its
logged information on usage, which will have been transmitted with that
request for information, has reached the provider computer, and thus it
also knows that it can clear the log, or at least it knows what
information in the log is redundant.
Similar operations can be applied to the sending and receipt of e-mail
messages, as described below.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described in more detail, by way of example, with
reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating the operation of a server in a
client/server system embodying the invention; and
FIG. 2 is a flow chart illustrating the corresponding operation of the
client.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
An outline description of the preferred embodiment of the invention will
first be given. The preferred embodiment takes the form of a system and
method for logging the use of a server machine (computer or processor) by
a client machine in a client/server relationship. It will be appreciated
that this represents only one possible usage of the invention and that it
is applicable to other computer networks and relationships as discussed
above. In particular, the use of the invention in monitoring e-mail is
described at the end of this description.
The preferred system comprises a method of logging that operates on both
the server machine and the client machine. In the system, the information
supplied to the client is augmented with special code which may be
executed on the client machine. This additional code, when executed,
records information about the behavior of the user, by reference to
specified events occurring at the user computer. This may include the time
the user spends accessing or looking at the information supplied. The
recorded information is stored on the client in the form of a log. The
information recorded is not limited to the time the user spends accessing
the information supplied but can include also other information about the
behavior of the user, such as how far down the page a user scrolled, or
how long they held the pointer over a particular banner or button, for
example.
The additional code is not only operational when the client is on-line to
the server but also executes when the information is being viewed
off-line.
In the augmented code is also included a version stamp, which can simply be
a date/time code. This version stamp indicates the time when, relative to
other transmissions of information to this client, the information was
transmitted by the server. This information is copied to client-side
storage. A comparison by the client computer of the version stamp which
accompanies any viewed information with the copy previously stored on the
client first confirms to the client what information has reached the
server. In particular it enables the client to determine whether the
server has been contacted in the interim, and accordingly it can also
determine how the information stored on the client should be kept. This
includes, for example, whether old information can be purged, whether new
information is now required, and whether more or less information about
each access should be stored. Examples of this could be:
(1) Purging all client side records that have been passed to the server.
(2) Deciding after a set amount of accesses that a user was a regular user
and could be allowed greater access.
(3) Storing information about a user's behavior whilst viewing an
information set if they have previously viewed information designed to
influence that behavior.
The server can assign to each client a unique identifier (UID) which it
then passes to the client along with the version stamp appropriate to the
particular issue of the particular information set requested. The unique
client or user UID is then stored at the client side for subsequent use by
the server.
The system makes it possible to detect the viewing of old copies of the
information provided, stored on the user's hard disk, and, if required, to
stop them from being viewable.
It is possible to relate the amount of information stored about a set of
sent information to how often it, or any other set of sent information
from the same server, has been viewed.
The code at the client side can make decisions about information storage,
that is to decide the appropriate record-keeping and behavior of the
information set.
A specific example will now be described with reference to the drawings, in
which FIGS. 1 and 2 are flowcharts illustrating the execution of the
method on the server and client computers respectively. The Appendix to
this description contains a sample code fragment for the client side
record keeping encoded in current JavaScript form.
The routine illustrated in FIG. 1 shows the steps which are taken at the
server following receipt of an information request from a client, step 10.
First a check is made to find out whether this client already has a unique
identifier (UID), step 12, and if not, one is assigned, step 14. Then the
requested information is gathered and collated, step 16. In the
illustrated example no additional processing of the information takes
place. However, in a more sophisticated system, additional processing
could take place which was dependent upon whether a unique identifier was
received, and if so, what it was.
Next, the date/time code, constituting a `version stamp` is incorporated
into the information to be sent, along with the unique identifier (UID)
and the client-side code required to implement the system, step 18. As
noted above, code sent in this way and stored at the recipient is commonly
known in internet parlance as a `cookie`. As described, all the code is
sent with every page, but in a more sophisticated system the sending of
the client-side code could be dependent upon the logging information, as
just described above.
The requested information is now sent, step 20, thus achieving a reasonable
response time for the client. All recording or logging at the server of
information sent takes place after the sending of the information to the
client, and thus does not delay the sending of the information. A check is
made, step 22, as to whether the client has sent with the request any
information which needs to be logged. If the client has sent logging
information, this will mean that it has previously received the code and a
UID before, and so the logging information is passed and stored as a
record of that client's behavior.
What happens at the client will now be seen by reference to FIG. 2. The
client request, which is basically conventional, and which takes place
before the routine of FIG. 1, need not be shown. It may, however, include
logging information as discussed below. FIG. 2 illustrates what happens at
the client upon receipt of the information from the content server.
Information may come directly from the server or may have been stored or
cached. When information is received at the client side by the user, the
client-side code which was sent by the server first causes the client
system to check, step 30, to see if a copy of the hard-coded version stamp
(date/time code) exists, in a cookie received from the server. If it does
not exist, the procedure moves straight to step 34 and copies the
hard-coded version stamp to a cookie. If it does exist, then the procedure
compares, in step 32, whether the version stamp stored at the client is
the same as or alternatively is older than the currently-received version
stamp. If the stored version stamp is older than the currently-received
version stamp, indicating that new information has been received on-line
from the server, then the procedure moves to step 34, and the new version
stamp is copied to the cookie to replace the version stamp which was
previously stored there. If the version stamps are the same, then the
information being displayed is stored or cached information and has not
just been received on-line from the content server.
After step 34, the procedure moves to step 38, where the log of information
which has been sent to the server is deleted. The precise form of step 38,
however, will depend on the particular implementation. In the example just
described the only events being monitored are load events, so all
information will have been passed back to the server, as a log cookie,
when the information containing the latest version stamp was requested.
Thus all log events can be deleted. As this situation will only obtain
when the output of comparison step 32 is `yes`, this step 38 could in this
example be moved to between steps 32 and 34. In any event, the fact that
the stored version stamp is older than the version stamp of the displayed
page indicates that the new page is being received in response to an
information request the client has sent to the server. Thus the client
knows that the logged information on usage, which will have accompanied
that information request, has reached the server. This is why it knows it
can clear the log. In fact, the log may not actually be cleared, as some
of the information may be kept as a backup, but at least the client knows
what information is now redundant.
Finally, the procedure reaches step 36, either directly from step 32, if
the version stamps are the same, or via steps 34 and 38. In step 36 the
current event, that is viewing the received information, is appended to
the log cookie.
As described, the only event being monitored and logged is the loading of
information, that is to say only when the information is displayed, and
this represents the simplest embodiment of the invention. The power of the
system is however such that many other different events can be monitored.
Other sophistications can be introduced. For example, the appearance of
the information, or how much of it is displayed, can be changed dependent
upon the output of step 32.
The code illustrated in the Appendix can readily be adapted to monitor
other events, e.g. the time information is being viewed, by including
extra code similar to the last section of the code (from `function
GCLoaded`) to monitor the additional events. The preferred solution would
be to monitor every event occurring at the client computer and analyse the
monitoring information by looking for specified events only. The
identification of relevant events could be made at the server or they
could be pre-filtered at the client. The displayed information would only
then be looked at when the display is altered.
The form in which the log is kept can be chosen to suit any particular
application. For example, the log can contain the page name, start time,
and finish time, or the page name and viewing duration.
The log is transmitted to the server simply as a variable-length character
string. As described, the log is cleared whenever a new version stamp is
received. However, if desired, some or all of the records can be
duplicated by storing them on both the client and server.
It will be seen from the foregoing that the server transmits with each
transmission of sent information a version stamp which enables the client
code to determine whether this is a new transmission of information or
not. Furthermore, the client keeps a log of usage made and transmits this
log to the server on the next occasion when a request for information is
made of that server.
The version stamp does not have to be a date/time code, though this is a
particularly convenient from for it to take. It could in principle simply
be a serial number which is incremented on each transmission from the
server, either to that UID or generally, or any other value which changes
unidirectionally. It could be the last date/time code from the log most
recently received from the client.
If proxy servers are in use, the UID received from the proxy will probably
not match the UID for the client. In any event, it is as though the client
and the server were not in communication, and the system operates exactly
as it does when the client is viewing information off-line. However, in
the present system the information transmitted from the content server to
a proxy server preferably gives a zero validity time for the proxy server
to hold the information. Effectively this disables the function of the
proxy server. Setting the limited validity time period to zero in this way
is a known technique. If this is not done, the system will still operate
so long as at least occasional requests from the client get back to the
content server.
The structure of internet pages is such that although each page is
relatively simple, the links and interactions between them can rapidly
lead to a very complex decision-making tree. Keeping track of viewing is
not easy with such a structure. However, the method described of simply
sending a version stamp to identify a transmission of a desired page or
pages, and its use in logging the events which it is desired to monitor,
and particularly in clearing the log, leads to a particularly effective
way of monitoring the desired viewing and/or other events, without
excessive complexity and without having to retain a log of possibly
formidable size.
The specific embodiment of the invention described and illustrated is a Web
session tracking program to follow a visitor to a web site, and to note
their behavior patterns. The Appendix includes some sample code for
keeping track of any individual web page's timing information. It does not
include any attempt to compress the information, nor any error handling,
both of which could advantageously be added. The code included is generic
to all pages of a site, with the places where the code is unique to a
specific page being omitted for clarity.
Here a typical application would be to follow a user around a Web site by
looking at the time they spend on each page, both while on and off line.
This is one way of measuring the effectiveness of each page. Determining
the time a user spent on a given page is simple and accurate using the
client side records. The server in this example tags each set of timing
information with the unique identifier for that particular user, thus
allowing analysis of how different pages interact.
In the Appendix, the hardcoded version stamp has a value which is set by
the server. The setting of this value is not included in the code
fragment.
The use of the version stamp in the manner described has the advantage that
the client can be sure that the previous monitoring information has
reached the server, and thus that the monitoring information stored at the
client no longer needs to be retained.
Similar principles to those described can be used to monitor e-mail
messages sent over the internet. This is achieved as follows. Existing
e-mail programs allow the e-mail messages to be sent as HTML code, that is
the code used for web pages. This option is selected at the sender
computer. The message automatically carries with it the additional code
necessary to undertake the monitoring at the receiving computer. The
monitoring may simply confirm the receipt of the message but may include
other information about the user or activities at the receiving computer.
It is then necessary for the receiver to contact the sender so that the
monitoring information is transmitted back to the sender. This can be
achieved is one of two basic ways. In the first of these, the message
includes something that causes the user to wish to contact the sender
again, for example by sending a reply e-mail. Alternatively, the message
may incorporate something that makes the computer contact the server
independently of the user commands. This may be achieved by including a
graphic towards the end of the message which has to be downloaded. The
graphic may be a one-pixel picture which is in fact imperceptible to the
user.
The invention has been described in the context of a specific example, and
those skilled in the art will appreciate that many modifications may be
made in the system described and illustrated. The system enables the
monitoring of the use of information provided over a computer network and,
in the form described, can monitor use not only when the provider and
client machine are both connected to the network, but also when one or
both of them are disconnected from the network. Also, the system is able
to function in conjunction with other systems which are supplied with
other information sets from the same information provider without harmful
interference.
Appendix
Sample JavaScript code fragment for client side record keeping:
<script language="JavaScript">
<!--
var expdate=new Date ( );
if (GetCookie("clstr")==null){
SetCookie ("clstr", "Entered site;");
}
var pname=";"+document.title+";";
var KnownTime=new Date( );
CheckTime( );
var expdate=new Date ( );
var RelativeTime=expdate.getTime( )-BaseTime.getTime( );
function CheckTime( ){
KnownTime=GetCookie(`Time1`);
if (KnownTime !=BaseTime){
SetCookie(`Time1`,BaseTime);
clstrCookie=GetCookie(`clstr`)
SetCookie(`clstr`,`StartStream/`);
}
}
function GCLoaded (PageNumberLoad){
expdate=new Date ( );
RelativeTime=expdate.getTime( )-BaseTime.getTime( );
SetCookie (`clstr`,
GetCookie(`clstr`)+PageNumberLoad+`/`);
}
//-->
</script>
* * * * *
|
|
|
|
|
Description  |
|