|
Claims  |
|
|
What is claimed is:
1. A method for identifying textual documents and multi-media files
corresponding to a search topic, comprising the steps of:
(A) storing document records each of which is representative of one of a
plurality of textual documents, said document records having text
information fields associated therewith, each of said text information
fields representing text from one of said plurality of textual documents;
(B) storing multi-media records each of which is representative of one of a
plurality of multi-media files, said multi-media records having
multi-media information fields for representing only digital video or
audio information and associated text fields, each of said associated text
fields representing text associated with one of said multi-media
information fields;
(C) receiving a single search query corresponding to said search topic;
(D) searching an index database in accordance with said single search query
to simultaneously identify document records and multi-media records
related to said single search query, said index database having a
plurality of search terms corresponding to terms represented by said text
information fields and said associated text fields, said index database
including a table for associating each of said document and multi-media
records with one or more of said search terms;
(E) generating a search result list having entries representative of both
textual documents and multi-media files related to said single search
query in accordance with said document records and said multi-media
records identified in step (D);
(F) retrieving text corresponding to said search topic by selecting entries
from said search result list representing selected document records to be
retrieved, and then retrieving text represented by text information fields
associated with said selected document records; and
(G) retrieving digital video or audio information corresponding to said
search topic by selecting entries from said search result list
representing selected multi-media records to be retrieved, and then
retrieving digital video or audio information represented by multi-media
information fields associated with said selected multi-media records.
2. The method of claim 1, wherein said document records and said
multi-media records are formed from header files stored in a single common
format on said database.
3. The method of claim 2, wherein said multi-media records include a
plurality of still image records each of which is representative of a
still image.
4. The method of claim 3, wherein said multi-media records include a
plurality of motion video records each of which is representative of a
sequence of motion video frames.
5. The method of claim 4, wherein said multi-media records include a
plurality of digital audio records each of which is representative of a
sequence of digital audio frames.
6. The method of claim 1, wherein step (E) further comprises the step of
relevance ranking said document and multi-media records identified in step
(D) by generating a relevance score corresponding to each of said entries
in said search result list.
7. The method of claim 6, wherein step (E) further comprises the step of
forming a relevance ordered search result list by ordering said entries in
said search result list in accordance with said relevance ranking such
that an entry with a highest relevance ranking represents a first entry on
said relevance ordered search result list.
8. The method of claim 7, wherein entries corresponding to said document
records identified in step (D) and entries corresponding to said
multi-media records identified in step (D) are interspersed within said
relevance ordered search result list.
9. The method of claim 1, wherein said single search query is in a natural
language format.
10. An apparatus for identifying textual documents and multi-media files
corresponding to a search topic, comprising:
(A) means for storing document records each of which is representative of
one of a plurality of textual documents and multi-media records each of
which is representative of one of a plurality of multi-media fries, said
document records having text information fields associated therewith, each
of said text information fields representing text from one of said
plurality of textual documents, said multi-media records having
multi-media information fields for representing only digital video or
audio information and associated text fields, each of said associated text
fields representing text associated with one of said multi-media
information fields;
(B) means for receiving a single search query corresponding to said search
topic;
(C) searching means, coupled to an index database and said means for
receiving said single query, for searching said database in accordance
with said single search query to simultaneously identify document records
and multi-media records related to said single search query, said index
database having a plurality of search terms corresponding to terms
represented by said text information fields and said associated text
fields, said index database including a table for associating each of said
document and multi-media records with one or more of said search terms;
(D) search result list generation means, coupled to said searching means,
for generating a search result list having entries representative of both
textual documents and multi-media files related to said single search
query in accordance with said document records and said multi-media
records identified by said searching means;
(E) means for receiving signals representing selected document records and
selected multi-media records identified on said search results list;
(F) first means for retrieving, from said means for storing, text
represented by text information fields associated with said selected
document records; and
(G) second means for retrieving, from said means for storing, digital video
or audio information represented by multi-media information fields
associated with said selected multi-media records.
11. The apparatus of claim 10, wherein said document records and said
multi-media records are formed from header files stored in a single common
format on said database.
12. The apparatus of claim 11, wherein said multi-media records stored on
said database include a plurality of still image records each of which is
representative of a still image.
13. The apparatus of claim 12, wherein said multi-media records stored on
said database include a plurality of motion video records each of which is
representative of a sequence of motion video frames.
14. The apparatus of claim 13, wherein said multi-media records stored on
said database include a plurality of digital audio records each of which
is representative of a sequence of digital audio frames.
15. The apparatus of claim 10, wherein said search result list generating
means includes means for relevance ranking said document and multi-media
records identified by said searching means by generating a relevance score
corresponding to each of said entries in said search result list.
16. The apparatus of claim 15, wherein said result list generating means
further comprises means for forming a relevance ordered search result list
by ordering said entries in said search result list in accordance with
said relevance ranking such that an entry with a highest relevance ranking
represents a first entry on said relevance ordered search result list.
17. The apparatus of claim 16, wherein entries corresponding to said
document records identified by said searching means and entries
corresponding to said multi-media records identified by said searching
means are interspersed within said relevance ordered search result list.
18. The apparatus of claim 10, wherein said single search query is in a
natural language format. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention is directed to systems for identifying documents
corresponding to a search topic or query. More particularly, the present
invention is directed to an automated multi-user system for identifying
and retrieving text and multi-media files related to a search topic from a
database library composed of information from many various publisher
sources.
BACKGROUND OF THE INVENTION
Information retrieval systems are designed to store and retrieve
information provided by publishers covering different subjects. Both
static information, such as works of literature and reference books, and
dynamic information, such as newspapers and periodicals, are stored in
these systems. Information retrieval engines are provided within prior art
information retrieval system in order to receive search queries from users
and perform searches through the stored information. It is an object of
most information retrieval systems to provide the user with all stored
information relevant to the query. However, many existing
searching/retrieval systems are not adapted to identify the best or most
relevant information yielded by the query search. Such systems typically
return query results to the user in such a way that the user must retrieve
and view every document returned by the query in order to determine which
document(s) is/are most relevant. It is therefore desirable to have a
document searching system which not only returns a list of relevant
information to the user based on a query search, but also returns the list
to the user in such a form that the user can readily identify which
information returned from the search is most relevant to the query topic.
Existing systems for searching and retrieving files from databases based on
user queries are directed primarily to the searching and retrieval of
textual documents. However, there is a growing volume of multi-media
information being published which is not textual. Such multi-media
information corresponds, for example, to still images, motion video
sequences and digital audio sequences, which may be stored and retrieved
by digital computers. It would be desirable from the point of view of an
individual using an information searching/retrieval system to be able to
be able to query a library or database and identify not only text
documents, but also multi-media files that are relevant to user's query.
Moreover, it would be desirable if the searching system could return to
the user not only a single list having both text and multi-media
information relevant to the query search, but also a list which enabled
the user to readily identify which of the text and multi-media files were
most relevant to the query topic.
Each different publisher providing documents that may be retrieved by
information retrieval systems typically uses its own information format to
store and transmit its information files. Thus, an information
searching/retrieval system which has a library database based upon
information from many various publishers must be compatible with many
different publisher formats. This compatibility requirement can serve to
slow the performance of an information searching/retrieval system.
It is well known in the prior art of information retrieval systems to
permit a user to specify a single subject of a number of subjects for
searching. For example, a user may wish to search only sports literature,
medical literature or art literature. This avoids unnecessary searching
through database documents that are not relevant to the subject of
interest to the user. In order to provide this capability, information
retrieval systems must categorize documents received from publishers
according to their subject prior to adding them to the database.
Subjecting of incoming documents often requires an individual to read each
incoming and make a determination regarding its subject. This process is
very time consuming and expensive, as there is often a large number of
incoming documents to be processed. The subjecting process may be farther
complicated if certain documents should properly be categorized in more
than one subject. It would be desirable to have an automated system for
processing incoming documents which categorized each incoming document
into one or more subjects, and which did not require an individual to read
each incoming document and make a separate judgment categorizing the
subject of such document.
When a user of an information searching/retrieval system enters a search
query into the system, the query must be parsed. Based on the parsed
query, a listing of stored documents relevant to the query is provided to
the user for review. In the prior art, it is known to use semantic
networks when parsing a query. Semantic networks make it possible to
identify words not appearing in the query, but which correspond to or are
associated with the words used in the query. The number of words used to
search the database is then expanded by including the corresponding words
or associated words identified by the semantic network in the search
instructions. This procedure is used to increase the number of relevant
documents located by the information searching/retrieval system Although
semantic networks may be useful for finding additional relevant documents
responsive to a query, it is believed that use of such networks also tends
to increase the number of irrelevant documents located by the search. In
fact, it is generally believed that the number of additional relevant
documents identified through the use of semantic networks is roughly equal
to the number of irrelevant documents which are also brought into the
search results list as a result of the semantic network. It would be
desirable to have a system for implementing a semantic network which
maximized the number of relevant documents identified during the search,
without substantially increasing the number of irrelevant documents found
by the search.
Many publishers that provide documents to information retrieval systems
require record-keeping in order to ensure accurate royalty payments.
Record-keeping permits the publishers to determine the interest level in
various documents produced by the publisher, and the demographics of users
retrieving such documents. Thus, it would be desirable to have a
searching/retrieval system that tracked not only how often each document
stored in the system database was retrieved by users, but also the
demographics of the users retrieving the documents and the query searches
used to identify and retrieve such documents.
It is therefore an object of the present invention to provide a
searching/retrieval system which can query a library or database and
identify not only text documents, but also multi-media files stored on the
library or database that are relevant to query.
It is a further object of the present invention to provide a
searching/retrieval system that accepts a query and returns a single
search results list having both text and multi-media information, which
list is presented in a format that enables the user to readily identify
which of the text and multi-media files are most relevant to the query
topic.
It is a still further object of the present invention to provide a scalable
computer architecture for implementing a searching/retrieval system which
can query a database and identify text documents and multi-media files
stored on the database that are relevant to query.
It is a still further object of the present invention to provide an
information searching/retrieval system which has a library database based
upon information from many various publishers, and which is compatible
with many different publisher formats.
It is a still further object of the present invention to provide an
information searching/retrieval system which has a library database based
upon information from many various publishers, and wherein such
information is stored in a central database in one or more common
information formats.
It is a still further object of the present invention to provide an
automated system for processing incoming documents to be stored on a
library or database, which system categorizes each incoming document into
one or more subjects, and which does not require an individual to read
each incoming document and make a separate judgment categorizing the
subject of such document.
It is a still further object of the present invention to provide a system
for implementing a semantic network which maximizes the number of relevant
documents identified during the query search, without substantially
increasing the number of irrelevant documents found by the search.
It is a still further object of the present invention to provide a system
for using a semantic network which maximizes the number of relevant
documents identified during a query search by semantically expanding the
search in response to the part of speech associated with each query term
in the search.
It is a still further object of the present invention to provide a
searching system that queries a database to determine text documents and
multi-media fries relevant to the query, wherein weightings associated
with proper nouns and slow words are adjusted prior to searching the
database.
It is a further object of the present invention to provide a
searching/retrieval system that accepts a query and returns a single
search results list including document relevance values, wherein the
document relevance values are independent of the number of terms in the
query.
It is yet a still further object of the present invention to provide a
searching/retrieval system that tracks not only how often each document
stored in the system database was retrieved by users, but also the
demographics of the users retrieving the documents and the query searches
used to identify and retrieve such documents.
These and other objects and advantages of the invention will become more
fully apparent from the description and claims which follow or may be
learned by the practice of the invention.
SUMMARY OF THE INVENTION
The present invention is directed to a method and apparatus for identifying
textual documents and multi-media files corresponding to a search topic. A
plurality of document records, each of which is representative of at least
one textual document, are stored, and a plurality of multi-media records,
each of which is representative of at least one of multi-media file, are
also stored. The document records have text information fields associated
therewith, each of the text information fields representing text from one
of the plurality of textual documents. The multi-media records have
multi-media information fields for representing only digital video (i.e.,
still images or motion video image sequences), digital audio or graphics
information, and associated text fields, each of the associated text
fields representing text associated with one of the multi-media
information fields. A single search query corresponding to the search
topic is received. The single search query is preferably in a natural
language format. An index database is searched in accordance with the
single search query to simultaneously identify document records and
multi-media records related to the single search query. The index database
has a plurality of search terms corresponding to terms represented by the
text information fields and the associated text fields. The index database
also includes a table for associating each of the document and multi-media
records with one or more of the search terms. A search result list having
entries representative of both textual documents and multi-media files
related to the single search query is generated in accordance with the
document records and the multi-media records identified by the index
database search. Text corresponding to the search topic is retrieved by
selecting entries from the search result list representing document
records to be retrieved, and then retrieving text represented by the text
information fields associated with the selected document records. Digital
video, audio or graphics information corresponding to the search topic is
retrieved by selecting entries from the search result list representing
selected multi-media records to be retrieved, and then retrieving digital
video, audio or graphics information represented by multi-media
information fields associated with the selected multi-media records.
In accordance with a further aspect, the present invention is directed to a
computer-implemented method and apparatus for composing a composite
document on a selected topic from a plurality of information sources by
searching the plurality of information sources and identifying, displaying
and copying files corresponding to the selected topic. A plurality of
records, each of which is representative of at least one information file,
are stored in a database. A single search query corresponding to the
search topic is received. The database is searched in accordance with the
single search query to identify records related to the single search
query. A search result list is then generated having entries
representative of information files identified during the database search,
and the search result list is displayed in a first display window open on
a user display. Signals representative of at least first and second
selected entries from the search result list are received from the user,
the first and second selected entries respectively corresponding to first
and second information files. A second display window for displaying at
least a portion of the first information file is opened on the user
display, a third display window for displaying at least a portion of the
second information file is opened on the user display, and a document
composition window for receiving portions of the and second first
information files is opened on the user display. The composite document is
then composed by copying portions of the first and second information
files from the second and third display windows, respectively, to the
document composition window.
In accordance with a still further aspect, the present invention is
directed to a split-server architecture for processing a search query
provided by a user, and identifying and retrieving documents from a
database corresponding to the search query. A session server is provided
for receiving the search query from the user. The session server has at
least a first processor coupled to the user over a communications channel.
A query server is coupled to the session server. The query server has at
least a second processor coupled to a first database having records
representative of the documents to be searched. The query server includes
means for receiving the search query from the session server, searching
means for searching the first database to identify documents responsive to
the search query, and means for sending search results information
representative of the documents identified by the searching means from the
query server to the session server. The session server includes means for
sending the search query to the query server, means for receiving the
search results information from the query server, means for sending a
search results list representative of the search results information
across the communications channel to the user, means for receiving a
document retrieval request transmitted from the user over the
communications channel means for retrieving a document in response to the
retrieval request and transmitting a file representative of the document
to the user over the communications channel, and means for incrementing an
accounting record on an accounting database coupled to the session server,
the accounting record representing a number of retrievals of the document
by the session server.
In accordance with a still further aspect, the present invention is
directed to a method for preparing input information having differing
input formats from different information sources for storage in an
information retrieval system having a database with a database index for
retrieval of the input information from the database. First and second
input information having differing input information formats are received.
The input information in one format is converted from the input format to
an information retrieval system format to provide reformatted information.
The information from the other information format is converted into the
information retrieval system format to provide further reformatted
information, whereby the input information in the differing input formats
is converted into a single information retrieval system format. The
reformatted information is stored in the database according to the single
information system retrieval format and retrieved from the database
according to the single information retrieval system format.
In accordance with a still further aspect, the present invention is
directed to a method for determining a part of speech of words in a
sentence or sentence fragment. A hidden Markov model for determining the
most likely part of speech for the words in the sentence or sentence
fragment is provided, wherein the hidden Markov model has an initial
transition matrix and a subsequent transition matrix for storing the
probabilities of transitions from one part of speech to another. The
initial matrix of the hidden Markov model is effectively removed by making
the probabilities therein equal to each other to provide a modified hidden
Markov model. The modified hidden Markov model is applied to the sequence
of words to determine the most likely part of speech of words within a
sentence fragment with increased accuracy.
In accordance with yet a further aspect, the present invention is directed
to a method for storing input information in an information retrieval
system database wherein a plurality of information subject categories are
provided. A plurality of subject lexicons are provided, each subject
lexicon of the plurality of subject lexicons corresponding to an
information subject category of the plurality of information subject
categories. Each subject lexicon contains information representative of
its corresponding information subject category. The input information is
compared with the subject lexicons and the input information is stored in
a selected information subject category according to the comparing of the
input information with the subject lexicons.
In accordance with yet a timber aspect, the present invention is directed
to a method for storing information in an information retrieval system
having a database for retrieval of the input information in response to a
query. Text information representative of text is received for storing in
the system Image information representative of an image is also received
for storing in the system Additionally, image text information
representative of text associated with the image information is received.
The image information is stored in an image information format. The text
information and the image text information are stored in a common text
information format whereby the format of the stored text information is
identical to the format of the stored image text information. The text
information and image text information are searched in the common text
information format and the text information and image text information are
identified in response to a single query. The image information associated
with the retrieved image text information is selected and the selected
image information is retrieved whereby the text information and the image
information are retrieved in accordance with the same query.
In accordance with still yet a further aspect, the present invention is
directed to a method for searching a database of an information retrieval
system in response to a query having at least one query word with a part
of speech, for applying the query word to the database and selecting
information from the database according to the query word. A semantic
network is provided for determining expansion words to expand the search
of the database in response to the query word. The part of speech of the
selected query word is determined. The selected query word is applied to
the semantic network to provide one or more query expansion words in
response to the selected query word. The part of speech of the query
expansion word is determined. The query expansion word is applied to the
database in accordance with the part of speech of the selected query word
and the part of speech of the query expansion word.
In accordance with a still further aspect, the present invention is
directed to a method for performing a search of a database in an
information retrieval system in response to a query having at least one
query word with a query word weight and for applying the query word to the
database and selecting information from the information retrieval system
in accordance with the query word. A query word is selected and assigned a
weight. The weight is adjusted depending on whether the query word is a
proper noun or slow word. The adjusting can be an increase or a decrease
in the weight. Information is selected from the information retrieval
system in accordance with the adjusted weight.
In accordance with a still further aspect, the present invention is
directed to a method for searching a database of an information retrieval
system in response to a query having a query length of at least one word,
for applying the query word to the database and selecting information from
the database according to the query word. The query is received and the
length of the query is determined. Information is selected from the
database according to the query. The relevance of the selected information
is determined according to matches between the query and the information.
The determined relevance of the selected information is adjusted according
to the length of the query.
In accordance with a further aspect, the present invention is directed to a
method for searching an information retrieval system having a database
containing a plurality of documents from a plurality of document sources
in response to a query from a user. A document log table is provided for
tabulating document information of documents selected by the user in
response to a query from the user. The query is received from the user and
a document is selected by the user in response to the received query. The
document log table is adjusted in response to the selecting of the
document. The adjusted log table can be used to determine royalties.
BRIEF DESCRIPTION OF THE DRAWINGS
In order that the manner in which the above-recited and other advantages
and objects of the invention are obtained and can be appreciated, a more
particular description of the invention briefly described above will be
rendered by reference to a specific embodiment thereof which is
illustrated in the appended drawings. Understanding that these drawings
depict only a typical embodiment of the invention and are not therefore to
be considered limiting of its scope, the invention and the presently
understood best mode thereof will be described and explained with
additional specificity and detail through the use of the accompanying
drawings.
FIG. 1 is a simplified block diagram showing an information retrieval
system in accordance with a preferred embodiment of the present invention.
FIG. 2 is a simplified process flow diagram illustrating a user session
which may be performed with the information retrieval system shown in FIG.
1, in accordance with a preferred embodiment of the present invention.
FIG. 3 is a more detailed block diagram showing an information retrieval
system in accordance with a preferred embodiment of the present invention.
FIG. 4 is a more detailed process flow diagram illustrating a user session
which may be performed with the information retrieval system shown in FIG.
3, in accordance with a preferred embodiment of the present invention.
FIG. 4A is a diagram illustrating an exemplary search results list
displayed in an open window on a user's personal computer, in accordance
with a preferred embodiment of the present invention.
FIG. 4B is an exemplary diagram illustrating first and second open windows
on a user's personal computer which respectively display text and video
information corresponding to document and multi-media files selected by
the user for retrieval, in accordance with a preferred embodiment of the
present invention.
FIG. 4C is an exemplary diagram illustrating first and second open windows
on a user's personal computer which respectively display text and video
information corresponding to document and multi-media files selected by
the user for retrieval, and a composite document window in which the user
has built a composite document based on the text and video information in
the first and second windows, in accordance with a preferred embodiment of
the present invention.
FIG. 5 is a diagram illustrating preferred data structures for storing a
document information directory table, a dependent image table, and
publisher information table, in accordance with a preferred embodiment of
the present invention.
FIG. 5A is a diagram illustrating a preferred data structure for
implementing a document index database, in accordance with a preferred
embodiment of the present invention.
FIG. 5B is a diagram illustrating a preferred data storage format for
implementing an image/text database, in accordance with a preferred
embodiment of the present invention.
FIG. 6 is a block diagram illustrating the operation of software systems
for implementing the session and query managers shown in FIG. 4, in
accordance with a preferred embodiment of the present invention.
FIG. 6A is a state flow diagram showing the operation of a session manager
software system, in accordance with a preferred embodiment of the present
invention.
FIG. 6B is a flow diagram showing the operation of a search engine software
system, in accordance with a preferred embodiment of the present
invention.
FIG. 7A is a block diagram of a hidden Markov model suitable for parsing
full sentences.
FIG. 7B is a block diagram of a hidden Markov model for parsing sentence
fragments, in accordance with a preferred embodiment of the present
invention.
FIG. 8A is a table of relevance normalization values for normalizing
relevance scores output by a search engine, in accordance with a preferred
embodiment of the present invention.
FIG. 8B is a graph illustrating a system for normalizing relevance scores
output by a search engine, in accordance with a preferred embodiment of
the present invention.
FIG. 9 is a block diagram representation of the data preparation component
of the information retrieval system of FIG. 3, in accordance with a
preferred embodiment of the present invention.
FIG. 9A is a block diagram representation of data flows within the data
preparation component of FIG. 9, in accordance with a preferred embodiment
of the present invention.
FIG. 10 is a block diagram representation of an automatic subjecting system
for automatically determining the subject category of input documents, in
accordance with a preferred embodiment of the present invention.
FIG. 11 is a process flow representation of a method for generating subject
lexicons for use in the automatic subjecting system of FIG. 10, in
accordance with a preferred embodiment of the present invention.
FIG. 12 is a block diagram of a system for generating subject lexicons for
use in the automatic subjecting system of FIG. 10, in accordance with a
preferred embodiment of the present invention.
FIG. 13 is a representation of data structures within an accounting
database, in accordance with a preferred embodiment of the present
invention.
DETAILED DESCRIPTION OF THE INVENTION
Referring now to FIG. 1, there is shown a simplified block diagram
illustrating an information retrieval system 100, in accordance with a
preferred embodiment of the present invention. The information retrieval
system 100 includes a user station 102 for searching information files
which have been collected from various publisher sources 112 and stored in
data center 110. The user station 102 includes a personal computer (PC)
104 and user software 106 which resides on PC 104. User software 106
includes a graphical user interface (shown generally in FIGS. 4A, 4B and
4C). The user station 102 provides search queries by way of a
communications channel 108 (such as, for example, a large volume public
network or the Internet) coupled to the data center 110. The data center
110 includes session server 114 which includes means for receiving a
search query from user station 102, means for sending the search query to
a query server 116, means for receiving search results information from
the query server 116, means for sending a search results list
representative of the search results information across communications
channel 108 to the user station 102, means for receiving a document
retrieval request transmitted from user station 102 over communications
channel 108 to session server 114, and means for retrieving a document
from database 118 in response to the retrieval request and transmitting a
file representative of the document to user station 102 over
communications channel 108. The query server 116 at data center 110
includes means for receiving a search query from the session server 114,
searching means for searching a document index database 117 (shown in FIG.
3) to identify documents responsive to the search query, and means for
sending search results information representative of the documents
identified by the searching means from the query server 116 to the session
server 114. Data center 110 also includes a library database 118 for
storing text, image, audio or other multi-media information representative
of files provided by a plurality of publishers 112. As explained more
fully below, session server 114 retrieves (from library 118) documents
identified by a search query and selected by a user of user station 102
for retrieval, and then transmits the selected documents to the user
station 102 over channel 108.
Referring now to FIG. 2, there is shown a simplified process flow diagram
illustrating a user session 200 which may be performed with information
retrieval system 100 shown in FIG. 1, in accordance with a preferred
embodiment of the present invention. In step 202 of user session 200, the
user station 102 communicates to data center 110 (via channel 108) a
description of the information that a user of user station 102 would like
to identify at data center 110. More specifically, in step 202 the a user
of user station 102 sends a "natural language search query" to data center
110. As described more fully below in connection with FIG. 4, the term
"natural language search query" is used to refer to a question, sentence,
sentence fragment, single word or term which describes (in natural
language form) a particular topic or issue for which a user of user
station 102 seeks to identify information. Based on the natural language
query provided by user station 102, the query server 116 in data center
110 searches a document index database 117 (shown in FIGS. 3 and 5A)
coupled to the query server, and a list of files responsive to the search
query are returned to user station 102, as shown in step 204. Next, in
step 206, the the user of user station 102 may select for retrieval one of
the listed files identified by data center 110. In step 208, session
server 114 in data center 110 retrieves the full text, i | | |