|
Claims  |
|
|
What is claimed and desired to be secured by patent is:
1. A method supporting keyword searches of data items in a structured database, the method comprising the computer-implemented steps of:
selecting at least one data item in the structured database, each selected item containing data and each selected item having a corresponding location identifier which identifies the item's location within the structured database;
documenting the selected data items by creating at least one document outside the structured database which contains a textual representation of each selected item's data; and
indexing the documents by creating an index outside the database, the index associating keywords in the textual representation of each selected item's data with that item's location identifier,
wherein the structured database includes data items organized as records in relations according to a data dictionary, the selecting step includes the step of providing a supplemental data dictionary which identifies selected records or tables,
and the indexing step only indexes records and tables that are identified by the supplemental data dictionary.
2. The method of claim 1, wherein the indexing step includes providing to a keyword search engine indexing agent both the textual representation of each selected item's data and the selected item's location identifier.
3. The method of claim 2, wherein the indexing agent produces an index that associates keywords with resource locators, and each resource locator includes a textual representation of a data item location identifier.
4. The method of claim 3, wherein the resource locator includes an URL.
5. The method of claim 3, wherein the resource locator includes a file path.
6. The method of claim 3, wherein the textual representations are comprehensive with respect to the data values of selected data items.
7. The method of claim 1, wherein the creating step creates an index containing keywords that are textual representations of data in the selected data items.
8. The method of claim 7, wherein the creating step creates an index containing keywords that are textual representations of non-numeric data in the selected data items.
9. A method supporting keyword searches of data items in a structured database, the method comprising the computer-implemented steps of:
selecting at least one data item in the structured database, each selected item containing data and each selected item having a corresponding location identifier which identifies the item's location within the structured database;
documenting the selected data items by creating at least one document outside the structured database which contains a textual representation of each selected item's data; and
indexing the documents by creating an index outside the database, the index associating keywords in the textual representation of each selected item's data with that item's location identifier,
wherein the indexing step includes providing to a keyword search engine indexing agent both the textual representation of each selected item's data and the selected item's location identifier, the indexing agent produces an index that associates
keywords with resource locators, each resource locator includes a textual representation of a data item location identifier, and the resource locator includes a distinguished name.
10. A method supporting keyword searches of data items in a structured database, the method comprising the computer-implemented steps of:
selecting at least one data item in the structured database, each selected item containing data and each selected item having a corresponding location identifier which identifies the item's location within the structured database;
documenting the selected data items by creating at least one document outside the structured database which contains a textual representation of each selected item's data; and
indexing the documents by creating an index outside the database, the index associating keywords in the textual representation of each selected item's data with that item's location identifier,
wherein the creating step creates an index containing keywords that are textual representations of data in the selected data items and also containing every keyword that is a textual representation of data in the selected data items.
11. A method supporting keyword searches of data items in a structured database, the method comprising the computer-implemented steps of:
selecting at least one data item in the structured database, each selected item containing data and each selected item having a corresponding location identifier which identifies the item's location within the structured database;
documenting the selected data items by creating at least one document outside the structured database which contains a textual representation of each selected item's data;
indexing the documents by creating an index outside the database, the index associating keywords in the textual representation of each selected item's data with that item's location identifier; and
logging changes that are made to data items after the creating step and then updating the index to reflect at least some of the changes.
12. A method supporting keyword searches of data items in a structured database, the method comprising the computer-implemented steps of:
selecting at least one data item in the structured database, each selected item containing data and each selected item having a corresponding location identifier which identifies the item's location in the structured database;
allowing a network-roaming indexing agent to create an index which associates keywords with resource locators, each keyword being a textual representation of data from a selected data item and each resource locator containing a textual
representation of the corresponding selected item's location identifier;
obtaining a keyword from a search engine interface;
using the index to obtain a resource locator associated with the keyword; and then
using the resource locator to retrieve the item's current data from the structured database.
13. The method of claim 12, wherein the resource locator includes an URL.
14. The method of claim 12, wherein the allowing step reads a data dictionary which identifies only the selected data items.
15. The method of claim 12, wherein the allowing step includes reading data from data items which are records in a relational database.
16. The method of claim 12, wherein the allowing step includes reading data from data items which are nodes in a hierarchical database.
17. The method of claim 12, wherein the allowing step includes reading data from data items which are objects in an object-oriented database.
18. The method of claim 12, wherein the step of using the resource locator comprises extracting a data item's location identifier from the resource locator, and then using the location identifier to retrieve the item's current data.
19. The method of claim 12, wherein the step of using the resource locator includes generating a request to retrieve the item's current data from the database.
20. The method of claim 19, wherein the request includes an SQL query.
21. The method of claim 12, further comprising the computer-implemented step of generating a textual document containing the retrieved data.
22. The method of claim 21, wherein the document is generated in a markup language format.
23. The method of claim 22, wherein the document is generated in HTML format.
24. A computer storage medium having a configuration that represents data and instructions which will cause at least a portion of a computer system to perform method steps for supporting keyword searches of data items in a structured database,
the method steps comprising the steps of claim 13.
25. The storage medium of claim 24, wherein the method steps comprise the
steps of claim 15.
26. The storage medium of claim 24, wherein the method steps comprise the steps of claim 19.
27. The storage medium of claim 24, wherein the method steps comprise the steps of claim 20.
28. The storage medium of claim 24, wherein the method steps comprise the steps of claim 22.
29. A computer system comprising:
selecting means for selecting data items in a structured database;
retrieving means for retrieving from the database the current data of a selected data item; and
exposing means for exposing to an indexing agent information about a data item's location in the database together with information about the data item's retrieved data,
wherein the structured database includes a relational database, the data items include relational database records or tables, and the selecting means includes a selection data dictionary which specifies only selected relational database records
or tables.
30. The system of claim 29, wherein the selecting means includes a schema defining elements of the structured database.
31. The system of claim 29, further comprising an administration tool for modifying the selecting means.
32. The system of claim 31, wherein the selecting means includes a selection data dictionary which specifies only selected relational database records or tables, and the administration tool is capable of creating and modifying the selection data
dictionary.
33. The system of claim 29, wherein the retrieving means includes a database reader capable of generating requests to retrieve data from the structured database.
34. The system of claim 33, wherein the database reader is capable of generating SQL queries.
35. The system of claim 29, further comprising the indexing agent.
36. The system of claim 35, wherein the indexing agent includes a web crawler.
37. The system of claim 29, further comprising a search engine interface.
38. The system of claim 37, wherein the search engine interface and the retrieving means reside on different nodes in a network.
39. The system of claim 38, wherein the search engine interface and the retrieving means communicate with one another using a TCP/IP network protocol.
40. The system of claim 38, wherein the search engine interface and the retrieving means communicate with one another using an IPX network protocol.
41. The system of claim 29, further comprising an index produced by the indexing agent.
42. The system of claim 41, wherein the index contains keywords and corresponding resource locators for both the structured database and a textual document information source residing at a different network location than the structured database.
43. The system of claim 41, wherein the index contains keywords and corresponding resource locators for at least two structured databases residing at different network locations.
44. A computer system comprising:
selecting means for selecting data items in a structured database;
retrieving means for retrieving from the database the current data of a selected data item; and
exposing means for exposing to an indexing anent information about a data item's location in the database together with information about the data item's retrieved data, wherein the exposing means includes a page generator capable of generating a
textual document containing the retrieved data.
45. The system of claim 44, wherein the page generator is capable of generating an HTML page containing the retrieved data. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to information management and retrieval in a digital system, and more particularly to the use of keyword indexes for retrieving data both from structured databases such as relational databases and from textual
documents such as web pages.
TECHNICAL BACKGROUND OF THE INVENTION
Information is stored digitally in a wide variety of formats, which are accessed with a bewildering assortment of retrieval operations. As computers containing digital information are increasingly connected with one another, the differences
between different information stores become more evident and more frustrating. Thus, many approaches have been proposed or implemented to make information more widely available.
Vast amounts of information are stored by corporations, government agencies, and other entities in structured databases, of which the most widely used are relational databases. In a typical relational database, individual pieces of data such as
names, addresses, prices, and part
numbers are stored in rows and columns designated by headings and organized into tables or other relations. The smallest unit of manipulation is an individual database record holding one (or perhaps a few) data values.
Indexes into the data records and tables are generated and maintained internally by database management software to make record accesses more efficient. Each database has its own set of indexes. The indexes are updated whenever a record's value
is changed, or in some cases at periodic intervals. In some relational databases, all records are indexed; in others, indexes are created only after the number of records or the importance of particular records passes a threshold or another efficiency
criterion is met. In many relational (and other) databases only primary database key values are indexed; other data values are retrieved by way of the keys and the relationships defined between key values and other (secondary) values. Information about
the data values is provided through a database query language. The various dialects of the SQL language are among the most widely used query languages.
Enormous amounts of information are also stored in textual documents using markup languages such as HTML, XML, and other variations on SGML. Markup language document stores differ from relational databases in several important ways. The
smallest unit of retrieval is typically an entire "page" (which may actually print as several pages). Each page typically contains many more words or numbers than a relational database record. The pages are not organized into tables or other relations,
but are instead connected by hyperlinks or hot links. Pages may also be grouped in a file system by directory placement and/or file naming conventions.
Web crawlers and other network-roaming agents index the pages at sporadic intervals. After a given page is posted to the network, considerable time may pass before an agent encounters and indexes the page. A given index often points to
information at numerous sites. The same page may be indexed in different ways by different agents. Sometimes all the words in a page are indexed, but more often selected words are indexed. Since the indexed words are selected by the web page author,
they do not always impartially and accurately summarize the page's contents. The indexes are used by keyword search engines that provide users with an interface that is substantially simpler, but also less powerful, than typical SQL interfaces.
Much useful information is also stored in word processor textual documents, such as *.doc, *.pdf, *.ps, *.rtf, *.txt, and other documents. Word-processed document repositories and their associated document management systems are similar to web
sites and to relational databases in some ways, and different in others. Some repositories are organized only by placing documents in particular directories in a file system hierarchy; no indexing is provided to speed searches. Other repositories index
their documents according to the entire text of each document in the repository, but indexing is more commonly based on selected keywords provided by the document's author or by a human or automated subject matter classifier. Each repository has its own
set of indexes. The user interface may support either a keyword search of the documents or an SQL-like query of an associated structured database of document keywords, authors, dates, titles, and similar data.
Unfortunately, the differences between these various information storage and retrieval approaches makes it difficult to provide a single interface that gives users access to information from all available digital sources. The attempts to bridge
differences between different sources of information are almost as varied as the sources themselves, and fully comprehensive indexes are not available.
One approach to increasing information availability involves "dynamic HTML." An SQL query embedded in an HTML web page is extracted by a web server, sent to a relational database query handler, and processed in conventional manner by the
relational database management system. The results of the query are placed in HTML format and returned to the user. This system strikes a balance between SQL's flexibility and SQL's complexity by deciding what queries are available, expressing them in
natural language in the web page, and writing them in SQL ahead of time for the user. However, users who do a keyword search using a web browser or intranet search engine will not necessarily discover that the relational database contains relevant
information, even if the keywords searched are among the data that would have been retrieved by the dynamic HTML query, because the web crawler index is based on the text of the dynamic HTML page, not on the relational data.
Another approach uses a natural language front-end to translate an English sentence into an SQL query which is then processed in conventional manner. The system provides greater flexibility than dynamic HTML, allowing users to write questions in
a natural language and then translating the questions into SQL queries (sometimes with varying degrees of success). As with dynamic HTML, however, users who do a keyword search using a browser or search engine will not necessarily discover relevant
information even if the keywords searched are among the data that would have been retrieved by an SQL query. The keyword search results might not even direct users to the natural language front-end.
Accordingly, another approach proceeds as follows. The column or table heading names and relationship names used in the database are extracted from a data dictionary that defines the relational database's structure. Selected data values are
added, and then synonyms of all these terms are added, creating a list of "magnet terms." The magnet terms are placed in a web "magnet page" that also has an SQL query interface. The magnet terms will be indexed by a web crawler, so that users who do
keyword searches using the magnet terms are directed to the magnet page and its SQL query interface.
The magnet page query interface may be a dynamic HTML interface, with prewritten SQL queries accompanied by explanatory text. The query interface may also be a natural language interface configured to receive English questions and translate them
into SQL queries. Or the query interface may simply accept SQL queries and pass them to the database management software. Of course, the query interface may also combine dynamic HTML, natural language translation, and straightforward SQL querying
capabilities.
In any case, a SQL query from the query interface is directed to the relational database, which uses its internal indexes to retrieve the data. The results are packaged as HTML and displayed to the user. This approach has the advantage that if
their keywords are among the magnet terms, then users who do a keyword search will be directed to the magnet page for the relational database containing the relevant information. However, users will usually not reach the query interface unless the data
they seek appears in the magnet terms. Moreover, even if they do reach the query interface they must still find or formulate an SQL query that will retrieve the relevant information from the database.
Instead of attempting to make relational database information available to web browsers, a different approach tries to make web pages accessible through a relational database interface. Text documents such as plain text files, HTML pages, word
processor documents, and the like are entered as records in a relational database. Keywords or the full text of the documents are entered in the database's internal indexes to support document retrieval through the database query interface using SQL or
another query language.
This approach has the advantage of bringing powerful and well-understood relational database software to bear on the problem of retrieving relevant text documents. But users who browse a network on which the relational database occupies only one
or a few nodes will not necessarily realize that the information they seek resides in documents indexed into the database in question, even if the keywords they use in their browsing appear in the document indexes. The indexes are internal to the
database and thus are used only in response to SQL or like queries directed specifically at the database.
Other approaches are also described in the literature and/or embodied in software currently being used. For instance, structured databases other than relational databases are sometimes used, including hierarchical, object-relational,
object-oriented, and other structured databases. Also, at least one web crawler now indexes word processor documents as well as markup language documents. But the examples above illustrate several important characteristics of different approaches to
publishing information:
the smallest unit of data retrieved (e.g., database record, web page);
the rules used to organize data (e.g., relations, file placement and naming conventions, hyperlinks);
how data is retrieved (e.g., SQL queries, keyword searches);
what data is indexed for each data unit (e.g., headings, primary database keys, author-defined keywords, selected keywords, full text);
where the indexes reside (e.g., within the database system or outside it);
which sources are indexed (e.g., the records of a given database, the web sites visited by the crawler); and
when the index is updated (e.g., when the record is entered or modified, periodically, when the crawler visits the site).
When existing approaches are viewed in the manner discussed above, it becomes apparent that improvements are possible. For instance, it would be an advancement in the art to make structured database information visible to net-wide keyword
searches when a user has not yet identified the database in question as one likely to contain relevant information.
It would be an additional advancement to provide such a method and system which do not interfere with existing retrieval mechanisms, but serve instead as additional tools for identifying and retrieving information based on keywords.
Such a method and system are disclosed and claimed herein.
BRIEF SUMMARY OF THE INVENTION
The present invention provides a method and system for supporting keyword searches of data items in a structured database, such as a relational database. One method of the invention begins with selection of at least one data item in the
structured database; each selected item contains data and has a corresponding location identifier which identifies the item's location within the structured database. For instance, a relational database record may be identified by an object class name
and one or more primary database key values.
The selected data items are documented by creating at least one document, such as a web page, which resides outside the structured database as a memory stream or as a file and which contains a textual representation of each selected item's data.
The documents are then indexed by creating an index outside the database which associates keywords in the textual representation of each selected item's data with that item's location identifier. The indexed keywords are more comprehensive and accurate
than terms used in conventional magnet pages or web page meta content tags because they are generated directly from most or all of the data values.
If the structured database includes data items organized as records in relations according to a data dictionary, then selection may be accomplished by providing a supplemental data dictionary which identifies the selected records or tables. In
this case, the indexing step only indexes records and tables that are identified by the supplemental data dictionary. A data dictionary may also be used to identify selected data items for binary-only relational databases that have no accessible data
dictionary and for non-relational databases.
Indexing may be accomplished by providing to a keyword search engine indexing agent both the textual representation of each selected item's data and the selected item's location identifier. The indexing agent produces an index that associates
keywords with resource locators, and each resource locator includes a textual representation of a data item location identifier. Suitable indexing agents include web crawlers, indexing "bots", and other text indexing tools. Suitable resource locators
include URLs, hot links, file paths, and distinguished names, object class names, table names, and primary database key values, among others.
Users provide keywords to a search engine interface in a system according to the invention. The system uses the index to obtain a resource locator that is associated with the keyword. The resource locator is used to retrieve the item's current
data from the structured database, using SQL queries or other structured database retrieval mechanisms. A document containing the retrieved data, such as a web page, is then generated and provided to the user.
The invention bridges a gap between loosely structured textual keyword search information technologies, on the one hand, and highly structured relational/hierarchical query language search database technologies, on the other. Web pages on the
Internet or on an intranet are effective for textual information that is relatively static and unstructured, such as press releases, user guides, policy statements, and procedure manuals. Other information, such as availability, pricing, performance and
planning records, is more dynamic and has traditionally been maintained in highly structured databases such as relational or object-oriented databases.
The invention makes it possible to use a single search method--keyword searching--to locate and retrieve desired information from different types of information sources. In particular, the invention makes it possible to publish selected portions
of a relational database in a manner that allows users to retrieve relational data without knowing details of the database's internal organization. Other features and advantages of the present invention will become more fully apparent through the
following description.
BRIEF DESCRIPTION OF THE DRAWINGS
To illustrate the manner in which the advantages and features of the invention are obtained, a more particular description of the invention will be given with reference to the attached drawings. These drawings only illustrate selected aspects of
the invention and thus do not limit the invention's scope. In the drawings:
FIG. 1 is a diagram illustrating one of many networks suitable for use according to the present invention.
FIG. 2 is a block diagram further illustrating components of the network shown in FIG. 1 and other suitable systems according to the invention.
FIG. 3 is a flowchart illustrating methods of the present invention.
FIG. 4 is a data flow diagram illustrating components and methods of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention relates to a method and system for assisting keyword searches of highly structured data. Before detailing the architecture of methods and systems according to the invention, the meaning of several important terms is
clarified. Specific examples are given to illustrate aspects of the invention, but those of skill in the art will understand that other examples may also fall within the meaning of the terms used. Some terms are also defined, either explicitly or
implicitly, elsewhere herein.
Terminology
As used here, a "keyword" search is a pattern-matching search which tries to locate instances of digital data using a key word or phrase. Many conventional web search engines support keyword searches. Keywords may contain wildcards. For
instance, if the question mark is used as a wildcard capable of matching any single character and the asterisk is used as a wildcard capable of matching any zero or more characters, then the keyword "b?t*" would match the words "bat", "bet", "bit",
"bot", "but", "battle", "bitten", and "butane", among others. In some cases keywords may also contain regular expressions, such as the regular expressions used in the familiar lexical analysis program lex or the familiar text editors emacs and vi. A
keyword may contain smaller keywords connected by operators such as AND and OR.
One alternative to keyword searching is "browsing" through the available
data until values of interest are located. Browsing is available in most computer information management systems, regardless of whether keyword searches are supported. An important difference between keyword searching and browsing is that
keyword searches focus much more quickly on portions of the data that are likely to be of interest. This is particularly true if the keyword search is performed on data that is grouped by subject matter. For instance, a search using the keyword "bat"
in data classified by subject matter could lead quickly to baseball statistics rather than a discussion of flying mammals.
Many conventional structured database systems support "query" searches through SQL or another query language. An important difference between query searches and keyword searches is that query searches normally presume the existence of relations
or other structure in the data and contain assumptions about that structure For instance, many SQL queries are of the form SELECT X FROM Y WHERE Z, with X being the heading name of a column in a table called Y, and Z being some constraint on the values
stored in the column. Such a query will be rejected if no table named Y exists, or if Y exists but has no column named X.
By contrast, keyword searches typically assume nothing about the relationships or structures that may internally connect different instances of matching data. In particular, a keyword search of a relational database according to one embodiment
of the present invention for a keyword K will identify all data values in the exposed portion of the database that match K, regardless of the table names or column names being used.
Even if a particular relational database system supported queries such as SELECT ALL FROM ALL WHERE (ENTRY CONTAINS `K`), this would not be equivalent to a system according to the invention which assists a keyword search of all database records
for matches to the keyword K. For instance, the internal indexing and retrieval mechanisms in relational databases are optimized for selecting and combining records in rows and columns and tables according to the database structure as well as testing
data value constraints; these mechanisms are not optimized for retrieving every data value and then testing it against the key. Also, web crawlers and other keyword index builders index all data values supplied to them, while relational databases
typically index only selected columns or rows. Finally, indexes according to the invention will generally have a much broader context or scope than an internal relational database index, involving not just a single relational database but many other
information sources as well; this makes the inventive indexes more useful with all-purpose or comprehensive search efforts.
As used here, a "structured database" is a collection of data items organized primarily by rules other than those governing natural languages such as English. The data items may contain natural language text such as addresses or part names in a
relational database, but relations, tables, trees, or other structures are the primary means of organization. Structured database operations aid decision-making by allowing users to combine individual data items in various ways, as illustrated in the
SQL query above.
Relational databases are one example of structured databases; other examples include hierarchical, inverted-list, object-relational, object-oriented, and flat-file databases. Structured databases may be stored in a single location or distributed
between several machines. Regardless of the approach taken to storage, many structured databases can be accessed through a network.
As used here, "network" includes local area networks, wide area networks, metropolitan area networks, and/or various "Internet" networks such as the World Wide Web, a private Internet, a secure Internet, a value-added network, a virtual private
network, an extranet, or an intranet. One of many possible networks suitable for use according to the invention is shown in FIG. 1, as indicated by the arrow labeled 100. The network 100 includes a server 102 and several clients 104; other suitable
networks may contain other combinations of servers, clients, and/or peer-to-peer nodes, and a given computer may function both as a client and as a server. The computers connected by a suitable network may be workstations, laptop computers,
disconnectable mobile computers, servers, mainframes, so-called "network computers" or "lean clients", personal digital assistants, or a combination thereof.
The network may include communications or networking software such as the software available from Novell, Microsoft, Artisoft, and other vendors, and may operate using TCP/IP, SPX, IPX, and other protocols over twisted pair, coaxial, or optical
fiber cables, telephone lines, satellites, microwave relays, modulated AC power lines, and/or other data transmission "wires" known to those of skill in the art. The network may encompass smaller networks and/or be connectable to other networks through
a gateway or similar mechanism.
As suggested by FIG. 1, at least one of the computers is capable of using a floppy drive, tape drive, optical drive, magneto-optical drive, or other means to read a storage medium 106. A suitable storage medium 106 includes a magnetic, optical,
or other computer-readable storage device having a specific physical configuration. Suitable storage devices include floppy disks, hard disks, tape, CD-ROMs, PROMs, random access memory, and other computer system storage devices. The physical
configuration represents data and instructions which cause the computer system to operate in a specific and predefined manner as described herein. Thus, the medium 106 tangibly embodies a program, functions, and/or instructions that are executable by
computer(s) to assist keyword searches of structured data substantially as described herein.
Suitable software for implementing the invention is readily provided by those of skill in the art using the teachings presented here and programming languages and tools such as Java, Pascal, C++, C, CGI, Perl, SQL, APIs, SDKs, assembly, firmware,
microcode, and/or other languages and tools.
Overview of Components
An overview of the main components of the invention and its environment is now given with reference to FIG. 2. A system 200 according to the invention operates using the network 100 or another suitable computer system. A structured database 202
and corresponding exposure definitions 204 are part of the inventive system or accessible to the inventive system 200. The structured database 202 includes data items which have data values; suitable databases include conventional relational databases
and other conventional structured databases with the associated database management system software.
The exposure definitions 204 identify the portion(s) of the structured database 202 that will be exposed to external keyword searches; the entire database 202 is typically already searchable by SQL or other conventional query means. Those of
skill will appreciate that the system 200 can also be configured such that the exposure definitions 204 identify the portions of the database 202 which should NOT be exposed for keyword searching, if that approach is more efficient or convenient. In
either case, the exposure definitions 204 may be in the form of a data dictionary, particularly if the structured database 202 is a relational database. However, the exposure definitions 204 may also take the form of a schema, particularly if the
structured database 202 is a hierarchical database or other database defined by a schema.
In the illustrated system 200, the exposure definitions 204 are created and edited using an administration tool 206. The tool 206 may operate by extracting the definitions 204 from an existing data dictionary or schema, or it may be necessary to
build the definitions from scratch by reverse engineering the data formats used in a binary-only structured database 202 and then generating a data dictionary or schema which can be edited to eliminate portions of the database 202 that should not be
exposed.
A document generator 208 generates documents 210 which contain textual representations of the exposed data values in the database 202. In one embodiment, the document generator 208 generates a document, such as an HTML page, for each table in a
relational database 202, containing the table's values in ASCII form, and then locates the document 210 at a Uniform Resource Locator (URL) corresponding to the table's location in the database 202. For instance, an HTML page containing the data values
stored in a sales database table named "customers" might be generated and then stored at http://www.company.com/salesdb/customers.htm.
An indexing agent 212 reads the documents 210 and generates entries in an index 214. Suitable indexing agents 212 include web crawlers, spiders, indexing robots, and other indexing tools. The indexing agent 212 may be a network-roaming agent,
or it may be tied to one or a few network sites. In one embodiment of the system 200, the indexing agent 212 indexes every data value in each document 210, not just "meta tag" or other values that may or may not be representative of the actual database
contents. Unlike indexing processes running inside the structured database 202, the indexing agent 212 does not rely heavily on assumptions about the database structure but merely treats the documents 210 as sources of text which have little or no
structure except that imposed by English or another natural language.
A keyword search engine user interface 216 may be integral with the indexing agent 212, or it may be a separate program provided by a separate vendor. The user interface 216 accepts keywords (possibly including wildcards) and uses the index 214
and possibly other components of the system 200 to locate corresponding documents 210.
Overview of Operation
An overview of the operation of the system 200 is now given, with reference to FIGS. 2 and 3. Four main steps are shown in FIG. 3: a data selecting step 300, an index allowing step 302, a search performing step 304, and an index maintaining step
306. These steps may be grouped for ease of explanation into an indexing phase (steps 300, 302, and 306) and a searching phase (step 304). During the indexing phase, the index 214 is created or updated. During the searching phase, the index 214 is
used to respond to keyword searches directed at the database 202 (and often to other information sources as wel | | |