|
|
|
| United States Patent | 5537586 |
| Link to this page | http://www.wikipatents.com/5537586.html |
| Inventor(s) | Amram; Joseph A. (Boston, MA);
Bouvard; Jacques (Wellesley, MA);
Leightheiser; James E. (Lexington, MA);
Lidington; John C. (Hull, MA);
Tomeh; Majed G. (Sudbury, MA);
Wu; Harry C. (Concord, MA) |
| Abstract | A method for extracting a preferred set of textual records from a database
includes the following features. Priority values are assigned to each of a
plurality of predefined category structures. Textual records are assigned
a relevance value with respect to each category structure. If a record's
relevance value exceeds a predetermined threshold value, that record is
associated with the category structure. Each category has a list of
associated textual records which are retrieved. Textual records are
selected from the set of retrieved textual records and assembled into a
set. Information on how the subscriber uses the set is gathered, and new
rankings for the category structure are computed. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5537586 |
|
|
Enhanced apparatus and methods for retrieving and selecting profiled
textural information records from a database of defined category
structures |
|
|
|
|
|
| Publication Date |
July 16, 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This application is a continuation-in-part of Ser. No. 07/876,328, now
abandoned, filed Apr. 30, 1992 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 5428778 Brookes 707/5 Jun,1995 |      Your vote accepted [0 after 0 votes] | | 5418951 Damashek 707/5 May,1995 |      Your vote accepted [0 after 0 votes] | | 5384703 Withgott 715/531 Jan,1995 |      Your vote accepted [0 after 0 votes] | | 5303361 Colwell 707/4 Apr,1994 |      Your vote accepted [0 after 0 votes] | | 5299123 Wang 707/2 Mar,1994 |      Your vote accepted [0 after 0 votes] | | 5297027 Morimoto
Mar,1994 |      Your vote accepted [0 after 0 votes] | | 5263167 Conner, Jr. 707/4 Nov,1993 |      Your vote accepted [0 after 0 votes] | | 5222234 Wang 707/3 Jun,1993 |      Your vote accepted [0 after 0 votes] | | 5093918 Heyen 709/215 Mar,1992 |      Your vote accepted [0 after 0 votes] | | 5084819 Dewey
Jan,1992 |      Your vote accepted [0 after 0 votes] | | 5077668 Doi
Dec,1991 |      Your vote accepted [0 after 0 votes] | | 5043891 Goldstein 715/531 Aug,1991 |      Your vote accepted [0 after 0 votes] | | 4984178 Hemphill 704/255 Jan,1991 |      Your vote accepted [0 after 0 votes] | | 4970681 Bennett 707/3 Nov,1990 |      Your vote accepted [0 after 0 votes] | | 4744050 Hirosawa 704/4 May,1988 |      Your vote accepted [0 after 0 votes] | | 4712174 Minkler, II 704/1 Dec,1987 |      Your vote accepted [0 after 0 votes] | | 4674066 Kucera 707/5 Jun,1987 |      Your vote accepted [0 after 0 votes] | | 4255796 Gabbe 707/3 Mar,1981 |      Your vote accepted [0 after 0 votes] | | 4996642 Hey 705/27 Dec,1969 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A method of extracting a preferred set of stored textual records from a
database, comprising the steps of:
assigning, to selected ones of a plurality of predefined category
structures, a priority value, wherein said selected ones of said plurality
of predefined category structures and assigned priority values form a
profile associated with a subscriber;
assigning to each stored textual record a relevance value associated with
each category structure;
associating each stored textual record with each category structure for
which the record's relevance value associated with that category structure
exceeds a predetermined threshold;
maintaining, for each category structure, a list of associated textual
records;
retrieving from the database, for each selected category structure, the
textual records associated with that category structure;
selecting, from the set of retrieved textual records, a plurality of
preferred textual records in a manner responsive to the priority value
assigned to each category structure;
assembling the plurality of preferred textual records to form the preferred
set;
collecting usage information from the subscriber for the retrieved textual
records forming the preferred set; and
assigning a new priority value for category structures associated with said
profile based on the usage information collected for said subscriber
associated with the profile, said step of assigning a new priority value
comprising:
ranking the category structures in order of subscriber usage of textual
records associated with the category structures to determine a usage rank
for each category structure; and
comparing the usage rank with the original priority value for each category
structure to determine the new priority value for the category structures,
said step of comparing comprising:
assigning a first numerical weight to each category structure determined by
its original priority value in the associated profile;
assigning a second numerical weight to each category structure determined
by the usage of textual records associated with the category structure by
the subscriber;
assigning a third numerical weight to each category structure determined by
the usage of the textual records associated with the category structure by
other subscribers previously determined to be peers; and
assigning the new priority value for each category structure determined by
summing the first, second and third numerical weights assigned for each
category structure.
2. A method of extracting a preferred set of stored textual records from a
database, wherein the stored textual records include full textual records
and brief textual records and each brief textual record is associated with
a full textual record, comprising the steps of:
assigning to selected ones of a plurality of predefined category
structures, a priority value, wherein said selected ones of said plurality
of predefined category structures and assigned priority values form a
profile associated with a subscriber;
extracting a brief textual record from a full textual record, said
extracting step comprising:
determining the source of the full textual record;
selectively extracting portions of the full textual record to provide the
brief textual record depending on the source and the length of the full
textual record, wherein this selectively extracting step includes
extracting the entire full textual record to provide the brief textual
record if the length of the full textual record is less than a
predetermined value;
assigning to each stored textual record a relevance value associated with
each category structure;
associating each stored textual record with each category structure for
which the record's relevance value associated with that category structure
exceeds a predetermined threshold;
maintaining, for each category structure, a list of associated textual
record;
retrieving from the database, for each selected category structure, the
textual records associated with that category structure;
selecting, from the set of retrieved textual records, a plurality of
preferred textual records in a manner responsive to the priority value
assigned to each category structure;
assembling the plurality of preferred textual records to form the preferred
set;
collecting usage information from the subscriber for the retrieved textual
records forming the preferred set, the usage information including
subscriber usage of full textual records; and
assigning a new priority value for category structures associated with said
profile based on the usage information collected for said subscriber
associated with the profile.
3. A method of extracting a preferred set of stored textual records from a
database, wherein the stored textual records include full textual records
and brief textual records and each brief textual record, is associated
with a full textual record, comprising the steps of:
assigning, to selected ones of a plurality of predefined category
structures, a priority value, wherein said selected ones of said plurality
of predefined category structures and assigned priority values form a
profile associated with a subscriber;
extracting a brief textual record from a full textural record, said
extracting step comprising:
determining the source of the full textual record;
identifying the location of key terms in the full textual record;
selectively extracting portions of the full textual record to provide the
brief textual record depending on the source of and the identified key
terms in the full textual record, wherein this selectively extracting step
includes extracting one or more sentences proximal to, and including, the
identified key terms to provide the brief textual record;
assigning to each stored textual record a relevance value associated with
each category structure;
associating each stored textual record with each category structure for
which the record's relevance value associated with that category structure
exceeds a predetermined threshold;
maintaining, for each category structure, a list of associated textual
records;
retrieving from the database, for each selected category structure, the
textual records associated with that category structure;
selecting, from the set of retrieved textual records, a plurality of
preferred textual records in a manner responsive to the priority value
assigned to each category structure;
assembling the plurality of preferred textual records to form the preferred
set;
collecting usage information from the subscriber for the retrieved textual
records forming the preferred set, the usage information including
subscriber usage of full textual records; and
assigning a new priority value for category structures associated with said
profile based on the usage information collected for said subscriber
associated with the profile.
4. A method of providing textual records from a database to a subscriber
comprising the steps of:
assigning a priority value to selected ones of a plurality of predefined
category structures to form a profile associated with a subscriber;
assigning to each stored textual record a relevance value associated with
each category structure;
associating each stored textual record with each category structure for
which the record's relevance value associated with that category structure
exceeds a predetermined threshold;
providing a brief textual record associated with each of the stored textual
records, wherein the brief textual record comprises an extracted portion
of the stored textual record with which it is associated;
retrieving from the database, the brief textual records associated with the
stored textual records associated with each category structure, the
selection of particular brief textual records retrieved being responsive
to the assigned priority values associated with the profile;
assembling the brief textual records retrieved from the database to form
the preferred set;
transmitting the preferred set of assembled textual records to the
subscriber;
receiving requests from the subscriber for the stored textual records
associated with one or more brief textual records of the preferred set;
and
retrieving the requested stored textual record from the database and
transmitting the retrieved stored textual record to the requesting
subscriber, this retrieving step comprising:
providing a stored textual record limit and a brief textual record limit;
retrieving a plurality of stored textual records up to the stored textual
record limit by first retrieving a plurality of stored textual records
from the associated category structures, and then, if the retrieved stored
textual records number less than the stored textual record limit, then
retrieving stored textual records from other category structures up to the
stored textual record limit; and
retrieving a plurality of brief textual records up to the brief textual
record limit.
5. A method of extracting a preferred set of stored textual records from a
database, comprising the steps of:
assigning, to selected ones of a plurality of predefined category
structures, a priority value, wherein said selected ones of said plurality
of predefined category structures and assigned priority values form a
profile associated with a subscriber;
assigning to each stored textual record a relevance value associated with
each category structure;
associating each stored textual record with each category structure for
which the record's relevance value associated with that category structure
exceeds a predetermined threshold;
maintaining, for each category structure, a list of associated textual
records;
retrieving from the database, for each category structure, the textual
records associated with that category structure;
selecting, from the set of retrieved textual records, a plurality of
preferred textual records in a manner responsive to the priority value
assigned to each category structure;
assembling the plurality of preferred textual records to form the preferred
set;
collecting usage information from the subscriber for the retrieved textual
records forming the preferred set;
defining a group of subscribers sharing a common characteristic;
compiling usage information for the subscribers of the defined group and
analyzing the compiled usage information to detect a usage pattern for the
group;
defining one or more new category structures in accordance with the
detected usage pattern; and
assigning a new priority value for the new category structures associated
with each subscriber profile for each subscriber belonging to the defined
group, this step of assigning comprising:
assigning a first numerical weight to each new category structure
determined by the original priority values for the original category
structures in the associated profile;
assigning a second numerical weight to each new category structure
determined by the usage of textual records associated with the new
category structure by the subscriber;
assigning a third numerical weight to each new category structure
determined by the usage of the textual records associated with the new
category structure by other subscribers previously determined to be peers;
and
assigning the new priority value for each new category structure determined
by summing the first, second, and third numerical weights assigned for
each new category structure.
6. The method of claim 5, wherein the defining one or more new category
structures comprises redistributing the textual records from a
pre-existing category structure into two or more new category structures.
7. The method of claim 5, wherein the defining one or more new category
structures comprises combining the textual records from at least two
pre-existing category structures in a new category structure.
8. The method of claim 5, wherein the defined group comprises all
subscribers.
9. The method of claim 5, wherein the defined group comprises subscribers
having a common profession.
10. The method of claim 5, wherein the defined group comprises subscribers
having similar geographical location. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The invention relates to the retrieval of a set of textual records from a
database and in particular to the retrieval of such records based on
category structures.
It is well known to retrieve information stored in computer databases. In
the SMART information retrieval system, described in "Introduction to
Modern Information Retrieval, The SMART and SIRE Experimental Retrieval
Systems", by Gerald Salton and Michael McGill, McGraw-Hill, New York,
1983, pages 118-156, information is retrieved based on measures of
similarity between documents searched and a given query.
It is also known to perform ongoing electronic searches, in which documents
in a database are periodically searched for certain words or queries. For
example, a company might want to track news items mentioning its name or
the name of competing companies.
SUMMARY OF THE INVENTION
In general, the invention features extracting a preferred set of textual
records from a database using category constructs, which act as versatile
information retrieval building blocks. Priorities are assigned to the
category structures based on a ranking, and records are associated with
the stored category structures to which they are relevant. The selection
of records retrieved for assembly into the preferred set is responsive to
the assigned priorities. New priorities may be assigned to category
structures based on an evaluation of the quality of the assembled set.
In general, in another aspect, the invention features assigning priority
values to stored category structures to form a profile associated with a
subscriber, and collecting usage information from the subscriber for the
retrieved text records forming the preferred set of the subscriber's
profile. A new ranking is assigned for category structures associated with
each profile determined by the usage information. In embodiments of the
invention, the textual records include full text records and brief text
records (briefs), each associated with a full text record. Usage
information can be collected for the subscriber usage of the full text
records.
In other embodiments, the invention features retrieving, assembling and
transmitting briefs to each appropriate subscriber. Requests are received
from the subscriber for the full text record associated with one or more
of the briefs. The full text record is retrieved from the database and
transmitted to the requesting subscriber. Usage information is collected
to track the full text record requests from each subscriber.
In still other embodiments, the invention features ranking the category
structures for the subscriber profiles in order of subscriber usage for
the text records associated with the category structures. The usage rank
is compared with the original rank for each category structure to
determine a new rank for the category structures. Numerical weights are
assigned to each category structure determined by its original rank, the
usage of its text records by the subscriber, and the usage of its text
records by peers. A new rank is assigned for each category structure
determined by summing the numerical weights.
In yet other embodiments, the invention features extracting a brief from a
full text record by determining the source and editorial style of the full
text record, and selectively extracting portions of the full text record
depending on its source and editorial style, to provide the brief.
Determining the editorial style can include defining the length and
identifying the location of key terms in the full text record. The brief
can be provided by extracting the entire full text record if its length is
less than a predetermined value, or extracting one or more sentences
including identified key terms.
In still other embodiments, the invention features defining neighboring
category structures associated with each subscriber and retrieving text
records associated with the neighboring category structures. If the
collected usage information from the subscriber indicates usage of the
text records from a neighboring category structure, then a priority value
is assigned to the neighboring category structure to include the structure
in the profile associated with the subscriber.
In other embodiments, one or more attribute preferences are associated with
attributes of text records to be retrieved and with the subscriber
profile. If an identified text record fails to satisfy the defined
attribute preferences, and if a secondary text record related to the
identified text record exists and satisfies the attribute preferences,
then the secondary text record replaces the identified text record. The
attributes can include, for example, the source, author, cost, length and
editorial style of the text record.
In general, in another aspect, the invention features a method and
apparatus for providing textual records from a database to a subscriber by
transmitting a preferred set of assembled briefs to a subscriber and
receiving requests from the subscriber for full text records associated
with one or more of the briefs. The requested full text records are
retrieved from the database and transmitted to the requesting subscriber.
The transmission can be by facsimile, electronic mail, or other means.
Requests can be received by an automated interactive telephone system,
electronic mail, or other means.
Embodiments of the invention include providing a full text record limit and
a brief limit. Full text records are retrieved up to the full text record
limit and briefs are retrieved up to the brief text limit. Full text
records can be retrieved up to the full text record limit by first
retrieving records from the associated category structures, and then, if
the retrieved full text records number less than the full text record
limit, retrieving full text records from other category structures to fill
the full text record limit.
In general, in another aspect, the invention features defining a group of
subscribers sharing a common characteristic, compiling usage information
for the subscribers of the defined group and analyzing the compiled usage
information to detect a usage pattern for the group. New category
structures are defined in accordance with the detected usage pattern. A
new ranking is assigned for the new category structures for each
subscriber belonging to the defined group. Embodiments include
redistributing text records from a pre-existing category structure into
two or more new category structures, or combining the text records from at
least two pre-existing category structures in a new category structure.
The defined group can include, for example, all subscribers, subscribers
having a common profession, or subscribers having a similar geographical
location.
In general, in still another aspect, the invention includes a method and
apparatus for on-line service providers to provide textual records to
subscribers. Text records are received from information providers, and
formatted into a common format. Tags are associated with various
components and attributes of the text records. The text records and tags
are transmitted to on-line service providers and stored on an on-line
provider database. Subscribers define a profile for selecting text records
from the on-line provider database in response to the contents of
particular tags. Text records are selected and retrieved from the on-line
provider database and transmitted to the subscriber.
In general, in another aspect, the invention features a method and
apparatus for tracking text records having entity-specific data, including
attaching tags to a text record stored on a database corresponding to each
identified entity that is part of the record's contents. The text records
are sorted into category structures, each corresponding to an identified
entity, according to the attached tags. A tagged text record is excluded
from a category structure if the record fails to satisfy rules associated
with the identified entity. Retained text records are ranked within a
category structure in accordance with its relevance to the associated
entity.
The retrieval method and apparatus of the invention permit highly specific
and versatile ongoing searches based on a library of defined category
structures. These structures can substantially reduce the difficulty of
creating a search profile while improving its quality to produce a series
of ongoing profile-specific news dispatches. The retrieval process may
also be completely automated, resulting in reduced cost and the virtual
elimination of human error. User feedback permits fine tuning of the
search profile, and may also be fully automated. Duplicative but different
records may be eliminated, leaving more space for non-redundant
information in the assembled set of records.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating a textual record retrieval system
according to the invention.
FIG. 2 is a flowchart illustrating input operations performed in connection
with the textual record retrieval system of FIG. 1.
FIG. 3 is a flowchart illustrating operations performed to the database
upon the reception of records.
FIG. 4 is a flowchart illustrating the assembling operations for textual
records retrieved from the database to form a preferred set.
FIG. 5 is an expanded illustration of the "generate profile" block of FIG.
2, illustrating weighting operations in the generation of a profile.
FIG. 6 is a flowchart illustrating feedback operations performed by the
textual record retrieval system of FIG. 1.
FIG. 7 is a continuation of the flowchart of FIG. 6.
FIG. 8 is a flowchart illustrating variations on the priority schemes used
by the system of FIG. 1.
FIG. 9 is a block diagram illustrating an exemplary category structure.
FIG. 10 is a block diagram illustrating a personal computer--local area
network implementation according to the invention.
FIG. 11 is a flowchart illustrating duplicate record handling operations
performed by the textual record retrieval system of FIG. 1.
FIG. 12 is a block diagram illustrating a user manager of the textual
record retrieval system according to this invention for tuning and
redefining subscriber profiles based upon subscriber usage feedback.
FIG. 13 is a flow chart illustrating a profile tuning and redefinition
process performed by the user manager of FIG. 12.
FIG. 14 is a flow chart detailing a profile adjustment process of the
redefinition process of FIG. 13.
FIG. 15 is pseudo-code illustrating an extraction process of this invention
for extracting a brief textual record from a full textual record.
FIG. 16 is a flow chart illustrating a process for retrieving full text
records requested by subscribers,
FIG. 17 is a flow chart illustrating a process for determining the
distribution of retrieved textual records between full textual records and
brief textual records.
FIG. 18 is a flow chart illustrating a process for sectioning or fusing of
category structures dependent on usage feedback of textual records by
defined groups of subscribers.
FIG. 19 is a diagram illustrating the separation of a single category
structure into two new category structures.
FIG. 20 illustrates the fusion of two category structures into a single new
category structure.
FIG. 21 is a flow chart illustrating a process for enlightening a
subscriber profile through sampling of textual records of neighboring
category structures.
FIG. 22 is a flow chart illustrating a process for selecting textual
records in accordance with defined attribute preferences.
FIG. 23 is a flow chart illustrating a process for the delivery of data to
on-line subscribers by means of a data pipe.
FIG. 24 is a flow chart illustrating a process for rule based portfolio
tracking.
DESCRIPTION OF THE INVENTION
Referring to FIG. 1, one possible embodiment of an electronic system for
retrieving textual records on an ongoing basis 10 includes an input
processor 12, which is connected to receive information over incoming
communication channels 14, and is associated with input journal storage
16. A system controller 20 is connected to receive input queue information
from the input processor via input queue storage 18 and to provide
information to one or more record editors 22. Each editor is associated
with an input source and is responsible for converting that input format
to a canonical (standard) format. The record editor maintains a record
library in record library storage 25, and provides an output to the
associative processor 26 via processing queue storage 24. The associative
processor 26 generates measures of relevance of records using queries
stored in the user library storage 28, and may employ an associative
information retrieval system, such as the SMART system. User manager 30
receives and processes subscriber feedback 32 and user profiles 34. Output
bins 36 receive search information from the associative processor, and
provide it to the output manager 38. The output manager 38 provides output
to record journal storage 40, statistics and account data storage 42, and
output queue storage 44. An output processor 46 receives information from
the output queue storage 44 and provides information to report queue
storage 48 as well as to output journal storage 50. The output processor
46 also provides output on outgoing communication channels 52, such as
subscriber fax lines. A report generator 54 accesses statistics and
account data storage and report queue storage. It is observed that this
exemplary embodiment may be altered in a variety of ways without departing
from the spirit and scope of the invention. In particular, this embodiment
is not intended as the broadest expression of the invention, which is to
be defined by the claims.
In operation, the input processor 12 receives textual records, such as news
stories, over incoming communication channels 14, which may be newswires.
Copies of these records are maintained in the input journal storage 16, as
backup. These records are also queued in input queue storage 18 and
provided to the system controller 20. The record editor 22 maintains a
copy of the records in its record library 25 in its standard format, which
acts as the main record database. The record editor 22 also combines
record segments which are transferred from the information providers as
separate segments. The records contained in the record library 25 are the
same as the backup records maintained in the input journal 16, except that
the records maintained in the input journal 16 may be in raw
communications formats, such as facsimile pixel data, whereas the record
library 25 contains ASCII text versions of the records in a standard
format. For example, this format may clearly delineate paragraphs, tables,
and the like. The record editor 22 provides the non-duplicative records to
the processing queue storage 24.
The user manager provides rankings of category structures and stores them
in the user library 28. Category structures 60 (see also FIG. 9) each
include a category definition 62, a query 64, and a series of pointers 66.
Initially, these pointers are vacant. For example, a certain category
structure may have a definition associated with it (e.g., mid-size
computer systems). The query will be a query designed to retrieve records
related to the category definition. The category structure illustrated in
FIG. 9 is an exemplary structure, and it will be clear to those skilled in
the art that the information maintained in such a structure may be
represented in various other forms. From the point of view of the user,
the category structures act as building blocks ("category structures" and
"building blocks" are interchangeable terms herein) that can be
manipulated to meaningfully tailor the retrieval operations. Generally,
the user only interacts with the definition of the category structures.
The associative processor 26 accesses the queries in the user library 28,
and performs searches using those queries on queued incoming records. If
an incoming record is relevant to the query associated with a given
category structure, a pointer to that textual record will be added to that
category structu | | |