|
|
|
| United States Patent | 5590317 |
| Link to this page | http://www.wikipatents.com/5590317.html |
| Inventor(s) | Iguchi; Hiroaki (Yokohama, JP);
Kurosu; Yasuo (Yokosuka, JP);
Fujinawa; Masaaki (Kanagawa-ken, JP);
Yokoyama; Yoshihiro (Yokohama, JP);
Masuzaki; Hidefumi (Hadano, JP) |
| Abstract | A document information compression and retrieval system which reduces the
document data amount and shortens the retrieval time when mass document
information is registered and retrieved. A method of registering document
information in a document information retrieval system which stores
document information consisting of a large number of characters for
retrieval of the stored document information. Entered document information
is separated into words. Whether or not each of the words is a word to
which a compressed code is assigned is determined. If not already
assigned, a compressed code is assigned to the word. The words are
converted into the assigned compressed codes for storing a compressed
text. At output, retrieval information is accepted and converted into
compressed code and stored compressed texts are searched for the
compressed text matching the compressed code of the retrieval information,
then the words corresponding to the compressed codes are used to expand
the compressed text into original document information. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5590317 |
|
|
Document information compression and retrieval system and document
information registration and retrieval method |
|
|
|
|
|
| Publication Date |
December 31, 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Priority Data |
May 27, 1992[JP]4-135340
May 27, 1992[JP]4-135341 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 5519857 Kato 707/5 May,1996 |      Your vote accepted [0 after 0 votes] | | 5491760 Withgott 382/203 Feb,1996 |      Your vote accepted [0 after 0 votes] | | 5337233 Hofert 715/540 Aug,1994 |      Your vote accepted [0 after 0 votes] | | 5321770 Huttenlocher 382/174 Jun,1994 |      Your vote accepted [0 after 0 votes] | | 5319779 Chang 707/3 Jun,1994 |      Your vote accepted [0 after 0 votes] | | 5298895 Van Maren 341/51 Mar,1994 |      Your vote accepted [0 after 0 votes] | | 5281967 Jung 341/55 Jan,1994 |      Your vote accepted [0 after 0 votes] | | 5265242 Fujisawa 707/3 Nov,1993 |      Your vote accepted [0 after 0 votes] | | 5239298 Wei 341/51 Aug,1993 |      Your vote accepted [0 after 0 votes] | | 5229947 Ross 701/200 Jul,1993 |      Your vote accepted [0 after 0 votes] | | 5168533 Kato 382/229 Dec,1992 |      Your vote accepted [0 after 0 votes] | | 5155484 Chambers, IV 341/55 Oct,1992 |      Your vote accepted [0 after 0 votes] | | 4899148 Sato 341/65 Feb,1990 |      Your vote accepted [0 after 0 votes] | | 4876541 Storer 341/51 Oct,1989 |      Your vote accepted [0 after 0 votes] | | 4843389 Lisle 341/106 Jun,1989 |      Your vote accepted [0 after 0 votes] | | 4796003 Bentley 341/95 Jan,1989 |      Your vote accepted [0 after 0 votes] | | 4672679 Freeman 382/233 Jun,1987 |      Your vote accepted [0 after 0 votes] | | 3613086 Loizides D10/15 Oct,1971 |      Your vote accepted [0 after 0 votes] | | 3593309 Clark, IV 546/145 Jul,1971 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text analysis section which separates the document information consisting
of a large number of characters input at said input section into words
consisting of one or more characters;
a code conversion dictionary in which pairs of said words and compressed
codes corresponding to said words are stored;
a text compression section which makes reference to said code conversion
dictionary for converting said words into the compressed codes
corresponding thereto;
compressed text storage means for storing the compressed codes of words of
said document information compressed by said text compression section as a
compressed text;
retrieval information input means for entering key information used to
retrieve document information registered in said compressed text storage
means;
a text retrieval section which makes reference to said code conversion
dictionary for converting said key information into compressed key data
corresponding thereto, and retrieves compressed texts including a
compressed code identical to said compressed key data stored in said
compressed text storage means;
an expansion section which expands the compressed text retrieved by said
text retrieval section into document information;
an output section for outputting the document information restored by said
expansion section; and
character string registration means for detecting words not registered in
said code conversion dictionary from said words into which said document
information is separated by said text analysis section, and assigning
fixed-length compressed codes to said detected words not registered in
said code conversion dictionary in sequence for registering the words in
said code conversion dictionary;
wherein said text compression section makes reference to the code
conversion dictionary in which words are registered by said character
string registration means for converting said words into the compressed
codes corresponding thereto;
wherein when a compressed code registration area of said code conversion
dictionary is finite, said character string registration means assigns the
compressed codes to said detecting words in sequence, and terminates
assignment of the compressed codes upon detection of said compressed code
registration area becoming full; and
wherein said text compression section, after the termination of assignment
of the compressed codes, converts the words already registered in said
code conversion dictionary into their corresponding compressed codes, and
stores words not registered in said code conversion dictionary in said
compressed text storage means without conversion into compressed codes.
2. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text analysis section which separates the document information consisting
of a large number of characters input at said input section into words
consisting of one or more characters;
a code conversion dictionary in which pairs of said words and compressed
codes corresponding to said words are stored;
a text compression section which makes reference to said code conversion
dictionary for converting said words into the compressed codes
corresponding thereto;
compressed text storage means for storing the compressed codes of words of
said document information compressed by said text compression section as a
compressed text;
retrieval information input means for entering key information used to
retrieve document information registered in said compressed text storage
means;
a text retrieval section which makes reference to said code conversion
dictionary for converting said key information into compressed key data
corresponding thereto, and retrieves compressed texts including a
compressed code identical to said compressed key data stored in said
compressed text storage means;
an expansion section which expands the compressed text retrieved by said
text retrieval section into document information;
an output section for outputting the document information restored by said
expansion section; and
character string registration means for detecting words not registered in
said code conversion dictionary from said words into which said document
information is separated by said text analysis section, and assigning
fixed-length compressed codes to said detected words not registered in
said code conversion dictionary in sequence for registering the words in
said code conversion dictionary;
wherein said text compression section makes reference to the code
conversion dictionary in which words are registered by said character
string registration means for converting said words into the compressed
codes corresponding thereto;
wherein when a compressed code registration area of said code conversion
dictionary is finite, said character string registration means assigns the
compressed codes to said detecting words in sequence, and upon detection
of said compressed code registration area becoming full, assigns
identification information for identifying said code conversion
dictionary, stores contents of said code conversion dictionary and said
identification information to identify said dictionary, stores said
identification information to identify said dictionary together with the
compressed texts in said compressed text storage means, and creates a new
code conversion dictionary for registering other words; and
wherein when said document information is output, said expansion section
uses the same code conversion dictionary that is used for compressing
texts for expanding the compressed text.
3. A document information compression and retrieval system as claimed in
claim 2, wherein the contents of said code conversion dictionary and said
identification information to identify said dictionary are stored together
with the compressed texts in said compressed text storage means.
4. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text analysis section which separates the document information consisting
of a large number of characters input at said input section into words
consisting of one or more characters;
a code conversion dictionary in which pairs of said words and compressed
codes corresponding to said words are stored;
a text compression section which makes reference to said code conversion
dictionary for converting said words into the compressed codes
corresponding thereto;
compressed text storage means for storing the compressed codes of words of
said document information compressed by said text compression section as a
compressed text;
retrieval information input means for entering key information used to
retrieve document information registered in said compressed text storage
means;
a text retrieval section which makes reference to said code conversion
dictionary for converting said key information into compressed key data
corresponding thereto, and retrieves compressed texts including a
compressed code identical to said compressed key data stored in said
compressed text storage means;
an expansion section which expands the compressed text retrieved by said
text retrieval section into document information;
an output section for outputting the document information restored by said
expansion section; and
character string registration means for detecting words not registered in
said code conversion dictionary from said words into which said document
information is separated by said text analysis section, and assigning
fixed-length compressed codes to said detected words not registered in
said code conversion dictionary in sequence for registering the words in
said code conversion dictionary;
wherein said text compression section makes reference to the code
conversion dictionary in which words are registered by said character
string registration means for converting said words into the compressed
codes corresponding thereto;
wherein said document information compression and retrieval system further
comprises a compressed word determination section including:
means for counting the number of occurrences of each of words of the
document information input at said input section;
a word occurrence registration dictionary in which occurrence count
information counted by said counting means is recorded;
means for calculating the compression effect for each word by using said
occurrence count information and the character length of the word; and
means for determining words to provide an optimum compression effect for
all words of the document information from said word compression effect;
wherein when a compressed code registration area of said code conversion
dictionary is finite, said character string registration means assigns the
compressed codes to said detecting words in sequence, and detects when
said compressed code registration area becomes full; and
wherein upon detection of said compressed code registration area becoming
full by said character string registration means,
said determining means replaces words having a low compression effect with
words providing an optimum compression effect for assignment of compressed
codes in response to said compression effect calculated by said means for
calculating the compression effect, and
said character string registration means reads said compressed texts in
said compressed text storage means, expands the compressed codes of said
words having the low compression effect for storage in said compressed
text storage means, and registers said words providing the optimum
compression effect determined by said determining means in said code
conversion dictionary.
5. A document information compression and retrieval system as claimed in
claim 4, wherein said compressed word determination section assigns
compressed codes to words providing a compression effect of a
predetermined threshold of compression effect from compression effects of
words of document information.
6. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text analysis section which separates the document information consisting
of a large number of characters input at said input section into words
consisting of one or more characters;
a code conversion dictionary in which pairs of said words and compressed
codes corresponding to said words are stored;
a text compression section which makes reference to said code conversion
dictionary for converting said words into the compressed codes
corresponding thereto;
compressed text storage means for storing the compressed codes of words of
said document information compressed by said text compression section as a
compressed text;
retrieval information input means for entering key information used to
retrieve document information registered in said compressed text storage
means;
a text retrieval section which makes reference to said code conversion
dictionary for converting said key information into compressed key data
corresponding thereto, and retrieves compressed texts including a
compressed code identical to said compressed key data stored in said
compressed text storage means;
an expansion section which expands the compressed text retrieved by said
text retrieval section into document information;
an output section for outputting the document information restored by said
expansion section; and
character string registration means for detecting words not registered in
said code conversion dictionary from said words into which said document
information is separated by said text analysis section and, assigning
fixed-length compressed codes to said detected words not registered in
said code conversion dictionary in sequence for registering the words in
said code conversion dictionary;
wherein said text compression section makes reference to the code
conversion dictionary in which words are registered by said character
string registration means for converting said words into the compressed
codes corresponding thereto;
wherein said document information compression and retrieval system further
comprises a character string table in which specific words are prestored;
and
wherein said character string registration means detects characters of said
document information being katakana or alphanumeric, and upon detection,
determines whether or not words not registered in said code conversion
dictionary match the words stored in said character string table, and
registers matching words in said code conversion dictionary.
7. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text analysis section which separates the document information consisting
of a large number of characters input at said input section into words
consisting of one or more characters;
a code conversion dictionary in which pairs of said words and compressed
codes corresponding to said words are stored;
a text compression section which makes reference to said code conversion
dictionary for converting said words into the compressed codes
corresponding thereto;
compressed text storage means for storing the compressed codes of words of
said document information compressed by said text compression section as a
compressed text;
retrieval information input means for entering key information used to
retrieve document information registered in said compressed text storage
means;
a text retrieval section which makes reference to said code conversion
dictionary for converting said key information into compressed key data
corresponding thereto, and retrieves compressed texts including a
compressed code identical to said compressed key data stored in said
compressed text storage means;
an expansion section which expands the compressed text retrieved by said
text retrieval section into document information;
an output section for outputting the document information restored by said
expansion section; and
character string registration means for detecting words not registered in
said code conversion dictionary from said words into which said document
information is separated by said text analysis section and, assigning
fixed-length compressed codes to said detected words not registered in
said code conversion dictionary in sequence for registering the words in
said code conversion dictionary;
wherein said text compression section makes reference to the code
conversion dictionary in which words are registered by said character
string registration means for converting said words into the compressed
codes corresponding thereto;
wherein said document information compression and retrieval system further
comprises a text analysis dictionary in which words for separating input
document information into words are prestored;
wherein said text analysis section performs character string matching with
said text analysis dictionary as a text analysis technique of separating
said document information into words; and
wherein said text analysis section adopts the longest word registered in
said text analysis dictionary for separation when multiple match, in which
more than one word separation way is defined for said document
information, occurs in the character string matching with said text
analysis dictionary.
8. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determined by
said retrieval condition determination means into original document
information; and
output means for outputting the original document information expanded by
said expansion section;
wherein said expansion section expands the compressed text to be collated
when said character string collation means collates said compressed key
data with said compressed text; and
wherein said character string collation means collates said key information
with restored document information.
9. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determined by
said retrieval condition determination means into original document
information;
output means for outputting the original document information expanded by
said expansion section; and
a text analysis section which separates said document information input at
said input section into words that can be assumed to be semantic units;
wherein said text compression section assigns a compressed code to each of
said words provided by said text analysis section for conversion to a
compressed text; and
wherein said text analysis section recognizes a portion of words where a
shift read which two or more ways of separation for said document
information can be available occurs, and adds predetermined information to
said portion.
10. A document information compression and retrieval system as claimed in
claim 9, wherein said text analysis section extracts a plurality of word
groups corresponding to a plurality of separation ways when a shift read
of words occurs;
wherein said text compression section assigns compressed codes to the words
in said plurality of extracted word groups for conversion to a compressed
text; and
wherein said character string collation means collates all of said words in
said words with the compressed key data at retrieval.
11. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determined by
said retrieval condition determination means into original document
information;
output means for outputting the original document information expanded by
said expansion section; and
a text analysis section which separates said document information input at
said input section into words that can be assumed to be semantic units;
wherein said text compression section assigns a compressed code to each of
said words provided by said text analysis section for conversion to a
compressed text;
wherein said text analysis section recognizes a portion where a shift read
of words into which two or more ways of separation for said document
information can be available occurs;
wherein said text compression section does not convert said portion into a
compressed text;
wherein said text storage means stores said portion as document information
intact; and
wherein said character string collation means also collates key information
with said document information at retrieval.
12. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determined by
said retrieval condition determination means into original document
information;
output means for outputting the original document information expanded by
said expansion section; and
a text analysis section which separates said document information input at
said input section into words that can be assumed to be semantic units;
wherein said text compression section assigns a compressed code to each of
said words provided by said text analysis section for conversion to a
compressed text;
wherein said retrieval expression conversion means
recognizes a portion where a shift read of words into which two or more
ways of separation for said entered key information can be available
occurs,
extracts a plurality of word groups corresponding to a plurality of
separation ways when a shift read of words occurs,
assigns compressed codes to the words in said plurality of extracted word
groups for conversion to compressed key data, and
generates a retrieval condition expression from said retrieval condition;
and
wherein said character string collation means collates all of said
compressed key data with compressed text data at retrieval.
13. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determined by
said retrieval condition determination means into original document
information;
output means for outputting the original document information expanded by
said expansion section; and
a text analysis section which separates said document information input at
said input section into words that can be assumed to be semantic units;
wherein said text compression section assigns a compressed code to each of
said words provided by said text analysis section for conversion to a
compressed text;
wherein said document information compression and retrieval system further
comprises a code conversion dictionary in which said words and compressed
codes are stored in pairs;
wherein said text compression section makes reference to said code
conversion dictionary for conversion to a compressed text;
wherein said document information compression and retrieval system further
comprises a plurality of types of said code conversion dictionary;
wherein said retrieval information input means accepts a selection
specification of a dictionary to be used in response to the type of
document;
wherein said text compression section makes reference to the specified code
conversion dictionary for conversion to a compressed text, and adds
identification information to identify the used code conversion dictionary
to said compressed text; and
wherein said expansion section makes reference to said code conversion
dictionary identification information, and uses the code conversion
dictionary corresponding thereto for expanding the compressed text into
original document information.
14. A document information compression and retrieval system as claimed in
claim 13, wherein said retrieval expression conversion means makes
reference to said specified code conversion dictionary for converting the
entered key information into compressed key data.
15. A document information compression and retrieval system comprising:
an input section for inputting document information;
a text compression section which converts the document information input at
said input section into a compressed text for compression;
text storage means for storing the compressed text into which the document
information is converted by said text compression section;
retrieval information input means for entering key information and a
retrieval condition used to retrieve document information registered in
said text storage means;
retrieval expression conversion means for converting the key information
entered through said retrieval information input means into compressed key
data and by generating a retrieval condition expression from said
retrieval condition;
character string collation means for collating said compressed key data
with said compressed text stored in said text storage means and for
outputting a collation result;
retrieval condition determination means being responsive to said collation
result output from said character string collation means for determining a
compressed text of document information matching said retrieval condition
expression given from said retrieval expression conversion means;
an expansion section which expands the compressed text of document
information matching said retrieval condition expression determine | | |