WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Document information compression and retrieval system and document information registration and retrieval method    
United States Patent5590317   
Link to this pagehttp://www.wikipatents.com/5590317.html
Inventor(s)Iguchi; Hiroaki (Yokohama, JP); Kurosu; Yasuo (Yokosuka, JP); Fujinawa; Masaaki (Kanagawa-ken, JP); Yokoyama; Yoshihiro (Yokohama, JP); Masuzaki; Hidefumi (Hadano, JP)
AbstractA document information compression and retrieval system which reduces the document data amount and shortens the retrieval time when mass document information is registered and retrieved. A method of registering document information in a document information retrieval system which stores document information consisting of a large number of characters for retrieval of the stored document information. Entered document information is separated into words. Whether or not each of the words is a word to which a compressed code is assigned is determined. If not already assigned, a compressed code is assigned to the word. The words are converted into the assigned compressed codes for storing a compressed text. At output, retrieval information is accepted and converted into compressed code and stored compressed texts are searched for the compressed text matching the compressed code of the retrieval information, then the words corresponding to the compressed codes are used to expand the compressed text into original document information.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5590317
Document information compression and retrieval system and document

     information registration and retrieval method - US Patent 5590317 Drawing
Document information compression and retrieval system and document information registration and retrieval method
Inventor     Iguchi; Hiroaki (Yokohama, JP); Kurosu; Yasuo (Yokosuka, JP); Fujinawa; Masaaki (Kanagawa-ken, JP); Yokoyama; Yoshihiro (Yokohama, JP); Masuzaki; Hidefumi (Hadano, JP)
Owner/Assignee     Hitachi, Ltd. (Tokyo, JP)
Patent assignment
All assignments
Publication Date     December 31, 1996
Application Number     08/068,658
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 27, 1993
US Classification     707/2
Int'l Classification     G06F 017/30
Examiner     Amsbury; Wayne
Assistant Examiner    
Attorney/Law Firm     Antonelli, Terry, Stout & Kraus
Address
Parent Case    
Priority Data     May 27, 1992[JP]4-135340 May 27, 1992[JP]4-135341
USPTO Field of Search     395/600 364/419.13 364/419.14
Patent Tags     document information compression retrieval document information registration retrieval
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5519857
Kato
707/5
May,1996

[0 after 0 votes]
5491760
Withgott
382/203
Feb,1996

[0 after 0 votes]
5337233
Hofert
715/540
Aug,1994

[0 after 0 votes]
5321770
Huttenlocher
382/174
Jun,1994

[0 after 0 votes]
5319779
Chang
707/3
Jun,1994

[0 after 0 votes]
5298895
Van Maren
341/51
Mar,1994

[0 after 0 votes]
5281967
Jung
341/55
Jan,1994

[0 after 0 votes]
5265242
Fujisawa
707/3
Nov,1993

[0 after 0 votes]
5239298
Wei
341/51
Aug,1993

[0 after 0 votes]
5229947
Ross
701/200
Jul,1993

[0 after 0 votes]
5168533
Kato
382/229
Dec,1992

[0 after 0 votes]
5155484
Chambers, IV
341/55
Oct,1992

[0 after 0 votes]
4899148
Sato
341/65
Feb,1990

[0 after 0 votes]
4876541
Storer
341/51
Oct,1989

[0 after 0 votes]
4843389
Lisle
341/106
Jun,1989

[0 after 0 votes]
4796003
Bentley
341/95
Jan,1989

[0 after 0 votes]
4672679
Freeman
382/233
Jun,1987

[0 after 0 votes]
3613086
Loizides
D10/15
Oct,1971

[0 after 0 votes]
3593309
Clark, IV
546/145
Jul,1971

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text analysis section which separates the document information consisting of a large number of characters input at said input section into words consisting of one or more characters;

a code conversion dictionary in which pairs of said words and compressed codes corresponding to said words are stored;

a text compression section which makes reference to said code conversion dictionary for converting said words into the compressed codes corresponding thereto;

compressed text storage means for storing the compressed codes of words of said document information compressed by said text compression section as a compressed text;

retrieval information input means for entering key information used to retrieve document information registered in said compressed text storage means;

a text retrieval section which makes reference to said code conversion dictionary for converting said key information into compressed key data corresponding thereto, and retrieves compressed texts including a compressed code identical to said compressed key data stored in said compressed text storage means;

an expansion section which expands the compressed text retrieved by said text retrieval section into document information;

an output section for outputting the document information restored by said expansion section; and

character string registration means for detecting words not registered in said code conversion dictionary from said words into which said document information is separated by said text analysis section, and assigning fixed-length compressed codes to said detected words not registered in said code conversion dictionary in sequence for registering the words in said code conversion dictionary;

wherein said text compression section makes reference to the code conversion dictionary in which words are registered by said character string registration means for converting said words into the compressed codes corresponding thereto;

wherein when a compressed code registration area of said code conversion dictionary is finite, said character string registration means assigns the compressed codes to said detecting words in sequence, and terminates assignment of the compressed codes upon detection of said compressed code registration area becoming full; and

wherein said text compression section, after the termination of assignment of the compressed codes, converts the words already registered in said code conversion dictionary into their corresponding compressed codes, and stores words not registered in said code conversion dictionary in said compressed text storage means without conversion into compressed codes.

2. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text analysis section which separates the document information consisting of a large number of characters input at said input section into words consisting of one or more characters;

a code conversion dictionary in which pairs of said words and compressed codes corresponding to said words are stored;

a text compression section which makes reference to said code conversion dictionary for converting said words into the compressed codes corresponding thereto;

compressed text storage means for storing the compressed codes of words of said document information compressed by said text compression section as a compressed text;

retrieval information input means for entering key information used to retrieve document information registered in said compressed text storage means;

a text retrieval section which makes reference to said code conversion dictionary for converting said key information into compressed key data corresponding thereto, and retrieves compressed texts including a compressed code identical to said compressed key data stored in said compressed text storage means;

an expansion section which expands the compressed text retrieved by said text retrieval section into document information;

an output section for outputting the document information restored by said expansion section; and

character string registration means for detecting words not registered in said code conversion dictionary from said words into which said document information is separated by said text analysis section, and assigning fixed-length compressed codes to said detected words not registered in said code conversion dictionary in sequence for registering the words in said code conversion dictionary;

wherein said text compression section makes reference to the code conversion dictionary in which words are registered by said character string registration means for converting said words into the compressed codes corresponding thereto;

wherein when a compressed code registration area of said code conversion dictionary is finite, said character string registration means assigns the compressed codes to said detecting words in sequence, and upon detection of said compressed code registration area becoming full, assigns identification information for identifying said code conversion dictionary, stores contents of said code conversion dictionary and said identification information to identify said dictionary, stores said identification information to identify said dictionary together with the compressed texts in said compressed text storage means, and creates a new code conversion dictionary for registering other words; and

wherein when said document information is output, said expansion section uses the same code conversion dictionary that is used for compressing texts for expanding the compressed text.

3. A document information compression and retrieval system as claimed in claim 2, wherein the contents of said code conversion dictionary and said identification information to identify said dictionary are stored together with the compressed texts in said compressed text storage means.

4. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text analysis section which separates the document information consisting of a large number of characters input at said input section into words consisting of one or more characters;

a code conversion dictionary in which pairs of said words and compressed codes corresponding to said words are stored;

a text compression section which makes reference to said code conversion dictionary for converting said words into the compressed codes corresponding thereto;

compressed text storage means for storing the compressed codes of words of said document information compressed by said text compression section as a compressed text;

retrieval information input means for entering key information used to retrieve document information registered in said compressed text storage means;

a text retrieval section which makes reference to said code conversion dictionary for converting said key information into compressed key data corresponding thereto, and retrieves compressed texts including a compressed code identical to said compressed key data stored in said compressed text storage means;

an expansion section which expands the compressed text retrieved by said text retrieval section into document information;

an output section for outputting the document information restored by said expansion section; and

character string registration means for detecting words not registered in said code conversion dictionary from said words into which said document information is separated by said text analysis section, and assigning fixed-length compressed codes to said detected words not registered in said code conversion dictionary in sequence for registering the words in said code conversion dictionary;

wherein said text compression section makes reference to the code conversion dictionary in which words are registered by said character string registration means for converting said words into the compressed codes corresponding thereto;

wherein said document information compression and retrieval system further comprises a compressed word determination section including:

means for counting the number of occurrences of each of words of the document information input at said input section;

a word occurrence registration dictionary in which occurrence count information counted by said counting means is recorded;

means for calculating the compression effect for each word by using said occurrence count information and the character length of the word; and

means for determining words to provide an optimum compression effect for all words of the document information from said word compression effect;

wherein when a compressed code registration area of said code conversion dictionary is finite, said character string registration means assigns the compressed codes to said detecting words in sequence, and detects when said compressed code registration area becomes full; and

wherein upon detection of said compressed code registration area becoming full by said character string registration means,

said determining means replaces words having a low compression effect with words providing an optimum compression effect for assignment of compressed codes in response to said compression effect calculated by said means for calculating the compression effect, and

said character string registration means reads said compressed texts in said compressed text storage means, expands the compressed codes of said words having the low compression effect for storage in said compressed text storage means, and registers said words providing the optimum compression effect determined by said determining means in said code conversion dictionary.

5. A document information compression and retrieval system as claimed in claim 4, wherein said compressed word determination section assigns compressed codes to words providing a compression effect of a predetermined threshold of compression effect from compression effects of words of document information.

6. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text analysis section which separates the document information consisting of a large number of characters input at said input section into words consisting of one or more characters;

a code conversion dictionary in which pairs of said words and compressed codes corresponding to said words are stored;

a text compression section which makes reference to said code conversion dictionary for converting said words into the compressed codes corresponding thereto;

compressed text storage means for storing the compressed codes of words of said document information compressed by said text compression section as a compressed text;

retrieval information input means for entering key information used to retrieve document information registered in said compressed text storage means;

a text retrieval section which makes reference to said code conversion dictionary for converting said key information into compressed key data corresponding thereto, and retrieves compressed texts including a compressed code identical to said compressed key data stored in said compressed text storage means;

an expansion section which expands the compressed text retrieved by said text retrieval section into document information;

an output section for outputting the document information restored by said expansion section; and

character string registration means for detecting words not registered in said code conversion dictionary from said words into which said document information is separated by said text analysis section and, assigning fixed-length compressed codes to said detected words not registered in said code conversion dictionary in sequence for registering the words in said code conversion dictionary;

wherein said text compression section makes reference to the code conversion dictionary in which words are registered by said character string registration means for converting said words into the compressed codes corresponding thereto;

wherein said document information compression and retrieval system further comprises a character string table in which specific words are prestored; and

wherein said character string registration means detects characters of said document information being katakana or alphanumeric, and upon detection, determines whether or not words not registered in said code conversion dictionary match the words stored in said character string table, and registers matching words in said code conversion dictionary.

7. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text analysis section which separates the document information consisting of a large number of characters input at said input section into words consisting of one or more characters;

a code conversion dictionary in which pairs of said words and compressed codes corresponding to said words are stored;

a text compression section which makes reference to said code conversion dictionary for converting said words into the compressed codes corresponding thereto;

compressed text storage means for storing the compressed codes of words of said document information compressed by said text compression section as a compressed text;

retrieval information input means for entering key information used to retrieve document information registered in said compressed text storage means;

a text retrieval section which makes reference to said code conversion dictionary for converting said key information into compressed key data corresponding thereto, and retrieves compressed texts including a compressed code identical to said compressed key data stored in said compressed text storage means;

an expansion section which expands the compressed text retrieved by said text retrieval section into document information;

an output section for outputting the document information restored by said expansion section; and

character string registration means for detecting words not registered in said code conversion dictionary from said words into which said document information is separated by said text analysis section and, assigning fixed-length compressed codes to said detected words not registered in said code conversion dictionary in sequence for registering the words in said code conversion dictionary;

wherein said text compression section makes reference to the code conversion dictionary in which words are registered by said character string registration means for converting said words into the compressed codes corresponding thereto;

wherein said document information compression and retrieval system further comprises a text analysis dictionary in which words for separating input document information into words are prestored;

wherein said text analysis section performs character string matching with said text analysis dictionary as a text analysis technique of separating said document information into words; and

wherein said text analysis section adopts the longest word registered in said text analysis dictionary for separation when multiple match, in which more than one word separation way is defined for said document information, occurs in the character string matching with said text analysis dictionary.

8. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determined by said retrieval condition determination means into original document information; and

output means for outputting the original document information expanded by said expansion section;

wherein said expansion section expands the compressed text to be collated when said character string collation means collates said compressed key data with said compressed text; and

wherein said character string collation means collates said key information with restored document information.

9. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determined by said retrieval condition determination means into original document information;

output means for outputting the original document information expanded by said expansion section; and

a text analysis section which separates said document information input at said input section into words that can be assumed to be semantic units;

wherein said text compression section assigns a compressed code to each of said words provided by said text analysis section for conversion to a compressed text; and

wherein said text analysis section recognizes a portion of words where a shift read which two or more ways of separation for said document information can be available occurs, and adds predetermined information to said portion.

10. A document information compression and retrieval system as claimed in claim 9, wherein said text analysis section extracts a plurality of word groups corresponding to a plurality of separation ways when a shift read of words occurs;

wherein said text compression section assigns compressed codes to the words in said plurality of extracted word groups for conversion to a compressed text; and

wherein said character string collation means collates all of said words in said words with the compressed key data at retrieval.

11. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determined by said retrieval condition determination means into original document information;

output means for outputting the original document information expanded by said expansion section; and

a text analysis section which separates said document information input at said input section into words that can be assumed to be semantic units;

wherein said text compression section assigns a compressed code to each of said words provided by said text analysis section for conversion to a compressed text;

wherein said text analysis section recognizes a portion where a shift read of words into which two or more ways of separation for said document information can be available occurs;

wherein said text compression section does not convert said portion into a compressed text;

wherein said text storage means stores said portion as document information intact; and

wherein said character string collation means also collates key information with said document information at retrieval.

12. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determined by said retrieval condition determination means into original document information;

output means for outputting the original document information expanded by said expansion section; and

a text analysis section which separates said document information input at said input section into words that can be assumed to be semantic units;

wherein said text compression section assigns a compressed code to each of said words provided by said text analysis section for conversion to a compressed text;

wherein said retrieval expression conversion means

recognizes a portion where a shift read of words into which two or more ways of separation for said entered key information can be available occurs,

extracts a plurality of word groups corresponding to a plurality of separation ways when a shift read of words occurs,

assigns compressed codes to the words in said plurality of extracted word groups for conversion to compressed key data, and

generates a retrieval condition expression from said retrieval condition; and

wherein said character string collation means collates all of said compressed key data with compressed text data at retrieval.

13. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determined by said retrieval condition determination means into original document information;

output means for outputting the original document information expanded by said expansion section; and

a text analysis section which separates said document information input at said input section into words that can be assumed to be semantic units;

wherein said text compression section assigns a compressed code to each of said words provided by said text analysis section for conversion to a compressed text;

wherein said document information compression and retrieval system further comprises a code conversion dictionary in which said words and compressed codes are stored in pairs;

wherein said text compression section makes reference to said code conversion dictionary for conversion to a compressed text;

wherein said document information compression and retrieval system further comprises a plurality of types of said code conversion dictionary;

wherein said retrieval information input means accepts a selection specification of a dictionary to be used in response to the type of document;

wherein said text compression section makes reference to the specified code conversion dictionary for conversion to a compressed text, and adds identification information to identify the used code conversion dictionary to said compressed text; and

wherein said expansion section makes reference to said code conversion dictionary identification information, and uses the code conversion dictionary corresponding thereto for expanding the compressed text into original document information.

14. A document information compression and retrieval system as claimed in claim 13, wherein said retrieval expression conversion means makes reference to said specified code conversion dictionary for converting the entered key information into compressed key data.

15. A document information compression and retrieval system comprising:

an input section for inputting document information;

a text compression section which converts the document information input at said input section into a compressed text for compression;

text storage means for storing the compressed text into which the document information is converted by said text compression section;

retrieval information input means for entering key information and a retrieval condition used to retrieve document information registered in said text storage means;

retrieval expression conversion means for converting the key information entered through said retrieval information input means into compressed key data and by generating a retrieval condition expression from said retrieval condition;

character string collation means for collating said compressed key data with said compressed text stored in said text storage means and for outputting a collation result;

retrieval condition determination means being responsive to said collation result output from said character string collation means for determining a compressed text of document information matching said retrieval condition expression given from said retrieval expression conversion means;

an expansion section which expands the compressed text of document information matching said retrieval condition expression determine