WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Stem for dynamically compressing and decompressing electronic data    
United States Patent4876541   
Link to this pagehttp://www.wikipatents.com/4876541.html
Inventor(s)Storer; James A. (Lincoln, MA)
AbstractA data compression system for encoding and decoding textual data, including an encoder for encoding the data and for a decoder for decoding the encoded data. Both encoder and decoder have dictionaries for storing frequently-appearing strings of characters. Each string is identified by a unique pointer. The input data stream is parsed and matched with strings in the encoder dictionary using a novel matching algorithm. The pointer associated with the matched string is then transmitted to a remote location for storage or decoding. Thereafter, using a novel update algorithm the encoder dictionary is updated to include new strings of data based on the matched string of data. If required, a novel deletion algorithm is employed to remove infrequently used strings of data to provide room for the newly generated strings of data. The strings of data may be arranged using a modified least recently used queue. The decoder matches each unique pointer in the stream of compressed input data with a corresponding pointer in the decoder dictionary. The decoder then transmits the string of character data associated with the matched pointer, thereby providing textual data in original, uncompressed form. Thereafter, using the novel update and deletion algorithms, new strings of data are added to, and old strings of data are deleted from, the decoder dictionary, so as to ensure both encoder and decoder dictionaries contain identical strings of data.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 4876541
Stem for dynamically compressing and decompressing electronic data - US Patent 4876541 Drawing
Stem for dynamically compressing and decompressing electronic data
Inventor     Storer; James A. (Lincoln, MA)
Owner/Assignee     Data Compression Corporation (Lexington, MA)
Patent assignment
All assignments
Publication Date     October 24, 1989
Application Number     07/108,929
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 15, 1987
US Classification     341/51 341/67 341/106
Int'l Classification     H03M 007/42
Examiner     Shoop Jr.; William M.
Assistant Examiner     Hoff; Marc S.
Attorney/Law Firm     Schiller, Pandiscio & Kusmer
Address
Parent Case    
Priority Data    
USPTO Field of Search     340/347 DD 358/261 341/51 341/65 341/67 341/69 341/106
Patent Tags     stem dynamically compressing decompressing electronic data
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4612532
Bacon
341/51
Sep,1986

[0 after 0 votes]
4558302
Welch
341/51
Dec,1985

[0 after 0 votes]
4464650
Eastman
341/51
Aug,1984

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. An apparatus for dynamically compressing and decompressing an input stream of data characters into a compressed stream code and for decompressing said compressed stream of code into uncompressed data characters, the apparatus comprising:

first dictionary means for storing a first plurality of strings of data characters, said first dictionary means comprising a plurality of unique codes each for identifying a corresponding respective one of said first plurality of strings, wherein said first plurality of strings comprises an initial set of unique data characters, with each character in said initial set being positioned in a corresponding respective one of said first plurality of strings;

second dictionary means for storing a second plurality of strings of data characters, said second dictionary means comprising a plurality of unique codes each for identifying a corresponding respective one of said second plurality of strings, wherein said second plurality of strings comprises an initial set of unique data characters that is identical to said initial set of unique data characters in said first plurality of strings, with each character in said initial set in said second plurality of strings being positioned in the one of said second plurality of strings that is identified by the same unique code as the string in said first plurality of strings in which the identical character from said initial set of characters in said first plurality of strings is positioned;

first match means for receiving and parsing said input stream of data characters into parsed strings of data characters and for comparing each of said parsed strings of data characters with said first plurality of strings so as to locate the one of said first plurality of strings that matches a corresponding one of said parsed strings, wherein the most recently matched one of said parsed strings is identified as the current match;

first transmitting means for transmitting the one of said plurality of unique codes identifying the one of said first plurality of stored strings that matches said current match, for each current match;

second match means for receiving said one unique code transmitted by said first transmitting means for each current match, and for comparing each said one unique code with said plurality of unique codes identifying said second plurality of strings so as to locate the one of said plurality of unique codes in said second dictionary means that matches said each said one unique code received from said first transmitting means;

second transmitting means for transmitting the string of character data in said second plurality of strings identified by the one of said plurality of unique codes in said second dictionary means that matches said one unique code received from said first transmitting means, for each string of character data matched by said second match means;

first update means for adding N new strings of data characters to said first dictionary means for each current match, wherein N equals the number of characters in said current match, said N new strings comprising the last current match concatenated with each non-empty prefix of said current match, and including means for assigning one of said plurality of unique codes to each of said new strings; and

second update means for adding said N new strings of data characters to said second dictionary means for each current match, including means for assigning the same one of said plurality of unique codes to each of said N new strings added to said second dictionary means that is assigned to corresponding respective ones of said N new strings added to said first dictionary means.

2. An apparatus according to claim 1, further wherein said first match means parses said input data stream so that each of said parsed strings is the longest string possible that will match a corresponding one of said first plurality of stored strings.

3. An apparatus according to claim 1, further comprising:

input buffer means for storing a portion of said input data stream; and

limited lookahead means for comparing said portion of said input data stream with said first plurality of strings stored in said first dictionary means so as to choose a parsing of said portion that, over time, allows said input data stream to be represented with substantially the fewest number of said plurality of unique codes possible, wherein a string from said portion chosen by said parsing of said portion is used as the current match.

4. An apparatus according to claim 3, further wherein:

said input buffer means comprises a looksize portion sized to hold from between 2 to 100 characters of input data and a maxmatch portion adjacent said looksize portion sized to hold from between 1 to 1000 characters of input data, wherein said maxmatch portion contains only character data forming part of a string of data that begins in the looksize portion and extends into the maxmatch portion; and

said limited lookahead means comprises comparison means for:

(a) matching various strings of characters at least partially contained in said looksize portion with said first plurality of strings in said first dictionary means, and determining the number of bits required to represent each of said various strings of characters in compressed form by (1) determining the number of bits required to represent each of said plurality of unique codes identifying the ones of said first plurality of strings that match corresponding respective ones of said various strings of characters, and (2) determining the number of bits required to represent each of said various strings of characters in uncompressed form;

(b) calculating a compression ratio for each of said various strings of characters by dividing the number of bits required to represent said each of said various strings in compressed form by the number of bits required to represent said each of said various string in uncompressed form;

(c) comparing the compression ratios for each of said various strings of characters with others of said various strings so as to determine a parsing of said various strings that requires the fewest possible of said plurality of unique codes to represent said various strings in compressed form; and

(d) using one of said various strings of characters as said current match.

5. An apparatus according to claim 4, wherein said looksize portion holds about 8 characters and said maxmatch portion holds about 25 characters.

6. An apparatus according to claim 1,

wherein said strings in said first plurality of strings containing said characters of the initial set together define an array, and wherein said array comprises a plurality of unique root nodes and each of said unique root nodes contains a character from said initial set, and wherein each of said first plurality of strings depends from one of said plurality of unique root nodes, further wherein the number of said first plurality of strings that depend from a single root node is equal to about the number of characters in said initial character set; and

wherein said strings in said second plurality of strings containing said characters of the initial sot together define an array, and wherein said array comprises a plurality of unique root nodes and each of said unique root nodes contains a character from said initial set, and wherein each of said second plurality of strings depends from one of said plurality of unique root nodes, further wherein the number of said second plurality of strings that depend from a single root node is equal to about the number of characters in said initial set of characters.

7. An apparatus according to claim 6, wherein said arrays in said first and second dictionary means each comprise about 256 root nodes.

8. An apparatus according to claim 6, wherein each character in said first and second plurality of strings, except for the characters contained in the root nodes of said first and second plurality of strings, is positioned at a descendant node, further wherein the number of descendant nodes that are permitted to depend from any given descendant node is equal to about 5 to 10 percent of the total number of root nodes in said array.

9. An apparatus according to claim 8, only about 12 to 25 descendant nodes are permitted to depend from any given descendant node.

10. An apparatus according to claim 1, wherein said first and second update means each comprise deletion means for deleting strings of data from said first and second dictionary means, respectively, so as to provide room for said N new strings of data added to said first and second dictionary means.

11. An apparatus according to claim 10, wherein said first and second dictionary means are each initialized with a set of characters prior to the receipt by said first match means of the first string in said input stream, and further wherein said deletion means of said first and second dictionary means never delete said initialized set of characters from said first and second dictionary means, respectively.

12. An apparatus according to claim 10, wherein said first and second plurality of strings are arranged in said first and second dictionary means, respectively, in a least recently used queue by successively increasing frequency of use, with the least frequently used ones of said first and second plurality of strings being positioned at the end of the queue and the most frequently used ones of said first and second plurality of strings being positioned at the front of the queue.

13. An apparatus according to claim 10, wherein said first and second plurality of strings each comprise a selected number of strings and wherein each of said first and second plurality of strings is arranged in a modified least recently used queue that is organized so that:

(a) said last current match and current match prefix concatenations are positioned after said prefixes of said current match;

(b) said last current match and current match prefix concatenations are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes, and said current match prefixes are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes;

(c) said current match prefixes are positioned at the beginning of said queue, except when said current match follows one of said initial set of characters in which case said current match prefixes are positioned after said last match and current match prefix concatenations;

(d) said characters of the initial set are not included in said queue;

(e) if a prefix of a last match and current match concatenation or a prefix of a current match is already in said queue said prefix is deleted from said queue when said N new strings are added to said first and second dictionary means; and

(f) strings are deleted from the end of said queue to provide room for N new strings whenever the total number of strings in said first and second dictionary means exceeds said selected number.

14. An apparatus according to claim 10, wherein each character in said first and second plurality of strings, except for the characters contained in the root nodes of said first and second plurality of strings, is positioned at a descendant node, and wherein each of said plurality of strings is arranged in a modified least recently used queue that is organized so that the ancestors of any given descendant node positioned closer to the front of said modified least recently used queue than said given descendant node.

15. An apparatus according to claim 1, wherein each of said current matches is represented by data bits, and wherein each of said unique codes in said first and second dictionary means is represented by a selected number of data bits and said selected number is related to the number of strings in said first and second plurality of strings, said first and second plurality of strings each having an identical number of strings, further wherein said apparatus comprises:

means for computing the rate of data compression achieved by said apparatus by dividing, for each current match, the number of bits in said unique code identifying the one of said first plurality of strings that matches said each current match by the number of bits in said each current match;

first means for varying the number of strings in said first plurality of strings pursuant to said rate of data compression achieved by said apparatus so as to maximize said rate of data compression achieved by said apparatus; and

second means for varying the number of strings in said second plurality of strings whenever said first means varies the number of strings in said first plurality of strings so that said first and second plurality of strings comprise identical numbers of strings.

16. An apparatus according to claim 1 further comprising means for increasing or decreasing the number of strings in said first and second plurality of strings so as to minimize the number of said unique codes required to represent a given number of input stream data characters.

17. An apparatus according to claim 1 further comprising:

a first downsize dictionary that is identical to and performs the same functions as said first dictionary means, except that said first downsize dictionary contains half as many strings in its first plurality of strings as said first dictionary means;

a first upsize dictionary that is identical to and performs the same functions as said first dictionary means, except that said first upsize dictionary contains twice as many strings in its first plurality of strings as said first dictionary means;

a second downsize dictionary that is identical to and performs the same functions as said second dictionary means, except that said second downsize dictionary contains half as many strings in its second plurality of strings as said second dictionary means;

a second upsize dictionary that is identical to and performs the same functions as said second dictionary means, except that said second upsize dictionary contains twice as many strings in its second plurality of strings as said second upsize dictionary means;

further wherein:

(a) said first match means additionally compares each of said parsed strings with the first plurality of strings stored in said first downsized dictionary and said first upsize dictionary so as to locate that one of said stored strings, for both said first downsize dictionary and said first upsize dictionary, that matches a corresponding one of said parsed strings, and wherein the most recently matched one of said parsed strings in said first downsized dictionary is identified as the downsized current match and wherein the most recently matched one of said parsed strings in said first upsize dictionary is identified as the upsize current match;

(b) said first transmitting means transmit the unique codes identifying the three ones of said first plurality of strings stored in said first dictionary means, said first downsize dictionary, and said first upsize dictionary that match, respectively, said current match, said downsize current match, and said upsize current match, wherein said unique codes identifying said three ones of said first plurality of strings stored in said first dictionary means, said first downsize dictionary, and said first upsize dictionary are together identified as the match code;

(c) said second match means compares, respectively, said three ones of said strings in said match code with said unique codes identifying said second plurality of strings in said second dictionary means, said second downsize dictionary, and said second upsize dictionary, so as to locate, respectively, the three ones of said unique codes in said second dictionary means, said second downsize dictionary, and said second upsize dictionary that match, respectively, said three ones of said first plurality of strings in said match code received from said first transmitting means;

(d) said second transmitting means transmits the strips of character data identified by said three ones of said unique code in said second dictionary means, said second downsize dictionary, and said second upsize dictionary, that match, respectively, said three ones of said first plurality of strings in said match code;

(e) said first update means additionally adds said N new strings of data to said first downsize dictionary, and said first upsize dictionary;

(f) said second update means additionally adds said N new strings of data to said second downsize dictionary and said second upsize dictionary;

said apparatus additionally comprising:

dictionary comparison means for comparing the number of characters in said current match, said downsize current match, and said upsize current match, with, respectively, (a) the number of characters in the unique code identifying the one of said first plurality of strings in said first dictionary means that matches said current match, (b) the number of characters in the unique code identifying the one of said first plurality of strings in said first downsize dictionary that matches said downsize current match, and (c) the number of characters in the unique code identifying the one of said first plurality of strings in said first upsize dictionary that matches said upsize current match, so as to determine the rates of data compression achieved using, respectively, said first dictionary means, said first downsize dictionary, and said first upsize dictionary;

means for reducing the number of strings in said first plurality of strings in said first dictionary means by one half if a higher rate of data compression is achieved using said first downsize dictionary than is achieved using said first dictionary means;

means for doubling the number of strings in said first plurality of strings in said first dictionary means if a higher rate of compression is achieved using said first upsize dictionary than is achieved using said first dictionary means;

means for changing the number of strings in said second dictionary means whenever said means for reducing or said means for doubling changes the number of strings in said first plurality of strings in said first dictionary means, so that said second dictionary means contains the same number of strings in its second plurality of strings as said first dictionary means.

18. An apparatus for dynamically compressing an input stream of data characters into a stream of compressed code, and for dynamically decompressing said stream of compressed code into said input stream of data characters, the apparatus comprising:

encoder module means for encoding said stream of characters and for transmitting said stream of characters in encoded form;

decoder module means for decoding said encoded stream of characters and for transmitting said stream of characters in uncoded form;

wherein said encoder module means and said decoder module means each comprise:

dictionary means for storing a plurality of strings of said characters, wherein each of said plurality of strings has a unique address;

first match means (1) for receiving said input stream of data characters, (2) for matching strings of characters from said input stream of characters with ones of said plurality of strings of characters in said dictionary means in said encoder module, and (3) for transmitting the address of matched ones of said plurality of strings;

second match means for (1) receiving the addresses of strings with which matches have been made, (2) matching said each of said addresses with the addresses of said plurality of strings in said dictionary means in said decoder module, and (3) transmitting the characters in those of said plurality of strings having addresses that match said addresses received by said second match means;

update means for updating said dictionary means in said encoder module and said decoder module so that said plurality of strings stored therein contain strings of characters frequently present in said stream of characters, and for concatenating the last matched string with the currently matched string; and

deletion means for deleting from said plurality of strings of characters in said dictionary means in said encoder module and in said dictionary means in said decoder module those strings with which strings of characters from said input stream are approximately least frequently matched by arranging said plurality of strings in said dictionary means in said encoder and decoder modules in a reverse order arrangement with the currently matched strings appearing after said concatenation of the last matched string and the currently matched string.

19. An apparatus for dynamically compressing an input stream of data characters into a compressed stream of code, the system comprising:

dictionary means for storing a plurality of strings of data characters, said dictionary means comprising a plurality of unique codes each for identifying a corresponding respective one of said plurality of strings, wherein said plurality of strings comprises an initial set of unique data characters, with each character in said initial set being positioned in a corresponding respective one of said plurality of strings;

match means for receiving and parsing said input stream of data characters into parsed strings and for comparing each of said parsed strings of data characters with said plurality of strings in said dictionary means so as to locate the one of said stored plurality of strings that matches a corresponding one of said parsed strings, wherein the most recently matched one on said parsed strings is identified as the current match;

transmitting means for transmitting said unique code identifying the one of said plurality of stored strings that matches said current match; and

update means for adding N new strings of data characters to said dictionary means, wherein N equals the number of characters current match concatenated with each non-empty prefix of said current match, and including means for assigning one of said plurality of unique address codes to each of said N new strings.

20. An apparatus according to claim 19, further wherein said match means parses said input data stream so that each of said parsed strings is the longest string possible that will match a corresponding one of said plurality of stored strings.

21. An apparatus according to claim 19, further comprising:

input buffer means for storing a portion of said input data stream; and

limited lookahead means for comparing said portion of said input data stream with said plurality of strings stored in said dictionary means so as to choose a parsing of said portion that, over time, allows said input data stream to be represented with substantially the fewest number of said plurality of unique codes possible, wherein a string from said portion closed by said parsing of said portion is used as the current match.

22. An apparatus according to claim 21, further wherein:

said input buffer means comprises a looksize portion sized to hold from between 2 to 100 characters of input data and a maxmatch portion adjacent said looksize portion sized to hold from between 1 to 1000 characters of input data, wherein said maxmatch portion contains only character data forming part of a string of data that begins in the looksize portion and extends into the maxmatch portion; and

said limited lookahead means comprises comparison means for:

(a) matching various strings of characters at least partially contained in said looksize portion with said plurality of stored strings in said dictionary means, and determining the number of bits required to represent each of the various strings of characters in compressed form by (1) determining the number of bits required to represent each of the plurality of unique codes associated with the ones of said plurality of stored strings that match corresponding respective ones of said various strings of characters, and (2) determining the number of bits required to represent each of said various strings of characters in uncompressed form;

(b) calculating a compression ratio for each of said various strings of characters by dividing the number of bits required to represent said each of said various strings in compressed form by the number of bits required to represent said each of said various string in uncompressed form;

(c) comparing the compression ratios for each of said various strings of characters with others of said various strings so as to determine a parsing of said various strings that requires the fewest possible of said plurality of unique codes to represent said various strings in compressed form; and

(d) using one of said various strings of characters as said current match.

23. An apparatus according to claim 22, wherein said looksize portion holds about 8 characters and said maxmatch portion holds about 25 characters.

24. An apparatus according to claim 19,

wherein said strings in said plurality of strings containing said characters of the initial set in said dictionary means together define an array, and wherein said array comprises a plurality of unique root nodes and each of said unique root nodes contains a character from said initial set, and wherein each of said plurality of strings depends from one of said plurality of unique root nodes, further wherein the number of said plurality of strings that depend from a single root node is equal to about the number of characters in said initial character set.

25. An apparatus according to claim 24, wherein said array in said dictionary means comprises about 256 root nodes.

26. An apparatus according to claim 24, wherein each character in said plurality of strings in said dictionary means, except for the characters contained in the root node of said plurality of strings, is positioned at a descendant node, further wherein the number of descendant nodes that are permitted to depend from any given descendant node is equal to about 5 to 10 percent of the total number of root nodes in said array.

27. An apparatus according to claim 26, wherein only about 12 to 25 descendant nodes are permitted to depend from any given descendant node.

28. An apparatus according to claim 19, wherein said update means comprises deletion means for deleting strings of data from said dictionary means so as to provide room for said N new strings of data added to said dictionary means.

29. An apparatus according to claim 28, wherein said dictionary means is initialized with a set of characters prior to the receipt by said match means of the first string in said input stream, and further wherein said deletion means never deletes said initialized set of characters from said dictionary means.

30. An apparatus according to claim 28, wherein said plurality of strings in said dictionary means are arranged in a least recently used queue, with said plurality of strings being arranged in the queue by successively increasing frequency of use, with the least frequently used one of said plurality of strings being positioned at the end of the queue and the most frequently used one of said plurality of strings being positioned at the front of the queue.

31. An apparatus according to claim 28, wherein said plurality of strings in said dictionary means comprises a selected number of strings and wherein said plurality of strings is arranged in a modified least recently used queue that is organized so that;

(a) said last current match and current match prefix concatenations are positioned after said prefixes of said current match;

(b) said last current match and current match prefix concatenations are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes, and said current match prefixes are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes;

(c) said current match prefixes are positioned at the beginning of said queue, except when said current match follows one of said initial set of characters in which case said current match prefixes are positioned after said last current match and current match prefix concatenations;

(d) said characters of the initial set are not included in said queue;

(e) if a prefix of a last current match and current match concatenation or a prefix of a current match is already in said queue said prefix is deleted from said queue when said N new strings are added to said dictionary means; and

(f) strings are deleted from the end of said queue to provide room for N new strings whenever the total number of strings in said dictionary means exceeds said selected number.

32. An apparatus according to claim 28, wherein each character in said plurality of strings in said dictionary means is positioned at a descendant node, and wherein each of said plurality of strings is arranged in a modified least recently used queue that is organized so that the ancestors of any given descendant node are positioned closer to the front of said modified least recently used queue than said given descendant node.

33. An apparatus according to claim 19, wherein each of said current matches is represented by data bits, and wherein each of said unique codes in said dictionary means is represented by a selected number of data bits and said selected number is related to the number of strings in said plurality of strings in said dictionary means, further wherein said apparatus comprises:

means for computing the rate of data compression achieved by said apparatus by dividing, for each current match, the number of bits in said unique code identifying the one of said plurality of strings in said dictionary means that matches said each current match by the number of bits in said each current match;

means for varying the number of strings in said plurality of strings in said dictionary means pursuant to said rates of data compression achieved by the apparatus so as to maximize the rates of data compression achieved by the apparatus

34. An apparatus according to claim 19, further comprising means for increasing or decreasing the number of said plurality of strings in said dictionary means so as to minimize the number of said unique codes required to represent a given number of input stream data characters.

35. A system for dynamically decompressing a compressed stream of code into uncompressed data characters, the system comprising:

dictionary means for storing a plurality of strings of data characters, said dictionary means comprising a plurality of unique codes for identifying a corresponding respective one of said plurality of strings, wherein said plurality of strings comprises an initial set of unique data characters, with each character in said initial set being positioned in a corresponding respective one of said plurality of strings;

match means for receiving said compressed stream code, and for comparing said stream of code with said plurality of unique codes identifying said plurality of strings in said dictionary means so as to locate the ones of said plurality of unique codes in said dictionary means that match corresponding ones of code in said stream of code received by said match means, wherein the most recently matched one of said plurality of unique codes in said dictionary means is identified as the current match;

transmitting means for transmitting the strings of character data identified by the ones of said plurality of unique codes in said dictionary that match said corresponding ones of said stream of code received by said match means; and

update means for adding N new strings of data characters to said dictionary means for each string of characters transmitted by said transmitting means, wherein N equals the number of characters in a current match, said N new strings comprising the last current match concatenated with each non-empty prefix of said current match, and including means for assigning a unique address code to each of said new strings.

36. An apparatus according to claim 35, wherein said update means comprises deletion means for deleting strings of data from said dictionary means so as to provide room for said N new strings of data added to said dictionary means.

37. An apparatus according to claim 36, wherein said dictionary means is initialized with a set of characters prior to the receipt by said match means of the first string in said compressed stream of code, and further wherein said deletion means never deletes said initialized set of characters from said dictionary means.

38. An apparatus according to claim 35, wherein said plurality of strings in said dictionary means are arranged in a least recently used queue, with the strings being arranged in the queue by successively increasing frequency of use, with the least frequently used strings being positioned at the end of the queue and the most frequently used strings being positioned at the front of the queue.

39. An apparatus according to claim 35, wherein said plurality of strings in said dictionary means comprises a selected number of strings and wherein said plurality of strings is arranged in a modified least recently used queue that is organized so that:

(a) said last current match and current match prefix concatenations are positioned after said prefixes of said current match;

(b) said last current match and current match prefix concatenations are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes, and said current match prefixes are arranged by length, so that the shortest prefix appears before the next longest prefix, and successively longer prefixes appear thereafter, with the longest prefix being positioned after all shorter prefixes;

(c) said current match prefixes are positioned at the beginning of said queue, except when said current match follows one of said initial set of characterizing which case said current match prefixes are positioned after said last match and current match prefix concatenations;

(d) said characters of the initial set are not included in said queue;

(e) if a prefix of a last match and current match concatenation or a prefix of a current match is already in said queue said prefix is deleted from said queue when said N new strings are added to said dictionary means; and

(f) strings are deleted from the end of said queue to provide room for N new strings whenever the total number of strings in said dictionary means exceeds said selected number.

40. An apparatus according to claim 35, wherein each character in said plurality of strings in said dictionary means is positioned at a descendant node, and wherein each of said plurality of strings is arranged in a modified least recently used queue that is organized so that the ancestors of any given descendant node are positioned closer to the front of said modified least recently used queue than said given descendant node.

41. An apparatus according to claim 35, wherein each of said current matches is represented by data bits, and wherein each of said plurality unique codes in said dictionary means is represented by a selected number of data bits and said selected number is related to the number of strings in said plurality of strings in said dictionary means, further wherein said apparatus comprises:

means for computing the rate of data compression achieved by said apparatus by dividing, for each current match, the number of bits in said unique code identifying the one of said plurality of strings in said dictionary means that matches said each current match by the number of bits in said each current match;

means for varying the number of strings in said plurality of strings in said dictionary means pursuant to said rates of data compression achieved by the apparatus so as to maximize the rates of data compression achieved by the apparatus.

42. An apparatus according to claim 35, further comprising means for increasing or decreasing the number of said plurality of strings in said dictionary means so as to minimize the number of said unique codes required to represent a given number of input stream data characters.

43. An apparatus according to claim 1, further comprising bypass means for passing said input stream of data characters through said apparatus so that said data is neither compressed or decompressed.

44. An apparatus according to claim 19, further comprising bypass means for passing said input stream of data characters through said apparatus so that said data is neither compressed or decompressed.

45. An apparatus according to claim 35, further comprising bypass means for passing said input stream of data characters through said apparatus so that said data is neither compressed or decompressed.

46. An apparatus according to claim 1, wherein said deletion means of said first and second dictionary means each comprise means for selecting a special set of characters, wherein characters in said special set of characters are never deleted from said first and second dictionary means.

47. An apparatus according to claim 1, said first update means and said second update means each comprising maxmatch means for limiting the length of each of said N new strings of data characters to maxmatch characters long, wherein maxmatch ranges from 1 to the number of strings in said plurality of strings in said first and second dictionary means.

48. An apparatus according to claim 47, wherein maxmatch ranges from 10 to 100.

49. An apparatus according to claim 19, said update means further comprising maxmatch means for limiting the length of each of said N new strings of data to maxmatch characters long, wherein maxmatch ranges from 1 to the number of strings in said plurality of strings in said first and second dictionary means.

50. An apparatus according to claim 49, wherein maxmatch ranges from 10 to 100.

51. An apparatus according to claim 35, said update means further comprising maxmatch means for limiting the length of each of said N new strings of data to maxmatch characters long, wherein maxmatch ranges from 1 to the number of strings in said plurality of strings in said first and second dictionary means.

52. An apparatus according to claim 51, wherein maxmatch ranges from 10 to 100.

53. An apparatus according to claim 1 wherein said apparatus is designated to operate in either a compression mode or a decompression mode, further wherein said apparatus comprises means for switching said apparatus (a) to said compression mode upon receipt of a first signal in said input stream of data characters and (b) to said decompression mode upon receipt of a second signal in said input stream.

54. An apparatus according to claim 18 wherein said apparatus is designated to operate in either a compression mode or a decompression mode, further wherein said apparatus comprises means for switching said apparatus (a) to said compression mode upon receipt of a first signal in said input stream of data characters and (b) to said decompression mode upon receipt of a second signal in said input stream.

55. An apparatus for dynamically compressing and decompressing an input stream of data characters into a compressed stream of code and for decompressing said compressed stream of code into uncompressed data characters, the apparatus comprising:

first dictionary means for storing a first plurality of strings of data characters, said first dictionary means comprising a plurality of unique codes each for identifying a corresponding respective one of said first plurality of strings;

second dictionary means for storing a second plurality of strings of data characters, said second dictionary means comprising a plurality of unique codes each for identifying a corresponding respective one of said second plurality of strings;

first match means for receiving and parsing said input stream of data characters into parsed strings of data characters and for comparing each of said parsed strings of data characters with said first plurality of strings so as to locate the one of said first plurality of strings that matches a corresponding one of said parsed strings, wherein the most recently matched one of said parsed strings is identified as the current match;

first transmitting means for transmitting the one of said plurality of unique codes identifying the one of said first plurality of stored strings that matches said current match, for each current match;

second match means for receiving said one unique code transmitted by said first transmitting means for each current match, and for comparing each said one unique code with said plurality of unique codes identifying said second plurality of strings as to locate the one of said plurality of unique codes in said second dictionary means that matches said each one unique code received from said first transmitting means;

second transmitting means for transmitting the string of character data in said second plurality of strings identified by the one of said plurality of unique codes in said second dictionary means that matches said one unique code received from said first transmitting means, for each string of character data matched by said second match means;

first update means for adding N new strings of data characters to said first dictionary means for each current match, wherein N equals the number of characters in said current match, said N new strings comprising the last current match concentrated with each non-empty prefix of said current match, and including means for assigning one of said plurality of unique codes to each of said new strings; and

second update means for adding said N new strings of data characters to said second dictionary means for each current match, including means for assigning the same one of said plurality of unique codes to each of said N new strings added to aid second dictionary means that is assigned to corresponding respective ones of said N new strings added to said first dictionary means.
 Description Submit all comments and votes
 


The present invention pertains to systems for encoding and decoding electronic data, and more particularly to systems for providing on-line, loss less compression and decompression of character data.

Practical data compression methods have been known for several decades. For instance, an early method of compressing data was described by D. Huffman in the article "A Method for the Construction of Minimum-Redundancy Codes", published in Proceedings of the IRE, vol. 40, pages 1098-1101, 1952. Traditional off-line data compression methods typically involve (1) scanning the data to gather statistical information about the nature of the data and (2) compressing the data based upon the statistical information gained during the scanning step. In traditional on-line data compression methods, a single pass is made over the data and compression is effected based on information obtained during this single pass. If, in performing this single pass, compression is based only on some fixed infor