|
Claims  |
|
|
We claim:
1. A method of compressing digital data of bytes of K bits each received
from a data stream to form and to output a compressed data stream,
comprising the steps of:
(a) receiving the first byte of the data stream and storing the first byte
in a compressor window;
(b) receiving the next byte of the data stream, storing the byte in a
compressor window, and hashing the byte with the immediately preceding
byte to obtain a hash value;
(c) addressing a reference table with the hash value obtained in step (b)
to obtain any reference pointers to bytes within the compressor window
stored at that reference-table address and to store at that
reference-table address a new reference pointer to the current byte in the
compressor window;
(d) for each reference pointer, if any, obtained in step (c), comparing the
current byte with the byte in the compressor window pointed to by that
reference pointer;
(e) in the event no reference pointers were obtained in step (c) or the
current byte does not match the byte in the compressor window pointed to
by any of the reference pointers obtained in step (c), then outputting
within the compressed data stream a flag indicating an alphabet token and
a compressed representation of the byte preceding the current byte;
(f) in the event the current byte matches the byte in the compressor window
pointed to by any of the reference pointers addressed in step (c), then
(1) receiving one or more additional bytes from the data stream and, for
each, comparing the additional byte with the byte in the compressor window
immediately after each match to determine if any match is extended by the
additional byte, until the additional byte does not further extend any
match, and
(2) outputting, as part of the compressed data stream, a flag indicating
string-substitution compression and a representation of the location
within the compressor window where the match was found and a
representation of the number of bytes in the match;
(g) repeating steps (b) through (f) to receive each byte of the data stream
subsequent to the first byte.
2. The method of claim 1 further comprising the steps of calculating a
parity or checksum for the data stream prior to compressing the data
stream and appending the parity or checksum to the compressed data stream,
whereby the parity or checksum of the decompressed data stream may be
verified upon decompression of the compressed data stream.
3. The method of any of claims 1-2 wherein the maximum number of pointers
which may be stored at each reference-table address is two.
4. An adaptive, lossless, single-pass, general-purpose compression
apparatus for compressing an input stream of a finite number of K-bit
digital bytes, comprising:
window memory means of size M bytes to store the up-to-M most recent bytes
of said input stream;
hash means to generate hash values, each based on the H most-recent bytes
of said input stream;
reference table means, addressable by said hash values, to hold pointers
into said window memory means, each pointer pointing to a sequence of H
bytes within said window memory means that generates the hash value that
is the reference-table address of that pointer;
comparator means to verify matches between said H most-recent bytes and the
window byte sequences pointed to by any pointers having a reference-table
address given by the hash value of said H most-recent bytes,
longest-match determining means, operable in the case where one or more
said matches are verified, to identify which of said window byte sequences
results in the longest match between the current input-byte sequence and
that window byte sequence, and to determine the length of said longest
match;
copy-token encoding means, operable whenever said longest-match determining
means has identified said longest match and has determined its length,
said copy-token encoding means to output from said compression apparatus a
copy token comprising a copy-token identifier field, a displacement field
pointing to said longest match within said window memory means and a
copy-length field representing the length of said longest match;
alphabet-token encoding means, operable in the case where no matches are
verified by said comparator means, to output from said compression
apparatus an alphabet token comprising an alphabet-token identifier field
and a field of variable length representing the oldest byte within said H
most-recent bytes.
5. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 4 wherein said displacement field of said copy token is
of variable length.
6. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 4 wherein said length field of said copy token is of
variable length.
7. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 4 wherein said displacement field of said copy token is
of variable length and said length field of said copy token is of variable
length.
8. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein the maximum number of pointers
which may be stored at each said hash-value reference table address is
two.
9. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a prefix code.
10. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a unary-length code.
11. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a Huffman code.
12. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein said alphabet-token encoding means
remaps the alphabet to reduce the average length of said alphabet tokens
for a predetermined set of typical data.
13. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein said alphabet-token encoding means
remaps the alphabet to reduce the average length of said alphabet tokens
for a predetermined set of typical data by remapping the three
most-significant bits of each input byte.
14. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein said reference table means has 256
reference-table addresses.
15. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 4-7 wherein said reference table means has 256
reference-table addresses and K equals 8.
16. In an adaptive, lossless, single-pass, general-purpose compression
apparatus for compressing an input stream of a finite number of K-bit
digital bytes, an improvement comprising:
window memory means of size M bytes to store up-to-M most recent bytes of
said input stream;
hash means to generate hash values, each based on the H most-recent bytes
of said input stream;
reference table means, addressable by said hash values, to hold pointers
into the window memory means, each pointer pointing to a sequence of H
bytes within said window memory means that generates the hash value that
is the reference-table address of that pointer;
comparator means to verify matches between said H most-recent bytes and the
window byte sequences pointed to by any pointers having a reference-table
address given by the hash value of said H most-recent bytes,
longest-match determining means, operable in the case where one or more
said matches are verified, to identify which of said window byte sequences
results in the longest match between the current input-byte sequence and
that window byte sequence, and to determine the length of said matches;
copy-token encoding means, operable whenever said longest-match determining
means has identified said longest match, has determined its length, and
has determined that said longest match either is the only verified match
or is the match closest in input sequence to the most-recent byte of said
input stream, said copy-token encoding means to output from said
compression apparatus a copy token comprising a copy-token identifier
field, a displacement field pointing to said longest match within said
window memory means and a copy-length field representing the length of
said longest match;
differential copy-token encoding means, operable whenever said comparator
means verifies more than one match and said longest-match determining
means identifies said longest match as a match further in input sequence
from the most-recent input byte of said input stream than one or more
other verified matches, said differential copy-token encoding means to
output from said compression apparatus a copy token comprising a
copy-token identifier field, a displacement field pointing to said longest
match within said window memory means, and a copy-length field
representing the difference between the length of said longest match and
the length of the longest of said one or more other verified matches.
17. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 16 wherein said displacement field of said copy token
is of variable length.
18. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 16 wherein said length field of said copy token is of
variable length.
19. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 16 wherein said displacement field of said copy token
is of variable length and said length field of said copy token is of
variable length.
20. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein the maximum number of pointers
which may be stored at each said hash-value reference table address is
two.
21. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a prefix code.
22. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a unary-length code.
23. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein the field of variable length to
represent the oldest byte within said H most-recent bytes output by said
alphabet-token encoding means comprises a Huffman code.
24. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said alphabet-token encoding
means remaps the alphabet to reduce the average length of said alphabet
tokens for a predetermined set of input streams.
25. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said alphabet-token encoding
means remaps the alphabet to reduce the average length of said alphabet
tokens for a predetermined set of typical data by remapping the three
most-significant bits of each input byte.
26. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said reference table means has
256 reference-table addresses.
27. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said reference table means has
256 reference-table addresses and K equals 8.
28. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said copy token encoding means
and said differential copy token encoding means each encode copy-length
fields using the same code.
29. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said copy token encoding means
and said differential copy token encoding means each encode copy-length
fields using the same code and said code comprises a prefix code.
30. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said copy token encoding means
and said differential copy token encoding means each encode copy-length
fields using the same code and said code comprises a unary-length code.
31. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said copy token encoding means
and said differential copy token encoding means each encode copy-length
fields using the same code and said code comprises a Huffman code.
32. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 16-19 wherein said copy token encoding means
and said differential copy token encoding means each encode copy-length
fields using the same code and an offset is applied to the copy lengths
within either said copy token encoding means or said differential copy
token encoding means.
33. An adaptive, lossless, single-pass, general-purpose compression
apparatus for compressing an input stream of a finite number of K-bit
digital bytes, comprising:
window memory means of size M bytes to store up-to-M most recent bytes of
said input stream;
hash means to generate hash values each based on the H most-recent bytes of
said input stream;
reference table means, addressable by said hash values, to hold pointers
into said window memory means, each pointer pointing to a sequence of H
bytes within said window memory means that generates the hash value that
is the reference-table address of that pointer;
comparator means to verify matches between said H most-recent bytes and the
window byte sequences pointed to by any pointers having a reference table
address given by the hash value of said H most-recent bytes,
longest-match determining means, operable in the case where one or more
said matches are verified, to identify which of said window byte sequences
results in the longest match between the current input-byte sequence and
that window byte sequence and to determine the length of said matches;
copy-token encoding means, operable whenever said longest-match determining
means has identified said longest match, has determined its length, and
has determined that said longest match either is the only verified match
or is the match closest in input sequence to the most-recent byte of said
input stream, said copy-token encoding means to output from said
compression apparatus a copy token comprising a copy-token identifier
field, a displacement field pointing to said longest match within said
window memory means and a copy-length field representing the length of
said longest match;
differential copy-token encoding means, operable whenever said comparator
means verifies more than one match and said longest-match determining
means identifies said longest match as a match further in input sequence
from the most-recent input byte of said input stream than one or more
other verified matches, said differential copy-token encoding means to
outputs from said compression apparatus a copy token comprising a
copy-token identifier field, a displacement field pointing to the longest
match within said window, and a copy-length field representing the
difference between the length of said longest match and the length of the
longest of said one or more other verified matches; and
alphabet-token encoding means, operable in the case where no said matches
are verified by said comparator means, to output from said compression
apparatus an alphabet token comprising an alphabet-token identifier field
and a field of variable length representing the oldest byte within said H
most-recent bytes.
34. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 33 wherein the maximum number of pointers which may be
stored at each said hash-value reference table address is two.
35. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 33 wherein said window memory means is a circular
buffer.
36. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 33 wherein said copy-token encoding means outputs from
said compression apparatus a copy token comprising in part a displacement
field of variable length for pointing to the longest match, and said
differential copy-token encoding means outputs from said compression
apparatus a copy token comprising in part a displacement field of variable
length for pointing to the longest match.
37. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 33 wherein said reference table means has 256
reference-table addresses.
38. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 33 wherein said reference table means has 256
reference-table addresses and K equals 8.
39. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein H is 2 and said hash means
generates eight-bit hash values by exclusive ORing the first of the two
most-recent bytes received with the second of the two most-recent bytes
received after first flipping one of said two bytes end for end to reverse
the order of the bits therein.
40. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein H is 2 and said hash means
generates hash values comprising a number of bits between 9 and 15
inclusive by exclusive ORing the first of the two most-recent bytes
received with the second of the two most-recent bytes received after first
flipping one of the two bytes end for end to reverse the order of the bits
therein and next shifting either the resulting flipped byte or the
remaining un-flipped byte, said exclusive ORing performed on the bits
which align after said shifting.
41. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein said copy token encoding means
and said differential copy token encoding means each encode the
copy-length fields of their respective copy tokens using the same code.
42. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein said copy token encoding means
and said differential copy token encoding means each encode the
copy-length fields of their respective copy tokens using the same code,
said code being a prefix code.
43. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein said copy token encoding means
and said differential copy token encoding means each encode the
copy-length fields of their respective copy tokens using the same code,
said code being a unary-length code.
44. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 33-38 wherein said copy token encoding means
and said differential copy token encoding means each encode the
copy-length fields of their respective copy tokens using the same code,
said code being a Huffman code.
45. In an adaptive, lossless, single-pass, general-purpose compression
apparatus for compressing an input stream of a finite number of K-bit
digital bytes, an improvement comprising:
window memory means of size M bytes to store up-to-M most recent bytes of
said input stream;
hash means to generate hash values, each based on the H most-recent bytes
of said input stream;
reference table means, addressable by said hash values, to hold pointers
into the window memory means, each pointer pointing to a sequence of H
bytes within said window memory means that generates the hash value that
is the reference-table address of that pointer;
comparator means to verify one or more matches between said current
input-byte sequence and the window byte sequences pointed to by any
pointers having a reference table address given by the hash value of said
two most-recent bytes;
said hash means generating hash values having the characteristic that for
all possible values of any particular one of the H bytes, every possible
value of the remaining bytes generates a unique hash value;
said comparator means verifying matches between said H most-recent bytes
and the window byte sequences pointed to by the pointers having a
reference-table address given by the hash value of said H most-recent
bytes by comparing one less than H bytes.
46. The compression apparatus of claim 45 wherein H is 2 and said
particular one of the H bytes is the first in input sequence of said two
bytes.
47. The compression apparatus of claim 45 wherein H is 2 and said
particular one of the H bytes is the second in input sequence of said two
bytes.
48. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 2 wherein H is 2 and said hash means generates eight
bit hash values by exclusive ORing the first of the two most-recent bytes
with the second of the two most-recent bytes after first flipping one of
the two bytes end for end to reverse the order of the bits therein.
49. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 45-48 wherein said reference table means has
256 reference-table addresses.
50. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 45-48 wherein said reference table means has
256 reference-table addresses and K equals 8.
51. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 45-48 wherein H is 2 and said hash means
generates hash values comprising a number of bits between 9 and 15
inclusive by flipping one of the two most-recent bytes end for end to
reverse the order of the bits therein, then shifting either the flipped
byte or the un-flipped byte, and then exclusive ORing the bits of the
flipped byte with those of the unflipped byte that align after said
shifting.
52. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claims 45-48 wherein said window memory means is a
circular buffer.
53. In an adaptive, lossless, single-pass, general-purpose compression
apparatus for compressing an input stream of a finite number of K-bit
digital bytes, an improvement comprising:
window memory means of size M bytes to store up-to-M most recent bytes of
said input stream;
hash means to generate hash values, each based on the H most-recent bytes
of said input stream;
reference table means, addressable by said hash values, to hold pointers
into the window memory means, each pointer pointing to a sequence of H
bytes within said window memory means that generates the hash value that
is the reference-table address of that pointer, said reference table means
holding a predetermined maximum number of said pointers per reference
table address;
for each new byte of said input stream, said hash means generates a new
hash value to address said reference table and said reference table means
is updated to hold a new pointer into said window memory means, and if
said predetermined maximum number of pointers per reference-table address
is exceeded, then said new pointer overwrites a pointer previously stored
at that reference-table address on a first in, first out basis.
54. The adaptive, lossless, single-pass, general-purpose compression
apparatus of claim 53 wherein the maximum number of pointers which may be
stored at each said hash-value reference table address is two.
55. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claim 4, claim 16, or claim 33 wherein said reference
table means holds a predetermined maximum number of said pointers per
reference-table address and for each new byte of said input stream, said
hash means generates a new hash value to address said reference table and
said reference table means is updated to hold a new pointer into said
window memory means, and if said predetermined maximum number of pointers
per reference-table address is exceeded, then said new pointer overwrites
a pointer previously stored at that reference-table address on a first in,
first out basis.
56. The adaptive, lossless, single-pass, general-purpose compression
apparatus of any of claim 4, claim 16, claim 33, or claim 53, wherein H is
3 and said hash means generates hash values by splitting each of the three
most-recent bytes into a most significant nibble, of K/2 bits, and a least
significant nibble, of K/2 bits, flipping one of the two nibbles for each
of the three bytes end for end to reverse the order of the bits therein,
then exclusive ORing the reversed nibble of each byte with the unreversed
nibble of one of the other two bytes to form three new nibbles, and then
combining the three new nibbles to form the hash value.
57. A method of decompressing a compressed data stream of alphabet tokens
based on a prefix code and copy tokens into an output data stream of
uncompressed K-bit bytes, comprising:
parsing the compressed data stream into copy tokens and alphabet tokens;
decompressing each parsed token by:
(i) for each alphabet token, discarding the alphabet-token identifier
field, decoding the alphabet token using a prefix-code decoder to obtain
an output byte, storing the output byte at the current byte within an
output memory means, advancing the current byte, and outputting the output
byte to the output data stream;
(ii) for each copy token, discarding its copy-token identifier field,
decoding its displacement field and its copy-length field to obtain
copy-length and displacement values, generating a sequence of output bytes
responsive to the displacement value, the copy-length value, and the
contents of the output memory means, storing the sequence of output bytes
starting at the current byte within the output memory means, advancing the
current byte, and outputting the sequence of output bytes to the output
data stream.
58. A method of decompressing a compressed data stream of alphabet tokens
based on a unary-length code and copy tokens into an output data stream of
uncompressed K-bit bytes, comprising:
parsing the compressed data stream into copy tokens and alphabet tokens;
decompressing each parsed token by:
(i) for each alphabet token, decoding the alphabet token using a
unary-length decoder to obtain an output byte, storing the output byte at
the current byte within an output memory means, advancing the current
byte, and outputting the output byte to the output data stream;
(ii) for each copy token, decoding its displacement field and its
copy-length field to obtain copy-length and displacement values,
generating a sequence of output bytes responsive to the displacement
value, the copy-length value, and the contents of the window memory means,
storing the sequence of output bytes starting at the current byte within
the output memory means, advancing the current byte, and outputting the
sequence of output bytes to the output data stream.
59. A method of decompressing a compressed data stream of alphabet tokens
based on a Huffman code and copy tokens into an output data stream of
uncompressed K-bit bytes, comprising:
parsing the compressed data stream into copy tokens and alphabet tokens;
decompressing each parsed token by:
(i) for each alphabet token, decoding the alphabet token using a Huffman
decoder to obtain an output byte, storing the output byte at the current
byte within an output memory means, advancing the current byte, and
outputting the output byte to the output data stream;
(ii) for each copy token, decoding its displacement field and its
copy-length field to obtain copy-length and displacement values,
generating a sequence of output bytes responsive to the displacement
value, the copy-length value, and the contents of the window memory means,
storing the sequence of output bytes starting at the current byte within
the output memory means, advancing the current byte, and outputting the
sequence of output bytes to the output data stream.
60. The method of any of claims 57-59 wherein said alphabet-token decoding
further comprises remapping the decoded alphabet token to recover the
K-bit byte.
61. The method of any of claims 57-59 wherein said alphabet-token decoding
further comprises remapping the decoded alphabet token by remapping the
three most significant bits of the byte.
62. A method of decompressing a compressed data stream of alphabet tokens,
copy tokens, and differential copy tokens into an output data stream of
uncompressed K-bit bytes, comprising:
parsing said compressed data stream into copy tokens and alphabet tokens;
decompressing each parsed token by:
(i) for each alphabet token, decoding said alphabet token to obtain an
output byte, storing said output byte at a current byte within an output
memory means, advancing said current byte, outputting said output byte to
said output data stream, hashing the H most-recent bytes of said output
stream to generate a hash value, and storing in a reference table means at
an address responsive to said hash value a pointer to said H most-recent
output bytes within said output memory means;
(ii) for each copy token, decoding its displacement field and its
copy-length field to obtain copy-length and displacement values;
generating a sequence of output bytes in response to said displacement
value, to said copy-length value, to the contents of said output memory
means, and to the contents of said reference table means; storing said
output-byte sequence starting at said current byte within said output
memory means; advancing said current byte; and, for each byte of said
output-byte sequence, outputting that byte to said output stream and
hashing the H most-recent bytes of said output stream to generate a hash
value and storing in said reference table means, at an address responsive
to said hash value, a pointer to said H most-recent output bytes within
said output memory means;
said output-byte-sequence generating comprising:
(a) locating in said output memory means in response to said displacement
value a target sequence of H bytes;
(b) hashing said target sequence to obtain a hash value;
(c) addressing said reference table means using said hash value to obtain
any pointers stored at that reference-table address pointing to any
sequences of H bytes within said output memory means that are closer to
said current byte than is said target sequence;
(d) for each pointer obtained in step (c), comparing the sequence of H
bytes pointed to by that pointer with said target sequence;
(e) for each sequence of H bytes found to be identical in step (d),
comparing the bytes adjacent to said target sequence with the bytes
adjacent to that identical sequence to determine the length of the match
and to determine the length of the longest such match;
(f) if said longest match length is determined in step (e), then modifying
said copy-length value responsive to said length of said longest such
match;
(g) generating a sequence of output bytes that includes said target
sequence and that is responsive to said displacement value, said
copy-length value, and the contents of said output memory means.
63. The method of claim 62 wherein said comparing in step (d) is performed
by comparing one less than H bytes.
64. An adaptive, lossless, single-pass, general-purpose compression method
for compressing an input stream of a finite number of | | |