|
|
|
| United States Patent | 5479654 |
| Link to this page | http://www.wikipatents.com/5479654.html |
| Inventor(s) | Squibb; Mark (Kingston, NY) |
| Abstract | Invention maintains duplicate files in safe places. A SCAN computer program
creates a TOKEN Table of an earlier file. The TOKEN Table reflects the
indices of successive segments of the file and the exclusive-or (XR) and
Cyclic redundancy check (CRC) products of the characters in each segment.
An updated file is compared to the earlier file by comparing the XR and
CRC products of segments in the updated file to the XR and CRC products in
the TOKEN Table. On detecting matching products for identical segments,
the next segments are compared. On mismatch, the segment (window) for the
updated file is bumped one character and new XR and CRC products generated
and compared. The indices of the TOKEN Table and the offsets from the
start of the file of the first characters of the updated file matching
segments are set forth in a Match Table. Next the updated file is scrolled
through for the non-matching information determined by acting on the
indices and offsets of the Match Table to form the TRANSITION Table which
is the Match Table and the updated file non-matching information. The
TRANSITION Table contains the delta information which may be sent to
another location having a copy of the earlier file thereat: the whole
updated file need not be sent there. A reconstruction program at the
location looks at the TRANSITION Table to determine where to get the
characters for the copy of the updated file it is creating. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5479654 |
|
|
Apparatus and method for reconstructing a file from a difference
signature and an original file |
|
|
|
|
|
| Publication Date |
December 26, 1995 |
|
|
|
|
|
| Filing Date |
March 30, 1993 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This application is a continuation of application No. 07/515,164, filed
Apr. 26, 1990, now abandoned. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A method for producing a difference signature of differences between an
original file and an updated version of the original file, comprising
(1) creating a token table from an original file in a first storage device
by producing a token set for each equal sized contiguous segment of said
original file, each token set comprising a primary exclusive-or based
token and at least one order sensitive secondary token or cyclic
redundancy check product term; and
(2) generating a difference signature, using the token table and an updated
file, by:
(a) defining a window of consideration for the updated file, said window
being of a size equivalent to the segment size used to create the token
set for the original file and comprising successive characters in the
updated file;
(b) calculating a primary exclusive-or based token for the window of
consideration;
(c) searching the token table for a primary token which matches the primary
token for the window and advancing to step 2(g) if said matching primary
token is not found in the token table;
(d)
(i) generating a secondary token for said window in response to finding in
the token table a primary token which matches the primary token from the
window and comparing the secondary token to the secondary token in the
corresponding token set in the token table; and
(ii) advancing to step 2(e) if the secondary tokens match;
(e) logging the offset of the current window to the difference signature to
correlate the relative locations of the matching segment in the original
and updated files, in response to finding a match between the secondary
token from the window and the corresponding secondary token from the token
set;
(f) advancing the window of consideration by the segment size to the next
segment after the matched text, if there are any remaining segments in the
updated file, and resuming the method at step (2b) above;
(g) advancing the window of consideration by at least one character to
create the next window, which includes the characters of the previous
window and at least one character in the updated file following the
previous window minus the equivalent number of characters at the beginning
of the previous window, in response to a failed token search for the
previous window, which occurs where either said primary token for the
previous window of said updated file is not found in the token table for
said original file or where at least one matching primary token is found
in the token table but no matching secondary token corresponding to said
at least one matching primary token is found in the token table;
(h) generating a primary token for said next window of consideration by
adjusting the primary token from the previous window and
(j) repeating the cycle of steps (2b) through (2i) until the updated file
is exhausted.
2. A method for producing a difference signature of differences between an
original file and an updated version of the original file when, in
updating the original file, the majority of insertions and deletions of
characters in segments of the original file are known to change the
offsets of only those segments where said insertions and deletions have
been made but do not change the offsets of adjacent segments, comprising;
(1) creating a token table from an original file in a first storage device
by
producing a token set for each equal sized contiguous segment of said
original file, each token set comprising a primary exclusive-or based
token and at least one order sensitive secondary token or cyclic
redundancy check product term; and
(2) generating a difference signature, using the token table and an updated
file, by;
(a) defining a window of consideration for the updated file, said window
being of a size equivalent to the segment size used to create the token
set for the original file and comprising successive characters in the
updated file;
(b) calculating a primary token for the window of consideration;
(c) searching the token table for a primary token which matches the primary
token for the window and advancing to step 2(g) if said matching primary
token is not found in the token table;
(d)
(i) generating a secondary token for said window in response to finding in
the token table a primary token which matches the primary token for the
window and comparing the secondary token to the secondary token in the
corresponding token set in the token table; and
(ii) advancing to step 2(e) if the secondary tokens match;
(e) logging the offset of the current window to the difference signature to
correlate the relative locations of the matching segment in the original
and updated files, in response to finding a match between the secondary
token from the window and the corresponding secondary token from the token
set;
(f) advancing the window of consideration by the segment size to the next
segment after the matched text, if there are any remaining segments in the
updated file, and resuming the method at step (2b) above;
(g) advancing the window of consideration by the segment size to the next
segment to create the next window in response to a failed token search for
the previous window, which occurs where either said primary token for the
previous window of said updated file is not found in the token table for
said original file or where at least one matching primary token is found
in the token table but no matching secondary token corresponding to said
at least one matching primary token is found in the token table; and
(h) repeating the cycle of steps (2b) through (2g) until the updated file
is exhausted.
3. The method of claim 1 or 2 comprising constraining the search for the
primary token for the window to the token set representing a corresponding
offset from the updated file in the token table.
4. The method of claim 1 or 2 comprising limiting the search for the
primary token in the token table to fewer than all token sets in the token
table.
5. The method of claim 4 further comprising creating lower and upper bounds
of said token table and limiting the search for the primary token in the
token table to token sets within said lower and upper bounds, said bounds
being created by dividing the offset of a character of the window of
consideration by the segment size to produce an index value equal to an
index value in the token table, subtracting a first preselected value from
the index value to determine the lower bound and adding a second
preselected value to the index value to determine the upper bound.
6. The method of claim 1 further comprising generating the primary
exclusive-or token for each said token set and for the window of
consideration by dividing each segment into sets, generating an
exclusive-or product of each set, and concatenating the exclusive-or
products of at least one of said sets and at least another of said sets,
and wherein the primary token from the previous window is adjusted in step
2(h) by dividing the primary token into components corresponding to the
sets, adjusting each component by exclusive-oring those characters which
are leaving the previous window or entering said next window of
consideration with the corresponding components which comprise those
characters and subsequently recomposing the primary token for said next
window of consideration.
7. The method of claim 1 further comprising including characters from the
previous window not included in the next window in the difference
signature as part of a non-matching segment before generating said primary
token for said next window of consideration.
8. The method of claim 1 further comprising including characters from the
previous window not included in the next window in the difference
signature as part of a non-matching segment after completing step 2(j) of
claim 1.
9. The method of claim 2 comprising generating the primary exclusive-or
token for each said token set and for the window of consideration by
dividing each segment into sets, generating an exclusive-or product of
each set, and concatenating the exclusive-or products of at least one of
said sets and at least another of said sets.
10. The method of claim 2 further comprising causing the contents of the
entire window to be included in the difference signature before advancing
the window of consideration by the segment size in step 2(g).
11. The method of claim 2 further comprising causing the contents of the
entire window to be included in the difference signature after completing
step 2(h).
12. The method of claim 1 or 2 wherein step 2(d) further comprises (iii)
searching the token table for another matching primary token if the
secondary token for said window and the secondary token for the
corresponding token set in the token table do not match and the end of the
token table has not been reached, and advancing to step 2(g) if said
another matching primary token is not found in the token table;
(iv) comparing said secondary token for said window to the secondary token
in the corresponding token set in the token table in response to finding
in the token table said another matching primary token which matches the
primary token from said window; and
(v) returning to step 2(d)(ii)
13. A method for producing a duplicate copy of an updated filed from said
difference signature produced by the method of claim 1 or 2 and from
either an original file or a duplicate of said original file, further
comprising using the difference signature and the original file or
duplicate thereof to assemble a duplicate of the updated file by:
(a) using the original file as the source for matching segments;
(b) using the difference signature as the source for non-matching segments;
and
(c) assembling the matching and non-matching segments.
14. The method of claim 1 or 2 wherein said original file from which said
token table is created is stored in said first storage device, said second
storage device has a duplicate of said original file stored in a second
storage device, and said duplicate copy of said updated file is produced
in said second storage device, further comprising deleting or otherwise
modifying said original file in said first storage device at any time
after said token table is created by step 1 of claims 1 or 2 without
affecting the ability to produce said duplicate copy of said updated file
in said second storage device from said duplicate of said original file
and said difference signature.
15. The invention of claim 9 wherein the second set is the entire segment
to which said first set belongs.
16. A method for recording differences between first and second computer
data files in a memory media associated with a programmable data
processor, said files having a plurality of fixed length segments,
comprising the steps of:
(1) generating a token table in said memory media by
(a) reading a fixed length segment of said data file into said memory
media;
(b) generating a primary exclusive-or term for the segment;
(c) generating a secondary order sensitive term for the segment;
(d) concatenating said primary exclusive-or term and said secondary order
sensitive term into a token;
(e) recording said token in said memory media; and
(f) repeating steps (a)-(e) for each of said plurality of fixed length
segments until all of said segments in said data file have been read and
the token table contains one token for each segment in said data file; and
(2) recording differences between the first and second computer data files,
using the token table, by
(a) defining a window of consideration for said second data file starting
at the first character of said second data file, said window having the
same number of characters as each of said plurality of fixed length
segments of said first data file;
(b) generating a window exclusive-or term for the window of consideration
in the same manner that the primary exclusive-or term for each segment is
generated;
(c) searching the token table for a primary exclusive-or term matching the
window exclusive-or term;
(d) if a matching primary exclusive-or term is found in the token table,
(i) generating a window order sensitive term for the characters in the
window of consideration in the same manner that the secondary order
sensitive term for each segment is generated;
(ii) comparing said window order sensitive term with the secondary order
sensitive term which forms part of the token corresponding to the matching
primary exclusive-or term; and
(iii) when the window exclusive-or term and the order sensitive term match
the primary exclusive-or term and secondary order sensitive term of a
respective token in the token table, recording information identifying the
respective token and recording the offset of the window of consideration
and the number of characters in the window of consideration into a
difference signature in said memory media, and advancing the window of
consideration by the length of a segment to beyond the last character in
the current window of consideration and returning to step (2b);
(e) if no match for said window exclusive-or term is found or if no match
for the secondary order sensitive term for the window of consideration is
found after completion of step (2d), then,
(i) adjusting said window exclusive-or term to remove the exclusive-or
representation of the first character of the window of consideration in
said window exclusive-or term and to add the exclusive-or representation
of the next character beyond the last character in the current window of
consideration to the window exclusive-or term and advancing the window of
consideration forward by one character; and
(f) repeating steps (2c) through (2e) until the window of consideration
reaches the end of the second data file.
17. A method according to claim 16 wherein said secondary order sensitive
term comprises a cyclic redundancy product term.
18. The method of claim 16 further comprising, after step (2d)(iii) and
before step (2e) of claim 16, when the window order sensitive term for the
window of consideration does not match the corresponding secondary order
sensitive term for the respective token in the token table, resuming the
search of the token table at step (2c) of claim 16 until the token table
is exhausted.
19. A method according to claim 16 comprising
(1) generating said primary exclusive-or term by
(a) calculating the exclusive-or product of each character in the segment;
(b) dividing each of said plurality of fixed length segments into equal
length subsets;
(c) generating an exclusive-or product for at least one of said subsets of
a respective segment; and
(d) including the exclusive-or product for at least one of said subsets in
said token for said respective segment by concatenating the exclusive-or
product for at least one of said subsets with the primary exclusive-or
term to form said primary exclusive-or term;
(2) generating said window exclusive-or term by
(a) dividing the window into subwindows; and
(b) generating an exclusive-or product for at least one of said subwindows
of a respective window; and including the exclusive-or product for at
least one of said subwindows in the window exclusive-or term by
concatenating the exclusive-or product for at least one of said subwindows
with the window exclusive-or term to form a concatenated window
exclusive-or term; and
(3) said step of adjusting said window exclusive-or term comprises
exclusive-oring both the first character of the window of consideration
and the next character beyond the last character in the current window of
consideration with the exclusive-or product of the segment,
exclusive-oring the first character of the window of consideration with
all of the subsets included in the primary exclusive-or term which
comprise the first character and exclusive-oring the next character beyond
the last character in the current window of consideration with all of the
subsets included in the primary exclusive-or term which comprise the next
character.
20. The method of claim 16 further comprising, in step (e) of claim 80,
before adjusting said window exclusive-or term, recording the first
character of the window of consideration into said difference signature in
said memory media.
21. The method of claim 16 wherein said step of recording information
identifying the respective token in step (2d)(iii) of claim 16 comprises
recording an index of the segment in the token table which corresponds to
the respective token.
22. The method of claim 16 wherein said step of recording information
identifying the respective token in step (2d)(iii) of claim 16 comprises
recording the offset of the segment in the first data file which
corresponds to the respective token.
23. A method for quickly recording differences between first and second
data files having a plurality of fixed length segments in a memory media
associated with a programmable data processor, when the majority of
insertions in and deletions of characters in segments of the first file
are known to change the offsets of only those segments where said
insertions and deletion have been made but do not change the offsets of
adjacent segments, comprising the steps of:
(1) generating a token table in said memory media by
(a) reading a fixed length segment of said data file into said memory
media;
(b) generating a primary exclusive-or term for the segment;
(c) generating a secondary order sensitive term;
(d) concatenating said primary exclusive-or term and said secondary order
sensitive term into a token;
(e) recording said token in said memory media; and
(f) repeating steps (a)-(e) for each of said plurality of fixed length
segments until all of said segments in said data file have been read and
the token table contains one token for each segment in said data file; and
(2) recording differences between the first and second computer data files,
using the token table, by
(a) defining a window of consideration for said second data file starting
at the first character of said second file, said windows having the same
number of characters as each of said plurality of fixed length segments of
said first data file;
(b) generating a window exclusive-or term for the window of consideration
in the same manner that the primary exclusive-or term for each segment is
generated;
(c) selecting a token from the token table at an index in the token table
which is determined by dividing the offset of the window of consideration
by the number of characters in a segment to obtain an index value and
comparing said window exclusive-or term with the primary exclusive-or term
of the selected token in the token table corresponding to the index for
the segment containing said index value;
(d) if a matching primary exclusive-or term is found in the token table,
(i) generating a window order sensitive term for the characters in the
window of consideration in the same manner that the secondary order
sensitive term for each segment is generated;
(ii) comparing said window order sensitive term with the secondary order
sensitive term which forms part of the token corresponding to the matching
primary exclusive-or term; and
(iii) when the window exclusive-or term and the window order sensitive term
match the primary exclusive-or term and secondary order sensitive term of
a respective token in the token table, recording the index of the
respective token and the offset of the first character of the window of
consideration in a difference signature in said memory media;
(e) advancing the window of consideration by the length of a segment to
beyond the last character in the current window of consideration and
returning to step (2b); and
(f) repeating steps (2c) through (2e) until the window of consideration
reaches the end of the second data file.
24. The method of claim 23 wherein the secondary order sensitive term
comprises a cyclic redundancy check product term.
25. The method according to claim 85 further comprising, after step (2d)
and before step (2e) of claim 24, when the window exclusive-or term or the
secondary cyclic redundancy check product term for the window of
consideration does not match the corresponding primary exclusive-or term
or secondary cyclic redundancy check product term for the respective token
in the token table, recording all of the characters within the window of
consideration into said difference signature in said memory media.
26. A method according to claim 16 or 23 for using said difference
signature and said first data file to construct said second data file
comprising the steps of:
(a) reading a section of said difference signature;
(b) if said section indicates a match between segments of said first and
second data files, determining the corresponding offset in said first data
file from the token table, and read the character segment at said offset
into said second data file,
(c) if said section indicates non-matching characters, reading said
characters into said second data file from said difference signature; and
(d) repeat steps (a)-(c) until all sections of the difference signature are
read.
27. A method according to claim 16 or 23 to efficiently store multiple
versions of a file by storing a copy of said original data file and said
difference signature in a memory media.
28. A method according to claim 27 for backing up said second file wherein
said programmable data processor for updating said original file is in a
first memory device and said copy of said original file and said
difference signature is stored in a second memory device. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to techniques for representing file differences
useful in computer file protect systems and other systems, and more
particularly to file transfer techniques useful in an electronic data
backup system wherein only changes in a file are periodically sent to the
backup system and in other systems.
2. Discussion of Prior Information
It is well known to off-load computers at the end of a work day to secure
the data file against computer failure. It is also known to transmit the
file to an off-site location for additional file security.
What is now known is the generation of a set of representations of the
changes in a file, and the periodic relocation of that set of
representations and its use to update the previous version of the file.
SUMMARY OF THE INVENTION
Accordingly it is an object of the invention to generate a set of
representations of the changes made in a computer file during a period of
time.
Another object of the invention is to generate a set of representatives of
the changes made in a computer file which can be used to update an earlier
version of the file, or to create a previous version of an updated file.
Still another object of the invention is to generate and to use such a set
of representations in a cost and time effective manner.
The objects of the invention are achieved through computer programs
designed to run on a micro- and mini- computers. A first or SCAN program
is designed to create a TOKEN Table (or file signature) of mathematical
representations of segments of the file as it exists at the start of a
period (earlier file (EF)). The TOKEN Table reflects the indices (ordinal
numbers) for all of the segments in the earlier file, and the exclusive-or
(XR) and cyclic redundancy check (CRC) products of the set of characters
for each segment. Actually, two CRC products are generated for each
segment; a sixteen bit one and a thirty-two bit one. The three products,
XR and two CRC, are generated for speed in comparisons: the XR product is
first compared because it is the fastest comparison; then the slower
sixteen bit CRC one if necessary; and finally the still slower thirty-two
bit CRC if necessary.
A second program is used at the end of the period to create a MATCH Table
setting forth the location of segments in the current file that are
identical to those in the earlier file. The MATCH Table lists the indices
of all of the segments in the earlier file and the file offsets of the
first character of the corresponding identical segment in the updated
file. The second program calculates the mathematical representations of
the first segment (window) in the updated, revised or current file, first
calculating only the XR product and comparing it to the XR product for the
first earlier-file segment in the TOKEN Table and noting whether a match
exists. If so, it then calculates the sixteen bit CRC product and compares
it to the sixteen bit early file CRC product and notes whether a match
exists; if so, it finally calculates the more time consuming but more
reliable thirty-two bit CRC product and compares it to the thirty-two
early file CRC product and notes whether a match exists; and if so, makes
an index and offset entry in the MATCH Table for the identical segments;
the offset entry being the ordinal number of the first character in the
current file segment string of characters. (The earlier file segments are
numbered (indexed) sequentially.). If a segment match is obtained, the
second program calculates one or more mathematical representations for the
next segment in the current file, and compares them to the products
associated with the next index in the TOKEN Table and representing the
second segment of the earlier file. However, if a mismatch obtained, the
window (which retains segment size) is bumped along one character, new
product(s) calculated for the window characters and comparison(s) again
made with the same representations of the earlier file segments in the
TOKEN Table. This continues until a match obtains at which time the index
for the earlier file segment and the offset of the first character in the
matching current file window (segment) are recorded in the MATCH Table.
The process then continues as above to the end of the current file. Only
the XR product is calculated in the event of an XR product mismatch; the
sixteen bit and the thirty-two bit CRC products being generated
respectively only in the event of earlier matches of the XR and sixteen
bit CRC products.
A third program creates a TRANSITION Table that reflects what's in the
current file that's not in the earlier table, and where. It scrolls
through the list of indices and offsets in the MATCH Table, to see if each
offset number differs from the previous one by the segment size. When such
an offset differs from the previous one by more than the segment size, it
adds the segment size to the first offset to determine the file ordial
number of the first character in the nonmatching information, subtracts
one from the second offset to determine the last character, goes to the
current file and lifts therefrom that set of characters beginning with
that ordinal number and stopping with the character preceding the
extra-spaced offset, and adds them to the MATCH Table to create with the
index a TRANSITION Table.
Thus creation of the Transition Table involves assuring that every
character in current file is accounted for in the TRANSITION Table. The
MATCH Table provides all of the information necessary for this accounting.
Each entry in the beginning column represents a match in the early file of
segment characters to the current file characters at location beginning.
The matching segment in the early file is located at that offset, which is
equal to the index times the segment size in early file.
Essentially the same process is followed for a deletion. The second
program, if no match obtained for an earlier file segment by the end of
the updated file (or over a predetermined number of segments as
conditioned by the character of the file), would have proceeded to
endeavor to match the next index mathematical representations in the TOKEN
Table with a current file segment, with no offset entry having been made
in the MATCH Table for the index of the segment that was unmatched. On
proceeding with the index and representations of the next earlier-file
segment, the window of the current file would be bumped along, and the
index and offset number entered in the MATCH Table when the match of the
mathematical representations occurred. The third program on scrolling
through the MATCH Table offsets, notes the missing offset, notes the
preceding offset, adds the segment size to the previous offset and copies
from that number forward the reduced characters if any in the current file
before the next offset, into the TRANSITION Table and in association with
the index number of the unmatched segment.
The TRANSITION Table is used to update a copy of the earlier file.
Typically, a fourth program and the earlier version of the file are on an
off-site location and the TRANSITION Table representations are
electronically transmitted thereto. The fourth program will examine the
indexes and offsets of the TRANSITION Table, copying segments from the
earlier file where the succeeding offset just differs by the segment size,
into what is to be a duplicate version of the updated file, making
additions where the offset numbers differ from the preceding ones by more
than the segment size with the information provided in the TRANSITION
Table, and substitutions from the TRANSITION Table where the offset
numbers are missing.
As observed earlier, the TOKEN.sub.-- Table mathematical representations of
file segments may be the products of exclusive-oring of the characters in
successive earlier file segments and of generating two cyclic redundancy
check (CRC) products for each earlier file segment. Corresponding XR
products are most quickly generated, but do not detect character order
differentiating; a sixteen bit CRC will catch most of these
transpositions; a relatively slowly generated thirty-two bit CRC product
will detect essentially all of them.
As observed earlier the MATCH Table is generated by the second program
generating mathematical representations of the segment sized windows of
the current file, and comparing the representations of a window with an
index's associated mathematical representations in the TOKEN Table. As
long as matches obtain, successive window sized segments of the current
file are addressed and a MATCH Table listing reflecting the early file
segment index and the current segment first character offset is generated.
Normally three mathematical representations of each segment obtain--an
exclusive-or (XR) one and sixteen bit and thirty-two bit cyclic redundancy
check (CRC) ones. In the interests of speed, the XR products are compared
first, and if a mismatch occurs in them, it is clear that the segments are
unmatched. However, even if the XR products match, the segments may not
match because the XR operation is not sensitive to the transposition of
characters. Accordingly, it is also necessary on XR match, to compare the
sixteen bit CRC product. On sixteen bit CRC match, it is desirable to do a
thirty-two bit CRC match for most applications to achieve practically one
hundred percent certainty. The generation of the CRC product is a
relatively slow process and is avoided where possible as on XR mismatch.
However, the great benefit of avoiding CRC calculations occurs in
operations subsequent to segment mismatch.
As observed earlier, upon detection of a mismatch, a segment sized window
representing only a one character displacement of the window in the
current file is operated upon to determine its mathematical
representations and compare them with the representations of the just
compared TOKEN Table representations, then on mismatch upon successor
windows until a match obtains or the end of file is reached. By generating
first the quickly generated exclusive- or (XR) products, and only on match
generating the more slowly generated CRC products, a significant amount of
time can be saved.
Applicant has further discovered that even the exclusive-oring process can
be expedited on a one-character shift of the window under consideration.
Thus the next XR product need not involve the exclusive-oring of each of
the characters of the new window: rather only the exiting character and
the entering character need be exclusive-ored with the existing XR product
of the just tested segment. The second exclusive-oring of the exiting
character amounts to a subtraction of it from the segment product.
Another feature of the invention is that the amount of updating material
that must be transmitted to the off-site is minimal; normally being less
than five percent (5%) of the current file.
An advantage of the invention is that it provides an easy way to secure a
user's data from fire, theft and tampering.
Another advantage is that is provides an inexpensive disaster recovery
insurance.
A further advantage is that it eliminates the tedious chore of computer
backup, and allows the user's office time to be dedicated more fully to
the productivity and profitability of his or her business.
Yet another advantage of the invention is that programs embodying the
invention can be incorporated in larger programs for handling large model
files which are immune to character insertions and deletions and grow in
size to accommodate new records. Thus under certain circumstances, it is
possible to skip creation of MATCH and TRANSITION Tables by windowing
techniques.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other objects, features, and advantages of the invention will
become apparent from a reading of the following specification when
considered with the appended drawings wherein:
FIG. 1 is a diagram of a system according to the invention;
FIG. 2 is a representation of the contents of a user's earlier file;
FIG. 3 is a representation of the contents of the user's updated file;
FIG. 4 sets forth a TOKEN Table which consists of the indices and the
exclusive-or (XR) and cyclic redundancy check (CRC) products of successive
segments of the earlier program;
FIG. 5 sets forth a MATCH Table reflecting a comparison of the TOKEN Table
contents with the identical segments of the current program;
FIG. 6 sets forth a TRANSITION Table reflecting the differences in the two
files of FIGS. 2 and 3;
FIG. 7 is a flow chart setting forth the method of the first or TOKEN Table
generating program;
FIG. 8 is a flow chart setting forth the method of the second or MATCH
Table generating program;
FIG. 8A is a somewhat more detailed flow chart of the method shown in FIG.
8;
FIG. 9 is a flow chart setting forth the method of the third or TRANSITION
Table generating program;
FIG. 9A shows an obvious variation of the method shown in FIG. 8A;
FIG. 10 is a flow chart setting forth the method of the fourth or
reconstruction program;
FIG. 11 sets forth a MATCH Table having an alternate format to that of FIG.
5;
FIG. 12 sets forth a TRANSITION Table having an alternate (IBE) format; and
FIG. 13 sets forth a TRANSITION Table having another (IBC) format.
FIG. 14 is a flow chart of the program for creating the MATCH Table for
large model files.
FIG. 15 is a flow chart of the steps for generating the TRANSITION Table
for large model files without creating the intermediate MATCH Table.
DETAILED DESCRIPTION OF EMBODIMENT
The system concept of the invention is shown in FIG. 1. A user maintaining
a data file 10 (FIGS. 1 and 2) such as "This is a test file." in a memory
generally indicated by the number 12 of a computer 14, would at the start
of the workday, activate a first program 16 (FIG. 7) also in the computer
memory to partition the earlier file into five characters segments,
generate XR and CRC products for each segment, and list each segment by
its index (ordinal number) and its products in a TOKEN Table 18 (FIG. 1
and 4) in the memory 12 and that he might care to store for the workday on
a disk drive (not shown) to maximize available memory space. During the
day, the user would update, as by inserting the word "radical", the file
10 so that it reads "This is a radical test file." (FIG. 3), using a
conventional data base program 20 also in the memory. At the end of the
workday, the user would activate a second program 22 (FIG. 8), then
located in the memory 12, to create in the memory 12 a MATCH Table 24
(FIGS. 1 and 5) consisting of indices from the TOKEN Table and the offsets
of the first characters of segments (windows) of the updated file that
result from the matching of the exclusive-or (XR) and cyclic redundancy
check (CRC) products of updated file segments with the products associated
with the indices.
The third program 28 in the memory 12 (FIG. 9) works in conjunction with
the MATCH Table or "difference signature", to develop the TRANSITION Table
which succinctly defines what is or is not in the current file that was
not or was in the earlier file. It does this by scrolling through the
offsets in the MATCH Table. It looks at the offsets for successive
indices, checking to see if it differs from the previous offset by the
segment size. When it doesn't, it copies the current file material between
the end of the last segment and the start of the segment with the greater
than segment size offset, into the TRANSITION Table, there to be
associated with the index of the greater than segment size offset.
It results that the TRANSITION Table reflects the changes obtaining in the
current program over the earlier file.
The TRANSITION Table is then electronically sent, using conventional modems
and communication programs, to the off-site computer 30 over telephone
wire 32. Computer 30 has a memory generally indicated by the number 34
which receives the TRANSITION Table in section 36. The earlier file would
normally already be resident in section 38 of the memory 34, representing
the file as it was updated at the end of the previous day. The fourth
program (FIG. 10) creates a duplicate of the current file by inserting or
delecting information according to the dictates of the TRANSITION Table in
memory section 36 and the contents of the earlier file in memory section
38. As long as the offsets for successive indices differ by the segment
size, the program copies the segments for the indices from the earlier
file into the memory section 40. When an addition is indicated as at index
2 because the offset FIG. (18) is larger than the normal segment size (5)
over the previous offset FIG. (5), the fourth program looks for the
additional information (here "radical") in the related area of the
TRANSITION Table and inserts it in the duplicate file, after the tenth
character (number 9). The fourth program then continues reviewing the
TRANSITIO | | |