WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Computerized index file interrogation and comparison    
United States Patent5659730   
Link to this pagehttp://www.wikipatents.com/5659730.html
Inventor(s)Kelley; Edward Emile (Wappingers Falls, NY); Paleveda; John Frederick (Pleasant Valley, NY)
AbstractA method of comparing files takes three passes through data tables in memory to generate tables containing pointers to matches and mismatches by employing a method of keyword-index translation in which a keyword is taken from a first data table in a first pass and used as the index in loading an index table containing a pointer to the record containing that keyword. In a symmetric operation, a keyword is fetched from one table and used to interrogate the corresponding index table in a second pass. If there is a match, the record in the index table contains the pointer; and if there is a mismatch, the record contains a null. In an optional third pass, keywords are fetched from the other table and used to find records in the other table that are mismatches.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5659730
Computerized index file interrogation and comparison - US Patent 5659730 Drawing
Computerized index file interrogation and comparison
Inventor     Kelley; Edward Emile (Wappingers Falls, NY); Paleveda; John Frederick (Pleasant Valley, NY)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     August 19, 1997
Application Number     08/729,710
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 7, 1996
US Classification     707/3 707/7
Int'l Classification     G06F 017/30
Examiner     Black; Thomas G.
Assistant Examiner     Lintz; Paul R.
Attorney/Law Firm    
Address
Parent Case     CROSS-REFERENCE This patent application is a Divisional patent application of U.S. patent application Ser. No. 08/538,377, filed on Oct. 3, 1995, now U.S. Pat. No. 5,604,901, which was a Continuation of U.S. patent application Ser. No. 07/867,950, filed on Apr. 13, 1992, now abandoned.
Priority Data    
USPTO Field of Search     395/603 395/607
Patent Tags     computerized index file interrogation comparison
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5377348
Lau
707/3
Dec,1994

[0 after 0 votes]
5369762
Wolf
707/7
Nov,1994

[0 after 0 votes]
5341498
Connor
707/104.1
Aug,1994

[0 after 0 votes]
5274802
Altine
707/202
Dec,1993

[0 after 0 votes]
5261091
Yuyama
707/4
Nov,1993

[0 after 0 votes]
5261087
Mukaino
707/3
Nov,1993

[0 after 0 votes]
5237678
Kuechler
707/5
Aug,1993

[0 after 0 votes]
5218696
Baird

Jun,1993

[0 after 0 votes]
5129082
Tirfing
707/3
Jul,1992

[0 after 0 votes]
5072367
Clayton

Dec,1991

[0 after 0 votes]
4912637
Sheedy
707/203
Mar,1990

[0 after 0 votes]
4807182
Queen
715/511
Feb,1989

[0 after 0 votes]
4641274
Swank
715/531
Feb,1987

[0 after 0 votes]
4524427
Vidalin
707/6
Jun,1985

[0 after 0 votes]
4469139
Seiler
139/65
Sep,1984

[0 after 0 votes]
4443860
Vidalin
707/1
Apr,1984

[0 after 0 votes]
4408273
Plow
707/202
Oct,1983

[0 after 0 votes]
4053871
Vidalin
340/146.2
Oct,1977

[0 after 0 votes]
3670310
Bharwani
707/3
Jun,1972

[0 after 0 votes]
3609703
Peacock
379/362
Sep,1971

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. An article of manufacture comprising:

a computer usable medium having computer readable program code means embodied therein for comparing in a data processing system a first file of data and a second file of data located in non-volatile storage media, each of said first and second files comprising a set of records, each record including a key, the computer readable program code means in said article of manufacture comprising:

computer readable program code means for causing a computer to effect initializing and loading a set of data tables in memory including first and second key-index tables, a match table for storing data representative of records in said first file having a counterpart record in said second file, a first mismatch table for storing data representative of records in said first file lacking a counterpart record in said second file and a second mismatch table for storing data representative of records in said second file lacking a counterpart record in said first file;

computer readable program code means for causing a computer to effect sequentially fetching a second-file keyword from a record in said second file, interrogating a corresponding record in said first key-index table having an index element equal to said second-file keyword, and adding a record containing identifying data to one of said match table when said corresponding record has a non-null entry and adding a record containing identifying data to said second mismatch table when said corresponding record has a null entry; and

computer readable program code means for causing a computer to effect sequentially fetching a first-file keyword from a record in said first file, interrogating a corresponding record in said second key-index table having an index element equal to said first-file keyword, and adding a record containing identifying data to said first mismatch table when said corresponding record has a null entry.

2. An article according to claim 1, further having computer readable program code means for causing a computer to effect reading said first and second files of data from nonvolatile storage into corresponding first and second data tables in memory, each record of said first and second data tables containing at least said key of a corresponding file record;

said computer readable program code means for causing a computer to effect loading said first key-index table includes means for reading said key for said first data table in a first pass through one of said set of data tables in memory;

said computer readable program code means for causing a computer to effect sequentially fetching a second-file keyword and interrogating a corresponding record in said first key-index file includes means for reading through said second data table in a second pass through one of said set of data tables in memory; and

said computer readable program code means for causing a computer to effect sequentially fetching a first-file keyword and interrogating a corresponding record in said second key-index file includes means for reading through said first data table in a third pass through one of said set of data tables in memory.

3. An article according to claim 2, in which said computer readable program code means for causing a computer to effect interrogating said first key-index file and said second key-index file include computer readable program code means for causing a computer to effect interrogating a table with an alphanumeric index.

4. An article according to claim 2, including computer readable program code means for causing a computer to effect forming at least one of said first-file keyword and second-file keyword from at least two non-contiguous characters selected from records in said first file or said second file.

5. An article according to claim 1, in which said computer readable program code means for causing a computer to effect interrogating said first key-index file and said second key-index file include computer readable program code means for causing a computer to effect interrogating a table with an alphanumeric index.

6. An article according to claim 1, including computer readable program code means for causing a computer to effect forming at least one of said first-file keyword and said second-file keyword from at least two non-contiguous characters selected from records in said first file or said second file.

7. An article of manufacture comprising:

a computer usable medium having computer readable program code means embodied therein for comparing in a data processing system a first and a second file of data, each of said first and second files comprising a set of records, each record including a key, comprising:

computer readable program code means for causing a computer to effect initializing and loading a set of data tables in memory in a first data pass including at least computer readable program code means for causing a computer to effect loading a first-key index table with a first key-index record corresponding to each record in said first file, said first key-index record having an interrogation index related in a one-to-one correspondence to a first-file key of said first file;

computer readable program code means for causing a computer to effect sequentially fetching, in a second data pass, a second-file key from a record in said second file;

computer readable program code means for causing a computer to effect interrogating a corresponding record in said first key-index table having an interrogation index element related to said second-file key;

computer readable program code means for causing a computer to effect storing data representative of the relationship between said record of said first file and said record of said second file;

computer readable program code means for causing a computer to effect initializing and loading, in a third data pass, a set of at least one data table including at least computer readable program code means for causing a computer to effect loading a second key-index table with a second key-index record corresponding to each record in said second file, said second key-index record having an interrogation index related in a one-to-one correspondence to a second-file key of said second file;

computer readable program code means for causing a computer to effect sequentially fetching, in a fourth data pass, a first-file key from a record in said first file;

computer readable program code means for causing a computer to effect interrogating a corresponding record in said second key-index table having an interrogation index element related to said first-file key; and

computer readable program code means for causing a computer to effect storing data representative of the relationship between said record of said first file and said record of said second file.

8. An article according to claim 7, further having computer readable program code means for causing a computer to effect reading in said second data pass said second file from non-volatile storage to fetch said second-file key; and

computer readable program code means for causing a computer to effect reading in said fourth data pass said first file from non-volatile storage to fetch said first-file key, whereby said first and second files are not loaded into said memory.

9. An article according to claim 8, further having computer readable program code means for causing a computer to effect reading in said first data pass said first file from non-volatile storage to load said first key-index table; and

computer readable program code means for causing a computer to effect reading in said third data pass said second file from non-volatile storage to load said second key-index table, whereby said first and second files are not loaded into said memory.

10. An article according to any of claims 7-9, further having computer readable program code means for causing a computer to effect processing alphanumeric interrogation indices, first-file keys, and second-file keys.

11. An article of manufacture comprising:

a computer usable medium having computer readable program code means embodied therein for causing a computer to effect comparing in a data processing system a first and a second file of data, each of said first and second files comprising a set of records, each record including a key, comprising:

computer readable program code means for causing a computer to effect reading in a first data pass said first file from non-volatile storage and loading a set of at least one data table in memory including at least a first key-index table with a first key-index record corresponding to each record in said first file, said first key-index record having an interrogation index related in a one-to-one correspondence to a first-file key of said first file;

computer readable program code means for causing a computer to effect reading in a second data pass said second file from non-volatile storage and loading a set of at least one data table in memory including at least a second key-index table with a second key-index record corresponding to each record in said second file, said second key-index record having an interrogation index related in a one-to-one correspondence to a second-file key of said second file;

computer readable program code means for causing a computer to effect sequentially fetching, in a third data pass, from non-volatile storage a first-file key from a record in said first file;

computer readable program code means for causing a computer to effect interrogating a corresponding record in said second key-index table having an interrogation index element related to said first-file key; and

computer readable program code means for causing a computer to effect storing data representative of the relationship between said record of said first file and said record of said second file;

computer readable program code means for causing a computer to effect sequentially fetching, in a fourth data pass r from non-volatile storage a second-file key from a record in said second file;

computer readable program code means for causing a computer to effect interrogating a corresponding record in said first key-index table having an interrogation index element related to said second-file key; and

computer readable program code means for causing a computer to effect storing data representative of the relationship between said record of said first file and said record of said second file.

12. An article according to claim 11, further having computer readable program code means for causing a computer to execute said second data pass by reading said second file from non-volatile storage to fetch said second-file key; and

computer readable program code means for causing a computer to execute said fourth data pass by reading first file from non-volatile storage to fetch said first-file key, whereby said first and second files are not loaded into said memory.

13. An article according to claim 12, further having computer readable program code means for causing a computer to execute said first data pass by reading said first file from non-volatile storage to load said first key-index table; and

computer readable program code means for causing a computer to execute said third data pass by reading said second file from non-volatile storage to load said second key-index table, whereby said first and second files are not loaded into said memory.

14. An article according to any of claims 11-13, further having computer readable program code means for causing a computer to effect processing alphanumeric interrogation indices, first-file keys, and second-file keys.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

A field of the invention is that of data processing in general purpose computers, in particular the field of comparing data in different files to find matches and mismatches.

BACKGROUND ART

Many techniques have been developed in the art in order to compare files of data that are nominally the same. A brute force technique requires that each record in a first file be compared with every record in a corresponding second file. The time required to make this comparison is on the order of N.sup.2 where N is the number of records in the files.

One technique known in the art is that illustrated in the IBM Technical Disclosure Bulletin 06-77, pp 387-388, which uses the technique of multiple pointers to determine the minimum difference between files. Next points are compared with current points to decide Insert, Delete, or Replace action.

SUMMARY OF THE INVENTION

The invention relates to a method of comparing data files that requires only two passes through one file and one pass through the other file. In a first pass a key word from a source record in the first file is used to define the index element of a record in a corresponding key-index file, the record element of which is a pointer to the source record.

In the second pass of the operation, a key word is sequentially read from the second file; a corresponding second key-index file is defined; an index translation or substitution process is performed in which the keyword from the second file is used to interrogate the key-index file derived from the first data file; and the record element labeled by that keyword index is tested. If the record is null, then a mismatch has been identified; i.e. there is no record in the first file corresponding to the record being processed in the second file. If the contents are not null, then there has been a match. In a third pass, the index lookup is repeated by reading keywords from the second file and interrogating a key-index file based on the first file.

An advantage of the method is that it employs the fast built-in routines that are provided with the compiler or interpreter for the high level language being used and/or by the operating system of the general purpose computer being used.

An advantageous feature of the invention is that the range of the index need not be defined in advance.

A feature of the invention is that it takes advantage of a feature provided in some high level languages that permits the use of an alphanumeric index to a table or vector. This permits the use of alphanumeric keywords as the index in the key-index table and broadens the class of files that can be compared with this technique.

Another feature of the invention is that the tables need not be sorted.

Another feature of the invention is that a large key may be used.

Another feature of the invention is that the key may be selected from free form text.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 illustrates an overall flow chart of a method according to the invention.

FIG. 1A is a general purpose computer on which the invention is carried out.

FIGS. 2A-AC illustrate details of the flow chart of FIG. 1.

FIGS. 3 and 4 illustrate an alternative version of the portion illustrated in FIG. 2.

FIGS. 5 and 6 illustrate another alternative method of practicing the invention.

BEST MODE OF CARRYING OUT THE INVENTION

In the field of data processing, it is important for efficiency to reuse routines, especially fast routines that have been optimized. One such routine is the utility routine that accepts as input the index value of some ith element in an array, N(i), N.sub.i, N.sup.i, say, and returns the contents of the array element pointed to by that index value. The term index will be used here to refer to a subscript, superscript, or other pointer to one element in an ordered set of elements. The terms index element or index value will be used to refer to a particular element of the set of indices and to the value of a particular element. When the expression Test=N(4) is used in a program, it means that a utility routine takes the value of the 4th element of vector N and places it in the memory location reserved for test. This is a convenient way to program a lookup function that will be executed more quickly than a corresponding routine written in a high level language.

In the invention disclosed here, the utility routine written for the language REXX, an interpreted or compiled language available from IBM for use in general purpose computers, is used to advantage in a file comparison routine. As will be disclosed below, a value "test" is fetched from a corresponding file using a built in lookup routine that operates on a keyword that uniquely specifies a record; i.e. test=N(Keyword), fetching a value for test from that element of array N pointed to by the value of the symbol "Keyword".

In operation, the method is carried out by a comparison that involves four tables in the computer memory. The first two tables, [TABL1(i) and TABL2(i)] contain the data being compared. They are formed by reading in at least the keyword of the raw first and second data files being compared. TABL1 may illustratively be a conventional table for which the index element is the number of the record in a set of records that are sequentially stored in memory. The corresponding "key-index" files (KEY1(key) and KEY2(key)) have one record for each record in the data file that they correspond to [KEY1 has one record for every record in TABL1, etc.]. The content of that record in the key-index file is the record number of the corresponding table record. The index "key" to the record in question is the keyword of the corresponding table record. For example, if the 27th record in TABL1 has the keyword SAM, the corresponding record in KEY1 has the value 27 and is pointed to by the index SAM.

There is symmetry between KEY1 and TABL1: the index element in TABL1 is the record element in KEY1 and the subset of the record element that is the keyword in TABL1 is the index element in KEY1. There need not be symmetry in record location. In the preceding example, there is no need for the record in KEY1 that has the index SAM to be the 27th in a sequence of record elements. As will be discussed below, the correlation between index element and memory location may be carried out in more than one way.

Referring now to FIG. 1, there is shown an overall flow chart illustrating the practice of the invention. FIG. 1A shows a schematic representation of a general purpose computer system for operating software to perform the method. In the system, CPU 1 passes data to and from tape drive 3 and/or disk drive 4, or non-volatile memory 5, such as ROM or EPROM (collectively called non-volatile storage) and into and out of memory 2.

The first block, labeled 10, represents an initialization step in which the first and second files to be compared are read from mechanical storage, such as a tape or disk, into first and second file tables (TABL1 AND TABL2) in the computer memory. Both files need not have the same structure. There may be a corresponding keyword that uniquely identifies a particular record, but it need not be located in the same location in both files; e.g. the key may be a surname, street and zip code, which may be located in different portions of the records in the two files and pulled out in the process of defining the keyword.

In the block denoted by the numeral 100, the data in the first file table is looped through in the first pass. The keyword of the nth file record in the first file table is assigned as the index of an nth element of a key-index table or list, KEY1, in which nth element the value n is placed as the data, or the record element.

The second pass through the data is through the second file table, in the course of which, the keyword of the nth record from the second file is used as the index value in the first key-index table. If the record element in the first key-index table pointed to by that index value is null, it indicates that there is no record in the first table corresponding to the current record from the second table. In that case, a mismatch table (TMIS2) is loaded with relevant data. If there is a non-null entry, then there is a match and a similar table (MATCH) is loaded, typically with the record number of the matching records.

The third pass is through the first table again, looking for mismatches in which a first-table entry has no counterpart in the second table. The same process is followed using a second keyword table derived from TABL2. At the end of the third pass, there are three tables, the match table and the first and second mismatch tables, that contain the data on the relation between the files. A final step uses the data generated in the preceding steps as is appropriate for the task at hand. If the purpose is to identify new records that have been added in the two files, then the records added in the second mismatch table can be added to the first file (or vice versa). If the purpose is identification of errors, then the records in the mismatch files will be checked for typographical errors, etc. before a final file is generated. If the purpose is to find common records in the two files, only the MATCH table will be of interest.

Referring now to FIGS. 2A-2C, the steps of FIG. 1 are shown in more detail. The block of steps denoted with the numeral 10 in FIG. 2A is an initializing step in which the files are read from external storage into tables in main memory. It is not necessary to load the entire file into memory if the comparison can be made by a smaller keyword that is a subset of a record. It is not necessary that the keyword be a defined field in a record. Keywords could be made on the fly by defining certain characters or fields of a record as a keyword and reading only the necessary data into memory.

In the loop denoted by the numeral 100, the data in the first file table is looped through. The keyword of the nth file record in the first file table (read in step 112) is assigned as the index of an element of KEY1 (step 114), in which element the value n is placed as the data, or record element (step 116). The method of assigning that number to a memory location does not matter for the practice of the invention and those skilled in the art will readily be able to devise satisfactory methods. For example, in an interpreted language, KEY1 could be treated as a group of variables, and a memory location could be assigned to each element of KEY1 as it is encountered in the course of program execution, without regard for any relationship between the members of the group. For example, KEY1(SAM) need not be between KEY1(SAK) and KEY1(SAP) and the locations may be assigned by whatever memory allocation algorithm is used for variables that are not part of a "table". In particular, there is no need for the elements of KEY1 to be located contiguously in memory, as was the case for vectors in FORTRAN and other languages of that generation. On the other hand, if the designers of a compiler or interpreter prefer, a memory block may be set aside and elements of KEY1 placed there in sequence according to the sequence in which they appear in the course of running the program, with a lookup table to take care of the link between SAM and the correct memory location. The invention may be practiced with any language that meets the requirements of returning a data element in response to a pointer or index element.

Referring now to FIG. 2B, blocks 200 and 250 show the second pass through the data, in which each record of the second file table is tested against the first file table. For convenience in terminology, the term "repetitively fetching" will be used to refer to this process. The records of the second file are stepped through in any convenient order and the keyword of the each record in the second file is fetched from TABL2 and used to fetch the contents of a corresponding record in the first key-index file; e.g. KEY1(keywordn) is the data stored in the element of KEY1 that has the index value given by the nth keyword (keywordn) in the second file table. The contents of that element of the first key-index file table are tested to determine if there is a match or a mismatch between the two files. In the setup procedure, the KEY1 table is first loaded with a flag such as null. In an earlier step in FIG. 2A, the record number of a record (i, say) in the first file was loaded into the element of KEY1 that has as index element value the keyword of the ith record in the first file table. In the case of a match between the ith record of TABL1 and the nth element of TABL2, the value of KEY1(Keyindexn) will be i. If there is a mismatch, then a query as to the value of KEY1(keyindexn) will return the value null that was loaded in the initialization step. An IF statement may be used to query whether the value of the element in the key-index table being queried is equal to the flag.

For example, if the keyword of a personnel record is the Social Security number of an employee and the record having keyword (123-45-6789) is the 7th record of TABL1 and the 9th record in TABL2, then KEY1(Keyindexn)=KEY1(Keyindex9) since the current record is the 9th record of TABL2. Plugging in the value of the 9th index, we interrogate KEY1 and find that tho contents of KEY1(123-45-6789)=7. Thus, we know that there is a match between the 7th record of TAXBL1 and the 9th record of TABL2. If there has been a mismatch in that there is no corresponding element in the first file that has the keyword (123-45-6789), then the contents are equal to the null flag that was loaded into the KEY1 array in the initialization procedure.

Effectively, there has been a translation or substitution in which the keyword (which is part of the data) from the first file is translated to give an index value for an element in a corresponding table of data taken from a second file. The preceding statement may be paraphrased as that there has been a role shift--a portion of the data in one file or table is used as the pointer or index element in another table.

Referring again to FIG. 2B, the contents of the key-index record are tested in block 220 and which of the two alternative steps following the test on the key-index table is used depends on whether there has been a match or not. If there has been a match, then the steps in block 230 are followed; a match counter is increased by 1 (to k, say) and the kth record in a match table is loaded with the record numbers (in the file table or in the tape or disk file) of the matched data (n and i), as pointers to the matching records. If the element was a null, indicating a mismatch, then a corresponding second mismatch counter (indicating a record in the second file that did not have a matching record in the first file) is incremented and a corresponding second mismatch table is loaded with the number of the record in the second file that did not have a corresponding record in the first file, as a pointer to the non-matching record. Since the relevant data is at hand, this is a convenient time to load KEY2, the second key-index file, in block 250. The computer then loops back to the start of block 200 and continues until it reaches the end of the first second table.

The third pass is taken through the first file table to look for mismatches i.e., elements of the first file that do not have corresponding data in the second file. Since the matches have been identified on the second pass, it is not necessary to look at them a second time. Similar steps are performed in block 300 of FIG. 2C to those of block 200, with corresponding steps being indicated by corresponding numerals. At the end of this pass, three new tables have been generated; a match table MATCH and two mismatch tables TMIS1 and TMIS2. These tables are then stored for further processing, typically an examination of the mismatched data and a merge of the two files into a final corrected file. Data stored contain a representation of the relationship between a record in the first and second files. A match is represented by the number in the MATCH file and a mismatch is represented by a number in the mismatch table TMIS1 or TMIS2.

It has been assumed that there is enough memory to hold the first and second data tables, TABL1 and TABL2, the first and second key-index tables, KEY1 and KEY2, and the three match tables. Referring now to FIGS. 3 and 4, there is illustrated an alternative version of the invention. In this alternative, it is assumed that the limitation is computer memory, and the two data tables TABL1 and TABL2 are not used. The two key-index tables are formed in memory, but the tape or disk will have to be read again in order to make the comparison.

Referring now to FIG. 3, a corresponding section 100' corresponding to section 100 of FIG. 2A is illustrated. Corresponding substeps are indicated by the same numeral as that of FIG. 2A and are not described further. After KEY1 has been loaded, the data from the second file is read in sequentially (205') in a second pass. The keyword is retrieved; the same keyword to index translation is performed; the Match counter or the First mismatch counter is incremented; and the appropriate data are stored in MATCH and TMIS2. At the end of the second pass, the method proceeds to the steps illustrated in FIG. 4, in which a third pass is perfo