WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Data processing system and method for selecting customized character recognition processes and coded data repair processes for scanned images of document forms    
United States Patent5305396   
Link to this pagehttp://www.wikipatents.com/5305396.html
Inventor(s)Betts; Timothy S. (Germantown, MD); Carras; Valerie M. (Kensington, MD); Knecht; Lewis B. (Olney, MD)
AbstractA data processing method, system and computer program repairs character recognition errors for digital images of document forms. A document form processing template is provided which specifies the identity and preferred sequence for selected, customized character recognition processes and selected, customized coded data error correction processes which are reasonably likely to be needed to automatically process a selected batch of document forms whose fields have certain, anticipated, uniform characteristics.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Betts; Timothy S. (Germantown, MD); Carras; Valerie M. (Kensington, MD); Knecht; Lewis B. (Olney, MD)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Publication Date     April 19, 1994
Application Number     07/870,507
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 17, 1992
US Classification     382/175 382/310
Int'l Classification     G06K 009/62
Examiner     Mancuso; Joseph
Assistant Examiner    
Attorney/Law Firm     Hoel; John E.
Address
Parent Case    
Priority Data    
USPTO Field of Search     382/36 382/38 382/40 382/61 382/9
Patent Tags     data processing selecting customized character recognition coded data repair scanned images document forms
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5119437
Kuwamura
382/175
Jun,1992

[0 after 0 votes]
5050218
Ikeda
382/101
Sep,1991

[0 after 0 votes]
5025484
Yamanari
382/311
Jun,1991

[0 after 0 votes]
5010580
Vincent
382/163
Apr,1991

[0 after 0 votes]
4949392
Barski
382/283
Aug,1990

[0 after 0 votes]
4933979
Suzuki
382/173
Jun,1990

[0 after 0 votes]
4933984
Nakano
382/175
Jun,1990

[0 after 0 votes]
4876731
Loris
382/229
Oct,1989

[0 after 0 votes]
4813077
Woods
382/138
Mar,1989

[0 after 0 votes]
4748678
Takeda
382/306
May,1988

[0 after 0 votes]
4741045
Denning
382/178
Apr,1988

[0 after 0 votes]
4616854
Landrum
283/74
Oct,1986

[0 after 0 votes]
4589142
Bednar
382/226
May,1986

[0 after 0 votes]
4533959
Sakurai
358/452
Aug,1985

[0 after 0 votes]
4503556
Scherl
382/176
Mar,1985

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process; and

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process.

2. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process; and

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first character recognition process of a first field and a second process step definition for a second character recognition process of a second field on said document form;

said form processing template including a third process step definition for determining whether said second field is related to said first field and whether character recognition of said second field should be omitted; and

omitting character recognition processing of said second field in response to said third process step definition.

3. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process; and

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first character recognition process of a first field and a second process step definition for a second character recognition process of a second field on said document form;

said form processing template including a third process step definition for determining whether said second field is related to said first field and whether character recognition of said second field should be omitted;

performing said first character recognition process of said first field and generating first recognition coded data;

determining that said first recognition coded data has a first certainty value; and

omitting character recognition processing of said second field in response to said third process step definition and said first certainty value.

4. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first coded data repair process of a first field and a second process step definition for a second coded data repair process of a second field on said document form;

said form processing template including a third process step definition for determining whether said second field is related to said first field and whether coded data repair of said second field should be omitted; and

omitting coded data repair processing of said second field in response to said third process step definition.

5. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first coded data repair process of a first field and a second process step definition for a second coded data repair process of a second field on said document form;

said form processing template including a third process step definition for determining whether said second field is related to said first field and whether coded data repair of said second field should be omitted;

performing said first coded data repair process of said first field and generating first coded data repair coded data;

determining that said first repaired coded data has a first certainty value; and

omitting coded data repair processing of said second field in response to said third process step definition and said first certainty value.

6. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition to search for a first coded data repair process which is related to said selected character recognition process, and

said step of selecting a coded data repair process further comprises reading said first process step definition and in response thereto, identifying said selected coded data repair process as being related to said selected character recognition process and performing said step of operating on said recognition coded data with said selected coded data repair process.

7. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first coded data repair process of a first field and a second process step definition for a second coded data repair process of a second field on said document form;

said form processing template including a third process step definition for determining whether said second field is related to said first field and whether repaired coded data of said second field should be cross-checked with repaired coded data of said second field; and

cross-checking repaired coded data of said first field with repaired coded data of said second field in response to said third process step definition.

8. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said form processing template including a first process step definition for a first coded data repair process in said second sequence specification and a second process step definition for a second coded data repair process in said second sequence specification;

said form processing template including a third process step definition for selectively changing the order of performing said first coded data repair process and said second coded data repair process; and

changing the order of performing said first coded data repair process and said second coded data repair process in response to said third process step definition.

9. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process;

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process;

said first sequence specification of said form processing template including a first process step definition for a first character recognition process of a first field and a second process step definition for a second character recognition process of a second field on said document form;

said form processing template including a third process step definition for selectively changing the order of performing said first character recognition process on said first field and said second character recognition process on said second field; and

changing the order of performing said first character recognition process on said first field and said second character recognition process on said second field in response to said third process step definition.

10. In a data processing system, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes including first occurring and second occurring character recognition processes, said template including a second sequence specification for a second plurality of coded data repair processes, including first occurring and second occurring coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting said first occurring character recognition process specified in said first plurality in said template and generating first recognition coded data from said extracted field image and generating first recognition error data using said first occurring character recognition process;

determining that said first recognition error data is greater than a first predetermined value, and in response thereto, selecting said second occurring character recognition process specified in said first plurality in said template and generating second recognition coded data from said extracted field image and generating second recognition error data using said second occurring character recognition process;

selecting said first occurring coded data repair process specified in said second plurality in said template and operating on said second recognition coded data and said second recognition error data to generate first repaired coded data using said first occurring coded data repair process; and

determining that said first repaired coded data has less certainty than a second predetermined value, and in response thereto, selecting said second occurring coded data repair process specified in said second plurality in said template and operating on said second recognition coded data and said second recognition error data to generate second repaired coded data using said second occurring coded data repair process.

11. In a data processing, a method for repairing character recognition errors for digital images of document forms, comprising:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes including first occurring and second occurring character recognition processes, said template including a second sequence specification for a second plurality of coded data repair processes, including first occurring and second occurring coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting said first occurring character recognition process specified in said first plurality in said template and generating first recognition coded data from said extracted field image and generating first recognition error data using said first occurring character recognition process;

determining that said first recognition error data is greater than a first predetermined value, and in response thereto, selecting said second occurring character recognition process specified in said first plurality in said template and generating second recognition coded data from said extracted field image and generating second recognition error data using said second occurring character recognition process;

assembling a machine generated data structure which includes a field data segment including a coded data buffer portion and an error buffer portion for said extracted field image;

inserting said second recognition coded data into said coded data buffer portion and inserting said second recognition error data into said error buffer portion of said field data segment;

transferring said MGDS to said second plurality of coded data repair process, for repairing said second recognition coded data;

augmenting said MGDS with a repair segment which includes a repair data buffer portion;

selecting said first occurring coded data repair process from said second plurality in said template and operating on said second recognition coded data and said second recognition error data to generate first repaired coded data using said first occurring coded data repair process;

determining that said first repaired coded data has less certainty than a second predetermined value, and in response thereto, selecting said second occurring coded data repair process specified in said second plurality in said template and operating on said second recognition coded data and said second recognition error data to generate second repaired coded data using said second occurring coded data repair process;

inserting said second repaired coded data into said coded data buffer portion of said field data segment and inserting said second recognition coded data into said repair data buffer portion of said repair segment; and

transferring said MGDS to a utilization device and accessing the contents of said coded data buffer portion of said field data segment for use as a corrected form of coded data representing said extracted field image.

12. In a data processing system, a method for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, comprising:

inputting document form data specifying a plurality of fields on a document form;

defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields; and

defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field.

13. In a data processing system, a method for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, comprising:

inputting document form data specifying a plurality of fields on a document form;

defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields;

defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field;

defining a second character recognition process sequence using said document form data, specifying a third plurality of character recognition processes for generating second recognition coded data for a second field of said plurality of fields;

defining a second coded data repair process sequence using said document to form data, specifying a fourth plurality of coded data repair processes, for use in repairing character recognition errors of said second recognition coded data for said second field and generating second corrected coded data for said second field; and

said second character recognition process sequence including a process step definition for determining whether said second field is related to said first field and whether character recognition of said second field should be omitted.

14. In a data processing system, a method for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, comprising:

inputting document form data specifying a plurality of fields on a document form;

defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields;

defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field;

defining a second character recognition process sequence using said document form data, specifying a third plurality of character recognition processes for generating second recognition coded data for a second field of said plurality of fields;

defining a second coded data repair process sequence using said document to form data, specifying a fourth plurality of coded data repair processes, for use in repairing character recognition errors of said second recognition coded data for said second field and generating second corrected coded data for said second field; and

said second coded data repair process sequence including a process step definition for determining whether said second field is related to said first field and whether coded data repair of said second field should be omitted.

15. In a data processing system, a method for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, comprising:

inputting document form data specifying a plurality of fields on a document form;

defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields;

defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field;

defining a second character recognition process sequence using said document form data, specifying a third plurality of character recognition processes for generating second recognition coded data for a second field of said plurality of fields;

defining a second coded data repair process sequence using said document to form data, specifying a fourth plurality of coded data repair processes, for use in repairing character recognition errors of said second recognition coded data for said second field and generating second corrected coded data for said second field; and

said second coded data repair process sequence including a process step definition for determining whether said second field is related to said first field and whether coded data repair of said first field should be cross-checked with repaired coded data on said second field.

16. A computer program for execution in a data processing system, to perform a process for repairing character recognition errors for digital images of document forms, the program when executed, performing the steps of:

inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selecting a character recognition process from said first plurality specified in said template and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process; and

selecting a coded data repair process from said second plurality specified in said template and operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process.

17. A computer program for execution in a data processing system, to perform a process for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, the program when executed, performing the steps of:

inputting document form data specifying a plurality of fields on a document form;

defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields;

defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field;

defining a second character recognition process sequence using said document form data, specifying a third plurality of character recognition processes for generating second recognition coded data for a second field of said plurality of fields;

defining a second coded data repair process sequence using said document form data, specifying a fourth plurality of coded data repair processes, for use in repairing character recognition errors of said second recognition coded data for said second field and generating second corrected coded data for said second field; and

said second character recognition process sequence including a process step definition for determining whether said second field is related to said first field and whether character recognition of said second field should be omitted.

18. A data processing system, for repairing character recognition errors for digital images of document forms, comprising:

first input means for inputting a document form processing template including a first sequence specification for a first plurality of character recognition processes and a second sequence specification for a second plurality of coded data repair processes;

second input means for inputting a digital document image of a document form and extracting a field image from said document image, forming a corresponding extracted field image;

selection means coupled to said first input means, for selecting a character recognition process from said first plurality specified in said template;

recognition means coupled to said selection means and said second input means and generating recognition coded data from said extracted field image and generating recognition error data using said selected character recognition process; and

from said second plurality specified in said template; repair means coupled to selection means and said recognition means for operating on said recognition coded data and said recognition error data to generate repaired coded data using said selected coded data repair process.

19. A data processing system for assembling a document form processing template for defining a processing sequence to convert a document image into corrected coded data, comprising:

input means for inputting document form data specifying a plurality of fields on a document form;

processor means coupled to said input means, for defining a first character recognition process sequence using said document form data, specifying a first plurality of character recognition processes for generating first recognition coded data for a first field of said plurality of fields;

said processor means defining a first coded data repair process sequence using said document form data, specifying a second plurality of coded data repair processes, for use in repairing character recognition errors of said first recognition coded data for said first field and generating first corrected coded data for said first field;

said processor means defining a second character recognition process sequence using said document form data, specifying a third plurality of character recognition processes for generating second recognition coded data for a second field of said plurality of fields;

said processor means defining a second coded data repair process sequence using said document form data, specifying a fourth plurality of coded data repair processes, for use in repairing character recognition errors of said second recognition coded data for said second field and generating second corrected coded data for said second field; and

said second character recognition process sequence including a process step definition for determining whether said second field is related to said first field and whether character recognition of said second field should be omitted.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Technical Field

The invention disclosed broadly relates to data processing systems and methods and more particularly relates to techniques for the repair of character recognition information derived from scanned document images.

2. Related Patent Applications

This patent application is related to the co-pending U.S. patent application Ser. No. 07/870,129, filed Apr. 15, 1992, entitled "Data Processing System and Method for Sequentially Repairing Character Recognition Errors for Scanned Images of Document Forms," by T. S. Betts, V. M. Carras, L. B. Knecht, T. L. Paulson, and G. R. Anderson, the application being assigned to the IBM Corporation and incorporated herein by reference.

This patent application is also related to the co-pending U.S. patent application Ser. No. 07/305,828, filed Feb. 2, 1989, entitled "A Computer Implemented Method for Automatic Extraction of Data From Printed Forms," by R. G. Casey and D. R. Ferguson, the application being assigned to the IBM Corporation and incorporated herein by reference.

3. Background Art

Document forms can be filled out in a variety of ways. The examples of writing methods can include hand printing of block letters, cursive hand writing of characters, impact typing, and printing with a dot matrix printer. There can be a variety of character styles and alphabets u