|
Claims  |
|
|
What is claimed is:
1. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process; and
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process.
2. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process; and
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first character recognition process of a first field and a second
process step definition for a second character recognition process of a
second field on said document form;
said form processing template including a third process step definition for
determining whether said second field is related to said first field and
whether character recognition of said second field should be omitted; and
omitting character recognition processing of said second field in response
to said third process step definition.
3. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process; and
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first character recognition process of a first field and a second
process step definition for a second character recognition process of a
second field on said document form;
said form processing template including a third process step definition for
determining whether said second field is related to said first field and
whether character recognition of said second field should be omitted;
performing said first character recognition process of said first field and
generating first recognition coded data;
determining that said first recognition coded data has a first certainty
value; and
omitting character recognition processing of said second field in response
to said third process step definition and said first certainty value.
4. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first coded data repair process of a first field and a second process
step definition for a second coded data repair process of a second field
on said document form;
said form processing template including a third process step definition for
determining whether said second field is related to said first field and
whether coded data repair of said second field should be omitted; and
omitting coded data repair processing of said second field in response to
said third process step definition.
5. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first coded data repair process of a first field and a second process
step definition for a second coded data repair process of a second field
on said document form;
said form processing template including a third process step definition for
determining whether said second field is related to said first field and
whether coded data repair of said second field should be omitted;
performing said first coded data repair process of said first field and
generating first coded data repair coded data;
determining that said first repaired coded data has a first certainty
value; and
omitting coded data repair processing of said second field in response to
said third process step definition and said first certainty value.
6. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition to
search for a first coded data repair process which is related to said
selected character recognition process, and
said step of selecting a coded data repair process further comprises
reading said first process step definition and in response thereto,
identifying said selected coded data repair process as being related to
said selected character recognition process and performing said step of
operating on said recognition coded data with said selected coded data
repair process.
7. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first coded data repair process of a first field and a second process
step definition for a second coded data repair process of a second field
on said document form;
said form processing template including a third process step definition for
determining whether said second field is related to said first field and
whether repaired coded data of said second field should be cross-checked
with repaired coded data of said second field; and
cross-checking repaired coded data of said first field with repaired coded
data of said second field in response to said third process step
definition.
8. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said form processing template including a first process step definition for
a first coded data repair process in said second sequence specification
and a second process step definition for a second coded data repair
process in said second sequence specification;
said form processing template including a third process step definition for
selectively changing the order of performing said first coded data repair
process and said second coded data repair process; and
changing the order of performing said first coded data repair process and
said second coded data repair process in response to said third process
step definition.
9. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process;
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process;
said first sequence specification of said form processing template
including a first process step definition for a first character
recognition process of a first field and a second process step definition
for a second character recognition process of a second field on said
document form;
said form processing template including a third process step definition for
selectively changing the order of performing said first character
recognition process on said first field and said second character
recognition process on said second field; and
changing the order of performing said first character recognition process
on said first field and said second character recognition process on said
second field in response to said third process step definition.
10. In a data processing system, a method for repairing character
recognition errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes
including first occurring and second occurring character recognition
processes, said template including a second sequence specification for a
second plurality of coded data repair processes, including first occurring
and second occurring coded data repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting said first occurring character recognition process specified in
said first plurality in said template and generating first recognition
coded data from said extracted field image and generating first
recognition error data using said first occurring character recognition
process;
determining that said first recognition error data is greater than a first
predetermined value, and in response thereto, selecting said second
occurring character recognition process specified in said first plurality
in said template and generating second recognition coded data from said
extracted field image and generating second recognition error data using
said second occurring character recognition process;
selecting said first occurring coded data repair process specified in said
second plurality in said template and operating on said second recognition
coded data and said second recognition error data to generate first
repaired coded data using said first occurring coded data repair process;
and
determining that said first repaired coded data has less certainty than a
second predetermined value, and in response thereto, selecting said second
occurring coded data repair process specified in said second plurality in
said template and operating on said second recognition coded data and said
second recognition error data to generate second repaired coded data using
said second occurring coded data repair process.
11. In a data processing, a method for repairing character recognition
errors for digital images of document forms, comprising:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes
including first occurring and second occurring character recognition
processes, said template including a second sequence specification for a
second plurality of coded data repair processes, including first occurring
and second occurring coded data repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting said first occurring character recognition process specified in
said first plurality in said template and generating first recognition
coded data from said extracted field image and generating first
recognition error data using said first occurring character recognition
process;
determining that said first recognition error data is greater than a first
predetermined value, and in response thereto, selecting said second
occurring character recognition process specified in said first plurality
in said template and generating second recognition coded data from said
extracted field image and generating second recognition error data using
said second occurring character recognition process;
assembling a machine generated data structure which includes a field data
segment including a coded data buffer portion and an error buffer portion
for said extracted field image;
inserting said second recognition coded data into said coded data buffer
portion and inserting said second recognition error data into said error
buffer portion of said field data segment;
transferring said MGDS to said second plurality of coded data repair
process, for repairing said second recognition coded data;
augmenting said MGDS with a repair segment which includes a repair data
buffer portion;
selecting said first occurring coded data repair process from said second
plurality in said template and operating on said second recognition coded
data and said second recognition error data to generate first repaired
coded data using said first occurring coded data repair process;
determining that said first repaired coded data has less certainty than a
second predetermined value, and in response thereto, selecting said second
occurring coded data repair process specified in said second plurality in
said template and operating on said second recognition coded data and said
second recognition error data to generate second repaired coded data using
said second occurring coded data repair process;
inserting said second repaired coded data into said coded data buffer
portion of said field data segment and inserting said second recognition
coded data into said repair data buffer portion of said repair segment;
and
transferring said MGDS to a utilization device and accessing the contents
of said coded data buffer portion of said field data segment for use as a
corrected form of coded data representing said extracted field image.
12. In a data processing system, a method for assembling a document form
processing template for defining a processing sequence to convert a
document image into corrected coded data, comprising:
inputting document form data specifying a plurality of fields on a document
form;
defining a first character recognition process sequence using said document
form data, specifying a first plurality of character recognition processes
for generating first recognition coded data for a first field of said
plurality of fields; and
defining a first coded data repair process sequence using said document
form data, specifying a second plurality of coded data repair processes,
for use in repairing character recognition errors of said first
recognition coded data for said first field and generating first corrected
coded data for said first field.
13. In a data processing system, a method for assembling a document form
processing template for defining a processing sequence to convert a
document image into corrected coded data, comprising:
inputting document form data specifying a plurality of fields on a document
form;
defining a first character recognition process sequence using said document
form data, specifying a first plurality of character recognition processes
for generating first recognition coded data for a first field of said
plurality of fields;
defining a first coded data repair process sequence using said document
form data, specifying a second plurality of coded data repair processes,
for use in repairing character recognition errors of said first
recognition coded data for said first field and generating first corrected
coded data for said first field;
defining a second character recognition process sequence using said
document form data, specifying a third plurality of character recognition
processes for generating second recognition coded data for a second field
of said plurality of fields;
defining a second coded data repair process sequence using said document to
form data, specifying a fourth plurality of coded data repair processes,
for use in repairing character recognition errors of said second
recognition coded data for said second field and generating second
corrected coded data for said second field; and
said second character recognition process sequence including a process step
definition for determining whether said second field is related to said
first field and whether character recognition of said second field should
be omitted.
14. In a data processing system, a method for assembling a document form
processing template for defining a processing sequence to convert a
document image into corrected coded data, comprising:
inputting document form data specifying a plurality of fields on a document
form;
defining a first character recognition process sequence using said document
form data, specifying a first plurality of character recognition processes
for generating first recognition coded data for a first field of said
plurality of fields;
defining a first coded data repair process sequence using said document
form data, specifying a second plurality of coded data repair processes,
for use in repairing character recognition errors of said first
recognition coded data for said first field and generating first corrected
coded data for said first field;
defining a second character recognition process sequence using said
document form data, specifying a third plurality of character recognition
processes for generating second recognition coded data for a second field
of said plurality of fields;
defining a second coded data repair process sequence using said document to
form data, specifying a fourth plurality of coded data repair processes,
for use in repairing character recognition errors of said second
recognition coded data for said second field and generating second
corrected coded data for said second field; and
said second coded data repair process sequence including a process step
definition for determining whether said second field is related to said
first field and whether coded data repair of said second field should be
omitted.
15. In a data processing system, a method for assembling a document form
processing template for defining a processing sequence to convert a
document image into corrected coded data, comprising:
inputting document form data specifying a plurality of fields on a document
form;
defining a first character recognition process sequence using said document
form data, specifying a first plurality of character recognition processes
for generating first recognition coded data for a first field of said
plurality of fields;
defining a first coded data repair process sequence using said document
form data, specifying a second plurality of coded data repair processes,
for use in repairing character recognition errors of said first
recognition coded data for said first field and generating first corrected
coded data for said first field;
defining a second character recognition process sequence using said
document form data, specifying a third plurality of character recognition
processes for generating second recognition coded data for a second field
of said plurality of fields;
defining a second coded data repair process sequence using said document to
form data, specifying a fourth plurality of coded data repair processes,
for use in repairing character recognition errors of said second
recognition coded data for said second field and generating second
corrected coded data for said second field; and
said second coded data repair process sequence including a process step
definition for determining whether said second field is related to said
first field and whether coded data repair of said first field should be
cross-checked with repaired coded data on said second field.
16. A computer program for execution in a data processing system, to
perform a process for repairing character recognition errors for digital
images of document forms, the program when executed, performing the steps
of:
inputting a document form processing template including a first sequence
specification for a first plurality of character recognition processes and
a second sequence specification for a second plurality of coded data
repair processes;
inputting a digital document image of a document form and extracting a
field image from said document image, forming a corresponding extracted
field image;
selecting a character recognition process from said first plurality
specified in said template and generating recognition coded data from said
extracted field image and generating recognition error data using said
selected character recognition process; and
selecting a coded data repair process from said second plurality specified
in said template and operating on said recognition coded data and said
recognition error data to generate repaired coded data using said selected
coded data repair process.
17. A computer program for execution in a data processing system, to
perform a process for assembling a document form processing template for
defining a processing sequence to convert a document image into corrected
coded data, the program when executed, performing the steps of:
inputting document form data specifying a plurality of fields on a document
form;
defining a first character recognition process sequence using said document
form data, specifying a first plurality of character recognition processes
for generating first recognition coded data for a first field of said
plurality of fields;
defining a first coded data repair process sequence using said document
form data, specifying a second plurality of coded data repair processes,
for use in repairing character recognition errors of said first
recognition coded data for said first field and generating first corrected
coded data for said first field;
defining a second character recognition process sequence using said
document form data, specifying a third plurality of character recognition
processes for generating second recognition coded data for a second field
of said plurality of fields;
defining a second coded data repair process sequence using said document
form data, specifying a fourth plurality of coded data repair processes,
for use in repairing character recognition errors of said second
recognition coded data for said second field and generating second
corrected coded data for said second field; and
said second character recognition process sequence including a process step
definition for determining whether said second field is related to said
first field and whether character recognition of said second field should
be omitted.
18. A data processing system, for repairing character recognition errors
for digital images of document forms, comprising:
first input means for inputting a document form processing template
including a first sequence specification for a first plurality of
character recognition processes and a second sequence specification for a
second plurality of coded data repair processes;
second input means for inputting a digital document image of a document
form and extracting a field image from said document image, forming a
corresponding extracted field image;
selection means coupled to said first input means, for selecting a
character recognition process from said first plurality specified in said
template;
recognition means coupled to said selection means and said second input
means and generating recognition coded data from said extracted field
image and generating recognition error data using said selected character
recognition process; and
from said second plurality specified in said template; repair means coupled
to selection means and said recognition means for operating on said
recognition coded data and said recognition error data to generate
repaired coded data using said selected coded data repair process.
19. A data processing system for assembling a document form processing
template for defining a processing sequence to convert a document image
into corrected coded data, comprising:
input means for inputting document form data specifying a plurality of
fields on a document form;
processor means coupled to said input means, for defining a first character
recognition process sequence using said document form data, specifying a
first plurality of character recognition processes for generating first
recognition coded data for a first field of said plurality of fields;
said processor means defining a first coded data repair process sequence
using said document form data, specifying a second plurality of coded data
repair processes, for use in repairing character recognition errors of
said first recognition coded data for said first field and generating
first corrected coded data for said first field;
said processor means defining a second character recognition process
sequence using said document form data, specifying a third plurality of
character recognition processes for generating second recognition coded
data for a second field of said plurality of fields;
said processor means defining a second coded data repair process sequence
using said document form data, specifying a fourth plurality of coded data
repair processes, for use in repairing character recognition errors of
said second recognition coded data for said second field and generating
second corrected coded data for said second field; and
said second character recognition process sequence including a process step
definition for determining whether said second field is related to said
first field and whether character recognition of said second field should
be omitted. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Technical Field
The invention disclosed broadly relates to data processing systems and
methods and more particularly relates to techniques for the repair of
character recognition information derived from scanned document images.
2. Related Patent Applications
This patent application is related to the co-pending U.S. patent
application Ser. No. 07/870,129, filed Apr. 15, 1992, entitled "Data
Processing System and Method for Sequentially Repairing Character
Recognition Errors for Scanned Images of Document Forms," by T. S. Betts,
V. M. Carras, L. B. Knecht, T. L. Paulson, and G. R. Anderson, the
application being assigned to the IBM Corporation and incorporated herein
by reference.
This patent application is also related to the co-pending U.S. patent
application Ser. No. 07/305,828, filed Feb. 2, 1989, entitled "A Computer
Implemented Method for Automatic Extraction of Data From Printed Forms,"
by R. G. Casey and D. R. Ferguson, the application being assigned to the
IBM Corporation and incorporated herein by reference.
3. Background Art
Document forms can be filled out in a variety of ways. The examples of
writing methods can include hand printing of block letters, cursive hand
writing of characters, impact typing, and printing with a dot matrix
printer. There can be a variety of character styles and alphabets u | | |