|
|
|
| United States Patent | 5594809 |
| Link to this page | http://www.wikipatents.com/5594809.html |
| Inventor(s) | Kopec; Gary E. (Belmont, CA);
Chou; Philip A. (Menlo Park, CA);
Niles; Leslie T. (Palo Alto, CA) |
| Abstract | A technique for automatically producing, or training, a set of bitmapped
character templates defined according to the sidebearing model of
character image positioning uses as input a text line image of unsegmented
characters, called glyphs, as the source of training samples. The training
process also uses a transcription associated with the text line image, and
an explicit, grammar-based text line image source model that describes the
structural and functional features of a set of possible text line images
that may be used as the source of training samples. The transcription may
be a literal transcription of the line image, or it may be nonliteral, for
example containing logical structure tags for document formatting and
layout, such as found in markup languages. Spatial positioning information
modeled by the text line image source model and the labels in the
transcription are used to determine labeled image positions identifying
the location of glyph samples occurring in the input line image, and the
character templates are produced using the labeled image positions. In
another aspect of the technique, a set of character templates defined by
any character template model, such as a segmentation-based model, is
produced using the grammar-based text line image source model and
specifically using a tag transcription containing logical structure tags
for document formatting and layout. Both aspects of the training technique
may represent the text line image source model and the transcription as
finite state networks. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5594809 |
|
|
Automatic training of character templates using a text line image, a
text line transcription and a line image source model |
|
|
|
|
|
| Publication Date |
January 14, 1997 |
|
|
|
|
|
| Filing Date |
April 28, 1995 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed:
1. A method of operating a machine to train a set of bitmapped character
templates for use in a recognition system; each of the bitmapped character
templates being based on a character template model defining character
image positioning referred to as a sidebearing model of character image
positioning; the machine including a processor and a memory device for
storing data; the data stored in the memory device including instruction
data the processor executes to operate the machine; the processor being
connected to the memory device for accessing the data stored therein; the
method comprising:
operating the processor to receive and store an image definition data
structure defining an image including a plurality of glyphs indicating a
single line of text, hereafter referred to as a text line image source of
glyph samples; each glyph occurring in the text line image source of glyph
samples being an image instance of a respective one of a plurality of
characters in a character set, hereafter referred to as a glyph sample
character set; each one of the set of bitmapped character templates being
trained representing a respective one of the plurality of characters in
the glyph sample character set;
operating the processor to receive and store in the memory device a text
line image source model data structure, hereafter referred to as a text
line image source model; the text line image source model modeling as a
grammar a spatial image structure of a set of text line images; the text
line image source of glyph samples being one of the set of text line
images modeled by the text line image source model; the text line image
source model including spatial positioning data modeling spatial
positioning of the plurality of glyphs occurring in the text line image
source of glyph samples;
operating the processor to determine, for each respective glyph occurring
in the text line image source of glyph samples, an image coordinate
position in the text line image source of glyph samples indicating an
image origin position of the respective glyph using the spatial
positioning data included in the text line image source model; each image
coordinate position being hereafter referred to as a glyph sample image
origin position;
operating the processor to produce a glyph label data item paired with each
glyph sample image origin position determined for the respective glyphs
occurring in the text line image source of glyph samples; each glyph label
data item being hereafter referred to as a respectively paired glyph
label; each respectively paired glyph label indicating the character in
the glyph sample character set represented by the respective glyph; the
processor, in producing each respectively paired glyph label, using
mapping data included in the text line image source model mapping
respective ones of the glyphs occurring in the text line image source of
glyph samples to respectively paired glyph labels; the processor, further
in producing each respectively paired glyph label, using a text line
transcription data structure associated with the text line image source of
glyph samples, hereafter referred to as a transcription, including an
ordered arrangement of transcription label data items; the processor using
the transcription and the mapping data to pair each glyph label with the
respective glyph sample image origin position of a respective glyph
occurring in the text line image source of glyph samples; and
operating the processor to produce the set of bitmapped character templates
using the text line image source of glyph samples, the glyph sample image
origin positions and the respectively paired glyph labels; the processor
determining, in each bitmapped character template produced, an image pixel
position included therein indicating a template image origin position;
each bitmapped character template produced having a characteristic image
positioning property such that, when a second bitmapped character template
is positioned in an image with the template image origin position thereof
displaced from the template image origin position of a preceding first
bitmapped character template by a character set width thereof, and when a
first bounding box entirely containing the first bitmapped character
template overlaps in the image with a second bounding box entirely
containing the second bitmapped character template, the first and second
bitmapped character templates have substantially nonoverlapping foreground
pixels.
2. The method of claim 1 of operating the machine to train bitmapped
character templates wherein operating the processor to produce the set of
bitmapped character templates includes
determining, for each bitmapped character template, a collection of sample
image regions, referred to as a glyph sample collection, occurring in the
text line image source of glyph samples using the glyph sample image
origin positions and the respectively paired glyph labels; each sample
image region including one of the glyph sample image origin positions; and
producing the set of bitmapped character templates contemporaneously using
the glyph sample collections by assigning foreground pixel color values to
selected template pixel positions in respective ones of the bitmapped
character templates; one of the selected template pixel positions in a
first one of the set of bitmapped character templates being selected for
assigning a foreground pixel color value thereto on the basis of template
contribution measurements computed using sample pixel positions included
in the glyph sample collections identified for the character represented
by the first bitmapped character template.
3. The method of claim 2 wherein
each bitmapped character template is represented as a template image region
data structure, referred to as a template image region, including a
template pixel position designated as a template origin pixel position;
the template image region indicating a respective one of the characters in
the glyph sample character set being paired with the glyph sample
collection identified for the respective character, and being referred to
as a respectively paired template image region;
each sample image region in one of the glyph sample collections has a
vertical and horizontal size dimension determined relative to the template
origin pixel position in the respectively paired template image region so
that the glyph sample image origin position determined for each glyph
sample is positioned in the respective sample image region at the sample
pixel position identical in pixel location to the template origin pixel
position in the respectively paired template image region; whereby the
sample image regions in each glyph sample collection are effectively
aligned at respective glyph sample image origin positions and are
hereafter referred to as aligned sample image regions; each sample pixel
position in a first one of the aligned sample image regions being
respectively paired with the sample pixel position in the same pixel
location in a second one of the aligned sample image regions; and
the step of assigning foreground pixel color values to template pixel
positions uses respectively paired sample pixel positions included in the
aligned sample image regions for all of the glyph sample collections and
further includes
(a) computing the template contribution measurement for each template pixel
position using each respectively paired sample pixel position included in
the sample image region;
(b) selecting the template pixel position having the highest positive
template contribution measurement as a selected template pixel position;
(c) assigning a foreground pixel color value to the selected template
pixel;
(d) modifying each sample pixel position paired with the selected template
pixel position to indicate a background pixel color value; and
(e) repeating steps (a) through (d) while at least one of the template
contribution measurements being computed is positive.
4. The method of claim 1 of operating the machine to train bitmapped
character templates further including operating the processor to
determine, for each bitmapped character template produced, a character set
width using the text line image source of glyph samples, the glyph sample
image origin positions and the respectively paired glyph labels; the
character set width indicating an image distance measurement between the
template image origin position of the bitmapped character template and a
template image origin position of a next adjacent bitmapped character
template.
5. The method of claim 1 of operating the machine to train bitmapped
character templates wherein
the transcription associated with the text line image source of glyph
samples is a tag transcription including at least one nonliteral
transcription label, hereafter referred to as a tag, indicating at least
one character code representing a character with which a respective glyph
in the text line image source of glyph samples cannot be paired by visual
inspection thereof; the at least one character code indicated by the tag
indicating markup information about the text line image source of glyph
samples; the markup information, when interpreted by a document processing
operation, producing at least one display feature included in the text
line image source of glyph samples perceptible as a visual formatting
characteristic of the text line image source of glyph samples; and
the processor, in producing the respectively paired glyph labels using the
transcription and the mapping data,
uses the spatial positioning information about the plurality of glyphs to
identify at least one glyph sample in the text line image of glyph samples
related to the tag, and
uses the at least one character code indicated by the tag in producing the
respectively paired glyph label paired with the at least one glyph sample
identified.
6. The method of claim 1 wherein
the text line image source model is represented as a stochastic finite
state network data structure indicating a regular grammar, hereafter
referred to as a text line image source network; the text line image
source network modeling the text line image source of glyph samples as a
series of nodes and transitions between pairs of the nodes; the text line
image source network representing the mapping data mapping a respective
one of the glyphs occurring in the text line image source of glyph samples
to a glyph label as at least one sequence of transitions from a first node
to a final node, called a path, through the text line image source
network; the path indicating path data items associated therewith and
accessible by the processor; the path data items associated with the path
indicating a pairing between substantially each one of the plurality of
glyphs occurring in the text line image source of glyph samples and a
glyph label indicating a character in the glyph sample character set;
the transcription data structure associated with the text line image source
of glyph samples is represented as a finite state network data structure,
hereafter referred to as a transcription network, modeling a set of
transcriptions as a series of transcription nodes and a sequence of
transcription transitions between pairs of the transcription nodes; each
transcription transition having a transcription label associated
therewith; at least one sequence of transcription transitions, called a
transcription path, through the transcription network indicating the
ordered arrangement of the transcription labels in one of the
transcriptions included in the set of transcriptions; and
the processor, in determining the glyph sample image origin positions and
the respectively paired glyph labels, merges the series of nodes of the
text line image source network with the series of transcription nodes of
the transcription network to produce a transcription-image network
indicating modified mapping data mapping a respective one of the
transcription labels included in the transcription to a respective glyph
sample image origin position and to a respectively paired glyph label
indicating the character in the glyph sample character set; the
transcription-image network representing the modified mapping data as at
least one complete transcription-path through the transcription-image
network; the at least one complete transcription-path indicating the path
data items; performs a decoding operation on the text line image source of
glyph samples using the transcription-image network to produce the at
least one complete transcription-path; and obtains the glyph sample image
origin positions and the respectively paired glyph labels using the path
data items indicated by the at least one complete transcription-path.
7. The method of claim 6 wherein
the path data items associated with the complete transcription-path include
a message string, a character template and an image displacement;
the transcription labels associated with transitions in the transcription
network are message strings in the transcription-image network such that
the transcription-image network models a relationship between each
transcription label and a glyph occurring in the text line image of glyph
samples; and
the processor determines, for transitions in the transcription-path having
non-null character templates associated therewith, the glyph sample image
origin position of each glyph by using the image displacement associated
with the respective transition, and determines the respectively paired
glyph label using a character label indicated by the non-null character
template indicating the character in the glyph sample character set
represented by the respective glyph sample.
8. The method of claim 6 wherein operating the processor to perform the
decoding operation on the text line image source of glyph samples to
produce the complete transcription-path includes
producing a plurality of complete transcription-paths through the
transcription-image network; each complete transcription-path indicating a
target text line ideal image;
computing a target image value for each one of the plurality of target text
line ideal images by comparing, for each target text line ideal image,
color values indicated by pixels defining the text line image source of
glyph samples with color values of respectively paired pixels defining the
target text line ideal image; and
determining one of the plurality of complete transcription-paths as a best
complete transcription-path using the target image values.
9. The method of claim 6 wherein operating the processor to perform a
decoding operation on the text line image source of glyph samples to
produce the complete transcription-path includes
performing a dynamic programming based decoding operation to compute an
optimum score at each of a plurality of lattice nodes in a decoding
lattice data structure representing the transcription-image network; the
dynamic programming based decoding operation producing and storing an
optimizing transition identification data item for each lattice node in
the decoding lattice; the optimizing transition identification data item
being produced as a result of computing the optimum score and indicating
one of a plurality of possible transitions into a respective one of the
lattice nodes that optimizes the score for the respective lattice node;
and
performing a backtracing operation to retrieve a sequence of transitions
indicating a decoding lattice path; the backtracing operation starting
with a final lattice node and ending with a first lattice node in the
decoding lattice path; the sequence of transitions being retrieved using
the optimizing transition identification data item produced for each
lattice node as a result of computing the optimum scores; the decoding
lattice path indicating the complete transcription-path through the
transcription-image network.
10. The method of claim 6 wherein
the character template associated with a transition in the
transcription-image network is one of a plurality of initial character
templates representing a respective character in the glyph sample
character set;
the decoding operation uses the plurality of initial character templates to
produce the complete transcription-path; and
after producing the set of bitmapped character templates using the text
line image source of glyph samples, the glyph sample image origin
positions and the respectively paired glyph labels thereof, performing at
least one additional iteration of the steps of performing the decoding
operation, obtaining the glyph sample image origin positions and
respectively paired glyph labels, and producing the set of bitmapped
character templates; wherein the at least one additional iteration of the
decoding operation uses the set of bitmapped character templates produced
in a prior iteration as the plurality of initial character templates.
11. The method of claim 1 wherein the processor, prior to determining the
glyph sample image origin positions and the respectively paired glyph
labels, produces the text line image source of glyph samples by performing
a text-line segmentation operation on an input two-dimensional (2D) image
source of glyph samples.
12. A method of operating a machine to train a set of bitmapped character
templates for use in a recognition system; the machine including a
processor and a memory device for storing data; the data stored in the
memory device including instruction data the processor executes to operate
the machine; the processor being connected to the memory device for
accessing the data stored therein; the method comprising:
operating the processor to receive and store an image definition data
structure defining an image including a plurality of glyphs occurring
therein indicating a single line of text, hereafter referred to as a text
line image source of glyph samples; each glyph occurring in the text line
image source of glyph samples being an image instance of a respective one
of a plurality of characters in a character set, hereafter referred to as
a glyph sample character set; each one of the set of bitmapped character
templates being trained representing a respective one of the plurality of
characters in the glyph sample character set;
operating the processor to receive and store in the memory device a text
line image source model data structure, hereafter referred to as a text
line image source model; the text line image source model modeling the
text line image source of glyph samples as a grammar and including spatial
positioning data modeling spatial positioning of the plurality of glyphs
occurring in the text line image source of glyph samples;
operating the processor to determine a plurality of glyph samples occurring
in the text line image source of glyph samples using the spatial
positioning data included in the text line image source model;
operating the processor to produce a glyph label data item, hereafter
referred to as a respectively paired glyph label, paired with each glyph
sample occurring in the text line image source of glyph samples; the
respectively paired glyph label indicating the respective one of the
characters in the glyph sample character set represented by the glyph
sample; the processor, in producing each respectively paired glyph label,
using mapping data included in the text line image source model mapping a
respective one of the glyphs occurring in the text line image source of
glyph samples to a glyph label indicating the character in the glyph
sample character set represented by the respective glyph; the processor,
further in producing each respectively paired glyph label, using a text
line transcription data structure associated with the text line image
source of glyph samples including an ordered arrangement of transcription
label data items; the text line transcription data structure including at
least one nonliteral transcription label, hereafter referred to as a tag,
indicating at least one character code representing a character with which
a respective glyph in the text line image source of glyph samples cannot
be paired by visual inspection thereof; the at least one character code
indicated by the tag indicating markup information about the text line
image source of glyph samples; the markup information, when interpreted by
a document processing operation, producing at least one display feature
included in the text line image source of glyph samples perceptible as a
visual formatting characteristic of the text line image source of glyph
samples; the text line transcription data structure being hereafter
referred to as a tag transcription; the processor, in producing the
respectively paired glyph label using the tag transcription and the
mapping data,
using the spatial positioning information about the plurality of glyphs to
identify the glyph sample in the text line image of glyph samples related
to the tag, and
using the tag in producing the respectively paired glyph label paired with
the glyph sample identified; and
operating the processor to produce the set of bitmapped character templates
indicating the characters in the glyph sample character set using the
glyph samples identified by the respectively paired glyph labels.
13. The method of claim 12 wherein
the text line image source model is represented as a stochastic finite
state network data structure indicating a regular grammar, hereafter
referred to as a text line image source network; the text line image
source network modeling the text line image source of glyph samples as a
series of nodes and transitions between pairs of the nodes; the text line
image source network representing the mapping data mapping a respective
one of the glyphs occurring in the text line image source of glyph samples
to a glyph label as a sequence of transitions from a first node to a final
node, called a path, through the text line image source network; each
transition having path data items accessible by the processor associated
therewith; the path data items including a message string, a character
template and an image displacement; the path data items indicating a
pairing between substantially each one of the plurality of glyphs
occurring in the text line image source of glyph samples and a glyph label
indicating a character in the glyph sample character set;
the tag transcription associated with the text line image source of glyph
samples is represented as a finite state network data structure, hereafter
referred to as a tag transcription network, modeling a set of tag
transcriptions, produced as an output of a recognition operation performed
on the text line image source of glyph samples, as a series of
transcription nodes and a sequence of transcription transitions between
pairs of the transcription nodes; each transition having a transcription
label associated therewith; one transcription transition having the tag
associated therewith; at least one sequence of transcription transitions,
called a transcription path, through the tag transcription network
indicating the ordered arrangement of the transcription labels in one of
the tag transcriptions included in the set of tag transcriptions; and
the processor, in determining the glyph samples and the respectively paired
glyph labels, merges the series of nodes of the text line image source
network with the series of transcription nodes of the tag transcription
network to produce a transcription-image network indicating modified
mapping data mapping a respective one of the transcription labels included
in the tag transcription to a respective glyph sample and to a
respectively paired glyph label indicating the character in the glyph
sample character set; transcription labels associated with transcription
transitions in the tag transcription network becoming message strings
associated with transitions in the transcription-image network; the tag
associated with the transcription transition in the tag transcription
network becoming a message string associated with a transition included in
the transcription-image network such that the transcription-image network
models a relationship between the tag and at least one glyph occurring in
the text line image of glyph samples; the transcription-image network
representing the modified mapping data as a complete transcription-path
through the transcription-image network; the complete transcription-path
indicating the path data items; performs a decoding operation on the text
line image source of glyph samples using the transcription-image network
to produce the complete transcription-path; and determines, for each
respective one of the transitions in the transcription-path having a
non-null character template associated therewith, the glyph sample
indicated by the transition using the image displacement associated
therewith, and determines the respectively paired glyph label using a
character label indicated by the non-null character template indicating
the character in the glyph sample character set represented by a
respective glyph sample.
14. The method of claim 13 wherein operating the processor to perform the
decoding operation on the text line image source of glyph samples to
produce the complete transcription-path includes
producing a plurality of complete transcription-paths through the
transcription-image network; each complete transcription-path indicating a
target text line ideal image;
computing a target image value for each of the plurality of target text
line ideal images by comparing, for each target text line ideal image,
color values indicated by pixels defining the text line image source of
glyph samples with color values of respectively paired pixels defining the
target text line ideal image; and
determining one of the plurality of complete transcription-paths as a best
complete transcription-path using the target image values.
15. The method of claim 13 wherein operating the processor to perform a
decoding operation on the text line image source of glyph samples to
produce the complete transcription-path includes
performing a dynamic programming based decoding operation to compute an
optimum score at each of a plurality of lattice nodes in a decoding
lattice data structure representing the transcription-image network; the
dynamic programming based decoding operation producing and storing an
optimizing transition identification data item for each lattice node in
the decoding lattice; the optimizing transition identification data item
being produced as a result of computing the optimum score and indicating
one of a plurality of possible transitions into a respective one of the
lattice nodes that optimizes the score for the respective lattice node;
and
performing a backtracing operation to retrieve a sequence of transitions
indicating a decoding lattice path; the backtracing operation starting
with a final lattice node and ending with a first lattice node in the
decoding lattice path; the sequence of transitions being retrieved using
the optimizing transition identification data item produced for each
lattice node as a result of computing the optimum scores; the decoding
lattice path indicating the complete transcription-path through the
transcription-image network.
16. The method of claim 13 wherein
the character template associated with a transition in the
transcription-image network is one of a plurality of initial character
templates representing a respective character in the glyph sample
character set;
the decoding operation uses the plurality of initial character templates to
produce the complete transcription-path; and
after producing the set of bitmapped character templates using the glyph
samples and the respectively paired glyph labels thereof, performing at
least one additional iteration of the steps of performing the decoding
operation, determining the glyph samples and respectively paired glyph
labels, and producing the set of bitmapped character templates; wherein
the at least one additional iteration of the decoding operation uses the
set of bitmapped character templates produced in a prior iteration as the
plurality of initial character templates.
17. The method of claim 12 of operating the machine to train bitmapped
character templates wherein the processor, prior to determining the glyph
samples occurring in the text line image source of glyph samples and
respectively paired glyph labels thereof, produces the text line image
source of glyph samples by performing a text-line segmentation operation
on an input two-dimensional (2D) image source of glyph samples.
18. The method of claim 12 of operating the machine to train bitmapped
character templates wherein
each of the set of bitmapped character templates is based on a character
template model having a characteristic image positioning property such
that when a first rectangular bounding box entirely contains a first
character image, and a second rectangular bounding box entirely contains a
second character image adjacent to the first character image, the first
rectangular bounding box does not substantially overlap with the second
rectangular bounding box; and
the step of operating the processor to determine the glyph samples
occurring in the text line image source of glyph samples includes
determining, for each glyph sample, image coordinates of a glyph sample
bounding box in the text line image source of glyph samples that entirely
defines image dimensions of a respective glyph sample.
19. The method of claim 18 wherein the step of operating the processor to
produce the set of bitmapped character templates includes producing the
bitmapped character templates from the text line image source of glyph
samples using the image coordinates of glyph sample bounding boxes to
define the image dimensions of the respective glyph samples.
20. The method of claim 18 wherein
the step of operating the processor to determine the glyph samples further
includes, for each glyph sample, producing an image definition data
structure defining an isolated glyph sample using the image coordinates of
the glyph sample bounding box of the respective glyph sample; and
the step of operating the processor to produce the set of bitmapped
character templates includes, for each respective one of the set of
bitmapped character templates, identifying the image definition data
structures defining the isolated glyph samples as samples of the character
in the glyph sample character set indicated by the respective bitmapped
character template using respectively paired glyph labels, and assigning a
foreground pixel color value to selected ones of a plurality of pixel
positions included in the respective bitmapped character template using
pixel color values included in the isolated glyph samples identified.
21. The method of claim 12 of operating the machine to train bitmapped
character templates wherein
the bitmapped character templates are based on a character template model
having a characteristic image positioning property such that, when a
second bitmapped character template is positioned in an image with a
template image origin position thereof displaced from a template image
origin position of a preceding first bitmapped character template by a
character set width thereof, and when a first bounding box entirely
containing the first bitmapped character template overlaps in the image
with a second bounding box entirely containing the second bitmapped
character template, the first and second bitmapped character templates
have substantially nonoverlapping foreground pixels;
the step of operating the processor to determine the glyph samples
occurring in the text line image source of glyph samples and respectively
paired glyph labels thereof includes determining an image position in the
text line image source of glyph samples indicating a glyph sample image
origin position of each glyph sample; and
the step of operating the processor to produce the bitmapped character
templates includes using the glyph sample image origin positions to
determine sample image regions in the text line image source of glyph
samples for use in producing the bitmapped character templates; the
processor identifying a template image origin position for each bitmapped
character template produced. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
CROSS REFERENCE TO OTHER APPLICATIONS
The invention of the present application is related to several other
inventions that are the subject matter of copending, commonly assigned
U.S. patent applications, respectively identified as Ser. No. 08/431,223,
"Automatic Training of Character Templates Using a Transcription and a
Two-Dimensional Image Source Model"; Ser. No. 08/431,714, "Method of
Producing Character Templates Using Unsegmented Samples"; Ser. No.
08/430,635, "Unsupervised Training of Character Templates Using
Unsegmented Samples"; and Ser. No. 08/460,454, "Method and System for
Automatic Transcription Correction".
FIELD OF THE INVENTION
The present invention relates generally to the field of
computer-implemented methods of and systems for pattern recognition, and
more particularly to a method of, and machine for, training bitmapped
character templates for use in computer-implemented systems for document
image decoding and character recognition.
BACKGROUND
Information in the form of language symbols (i.e., characters) or other
symbolic notation that is visually represented to a human in an image on a
marking medium, such as a computer display screen or paper, is capable of
manipulation for its semantic content by a processor included in a
computer system when the information is accessible to the processor in an
encoded form, such as when each of the language symbols is available to
the processor as a respective character code selected from a predetermined
set of character codes (e.g. ASCII code) that represent the symbols to the
processor. An image is typically represented in a computer system as a
two-dimensional array of image data, with each item of data in the array
providing a value indicating the color (typically black or white) of a
respective location of the image. An image represented in this manner is
frequently referred to as a bitmapped or binary image. Each location in a
binary image is conventionally referred to as a picture element, or pixel.
Sources of bitmapped images include images produced by scanning a paper
form of a document using an optical scanner, or by receiving image data
via facsimile transmission of a paper document. When manipulation of the
semantic content of the characters in an image by a processor is
desirable, a process variously called "recognition," or "character
recognition," or "optical character recognition" must be performed on the
image in order to produce, from the images of characters, a sequence of
character codes that is capable of being manipulated by the processor.
Character recognition systems typically include a process in which the
appearance of an isolated, input character image, or "glyph," is analyzed
and, in a decision making process, classified as a distinct character in a
predetermined set of characters. The term "glyph" refers to an image that
represents a realized instance of a character. The classification analysis
typically includes comparing characteristics of the isolated input glyph
(e.g., its pixel content or other characteristics) to units of reference
information about characters in the character set, each of which defines
characteristics of the "ideal" visual representation of a character in its
particular size, font and style, as it would appear in an image if there
were no noise or distortion introduced by the image creation process. The
unit of reference information for each character, typically called a
"character template," "template" or "prototype," includes identification
information, referred to as a "character label," that uniquely identifies
the character as one of the characters in the character set. The character
label may also include such information as the character's font, point
size and style. A character label is output as the identification of the
input glyph when the classification analysis determines that a sufficient
match between the glyph and the reference information indicating the
character label has been made.
The representation of the reference information that comprises a character
template may be referred to as its model. Character template models are
broadly identifiable as being either bitmapped, or binary, images of
characters, or lists of high level "features" of binary character images.
"Features" are measurements of a character image that are derived from the
binary image and are typically much fewer in number than the number of
pixels in the character image. Examples of features include a character's
height and width, and the number of closed loops in the character. Within
the category of binary character template models, at least two different
types of models have been defined: one model may be called the
"segmentation-based" model, and describes a character template as fitting
entirely within a rectangular region, referred to as a "bounding box," and
describes the combining of adjacent character templates as being
"disjoint"--that is, requiring nonoverlapping bounding boxes. U.S. Pat.
No. 5,321,773 discloses another binary character template model that is
based on the sidebearing model of letterform shape description and
positioning used in the field of digital typography. The sidebearing
model, described in more detail below in the discussion accompanying FIG.
1, describes the combining of templates to permit overlapping rectangular
bounding boxes as long as the foreground (e.g., typically black) pixels of
one template are not shared with, or common with, the foreground pixels of
an adjacent template; this is described as requiring the templates to have
substantially "disjoint supports."
1. Overview of Training Character Templates For Recognition Systems
Training character templates is the process of using training data to
create, produce or update the templates used for the recognition process.
Training data can be broadly defined as a collection of character image
samples, each with an assigned character label identifying the character
in the character set that it represents, that provide the information
necessary to produce templates according | | |