|
Description  |
|
|
CROSS REFERENCES TO RELATED APPLICATIONS
1. "Method and Apparatus for Character Recognition Employing a Dead-Band
Correlator," Ser. No. 470,241, filed Feb. 28, 1983 abandoned in favor of
continuation application Ser. No. 902,071, filed Aug. 27, 1986, now Pat.
No. 4,700,401, issued Oct. 13, 1987.
2. "Optical Character Isolation System, Apparatus and Method," Ser. No.
535,410, filed Sept. 23, 1983.
BACKGROUND OF THE INVENTION
The present invention relates to an optical character recognition system
and method. More particularly, the present invention relates to an
OCR-type system which provides recognition of video character data
representing text on a document having proportional spacing and/or fixed
pitch formats, as well as recognition of text having accented characters.
OCR-type systems have long been known in the art in conjunction with
sophisticated word processing systems implemented in the business
environment, as well as a personal user environment, Many such systems
operate with a proportional spacing format, which provides for
proportional spacing for each line of text.
This proportional spacing is a desirable feature in many such word
processing systems such as in preparation of legal briefs and memoranda,
marketing projections and the like.
One problem with prior art OCR-type systems is that, in general, there is
no capability of distinguishing between proportional spacing formats and
fixed pitch formats, such as may be used with older type equipment.
However, there are many applications where it would be desirable to have a
recognition capability between proportional spaced formats and fixed pitch
formats. Prior art systems have, in general, not been able to provide a
recognition technique so that either type of format can be utilized in a
word processing system.
In addition, OCR scanning systems have not in general been able to provide
for accurate recognition of accented characters, such as appear in many
Western European languages. As the word processing capability is expanded
to include Western European text, a serious limitation is the deficiency
of prior art systems of not being able to recognize accented characters.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a proportional spaced
text recognition apparatus and method.
In one embodiment, the present invention is utilized in an optical
character recognition system. The invention provides recognition of video
data representative of proportional spacing or fixed pitch formats, and
can convert between the different types of formats.
The present invention also provides for recognition and processing of
accented characters which are common in Western European type texts.
Other objects and features of the present invention will become apparent
from the following detailed description when taken in conjunction with the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts a representation of a video buffer.
FIG. 2 depicts a representation of a recognition buffer storing the letter
"S".
FIG. 3 depicts an illustration of page image records (PIR's).
FIG. 4 depicts a representation of two accented characters, where the
placement of the accent above the character varies.
FIG. 5 depicts a representation of an oversized accented character.
FIG. 6 depicts a representation of a base character and remnant buffer,
respectively.
FIG. 7 depicts a flow chart illustrating the sequence of steps for
recognizing an accented character.
FIG. 8 depicts a representation of a cloud mask for an accented character.
FIGS. 9A and B depict the segmentation of an accented character into a base
character and an accent portion.
FIG. 10 depicts a flow chart illustrating the process of recombining
accented characters.
FIG. 11 depicts a flow chart illustrating the process of pitch
determination.
FIG. 12 depicts a representation of touching characters "Th".
FIG. 13 depicts a representation of the segmented characters of FIG. 12.
FIG. 14 depicts a flow chart illustrating the process of recognition of
touching characters.
FIG. 15 depicts a representation of the segmented characters of FIG. 13.
FIG. 16 depicts a flow chart illustrating the process of recognizing
multiple characters within one character buffer.
FIG. 17 depicts a representation of multiple characters which can be stored
in one character buffer.
FIG. 18 depicts a representation of separated multiple characters of FIG.
17.
FIG. 19 depicts a representation of a proportional space input and an
adjusted fixed pitch output.
DETAILED DESCRIPTION OF THE DRAWINGS
The present invention relates to optical character recognition systems.
Particularly desirable OCR isolation and recognition techniques are
described in the two cross-referenced applications identified above, the
details of which are hereby incorporated by reference.
In general, an optical character recognition (OCR) system can be subdivided
into three major subsystems. These are:
1. Character Segmentation--handles page detection, video acquisition,
pre-recognition noise filtering, the identification of character fields
suitable for recognition and the creation of character image buffers for
use by character recognition.
2. Character Recognition--attempts to identify an unknown character image
provided by character segmentation. Recognition technology may include
some combination of correlation against known masks, feature analysis,
decision trees or other recognition techniques.
3. Page Composition--attempts to reconstruct the page into ASCII coded
lines of text (American Standard Code for Information
Interchange--ASCII--is a standard code that assigns specific bit patterns
to each sign, symbol, numeral, letter and operation in a specific set).
Included in this subsystem may be contextual and positional analysis,
post-recognition noise filtering, text spacing and skew adjustment.
The relative complexity of these subsystems will vary depending on the
nature of the text read and the accuracy desired. Early OCR systems for
typed pages used OCR specific fonts, such as OCR-A, which allowed little
or no random noise such as dirt, paper imperfections, copier marks, etc.
For these systems, simple character segmentation, recognition and page
composition techniques yielded acceptable results. Later OCR systems used
improved character recognition schemes to allow the use of standard office
fonts, such as Prestige Elite, but were still intolerant to problems
affecting character segmentation, such as noise, skew and touching
characters. These systems were quite suitable in an environment of
carefully prepared input documents, but were unacceptable when the input
documents took the form of standard archived or correspondence pages. To
account for these problems, a system using character segmentation by
iterative page decomposition and page composition by character baseline
analysis was developed. These two additional techniques served to
eliminate such problems as skew and noise. The following describes the
methodology used to implement character segmentation, character
recognition and page composition in one of these later systems.
Briefly, character segmentation by iterative page decomposition works as
follows:
Input: Character segmentation assumes as input a video buffer acting as a
window onto an input document. The representation of a video buffer
illustrated in FIG. 1 contains a digitized image of a portion of the
document, which contains recognizable typed text as well as unrecognizable
features such as noise, forms features, letterheads, logos, underlines and
signatures. The digitized image in the video buffer must be tall enough to
hold the tallest recognizable character and as wide as the document
currently being processed. The top pixel row of the video buffer may be
discarded and a new pixel row added to the bottom of the buffer, in effect
moving the buffer window down the document being processed. Any pixel in
the video buffer may be examined or altered by the software or hardware
implementing this technique.
Output: The primary output of character segmentation is a series of
recognition input buffers, each containing a digitized image of a
localized feature assumed to be a single character, such as the letter "S"
illustrated in FIG. 2, as well as the horizontal and vertical pixel
location of the feature. Secondary outputs of character segmentation
include the location and length of any underlines detected, as well as the
location and size of features too tall and too wide to be considered
recognizable characters.
Method: The video buffer is moved down the document until the top row of
the buffer contains some black. If the feature containing the leftmost
black pixel is small isolated noise, it is erased from the video buffer,
and processing continues with the search for black. If the feature is
large enough, it is assumed to be part of a character which is itself a
part of a word of text. The leftmost edge of the word is located by
searching to the left until a tall, wide white region is found. Individual
characters within the word are then segmented out, left to right, copied
into recognition buffers and erased from the video buffer. During the
segmentation process, overlapping characters, touching characters,
underlines and oversized features are detected and processed. After each
noise feature, word or oversized feature is processed, it is erased from
the video buffer and the buffer is again moved down until the top row
contains black. The entire page is thus decomposed into a series of noise,
oversized and character features without any requirement to locate word or
line baselines or skew angles. Features are erased from the video buffer
after they are processed to insure that each feature is processed once and
only once. Isolating text by word instead of by line or character provides
these basic advantages:
1. Segmentation problems caused by text skew can normally be ignored within
a word.
2. A smaller video buffer can be used, since a skewed line requires a much
taller buffer than an unskewed line, but a skewed word is not
significantly taller than an unskewed word.
3. Individual words do not normally touch other words, whereas individual
characters often touch other characters within a word, therefore
segmentation by word is less prone to error than segmentation by
character.
The character recognition technique works as follows:
Input: The primary input of the character recognition technique is a series
of recognition input buffers, each containing a digitized image of a
localized feature assumed by the character segmentation software to be a
single character, as well as the horizontal and vertical pixel location of
the feature, as illustrated in FIG. 2.
Output: The primary output of the character recognition subsystem is the
input recognition buffers with an associated best fit and next best fit
ASCII codes for the character in question, along with correlation scores
the best fit ASCII.
Method: The character recognition subsystem may be subdivided into two
major functions: iterative processing, in which the best match for an
unknown character is determined, and typestyle recognition, whereby the
proper typestyle for a region of text is determined, thereby increasing
throughput and accuracy. These two functions are described in detail
below.
During iterative processing, the unknown character is passed through a
dead-band correlator and compared with a series of masks for known
characters. The results of this correlation are then evaluated using tight
threshold and separation criteria. If these results are acceptable with
tight requirements, then the character is considered recognized with high
confidence. If the character fails to meet these tight threshold and
separation requirements, then the character undergoes a series of retries
to attempt to make the character tightly acceptable. The first retry
attempts to insure that the character is properly centered in the
recognition buffer. The second level of retry processing attempts to
filter small isolated noise from the character field. The third level of
retry processing attempts to remove larger isolated noise. The fourth
level of retry processing attempts to filter out any portions of an
underline which may have been missed by the character segmentation
subsystem. The fifth level of retry is a series of stroke width
normalizations. These normalizers are "burn," which attempts to reduce the
stroke width of very dark characters, and "regrow," which attempts to
widen the stroke width of very light characters. If any of these retries
causes the unknown character to become tightly acceptable based on
threshold and separation, then recognition for the character is complete.
If not, then following all retries, a loosely acceptable character can now
be accepted with high confidence. Any other characters will be rejected.
Typestyle recognition is initiated at the beginning of each page. Each
character is run through each typestyle in the system. Following
recognition of each character, the results from each typestyle are
evaluated. A character that is rejected in all typestyles is simply
rejected. A character that is accepted in at least one font is passed on
for further processing. The best typestyle is determined by comparing the
level of acceptances for each typestyle. If there is more than one
typestyle which recognized the character at the best level of threashold
acceptance, the ASCII codes from those typestyles are compared. If all of
the typestyles do not agree on the ASCII code designation, then the
character is saved in a holding buffer to await typestyle determination.
If the ASCII codes are all equal, the character is accepted, and the low
scores of all the typestyles are added to a score counter. Any typestyle
that rejects the character is eliminated. Also, any typestyle whose score
counter goes over a threshold value is also eliminated. The correct
typestyle for the page is recognized when only one typestyle is still
enabled.
Briefly, page composition by character baseline analysis works as follows:
Input: The technique assumes as primary input a series of character
recognition buffers, each containing a best fit and next best fit ASCII
value and correlation score (less likely characters and scores may also be
used) as well as the X-Y page pixel origin of the character. Secondary
inputs include records describing underlines and forms/logo features, as
for example, seen in FIG. 2.
Output: The primary output of the technique is an ASCII data string which
may be displayed on a teletype compatible display or printer to recreate
an accurate (to within a character) image of the original input document.
Secondary outputs include escape sequences embedded in the data stream,
describing underline and forms/logo feature origin and size, exact line
X-Y origin, exact page length and rejected and post-processed characters.
Method: The method is broken down into two steps: word reconstruction and
line reconstruction.
In word reconstruction, characters arriving in recognition buffers from the
character recognition process are built back into words and stored in page
image records (PIR's). This is done primarily to allow more efficient
usage of system memory: since every PIR must contain several bytes of
position and linkage information, it is desirable to store as much ASCII
information as possible in each PIR. Since characters are not normally
isolated from left to right within a word and processed by recognition in
the same order, word reconstruction is merely a process of testing if each
successive input character is to be appended to the end of the current
input PIR. When a word ends or a PIR fills, the old PIR is linked into the
active PIR chain in ascending horizontal word origin order, and a new PIR
is allocated from a free PIR chain, as illustrated in FIG. 3.
In FIG. 3, line reconstruction is invoked when two or more lines of text
are contained in the active PIR chain, or when the free PIR chain is
empty. Line reconstruction makes two passes through the active PIR chain.
The first pass determines the leftmost PIR of the topmost line in the
active PIR chain, and the second pass starts from that leftmost PIR and
outputs and restores to the free chain all PIR's on the topmost line.
As each ASCII character is added to a PIR (word), a baseline adjustment
appropriate to that ASCII character is applied to the character's base
point to yield a character baseline. Character baselines within a word are
averaged to form a word baseline. This ASCII calculated baseline is far
more accurate than a baseline derived from examination of raw digitized
video, since it is resistant to errors caused by random noise and extreme
cases (such as "soggy gypsum," which may cause a false baseline due to the
occurrence of the many characters below the baseline). This accurate word
baseline allows for the implementation of accurate positional post
processing of characters such as "P" and "p", as well as accurate
detection of subscripts and superscripts.
A series of problems are encountered when attempting to perform OCR on text
containing accented characters, as are typically found in Western European
languages, and text created on office equipment utilizing proportional
space fonts. The major problems are outlined below and a series of
techniques for the solution of these problems follows.
Multi-strike Accented Characters: On many print sources, accented
characters are formed by more than one stroke. As seen in FIG. 4, the base
character is first typed, the carriage is backed up, and the accent is
typed above the character. Because of variations in the way printers and
typewriters handle these overstrikes, the accented characters are not
easily recognized by template matching alone because of differences in the
placement of the accent above the character. Another method for
recognition of these characters is needed so that accurate results may be
obtained.
Proportionally Spaced Text: There are a number of problems which arise when
current OCR technology is used to attempt to process text which is typed
proportionally. First, because characters are packed closely together, it
is possible for two narrow characters, such as "11", illustrated in FIG.
17, to be segmented as one character, which in current systems may
mistakenly be recognized as a "U" or an "H" instead of a rejected
character. Second, in proportionally spaced text a much larger proportion
of characters touch than is typically found in fixed pitch text, so a much
higher reject rate is encountered for this reason, as seen in FIG. 13.
Lastly, because the text is being output to a device which does not have
proportional space capability, the integrity of columns and underlines is
lost with the change of spacing, as seen in FIG. 19.
Accented characters represent a special case of problems for character
segmentation and recognition. Character segmentation is affected since
many accented characters are too tall to fit into one recognition buffer.
Truncation of the character so that it fits is not a valid solution, since
too much of the data in the accent mark would be lost. Therefore, a
technique has been developed whereby anything taller than the recognition
buffer is saved in two recognition buffers, a base character and a
vertical remnant, thus conserving all data in the character field for
recognition. When recognition encounters an accented character, signalled
by recognition against one of the "cloud" masks which are generic accented
character masks, the base character is recombined with the vertical
remnant. The character is then vertically segmented into an accent mark
and a base character, which are recognized separately, so as to eliminate
any problems with placement of the accent mark. Following recognition of
the accent mark and the base character, page composition will attempt to
recombine the two ASCII codes into one code for the accented character. If
an invalid accent and base character is encountered, then the accent is
thrown away, thus eliminating noise from some characters, for example, any
accent above an "s" would be illegal, thereby filtering out the noise
above the character.
The processing necessary for character segmentation of accented characters
is as follows. At the point where a new localized feature which is assumed
to be a character is to be moved into a recognition input buffer, the
character height is checked against the height of the recognition buffer.
If the character is too tall, as seen in FIG. 5, then vertical remnant
processing is initiated. The base character is set up to be the height of
a recognition buffer and moved into one. Anything left over is moved into
a new recognition buffer and both buffers are linked vertically, as seen
in FIG. 6.
Recognition processing of accented characters takes as input a recognition
buffer which has already been run through the correlation process.
Following the correlation process previously described, the following
steps are taken to attempt to recognize an accented character.
Step 1. Does character recognize as a "cloud" mask?
If the character in question recognizes as a "cloud" mask (see FIG. 8),
then this is the signal that the character is an accented character. A
cloud mask is a composite of all possible accent marks for the current
character. The major purpose is to offset the base character downwards,
and have enough pixels above the base character so that the accent mark
does not force too high a mis-score. In any multilingual font, there will
be cloud masks for the following base characters: a, e, i, n, o, u, A E,
I, N, O, U.
Steps 2, 3, and 4. Is the character an oversized character? Recreate
oversized character in recombination buffer. Move single character into
recombination buffer.
These steps create an exact copy of what the character in question looked
like in the video buffer. This copy is created in a special buffer known
as the recombination buffer which is tall enough to hold the maximum
height character.
Step 5. Initial separation of accent and base character.
This step looks at the image of the character and determines the most
likely place to separate the accent from the base character in the
recombination buffer. The initial separation point is determined by white
space within the character or a position with small density in the
horizontal direction if the accent is touching the base character. The
accent and the base character are then moved into two separate recognition
buffers, as seen in FIG. 9.
Step 6. Run accent and base character through correlation.
The accent and base character recognition buffers created in either Step 5
or 8 is run against the current font to search for the best match for
each. The fonts are arranged such that the accent will only be run against
accents, thus eliminating the possibility of an accent being mistakenly
recognized as a punctuation symbol, such as a ",".
Step 7. Did the base character recognize the tight threshold?
This step tests to see if the base character is recognized with a high
level of confidence.
Step 8. Try next separation point.
If the base character did not recognize then find the next possible
position to separate the accent and the base character in the
recombination buffer and resegment the accent and the base character and
move them into their associated recognition buffers.
Page composition processing of accented characters involves the
recombination of the vertically linked pair of character buffers created
by recognition (ASCII codes for the accent mark and the base character)
into one accented character ASCII code. The steps taken to perform this
are as follows (see FIG. 10):
Step 1. Current recognition buffer vertically linked with another or
stand-alone accent positionally above a base character?
Steps 2, 3 and 4. Did top recognition buffer recognize as an accent mark?
Is the combination of accent and base character valid?Discard top
character.
These steps are used to determine if the vertically linked pair of
recognition buffers form a valid accented character. The first test is a
check for a well recognized accent mark in the top character buffer. If
this test passes, the pair is then tested to see if together they form an
accented character (an umlaut above an "a" would be valid, whereas a grave
above an "s" would be invalid). If either of these tests fail, then the
accent is thrown away as noise.
Step 5. Recombine accent and base character to yield accented character.
This step changes the ASCII code of the base character to be the resultant
accented character. Following this recombination, the top buffer is
discarded.
For the recognition and page composition subsystems to work most
effectively, they must have some indication of the pitch of the text being
processed. That is, whether the text is typed in fixed pitch or
proportional space (P.S.). The following describes how the character
segmentation subsystem computes this pitch information.
In order to determine the pitch of a text region, the pitch of each word is
determined using several sources of information: (1) the variation in the
spacing between the centers of the characters of the word, (2) the
difference in the mean of the character spacing between the word and its
neighbors, and (3) the differences in the pitch determined for the word
and that of its two neighboring words. Using these three sources of
information, the pitch of the text can be determined with an estimated
confidence.
Two scores are established for each word, a fixed pitch score and a P.S.
score, the values of which depend on the first two information sources
mentioned above. The final scores for a word are tallied and the pitch
with the larger score value is selected as the correct pitch. Next, the
confidence measure of this pitch is determined using the third information
source above. The confidence measure can be one of three levels; no
confidence, some confidence, and high confidence.
Procedure (see FIG. 11).
Each time a word has been completed by character segmentation, i.e., has
been broken up into separate characters, four steps are executed to
compute the pitch and its associated confidence level.
Step 1. Two statistics on that word are computed; the mean distance between
the centers of the characters, and the deviations per character from this
mean.
Step 2. Using the statistics from Step 1, the scores for fixed pitch and
P.S. are computed as follows: If the value of the mean distance between
characters is within the range established for fixed pitch, a small weight
is added to the fixed pitch score. If it is out of that range, a large
weight is added to the P.S. score. In the same way, if the deviation per
character from this mean is less than a certain threshold, then the fixed
pitch score is given additional weight, and if it is greater, then the
threshold weight is given to the P.S. score. Based on the values of these
two scores, a confidence measure is given to the estimated pitch.
Step 3. In addition to using the statistics generated for the given word,
those of the previous and subsequent words are used as well, In
particular, the difference in the mean values of the character to
character distances, as well as the actual value of this mean distance. If
the differences in the means is small, then weight is added to the fixed
pitch score; if the difference in the means is large, then weight is added
to the P.S. score. In addition, if the actual mean of the previous or
subsequent words are out of the fixed pitch range, weight is added to the
P.S. score; and if either is within the range, weight is added to the
fixed pitch score.
Step 4. The greater of the two scores is chosen as the pitch for the given
word, and the last step is to determine the confidence in this pitch. This
is done using the pitch estimated for the given word, in context with the
pitch of the previous and subsequent words and their associated
confidences as computed in Step 2. Based on the similarity, or difference
of the neighboring pitches and their respective confidences, the final
determination of the pitch and its confidence is made.
By using this method, changes in pitch may be detected as soon as they
occur on the page, thus avoiding possible errors.
When two or more characters are touching so that no white path (vertical or
kerned) can be found, character segmentation makes a guess as to the best
location to separate the character images. This guess works well when the
cut is between two serifs. However, in some cases (as in FIG. 13), the
selected cut position is wrong. The following details a technique whereby
these mis-cut characters can be corrected in character recognition by
iteratively making cuts and decisions based on the recognition results.
The input to this process is a recognition buffer which has already been
run through the correlation process.
Process (see FIG. 14):
Step 1. Did character recognize with tight threshold acceptance?
This test is a check to see if the current character was not able to be
tightly accepted by the correlation process regarding threshold (low
score).
Step 2. Is character touching character to its right?
All recognition buffers which are sent from character segmentation to
recognition are linked to the recognition buffers which contain the
characters to the left and the right.
Each of these links contains a flag which specifies whether the two
characters were touching before segmentation was performed (see FIG. 13).
This step tests the current character to see if it was touching the
character to its right.
Step 3. Recombine characters and correlate character to right.
The first portion of this step is a re-creation of the exact appearance of
the touching character pair prior to segmentation in the recombination
buffer. The right character of the pair is then run through the
correlation process to set up its scores for the following steps.
Step 4. Determine new segmentation point.
Based on the scores of the touching character pair, a new segmentation
point is determined. If neither character recognized at all, then the new
segmentation point will be three pixels to the right of the original one.
If one or both was loosely acceptable based on threshold, then the new
segmentation point will be one pixel to the right of the original one.
Once the new segmentation point is determined, then the new recognition
buffers are formed from the recombination buffer, separated at the new
segmentation point. If processing is currently on a later pass, then the
scores are compared against the previous scores. If there was an
improvement in the scores, then the new segmentation direction will be in
the same direction as the previous, but one pixel further out. If the
scores got worse, then the previous cut direction is investigated. If the
previous cut was to the left, then the new cut will be to the right of the
best previous cut point. If the previous cut was to the right, then
processing is done and the best scores to date are used for the touching
character pair and an invalid segmentation point indication is sent on to
Step 5. Once the new segmentation point is determined, then the new
recognition buffers are formed from the recombination buffer, separated at
the new segmentation point. For the touching character example given in
FIG. 13, the result will finally be the two recognition buffers shown in
FIG. 15.
Step 5. New segmentation point valid?
This step tests to insure that the new segmentation point determined in
Step 4 is valid based on the indicator sent from that step.
Step 6. Correlate left and right characters.
The new pair of recognition buffers is then run through the correlation
process to see what kind of improvement, if any, was made.
Step 7. Do both characters now have tight threshold requirement?
This step tests to see if at this point, touching character processing has
yielded a pair of tightly acceptable characters based on their low scores.
Step 8. Label left character for page composition.
Touching character processing was not able to resolve the pair of
characters to a high degree of satisfaction, due to factors such as
overlapped characters, noise, etc. The left character of the pair is
labelled as an accepted or rejected character, based on threshold and
separation, and passed on to page composition.
Step 9. Does right character have tight acceptance?
Following the release of the left character, the right character is tested
for tight acceptance based on threshold.
Step 10. Is right character touching character to its right?
If the right character is not acceptable at this time, then it is tested to
see if it is touching the character to its right.
Step 11. Set up for new left and right characters.
If the right character was not acceptable and was touching the character to
its right, then the current right character becomes the new left
character, and the character to its right becomes the new right character.
Thus, strings of touching characters can be resolved into recognizable
characters. Processing continues with Step 3.
Steps 12, 13 and 14. Label characters as accept or reject. Label both
characters. Label right character.
The character in question is labelled as an accepted or rejected character
based on threshold and separation and passed on to page composition.
Often, when dealing with proportionally spaced test, two adjacent small
characters can often touch, and therefore be segmented as one character by
the character segmentation subsystem. These multiple characters must be
resolved by the recognition subsystem. When a character does not
recognize, one of the additional character retries involves dividing a
single character feature from character segmentation into more than one
character when the page is known to be typed in P.S., either from the font
or the pitch determined in character segmentation. See FIGS. 17 and 18.
The input to this process is a recognition buffer which has already been
run through the correlation process.
Process (see FIG. 16):
Step 1. Did character recognize with tight threshold acceptance?
This test is a check to see if the current character was not able to be
tightly accepted by the correlation process regarding threshold (low
score).
Step 2. Is page proportionally spaced?
This test determines whether or not the character in question is from a
proportionally spaced page. The determining factors in this decision are
either the font (each P.S. font contains a flag which identifies it as
being proportional space) and/or the pitch determined by the character
segmentation process (previously described).
Step 3. Perform double character test.
This test investigates | | |