|
|
|
| United States Patent | 5583949 |
| Link to this page | http://www.wikipatents.com/5583949.html |
| Inventor(s) | Smith; Raymond W. (Bristol, GB2);
Robson; Christopher J. (Bristol, GB2) |
| Abstract | Optical character recognition is achieved by a system which comprises a
scanner for scanning a document, an edge extractor for identifying edges
in the image produced by the scanner to produce an outline of each object
identified in the image, a segmentation facility for grouping the object
outlines into blocks, means for identifying features of the outlines, and
a final classification stage for providing data in an appropriate format
representative of the characters in the image. Also disclosed are a novel
edge extractor, a novel page segmentation facility and a novel feature
extraction facility. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5583949 |
|
|
Apparatus and method for use in image processing |
|
|
|
|
|
| Publication Date |
December 10, 1996 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This is a continuation of application Ser. No. 07/956,593, filed Oct. 5,
1992 now abandoned, which is a continuation of application Ser. No.
373,137, filed Jun. 28, 1989, now abandoned. |
|
| Priority Data |
Mar 03, 1989[EP]89302122 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 5185809 Kennedy 382/131 Feb,1993 |      Your vote accepted [0 after 0 votes] | | 5131053 Bernzott 382/176 Jul,1992 |      Your vote accepted [0 after 0 votes] | | 4899225 Sasuga 358/448 Feb,1990 |      Your vote accepted [0 after 0 votes] | | 4833721 Okutomi 382/197 May,1989 |      Your vote accepted [0 after 0 votes] | | 4754489 Bokser 382/230 Jun,1988 |      Your vote accepted [0 after 0 votes] | | 4712248 Hongo 382/199 Dec,1987 |      Your vote accepted [0 after 0 votes] | | 4680805 Scott 382/197 Jul,1987 |      Your vote accepted [0 after 0 votes] | | 4566124 Yamamoto 382/185 Jan,1986 |      Your vote accepted [0 after 0 votes] | | 4504969 Suzuki 382/175 Mar,1985 |      Your vote accepted [0 after 0 votes] | | 4213150 Robinson 348/26 Jul,1980 |      Your vote accepted [0 after 0 votes] | | 3925760 Mason 382/310 Dec,1975 |      Your vote accepted [0 after 0 votes] | | | | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. An optical character recognition system comprising:
scanning means for optically scanning a document to produce a grey level
image thereof;
edge extractor means comprising:
identifier means for identifying points along an edge within said grey
level image using grey level values so that said points so identified
represent substantially the strongest edge;
tracking means for automatically tracking the edge using grey level values
to determine if the edge forms a closed loop and if so defining the edge
as an outline,
said identifier means identifying alternate points of the edge if the edge
does not form a closed loop and said tracking means automatically tracking
an alternate edge associated with said alternate points together with at
least some of said points on said strongest edge and determining whether
the alternate edge forms a closed loop and if so defining the alternate
edge as the outline; and
means for producing data indicative of an object based on at least one
outline identified in said image, each outline comprising at least a part
of one character; and
processing means for processing the data provided by said edge extractor
means to produce an output representative of the characters in said image.
2. The system as claimed in claim 1, wherein the outlines identified by
said edge extractor means are subjected to polygonal approximation means
for processing the data prior to further processing by (1) following the
outline of each object to identify a series of points on said outline
having a predetermined characteristic, (2) measuring the depth of the
concavity between each adjacent pair of said points, and (3) removing
concavities having depths less than a predetermined value, whereby the
output of said polygonal approximation means approximates the outlines of
the characters in said image by a sequence of straight lines.
3. The system as claimed in claim 1, wherein the grey level image comprises
a pixel array, said edge extractor being arranged to locate edges by means
of an edge operator which generates edge values for each pixel from the
grey level value of pixels adjacent to said pixel.
4. The system as claimed in claim 3, wherein the edge operator process a
plurality of grey level values and is defined with respect to a 3.times.3
matrix neighborhood of x as follows:
______________________________________
5 6 7
4 X 0
3 2 1
______________________________________
in which the numbers represent the chain codes of adjacent points, the edge
value of an adjacent point with chain code n being:
grey level (n+1) (modulo 8)-grey level (n-1) (modulo 8).
5. The system as claimed in claim 4, wherein each of the edges is traced in
accordance with an algorithm in which, for a point with chain code n, edge
values are derived for points with chain codes n-2, n-1, n, n+1, and n+2
from the current position X, the highest edge value is marked as the
current position X and defines the strongest point, and the process is
repeated to identify and mark successive strongest points.
6. The system as claimed in claim 1, wherein said processing means includes
a page segmentation facility for arranging the data indicative of the
outlines into a desired order and means for identifying particular
features of said outlines.
7. The system as claimed in claim 6, wherein said page segmentation
facility identifies groups of said outlines to form said outlines into
blocks of outlines and arranges said blocks into said desired order.
8. The system as claimed in claim 7, wherein the page segmentation facility
can identify blocks as text or non-text blocks by determining whether the
proportion of spurious characters in the block exceeds a defined threshold
value.
9. The system as claimed in claim 8, wherein the block type is determined
on the basis of a classification of each outline.
10. The system as claimed in claim 9, wherein data representative of the
classification of each outline is provided by an object outline
classification process.
11. The system as claimed in claim 8, wherein the block order is determined
in accordance with the manner in which the blocks nest.
12. The system as claimed in claim 6, wherein the output of said page
segmentation facility comprises data representative of at least one
possible class to which the object represented by said at least one
outline belongs together with a weighting value, said weighting value
indicating the probability that the outline is associated with a
particular class.
13. The system as claimed in claim 1, wherein said edge extractor means
further comprises a feature extraction facility for processing said data
to identify particular features of each outline and to classify the
outlines into classes in accordance with the identified features.
14. The system as claimed in claim 13, wherein the features comprise any
one of concavity, closure, axis, symmetry or line either singly or in
combination.
15. The system as claimed in claim 6, wherein said processing means
includes a final classification stage for making the final classification
of each of said outlines, wherein said final classification stage operates
on the data output by the page segmentation facility using lexical rules,
Markov chain analysis and dictionary search techniques.
16. The system as claimed in claim 15, wherein the output of said final
classification stage is in ASCII code.
17. The system as claimed in claim 1, wherein the operation of the edge
extractor derives an edge value for a pixel based on the grey level values
of pixels immediately adjacent to the pixel based on the direction of
tracking.
18. An image processing system comprising an edge extractor for
automatically producing data indicative of the outline of objects in a
grey level image generated by an optical scanning device, wherein said
outline comprises at least part of one or more characters, said grey level
image comprising a pixel array, said edge extractor comprising means for
sequentially scanning said pixel array to locate a first strongest edge
and for tracing the outline of one of the objects starting with said first
strongest edge and by locating a plurality of additional strongest edges
contiguous with one another to determine whether or not the entire outline
can be traced using only said first and additional strongest edges and
automatically locating alternate edges that are used to complete the
entire outline when said additional strongest edges do not complete the
outline, said edge extractor further comprising edge operator means for
generating edge values for each pixel using grey level values, wherein the
operation of the edge extractor is such that the edge value for a current
pixel is derived from the relative values of the grey level values of the
pixels which are immediately adjacent to the current pixel based on the
direction of tracing.
19. The system as claimed in claim 18, wherein the edge operator processes
a plurality of grey level values and is defined with respect to a
3.times.3 neighborhood matrix of x as follows:
##STR1##
in which the members represent the chain codes of adjacent points, the
edge value of an adjacent point with chain code n being:
grey level (n+1) (modulo 8)-grey level (n-1) (modulo 8).
20. The system as claimed in claim 19, wherein said outline is traced in
accordance with an algorithm in which for a point with chain code n, edge
values are derived for points with chain codes (n-2), (n-1), n, (n+1), and
(n+2) from the current position, the point with the strongest edge values
is marked, and the process is repeated to identify and mark successive
strongest edge values.
21. The system according to claim 18, wherein the edge extractor further
comprises a polygonal approximation facility for approximating the outline
of an object generated by the edge extractor by at least one polygon.
22. An edge extractor according to claim 18, wherein the edge outline is
represented by a closed loop.
23. An image processing system comprising means for scanning a document to
generate an image represented by grey level values and automatically
producing data representative of an outline of an object in the image
comprising an edge extractor for tracing said outline by generating an
edge value for a pixel derived from the grey level values of the pixels
which are immediately adjacent to said pixel based on the direction of
tracing, said edge extractor locating a strongest edge based on said edge
values and tracing said outline by locating additional strongest edges if
said additional strongest edges in combination complete the outline, said
edge extractor automatically locating an alternate edge to complete the
outline when said additional strongest edges do not complete the outline,
said system further comprising a page segmentation facility for
identifying groups of outlines to form said groups into blocks of outlines
and for arranging said blocks into order, wherein said outlines are
representative of at least a portion of one or more characters.
24. The system as claimed in claim 23, wherein the page segmentation
facility can identify blocks as text or non-text by determining whether a
proportion of spurious characters in the block exceeds a defined threshold
value.
25. The system as claimed in claim 23, wherein the block type is determined
on the basis of a classification of each outline.
26. The system as claimed in claim 25, wherein the data representing the
classification of each outline is provided by an object outline
classification process.
27. The system as claimed in any claim 24, wherein the block order is
determined in accordance with the manner in which the blocks nest.
28. An image processing system comprising means for scanning a document to
generate an image represented by grey level values and automatically
producing output data representative of an outline of an object in the
image, the system comprising an edge extractor for tracing said outline by
generating an edge value for a pixel derived from the grey level values of
pixels immediately adjacent to said pixel relative to the direction of
tracing, said edge extractor locating a strongest edge based on said edge
values and tracing said outline by locating additional strongest edges if
said additional strongest edges in combination complete the outline, said
edge extractor automatically locating an alternate edge to complete the
outline when said additional strongest edges do not complete the outline,
said system further comprising a facility for processing said data to
identify particular features of each outline and to classify the outlines
into classes of characters likely to represent the character for each
outline in accordance with the identified features, and for classifying as
spurious outlines those outlines not representing a character, wherein the
features identified comprise one or more of concavity, closure, axis,
symmetry and line.
29. The system as claimed in claim 28, wherein said output data is produced
by said edge extractor.
30. The system as claimed in claim 28, wherein the classification data
includes a rating or weighting value indicative of the strength of the
match of the outline to the class with which the object has been
associated.
31. A method of processing a document to produce data representative of
characters on said document, said method comprising the steps of:
scanning said document to produce a pixel array of grey level values
representing a grey level image of the document;
scanning the grey level values and for each grey level value comparing the
grey level values of pixels immediately on either side of the pixel being
scanned;
locating a first edge associated with one of the pixels based on the grey
level values so compared;
automatically locating additional edges and tracing a strongest edge in
said grey level image representing an outline comprising at least a
portion of one or more characters in said image by comparing the grey
level values immediately adjacent the pixel associated with each said
additional edge; and
determining whether or not said strongest edge forms a complete outline;
automatically identifying an alternate edge that does form a complete
outline when said strongest edge does not form a complete outline, wherein
the alternate edge is identified by comparing grey level values;
processing data representing said strongest and said alternate edges that
form complete outlines to provide data representative of the characters in
said image, said processing step comprising automatically segmenting and
classifying the data into blocks representative of a plurality of
outlines, each outline comprising at least a portion of one or more
characters.
32. The method as claimed in claim 31, wherein the data is subjected to a
polygonal approximation step.
33. The method as claimed in claim 31, wherein said processing step
comprises identifying a feature or features of said complete outlines and
providing data representative of a class to which the object represented
by said complete outlines belong.
34. A method as claimed in claim 33, wherein said features include
concavities and closures. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
This invention relates generally to the field of optical character
recognition.
BACKGROUND OF THE INVENTION
Optical character recognition devices read text from a printed page and
produce an output representative of the text in an appropriate format,
e.g. ASCII code. There are two basic techniques available in optical
character recognition. One is based on matrix matching and the other on
feature extraction. Matrix matching involves comparing a bit mapped image
of a character with a set of templates. A major disadvantage of matrix
matching is that it is limited to certain fonts and is very susceptible to
character spacing. This is a particular drawback in relation to
proportionally spaced characters.
Feature extraction involves recognition of selected features of characters
and comparison of those features with stored versions of the features on
which the system has been trained. Techniques based on feature extraction
have the advantage that they are better able to read many different fonts.
However known feature extraction methods are complex to implement and have
a slow operating speed.
The present invention provides novel aspects which have particular
application to optical character recognition based on feature extraction.
Some of these methods and apparatus have more general application in the
field of image processing.
SUMMARY OF THE INVENTION
A first aspect of the present invention provides an optical character
recognition system comprising scanning means for optically scanning a
document to produce an image thereof, means for identifying edges within
said image and for identifying edge paths to produce data indicative of
the outline of each object identified in said image, and means for
processing the data provided by said edge identifying means and arranged
to provide data, in an appropriate format, representative of the
characters in said image.
A second aspect of the present invention provides an edge extractor for
producing data indicative of the outline of objects in a grey level image
generated by an optical scanning device and which comprises a pixel array,
said extractor being arranged to sequentially scan said image array for an
edge and for each edge located trace an outline of the object associated
with said edge, said edge extractor being arranged to locate edges by
means of an edge operator which generates edge values for each pixel.
A third aspect of the present invention provides an image processing system
of the type which scans a document and which produces data representative
of the outline of each object identified on the document, said system
including a page segmentation facility for identifying groups of said
object outlines to form said outlines into blocks of outlines and for
arranging said blocks into order.
A fourth aspect of the present invention provides an image processing
system of the type which scans a document and which produces data
representative of the outline of each object identified on the document,
said system including a facility for processing said data to identify
particular features of each outline and to classify the outlines into
classes in accordance with the identified features.
A fifth aspect of the present invention provides a system for processing a
document to produce data representative of characters on said document in
which said document is scanned to produce an image thereof and an edge
identifying means processes the image data to produce data representative
of the outline object identified in the image, the system including a
polygonal approximation facility for reducing the number of irregularities
in each outline said polygonal approximation facility operating by
following the outline of each object to identify a series of points on
said outline having a predetermined characteristic, measuring the depth of
the concavity between each adjacent pair of points and removing
concavities having depths less than a given value.
A sixth aspect of the present invention provides a method of processing a
document to produce data representative of characters on said document,
said method comprising scanning said document to produce an image thereof,
identifying edge paths in said image, and processing data representing
said edge paths to provide data, in an appropriate format, representative
of the characters in said image.
A seventh aspect of the present invention provides a method for processing
data representing the edge outlines in an image of a document to produce
polygonal approximation of said outline, said method comprising following
each outline to identify a series of points along said outline which have
a predetermined characteristic, measuring the depth of the concavity
between each adjacent pair of points and removing concavities having
depths less than a given value.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be described now by way of example only with particular
reference to the accompanying drawings. In the drawings:
FIG. 1 is a block schematic diagram of an optical character recognition
system in accordance with the present invention;
FIGS. 2(A), 2(b), 3(a), 3(b) and 4 illustrate the operation of the system
of FIG. 1;
FIG. 5 is a block schematic diagram of an edge extractor which is used in
the system of FIG. 1;
FIGS. 6(a)-6(c), 7(a)-7(d), 8(a)-8(d), 9(a)-9(d), 10(a), 10(b), 11(a) and
11(b) illustrate the operation of the edge extractor of FIG. 5;
FIG. 12 is a block schematic of the text ordering part of the system of
FIG. 1;
FIGS. 13 and 14 illustrate the operation of the polygonal approximation
facility;
FIGS. 15(a), 15(b), 169(a), 16(b), 17(a), 17(b), 18(a), 18(b), 19(a),
19(b), 20, 21(a) and 21C. illustrate the way in which character features
are identified;
FIG. 22 is a block schematic diagram of the feature extraction, text
ordering and segmentation, and final classification part of the system of
FIG. 1;
FIGS. 23(a)-23(d) show several examples of the way in which a letter "E"
can be represented;
FIG. 24 is a functional representation of the template generation facility;
FIGS. 25(a), 25(b), 26, 27(a)-27(d) illustrate the operation of the
template generation facility;
FIGS. 28(a), 28(b), 29, 30 and 31 illustrate the operation of the
segmentation and final classification stages.
DETAILED DESCRIPTION
Referring to FIG. 1 an optical character recognition system comprises a
scanner and scanner interface 10, an edge extractor 11, a polygonal
approximation means 13, a text ordering facility 12 which operates in
conjunction with a feature extractor 14 and a merger/segmenter 15 and a
final classifier 16. The scanner can be any suitable commercially
available device capable of producing a grey level image. One such device
which can be used is a Hewlett-Packard HP9I90A scan-jet device. This
device produces a 16 grey level image with a resolution of 300 pixels per
inch. At this resolution each A4 page can be read in approximately 30
seconds and requires a storage capability in excess of 4 megabytes
(Mbytes). This type of device makes use of a linear charge couple device
array as the light sensing element and a typical grey level image for the
letter "a" is shown in FIG. 2. FIG. 2A shows the image form and FIG. 2B
the form of the image in hexadecimal values. In these values 0 represents
black and F represents white, the values 1 to 9, A to E in between
representing various levels of grey.
The edge extractor 11 operates by raster scanning the grey level image to
locate edges. When an edge is found it is followed all the way around
until a closed loop is formed thereby forming an outline of an object,
e.g. character associated with the edge. If the edge does not form a
closed loop it is ignored. A significant feature of the present optical
character recognition arrangement is the use of a grey level image in
conjunction with an edge extractor 11. The following are important
advantages provided by this arrangement.
1. Looking for edges rather than black regions provides the ability to read
any color of text on the background of any color provided that the
contrast is sufficiently great. This is not possible by thresholding
without manually selecting for positive or negative print.
2. In threshold type arrangements, when an image is thresholded non-text
portions of the image form the same kind of black regions on a
white-background as text. This causes the character cell location
algorithm to be confused. With a grey level image, the vast majority of
edges in the non-text region do not form closed loops and are therefore
rejected. Any edges which do form closed loops will either not look like
any known character, or will fail to form a reasonable text line. Hence
grey level images provide discrimination between text and non-text.
3. Thresholded images have inadequate quality for use either in document
storage or in new documents. Half-toning algorithms for scanning destroy
text; therefore to obtain the text and images from a page in one scan,
grey scale is essential.
4. Apart from a simple raster scan of the entire image from the scanner,
the edge extractor examines only the edges of characters. The number of
pixels in the image that are analyzed in detail is thus reduced by about
90%, thereby making it possible to process a page in a reasonable time.
The output of the edge extractor is a long sequence of steps between
neighboring pixels representing the outlines of the characters on a page
which have been scanned. Prior to transmission to the text ordering
facility this output from the edge extractor 11 is subjected to a
polygonal approximation step in which the outline of the characters are
approximated by sequences of straight line segments, i.e. polygons. There
are two main reasons for approximating the outlines prior to passing them
on to the next stage.
1. Both polygonal approximation and the processes used in feature
extraction consume linear time with respect to the number of points on the
input polygon. The constant of proportionality however is much larger in
the case of feature extraction. Consequently, polygonal approximation
results in considerable time saving.
2. It will become apparent that one of the most important features used in
character recognition is the presence of concave regions in the outline.
Polygonal approximation straightens out the zig-zags in the outline which
would otherwise produce false concavities and confuse the recognition
process.
An example of the polygonal approximation is shown in FIG. 3A and 3B. FIG.
3A represents the typical edge path of a character "a" and FIG. 3B is the
equivalent outline after It has been subjected to polygonal approximation.
The output of the edge extractor 11 can be considered as a series of image
blobs each of which represents the edge path of an object typically a
character together with information on its bounding rectangle. This is the
main input to the text ordering process 12, feature extractor 14, and
merger segmenter 15. The function of the feature extractor 14 is to
extract from the outline of the characters certain fundamental features.
The particular features which have been selected for extraction by the
present arrangement are as follows:
1. Concavities. In general terms a concavity is an area that can be filled
and which may include some convex points. Three concavities are
illustrated by circles 20 in FIG. 4.
2. Closure. A closure is a region of the background which is completely
enclosed by the character. In FIG. 4 a closure is shown by circle 21. In
certain fonts characters such as "p" and "e", are printed with an
incomplete closure. It is the presence of functional closure rather than
physical closure which is to be detected.
3. Lines. A line is a straight part of the outline. Line detection is
complicated by the polygonal approximation since curves are converted into
a small number of straight lines. They are thus rendered less distinct
from actual straight lines.
4. Axes. An axis is sought for objects which have no concavity or closure.
The axis feature measures the ratio of the lengths of the major and minor
axes of a convex object. It is used to distinguish between characters such
as a dot and a straight comma.
5. Symmetry. Symmetry is measured by comparing parts of the character on
either side of an axis. Measurement of symmetry is difficult in the case
of italicized text, where the axis of symmetry is not perpendicular to the
direction of measurement.
In broad outline, using the features set out above, an unknown character is
matched by the feature extractor 14 with a generic character class, i.e.
it produces a class or set of ASCII characters which the blob could be.
The mapping of generic character classes to ASCII characters is not
one-to-one. A small number of characters appear in more than one
fundamental shape, for example "g". Such characters must have more than
one class which map to the same ASCII character.
A more common case is the single class which can map one of several
characters. In certain fonts for instance, the characters "1" (one), "l"
(lower case l) and "I" (capital I) are identical. The only way to
distinguish them is by context. At the feature extraction stage there will
be several generic classes to cover the various physical shapes of these
three characters. Most of these classes will map to one of two or three
ASCII characters, depending on the particular shape.
The main function of the text ordering process 12 is to identify the
document structure and extract the correct ordering of the text within the
document. The principal steps in text ordering are as follows:
A. Identify separate regions (blocks) within the document.
B. Classify the blocks into text or graphics.
C. Order text blocks into logical reading sequence.
D. Order text blocks into character order.
E. Deduce white space characters.
F. Associate parts of deliberately broken characters.
G. Identify potentially merged characters.
H. Resolve intrinsic character ambiguities
(those which are a feature of the font) by geometry.
The merger segmenter 15 makes an attempt to merge the outlines of two or
more blobs which have been identified as parts of an unintentionally
broken character or to segment the outline of a blob which has been
identified as the ligature of two or more characters. This process gives
the system the capability of reading relatively poor copies such as that
output from a photocopier.
The final classifier 16 makes the final decision as to the ASCII character
associated with each blob by resolving any outstanding ambiguities, for
example, the final classification can discriminate upper and lower case
characters by using contextual information. It can also use context to
resolve difficult ambiguities such as distinguishing between the number
"1" a small letter "l" and a capital "I". The techniques available to the
final classifier 16 include lexical rules (such as no numbers within
words), Markov chain analysis and dictionary search.
The above is a general outline of the overall structure of the optical
character recognition system. Certain elements of this overall structure
will now be described in more detail.
As has been explained earlier an important element in the present system is
the edge extractor. The edge extractor is represented functionally in FIG.
5 of the drawings. The input to the edge extractor is a grey level image
which comprises an array of pixels which are stored as shown at 28. This
array is raster scanned as indicated at 30 to locate an edge, that is a
significant difference in grey level between neighboring pixels. Once an
edge is located the edge extractor operates to trace an outline as
indicated at 31. The tracing continues until a closed loop edge path has
been located as indicated at 32. This closed loop path is approximated as
indicated at 33 to produce a polygonal approximation. The outlines are
associated as indicated at 34 to produce a blob outline of an object which
is fed to the text ordering process. The edge extractor 11 uses a novel
edge operator to locate and follow edges around outlines until they form a
closed loop. This operator is designed to find the edge strength
perpendicular to the direction of travel between neighboring pixels. FIG.
6 illustrates the way in which the edge extractor 11 works. FIG. 6A shows
a 3.times.3 array of pixels it being understood that this is a small part
of the image of a document which has been scanned which will typically be
2,400.times.3,300 pixels. As has been indicated the edge extractor raster
scans the image and "X" represents the current point on the scan. It will
be seen that on FIG. 6A the chain code of each point in the neighborhood
of "X" is given by a number in the box corresponding to that pixel. If "X"
is the present position then the edge value at the next position with
chain code "N" is the difference between the grey levels at points with
chain codes (N+1) (modulo 8) and (N-1) (modulo 8). Thus referring to FIG.
6A the edge value at the point with chain code 0 is found by subtracting
the grey level at 7 from the grey level at 1. This step is illustrated in
FIG. 6B; FIG. 6C shows a similar representation of the operator for chain
code 7. It should be noted that there is only a single operator for each
direction and that the result is signed. An edge is indicated when there
is an edge value greater than a specific threshold.
Once an edge is located an edge following algorithm comes into effect. The
main process is to apply an edge operator corresponding to chain code 4 in
a raster scan of the image. The raster scan thus locates the top edge of a
new object which typically will be a character of text. When such a point
is located with an edge value greater than a specific starting threshold
and of either sign, the raster scan repeats a check one pixel lower in the
image. When an edge is located there must be an edge of the same sign and
strength greater than the starting threshold over a 2 by 2 cell-to ensure
that the edge is significant. The light and dark grey levels from the
strongest of these edges are used to determine a center value which is the
average of the light and dark grey levels. On detection of an edge raster
scanning is suspended and the line tracking process is started from the
point with the higher edge value. The double check is provided to ensure
that the edge is significant and that tracking starts at the strongest
edge value. A region of grey values centered on the center value will be
regarded during outline following as "grey", those above as "white" and
those below as "black".
The tracking algorithm operates essentially as follows. Line tracking
commences by tracking left from the starting point, i.e. from the point
shown as chain code 4 on FIG. 6 and attempts to complete an anti-counter
track of an outline. Given the chain code N of the last direction taken,
edge values are found at each of the five points with chain codes N-2,
N-1, N, N+1 and N+2 (modulo 8) from the current position. We define the
five or less points which have the correct sign and are above the
threshold, the edge set corresponding to this position and direction. From
the edge set the point with the strongest edge value is found. The choice
of the next point is non-trivial, but in most cases is the point with the
strongest edge value.
This is illustrated in FIG. 7 of the drawings. In FIG. 7A there is shown a
5.times.5 array of pixels and the values in each square of the array
represents the actual grey level values for part the character "a" which
was illustrated in FIG. 2 of the drawings. Assuming that the current point
is that circled, i.e. point 80 and the last move was in direction 4 in
FIG. 6, then the edge values which would be calculated at the circle 80
are shown in FIG. 7B. The next step is to the rounded square 81
corresponding to edge value 13 shown in FIG. 7B. The edge values shown in
FIG. 7C are those calculated at this square. FIG. 7D demonstrates how the
calculated edge values change with the last chain code and represent the
edge values at the diamond 82.
FIG. 7 shows that at each step there are several useful edge values. As
long as the edge following process is in progress, points on the edge path
are marked current to indicate that they form part of the outline being
processed. If only one point is marked at each step the raster scan will
later find unused points on the same edge and attempt to follow the
outline again. Therefore as many points as possible (up to a limit of
three) are marked subject to the following constraints:
A. The strongest edge is always marked.
B. All points to be marked must be in the edge set and have edge values
over the tracking threshold.
C. All marked points must be adjacent.
Thus in FIG. 7 the points marked will be 13, 10 in FIG. 7B, 11, 12 in FIG.
7C, and 12 in FIG. 7D assuming a threshold of 4.
If at any time no points are found over the tracking threshold then the
edge is considered to have faded, the object is discarded and the whole
edge is remarked as deleted. The tracking process continues until the edge
fades, a closed loop is completed, or the edge coincides with a previously
completed outline. When an edge closes a satisfactory loop, all points on
the loop are remarked used. The reason for changing all the marks when an
edge is complete is to enable the discrimination between collisions with
other lines and closing up the loop.
FIG. 7 also illustrates that there are instances when the next step is not
an obvious choice. For example at the rounded square in FIG. 7C the
highest edge values are 12 and 11. In many cases the choice has an
insignificant impact on the result, although this is not necessarily
always the case. The choice can have an effect in the following
situations.
A. Line cross over: In some fonts lines are of variable width. At the
thinnest part of the line it may be possible to follow an edge path across
the line and thus incorrectly break-up the character.
B. Object merger: Some characters may be printed in very close
juxtaposition. At the point when the two characters almost touch, there
can be a two-way decision to be made when traversing either character.
Both decisions must be correct otherwise the outline of one will collide
with the other, resulting in the loss of one or both characters.
C. Fading: Where an edge is blurred it can be necessary to take a path
corresponding to a weaker edge value in order to avoid the edge fading at
a subsequent step.
The problems are exemplified in FIG. 8 of the drawings. FIG. 8A shows a
portion of the character "a" which is illustrated in FIG. 2 of the
drawings, particularly that part of the "a" at the point where the top of
the loop meets the stem. The darkest pixels are shaded. FIG. 8B gives the
edge values at the circled point 90, showing two points with an edge value
11. The edge sets from each of these points are shown in FIGS. 8C and 8D
with the possible destinations indicated by a rounded square and a diamond
in FIG. 8A. The rounded square is the correct path and the diamond is on
the wrong side of the loop.
Problem B referred to above is illustrated in FIG. 9. FIG. 9A shows the
correct outlines for a pair of closely spaced characters. The starting
point for following each outline is marked with a +sign. In FIG. 9B a
different choice was made while tracing the "t" resulting in a merging of
the characters. The same path was followed in FIG. 9C, but with a
different decision on the second path through the difficult region. The
result is that the "i" is completed, but the edge did not close at the
start of the loop and there fore both characters were lost. FIG. 9D shows
the situation where the "t" was followed correctly, but the same route was
taken for the "i" as in FIG. 90 so that it collided with the complete "t"
and was deleted. A solution to this problem is to look ahead several steps
to edge values beyo | | |