|
Claims  |
|
|
What is claimed is:
1. A method for processing a scanned first image in a digital computer to
identify a location of non-transparent hand marks and hand marked regions,
the method comprising the steps of:
(a) performing a plurality of morphological OPENING operations on the first
image using a first set of structuring elements, each OPENING operation
using a different structuring element from said first set, and taking a
UNION of the results of said plurality of OPENING operations to form a
second image; and
(b) creating bounding boxes around mark-up lines in the second image, said
bounding boxed identifying the hand marked regions.
2. The method as recited in claim 1, further comprising decreasing the
resolution of said first image prior to performing step (a).
3. The method as recited in claim 2, wherein step (a) further comprising
the steps of:
reducing an intermediate image formed by taking the UNION of said plurality
of OPENING operations by two times, using a threshold of 1;
performing a second plurality of morphological OPENINF operations on the
intermediate using the first set of structuring elements; and
taking a UNION of the results of said second plurality of OPENING
operations to form said second image.
4. The method as recited in claim 1 wherein some of the bounding boxes
having less than a predetermined size are classified as small bounding
boxes, and wherein the small bounding boxes are eliminated.
5. A method for processing a scanned first image in a digital computer to
identify a location of non-transparent hand marks and marked regions, the
method comprising the steps of:
(a) CLOSING the first image with a first horizontal structuring element,
forming a second image;
(b) XORing the first image with the second image, forming a third image;
(c) DILATING the third image with a solid square structuring element,
ANDing the DILATION of the third image with the first image, forming a
fourth image;
(d) XORing the fourth image with the first image, forming a fifth image;
(e) performing a plurality of morphological OPENING operations on the fifth
image using a first set of structuring elements, each OPENING operations
using a different structuring element from said first set, and taking a
UNION of the results of said plurality of OPENING operations, forming a
sixth image;
(f) OPENING the sixth image with a second horizontal structuring element,
forming a seventh image;
(g) reducing the seventh image by a first predetermined factor, forming an
eighth image;
(h) CLOSING the eighth image with the first set of structuring elements
forming a ninth image;
(i) performing a plurality of morphological OPENING operations on the ninth
image using the first set of structuring elements, each OPENING operation
using a different structuring element from said first set, and taking a
UNION of the results of said plurality of OPENING operations, forming a
tenth image;
(j) reducing the tenth image by a second predetermined factor, forming an
eleventh image;
(k) filling bounding boxes of the eleventh image, forming a twelfth image;
(l) expanding the twelfth image to a full scale; and
(m) ANDing the twelfth image with the first image, extrapolating the hand
marked regions.
6. A topological method for processing an image in a digital computer for
extraction of regions of a document image entirely encircled by
non-transparent hand marks, the method comprising the steps of:
(a) flood filling the document image from edges of said document image;
(b) bitwise inverting the flood-filled document image;
(c) flood filling the inverted document image produced in step (b) from
edges of said inverted document image; and
(d) bitwise inverting the document image produced in step (c).
7. A semitopological method for processing an image in a digital computer
for extracting regions of a document image entirely encircled by
non-transparent hand marks, the method comprising the steps of:
(a) flood filling the document image from edges of said document image;
(b) bitwise inverting the flood-filled document image;
(c) OPENING the document image produced in step (b) using a solid
structuring element of a first predetermined size;
(d) CLOSING the document image produced in step (c) using a solid
structuring element of a second predetermined size; and
(e) ANDing the CLOSED document image produced in step (d) with the original
document image.
8. The method as recited in claim 7 wherein the first predetermined size is
a size less than a size of a character.
9. The method as recited in claim 7, wherein the second predetermined size
is a size at least as large as a size of a character.
10. The method of claim 7, wherein prior to performing step (e) the CLOSED
document image produced in step (d) is ERODED with a structuring element.
11. A method for processing a document image in a digital computer for
identification in the document image of hand drawn lines comprising the
steps of:
(a) OPENING the document image using at least one structuring element;
(b) finding bounding boxes around image units in the OPENED document image
produced in step (a); and
(c) testing the document image produced in step (b) to identify bouding
boxes containing hand drawn lines.
12. The method as recited in claim 11 further comprising reducing the
document image prior to performing step (a).
13. The method of claim 11, wherein said at least one structuring element
used in step (a) is a horizontal structuring element so that horizontal
lines are identified in step (c).
14. The method of claim 11, wherein said at least one structuring element
used in step (a) is a vertical structuring element so that vertical lines
are identified in step (c).
15. The method of claim 11, further comprising the steps of:
(d) performing a fill operation by using a result of step (c) as a seed and
the original image as a clipping mask; and
(e) XORing a result of step (d) with the original document image so as to
remove hand drawn marks from the document.
16. The method of claim 11, wherein said testing performed in step (c)
includes comparing an asperity ratio of said bounding boxes to a threshold
asperity ratio.
17. The method of claim 11, wherein said testing performing in step (c)
includes comparing a fraction of ON pixels in each bounding box with a
threshold fraction.
18. A method for processing a document image in a digital computer for
identifying hand drawn encircled regions of the document image, the method
comprising the steps of:
(a) performing an OPENING operation on the document image;
(b) finding bounding boxes of image units in the OPENED document image
produced in step (a);
(c) testing the document image produced in step (b) to identifying hand
drawn lines; and
(d) performing a fill operation by using a result of step (c) as a seed and
the original image as a clipping mask.
19. The method as recited in claim 18, further comprising decreasing the
resolution of said document image prior to performing step (a).
20. The method as recited in claim 18 wherein the OPENING operation in step
(a) uses horizontal structuring elements.
21. The method as recited in claim 18 wherein the OPENING operation in step
(a) uses vertical structuring elements.
22. The method of claim 18, further comprising the steps of:
(e) finding bounding boxes of the encircled hand drawn regions produced in
step (d);
(f) extracting the encircled regions from the original document image so as
to identify contents of the encircled regions; and
(g) XORing a result of step (d) with a result of step (f) so as to extract
the contents of the encircled regions. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
A portion of the disclosure of this patent document contains material which
is subject to copyright protection. The copyright owner has no objection
to the facsimile reproduction by anyone of the patent disclosure, as it
appears in the U.S. Patent and Trademark Office patent files or records,
but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
Field of the Invention
This invention relates generally to image processing and more specifically
to a morphological technique and apparatus for: discriminating between
regions of a document which have been hand marked with an ordinary pen or
other ordinary writing utensil and, regions of a document which have not
been marked; and extracting a desired region.
Description of the Related Art
The ability to identify hand marks made from ordinary writing utensils, and
the regions to which they are meant to refer, is commercially useful for
many applications in which an electronic image of a paper document is
produced with an optical image scanner. For example, regions may be marked
for the purposes of:
identifying text to be sent to an optical character recognition (OCR)
system, for the purpose of retrieval of the ASCII representation and
identification of fields or key words for database filing;
identifying parts of an image that are not to be sent to an optical
character recognition system;
identifying parts of an image that are to be stored as a bitmap image; and
identifying a region of a form for which some action is to be taken.
Identification of certain portions of a document for image processing has
been accomplished in the prior art by using a highlighter pen which
provides a discriminated gray-scale reading between the highlighted
region, the dark letter type and the light page background. However, only
bright, transparent highlighter type pens can be used which provide the
proper reflective characteristics to distinguish the highlighting from
other marks on the document. For example, in U.S. Pat. No. 5,048,109, to
Bloomberg et al., a method was disclosed for detecting regions of a
document image that have been highlighted with a transparent color
highlighter pen. The method requires the use of a grayscale scanner, a
bandpass and a threshold filter, and binary image processing. One major
drawback of the invention disclosed in U.S. Pat. No. 5,048,109 is that the
image must be marked with a color highlighter pen. In the present
invention, a method and apparatus are described for detecting regions of a
document image that have been marked with an ordinary pen or pencil.
U.S. Pat. No. 5,029,224 to Fujisawa describes a marked region recognition
apparatus which recognizes an arbitrary marked region of a document image
from a mark signal which indicates whether or not there exists a mark for
delineating the marked region. The marked region recognition apparatus
comprises a first storing part for storing a mark signal for at least one
scanning line, a second storing part for storing a coordinate in a main
scanning direction where the mark region ends for each scanning line based
on the mark signal stored in the first storing part, and a recognition
part for recognizing an inside and an outside of the marked region and for
producing a marked region signal which indicates the inside or the outside
of the marked region for a present scanning line contiguous with the
marked region signal of a previous scanning line, where a state of the
marked region signal of a previous scanning line is obtained from the
first and second storing parts.
U.S. Pat. No. 5,016,096 to Matsunawa et al. describes an apparatus capable
of detecting color marking on a multicolored document, and then of
performing specific image processing on the inside or outside of the
region designated by the color marking. A region extraction circuit
detects a region marked by a specific color marking by sending a pulse
when the color marking is detected during a scan. The duration between
pulses thus provides the width of the marked region.
U.S. Pat. No. 4,720,750 to Watanabe describes an image forming apparatus
for an electronic copying machine which is capable of designating and
erasing any part of an original document. A masking area is drawn on an
area designation sheet which is then read and stored by the copying
machine. The original document is then placed on the copying machine and
the marked/mask area is erased, i.e., not copied, in accordance with the
stored mask specification from the area designation sheet.
U.S. Pat. No. 4,908,716 to Sakano describes an image processing apparatus
in which an area of a document is designated by a marking entered on the
document and a portion encircled by the marking is treated as a marked
area which is the subject of a trimming or a masking process. A color felt
pen or the like is used to encircle a designated area of a document. Then,
a mark detection circuit can detect the marking by detecting the tone of
the image. The disparate reflectivity or tone of the marker pen allows
marker area detection, whereupon, the marked area can be erased or
maintained as desired.
SUMMARY OF THE INVENTION
It is an object of the invention to overcome the above and other
disadvantages of the prior art by providing improvements to methods and
apparatus for image markup detection by hand marking using a pen, pencil
or other ordinary writing utensil.
It is another object of the invention to provide improvements to methods
and apparatus for image markup detection of hand marks which work on
binary scanned images and which utilize binary morphological image
processing techniques to expedite the detection process.
It is yet another object of the invention to provide improvements to
methods and apparatus for image markup detection of hand marks which
utilize detection without the use of either a highlighter pen or a
grayscale scanner.
It is further an object of the invention to provide improvements to methods
and apparatus for image markup detection of hand marks which do not
require extraneous detection circuitry to operate properly.
A method for processing a scanned first image in a digital computer for
differentiating machine marks from hand marks and identifying a location
of non-transparent hand marks and hand marked regions, includes the steps
of: identifying and differentiating markings on the scanned first image as
hand and machine marks using characteristics of the markings. The
characteristics utilized include horizontal, vertical, oblique, curved and
irregular shapes of the markings. Regions of the scanned first image to
which the hand marks refer are identified and then reproduced without
interference from the hand marks.
In one embodiment, hand marks are located by performing a plurality of
morphological OPENING operations on an image using a set of structuring
elements. Each OPENING operation uses a different structuring element from
the set. A UNION is then taken of the results of the OPENING operations to
produce an image containing substantially only the hand marks. Bounding
boxes can be created around the hand marked regions of the document using
the identified hand marks, or further processing can be performed to
further clarify the hand marks.
In another embodiment, regions of an image entirely encircled by hand-marks
are extracted from the image by using a series of flood-filling and
bitwise inversion operations on the encircled regions. Alternatively, the
image can be OPENED, and then CLOSED using solid structuring elements of
different sizes after the image is flood-filled and bitwise inverted once.
The CLOSED document image produced is then ANDed with the original image
to extract the encircled portions.
Another embodiment identifies hand drawn lines in an image by OPENING the
document using a structuring elements, finding bounding boxes around image
units in the opened document image, and then testing the bounding boxes to
identify bounding boxes which correspond to hand drawn lines. The
orientation of the structuring elements can be varied depending on the
type of line (e.g., vertical or horizontal) desired to be identified.
Further processing can then be performed to locate and extract the test
associated with the hand drawn lines.
The scope of the present invention and the manner in which it addresses the
problems associated with prior art will become more readily apparent from
the following detailed description when taken in conjunction with the
accompanying drawings and claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be described in detail with reference to the following
drawings in which like reference numerals refer to like elements. The
drawings are not drawn to scale, rather, they illustrate the sequential
image processing of a scanned first image according to various embodiments
of the claimed invention.
FIG. 1 is a flowchart of a first preferred embodiment according to the
invention showing the steps in a direct approach to a method of
identification and extraction of hand markup lines in an optically scanned
first image.
FIG. 2 is a flowchart of a second preferred embodiment according to the
invention showing the steps of another direct approach to a method of
identification and extraction of hand markup lines in an optically scanned
first image.
FIG. 3 is a flowchart utilizing binary logic symbols for a third preferred
embodiment according to the invention showing the steps of an indirect
approach to a method of identification and extraction of hand markup lines
in an optically scanned first image.
FIG. 4 is a block diagram of a preferred embodiment of an apparatus which
identifies and extracts hand markup lines in an optically scanned first
image according to the invention.
FIG. 5 is an example of a scanned first image of a page of a document with
markup lines hand drawn with an ordinary pen.
FIG. 6 is an example of a second image formed by of the indirect markup
detection method of FIG. 3, the second image resulting from a
morphological CLOSING operation of the first image of FIG. 5.
FIG. 7 is an example of a third image formed by the indirect hand markup
detection method of FIG. 3, the third image resulting from XORing the
first image of FIG. 5 with the second image of FIG. 6.
FIG. 8 is an example of a fourth image formed by the indirect hand markup
detection method of FIG. 3, the fourth image resulting from logically
ANDing the DILATION of the third image of FIG. 7 with the first image of
FIG. 5.
FIG. 9 is an example of a fifth image formed by the indirect hand markup
detection method of FIG. 3, the fifth image resulting from XORing the
fourth image of FIG. 8 with the first image of FIG. 5.
FIG. 10 is an example of a sixth image formed by the indirect hand markup
detection method of FIG. 3, the sixth image resulting from an UNION of
OPENINGS from the fifth image of FIG. 9.
FIG. 11 is an example of a seventh image formed by the indirect hand markup
detection method of FIG. 3, the seventh image resulting from morphological
OPENINGS of the sixth image of FIG. 10.
FIG. 12 is an example of an eighth image formed by the indirect hand markup
detection method of FIG. 3, the eighth image resulting from a reduction of
the seventh image of FIG. 11 by a factor of four.
FIG. 13 is an example of a ninth image formed by the indirect hand markup
detection method of FIG. 3, the ninth image resulting from the CLOSING of
the eighth image of FIG. 12.
FIG. 14 is an example of a tenth image formed by the indirect hand markup
detection method of FIG. 3, the tenth image resulting from an UNION of
OPENINGS on the ninth image of FIG. 13.
FIG. 15 is an example of an eleventh image formed by the indirect hand
markup detection method of FIG. 3, the eleventh image resulting from a
reduction of the tenth image of FIG. 14 by a factor of two.
FIG. 16 is an example of a twelfth image formed by the indirect hand markup
detection method of FIG. 3, the twelfth image resulting from filling the
bounding boxes of the eleventh image of FIG. 15.
FIG. 17 is an example of a second image formed by the direct hand markup
detection method of FIGS. 1 and 2, the second image resulting from an
UNION of OPENINGS of the first image of FIG. 5.
FIG. 18 is an example of a third image formed by the direct hand markup
detection method of FIG. 2, the third image resulting from a reduction and
UNION of OPENINGS of the second image of FIG. 17.
FIG. 19 is an example of a fourth image formed by the direct hand markup
detection method of FIG. 2, the fourth image resulting from a reduction of
the third image of FIG. 18.
FIG. 20 is an example of a fifth image (mask) formed by the direct hand
markup detection method of FIG. 2, the fifth image resulting from a
bounding box fill of the fourth image of FIG. 19.
FIG. 21 exemplifies a horizontal structuring element of length 8.
FIG. 22 exemplifies a horizontal structuring element of length 2.
FIG. 23 exemplifies a horizontal structuring element of length 5.
FIG. 24 exemplifies a 5.times.5 structuring element with ON pixels running
diagonally from the lower left corner to the upper right corner.
FIG. 25 exemplifies a 5.times.5 structuring element with ON pixels running
diagonally from the upper left corner to the lower right corner.
FIG. 26 exemplifies a vertical structuring element of length 5.
FIGS. 27A-H exemplify a set of eight structuring elements of various
configurations, each of length 9.
FIG. 28 is an example of a first image of another preferred embodiment of a
direct approach for hand markup detection.
FIG. 29 is an example of a second image of the other preferred embodiment
of a direct approach for hand markup detection.
FIG. 30 is an example of a third image of the other preferred embodiment of
a direct approach for hand markup detection.
FIG. 31 is an example of a fourth image of the other preferred embodiment
of a direct approach for hand markup detection.
FIG. 32 is an example of a fifth image of the other preferred embodiment of
a direct approach for hand markup detection.
FIG. 33 is an example of a sixth image of the other preferred embodiment of
a direct approach for hand markup detection.
FIG. 34 is an example of a seventh image of the other preferred embodiment
of a direct approach for hand markup detection.
FIG. 35 is an example of a first image of a second indirect approach for
hand markup detection.
FIG. 36 is an example of a second image of the second indirect approach for
hand markup detection.
FIG. 37 is an example of a third image of the second indirect approach for
hand markup detection.
FIG. 38 is an example of a fourth image of the second indirect approach for
hand markup detection.
FIG. 39 is an example of a fifth image of the second indirect approach for
hand markup detection.
FIG. 40 is an example of a sixth image of the second indirect approach for
hand markup detection.
FIG. 41 is an example of the FIG. 38 image the second after an OPENING
operation is performed on that image.
FIG. 42 is an example of the FIG. 41 image after a CLOSE operation is
performed on that image.
FIG. 43 is an example of the FIG. 42 image after an EROSION operation is
performed on that image.
FIG. 44 is a flowchart of a topological method for extraction of regions of
a document image encircled by non-transparent hand marks.
FIG. 45 is a flowchart of a semi-topological method for extracting regions
of a document image encircled by non-transparent hand marks.
FIG. 46 is a flowchart of a method for extracting regions of a document
image encircled by non-transparent hand marks.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
I. Definitions and Terminology
The present discussion deals with binary images. In this context, the term
"image" refers to a representation of a two-dimensional data structure
composed of pixels. A binary image is an image where a given pixel is
either "ON" of "OFF". Binary images are manipulated according to a number
of operations wherein one or more source images are mapped onto a
destination image. The results of such operations are generally referred
to as images. The image that is the starting point of processing will
sometimes be referred to as the original image or source image.
A "morphological operation" refers to an operation on a pixelmap image (a
"source" image), that uses a local rule at each pixel to create another
pixelmap image, the "destination" image. This rule depends both on the
type of the desired operation to be performed as well as on the chosen
"structuring element".
Pixels are defined to be ON if they are black and OFF if they are white. It
should be noted that the designation of black as ON and white as OFF
reflects the fact that most documents of interest have a black foreground
and a white background. The techniques of the present invention could be
applied to negative images as well. The discussion will be in terms of
black on white, but the references to ON or OFF apply equally well to
images which have been inverted and, therefore, the roles of these two
states are reversed. In some cases the discussion makes reference to a
"don't care" pixel which may be either an ON or an OFF pixel.
A "structuring element" (SE) refers to an image object of typically (but
not necessarily) small size and simple shape that probes the source image
and extracts various types of information from it via the chosen
morphological operation. In the attached figures that show, SEs, a solid
circle is a "hit", and an open circle is a "miss". The center position is
denoted by a cross. Squares that have neither solid nor open circles are
"don't cares"; their value in the image (ON or OFF) is not probed. A
binary SE is used to probe binary images in a binary morphological
operation that operates on binary input images and creates an output
binary image. The SE is defined by a center location and a number of pixel
locations, each normally having a defined value (ON or OFF). The pixels
defining the SE do not have to be adjacent each other. The center location
need not be at the geometrical center of the pattern; indeed it need not
even be inside the pattern. A "solid" SE refers to an SE having a
periphery within which all pixels are ON. For example, a solid 2.times.2
SE is a 2.times.2 square of ON pixels. A solid SE need not be rectangular.
A horizontal SE is generally one row of ON pixels and a vertical SE is
generally one column of ON pixels of selected size. A "hit-miss" SE refers
to an SE that specifies at least one ON pixel and at least one OFF pixel.
AND, OR and XOR are logical operations carried out between two images on a
pixel-by-pixel basis.
NOT is a logical operation carried out on a single image on a
pixel-by-pixel basis.
"EXPANSION" is scale operation characterized by a scale factor N, wherein
each pixel in a source image becomes an N.times.N square of pixels, all
having the same value as the original pixel.
"REDUCTION" is a scale operation characterized by a scale factor N in a
threshold level M. Reduction with scale=N entails dividing the source
image into N.times.N squares of pixels, mapping each such square in the
source image to a single pixel on the destination image. The value for the
pixel in the destination image is determined by the threshold level M,
which is a number between 1 and N.sup.2. If the number of ON pixels in the
pixel square is greater or equal to M, the destination pixel is ON,
otherwise it is OFF.
"EROSION" is a morphological operation wherein a given pixel in the
destination image is turned ON if and only if the result of superimposing
the SE center on the corresponding pixel location in the source image
results in a match between all ON and OFF pixels in the SE and the
underlying pixels in the source image. An EROSION will give one pixel in
the destination image for every match. That is, at each pixel, it outputs
1 if the SE (shifted and centered at that pixel) is totally contained
inside the original image foreground, and outputs 0 otherwise. Note that
EROSION usually refers to operations using a structuring element(s) with
only hits, and more generally, matching operations with both hits and
misses (often called a "hit-miss transform". The term EROSION is used
herein to include matching operations with both hits and misses, thus the
hit-miss transform is the particular type of EROSION used herein.
"DILATION" is a morphological operation wherein a given pixel in the source
image being ON causes the SE to be written into the destination image with
the SE center at the corresponding location in the destination image. The,
SEs used for DILATION typically have no OFF pixels. The DILATION draws the
SE as a set of pixels in the destination image for each pixel in the
source image. Thus, the output image is the union of all shifted versions
of the SE translated at all 1-pixels of the original image.
A "seed fill" is an operation taking as input two images, and generating a
third image as the result. One of the input images is the "seed", which
may be composed of a single ON pixel or of many ON pixels. The other input
image is the "mask", which is typically composed of more than one image
component. The two images are aligned. The result of the seed fill is to
produce an image that has only those image components in which at least
one seed pixel was present in the seed image. The result image is formed
by starting with the seed pixels and growing each image region until it
has filled the corresponding image component in the mask. This can be done
morphologically (the "fillclip" operation, where the result image is
formed by starting with the seed and alternatively dilating it and ADDing
it with the "mask", until it stops changing) or by seed fill or "flood
fill" techniques (where those image components containing a seed are
erased--by converting ON pixels to OFF pixels--and then reconstructed
using XOR with the original image).
"FillClip" is a morphological operation where one image is used as a seed
and is grown morphologically, clipping it at each growth step to the
second image. For example, a fillClip could include a DILATION followed by
logically ANDing the DILATION result with another image.
"OPENING" is a morphological operation that uses an image and a structuring
element and consists of an EROSION followed by a DILATION. The result is
to replicate the structuring element in the destination image for each
match in the source image.
"CLOSING" is a morphological operation using an image and a structuring
element. It includes a DILATION followed by an EROSION of the image by a
structuring element. A CLOSE of an image is equivalent to the bit inverse
of an OPEN on the (bit inverse) background.
An UNION is a bitwise OR between two images. An "intersection" is a bitwise
AND between two images.
"BLURRING" is a DILATION using a structuring element(s) composed of two or
more hits.
A "mask" refers to an image, normally derived from an original or source
image, that contains substantially solid regions of ON pixels
corresponding to regions of interest in the original image. The mask may
also contain regions of ON pixels that do not correspond to regions of
interest.
"Text" refers to portions of a document or image which comprises letters,
numbers, or other language symbols including non-alphabetic linguistic
characters such as ideograms and syllabry in the oriental languages.
The various operations defined above are sometimes referred to in noun,
adjective, and verb forms. For example, references to DILATION (noun form)
may be in terms of DILATING the image or the image being DILATED (verb
forms) or the image being subjected to a DILATION operation (adjective
form). No difference in meaning is intended.
Morphological operations have several specific properties that simplify
their use in the design of appropriate algorithms. First, they are
translationally invariant. A sideway shift of the image before
transforming does not change the result, except to shift the result as
well. Operations that are translationally invariant can be implemented
with a high degree of parallelism, in that each point in the image is
treated using the same rule. In addition, morphological operations satisfy
two properties that make it easy to visualize their geometrical behavior.
First, EROSION, DILATION, OPEN and CLOSE are "increasing", which means
that if image 1 is contained in image 2, then any of these morphological
operations on image 1 will also be contained in the morphological
operation on image 2. Second, a CLOSE is extensive and OPEN is
antiextensive. This means that the original image is contained in the
image transformed by CLOSE and the image transformed by OPEN is contained
in the original image. The DILATION and EROSION operations are also
extensive and antiextensive, respectively, if the center of the
structuring element is contained within the original image.
The OPEN and CLOSE operations also satisfy two more morphological
properties:
(1) The result of the operation is independent of the position of the
center of the structuring element.
(2) The operation is idempotent, which means that reapplying the OPEN or
CLOSE to the resulting image will not change it.
An "image unit" means an identifiable segment of an image such as a word,
number, character, glyph or other unit that can be extracted reliably and
have an underlying linguistic structure.
II. Overview of the Method
One problem addressed by the invention is identifying regions (i.e., image
segments) on a page that have been marked by hand with an ordinary pen (or
pencil). The markings can consist of horizontal or vertical lines, or of
"circular" marks (either open lines, segments of curved lines, or
combinations of the two). Since all markings are by hand, the straight
lines will not have the same straightness or smoothness of machine-printed
rules, or hand markings using a straight-edge.
The image interpretation problem can be broken into several sub-problems;
(1) identifying the markings themselves;
(2) identifying the regions of the image to which these markings refer; and
(3) reproducing those regions without interference from the markings
themselves.
A method for finding word boxes or bounding boxes around image units is to
close the image with a horizontal SE that joins characters but not words,
followed by an operation that labels the bounding boxes of the connected
image components (which in this case are words). The process can be
greatly accelerated by using 1 or more threshold reductions (with
threshold value 1), that have the effect both of reducing the image and of
closing the spacing between the characters. The threshold reduction(s) are
typically followed by closing with a small horizontal SE. The connected
components labeling operation is also done at the reduced scale, and the
results are scaled up to full size. The disadvantage of operating at
reduced scale is that the word bounding boxes are only approximate;
however, for many applications the accuracy is sufficient. The method
described above works fairly well for arbitrary test fonts, but in extreme
cases, such as large fixed width fonts that have large inter-character
separation or small variable width fonts that have small inter-word
separation, mistakes can occur. The most robust method chooses a SE for
closing based on a measurement of specific image characteristics. This
requires adding the following two steps:
(1) Order the image components in the original or reduced (but not closed)
image in line order, left to right and top to bottom.
(2) Build a histogram of the horizontal intercomponent spacing. This
histogram should naturally divide into the small inter-character spacing
and the larger inter-word spacings. Then use the valley between these
peaks to determine the size of the SE to use for closing the image to
merge characters but not join words.
A. Identifying the Hand Markings
Sub-problem (1) of the image interpretation problem involves identifying
the markings. Several salient characteristics of the hand markings can be
used to identify the markings. The characteristics include:
(i) long horizontal, vertical, and oblique straight line segments, where
"long" is relative to the size of machine marks, such as text characters;
(ii) segments that are not exactly straight, having some curviness; and
(iii) segments that are not horizontal or vertical, relative to the text in
the image.
If the document consists only of text, without rules or line graphics, then
it is not necessary to distinguish between hand markings and machine
lines, and a probing of the image based on the length of the straight line
segments is adequate to separate the hand markings from text. ("Probing"
is typically done morphologically, optionally with reduction beforehand,
using either an EROSION or an OPENING).
If the image may contain horizontal or vertical les, it is necessary to
distinguish the machine marks from the hand marks. In this case, the best
results are obtained by utilizing all of the above characteristics. One
method for distinguishing machine marks from hand marks is as follows:
(a) Deskew the image as described in a copending patent application
entitled "Method and Apparatus for Identification of Document Skew" to
Bloomberg et al., Ser. No. 448,774, filed Dec. 8, 1989, said copending
patent application herein being incorporated in its entirety.
(b) Do an OPENING of the image for long horizontal line segments. This will
project out both machine-printed horizontal lines, and nearly horizontal
handwritten line segments.
(c) For each connected component thus extracted, determine the width W,
height H, and number N of ON pixels within a bounding box.
(d) Using the width, height and number of ON pixels within the bounding
box, determine if the image segment is machine or hand made. This can be
done by constructing factors such as: the ratio W/H (for horizontal
segments); the ratio N/(WH) (which designates a fractional area of ON
pixels within the bounding box); the ratio N/(W*(H-c)) (for horizontal
segments), where c is a constant with workable values of about 2; the
ratio N/(H*(W-c)) (for vertical segments), where workable values for c are
about 2; and comparing these with thresholds. If the constant c is 0, the
special case occurs where the factor is N/(WH). The reason for
generalizing the factor N(WH) with the constant c is to compensate for
jagged marks and slight misalignment on machine printed lines. For
example, by removing 2 or so lines from the width, N/(W*(H-c)) should be
approximately 1.0 for machine printed lines, whereas it would be
significantly smaller than 1.0 for handwritten marks.
B. Identifying the Marked Regions
Sub-problem (2) of the image interpretation problem provides for
identifying regions of the document image to which the handwritten marks
refer. The handwritten marks identified in sub-problem (1) are further
processed to identify a target part of the document image.
A fill operation, starting with the identified segments as "seeds", and
filling into the connected component of the image of which the identified
segments are a part, will provide the connected hand marking. This marking
can be an underline, a sideline, a circle, etc.
The asperity ratio (width to width) of the bounding box of the (filled)
connected component can then be compared with thresholds to determine if
the marks are underlines (large width/height), sidelines (large
height/width), or circles (both width and height are larger than a minimum
threshold value).
Underlines typically refer to the image units directly above them; image
unit segmentation with association of the neighboring underline is
appropriate. Thus, for example, the document image can be horizontally
CLOSED (or DILATED) so that the letters within the image unit are merged;
thus, when extracting by connected components, the entire image unit is
obtained.
Sidelines typically refer to a block of the document image to the right of
the sideline if the sideline is on the left side of the image, and vice
versa.
Circles typically refer to a part of the document image that is encircled.
Encircled means any marking that is intended to demarcate by enclosure, or
near enclosure. The circle need not be closed, since the demarcated
region, including the hand marking, may be determined by several methods
as follows.
| | |