|
Description  |
|
|
BACKGROUND OF THE INVENTION
The invention relates generally to image processing and more specifically
to a technique for registering images, typically for further processing.
In general, the more image there is to process, the greater the time (and
expense) to process it. In certain cases, it is known in advance that
information of interest is located in specific portions of the image. An
example is a preprinted form with spaces or boxes that have been filled
in, either by hand or by a computer. Clearly, significant time savings can
be realized if only the regions of interest need to be processed.
However, between the time it was printed and the time it is analyzed, the
form may have been photocopied, and it more likely than not has passed
through an optical scanner or the like. Thus, despite remarkable advances
in paper handling and optical technologies, there is a reasonable chance
that the document will have been skewed (perhaps by a few degrees), scaled
(perhaps by a few percent), and translated. Thus, the regions of interest
may well not be where they are supposed to be.
SUMMARY OF THE INVENTION
The present invention provides a technique for rapidly and efficiently
registering binary images, thereby facilitating further image processing.
The invention contemplates incorporating one or more reference features,
referred to as fiducials, into the binary image at a known displacement
from a feature of interest in the image, subjecting the image to an
operation (typically a morphological operation and possibly a thresholded
reduction) that projects out the fiducial(s), determining the position of
the fiducial(s), and thereby determining the position of the feature of
interest. The fiducial(s) must have at least one characteristic that is
absent from the remaining (or at least from neighboring) portions of the
image. In general it is preferred to provide a number of spatially
separated fiducials so that small amounts of skew and
reduction/enlargement can be determined and taken into account.
Thresholded reductions and morphological operations will be defined and
discussed in detail below. A thresholded reduction entails mapping a
rectangular array of pixels onto a single pixel, whose value depends on
the number of ON pixels in the rectangular array and a threshold level.
Morphological operations use a pixel pattern called a structuring element
(SE) to erode, dilate, open, or close an image.
In one set of embodiments, each fiducial includes horizontal and vertical
line segments (preferably in a corner or crossing configuration) that are
longer than any line segments expected to be found in the binary image.
Projecting out the fiducial entails erosions or open operations using
hit-miss structuring elements.
In another embodiment, each fiducial is a small finely textured region. The
image can be subjected to a sequence of morphological or other operations
that have the effect of blackening the textured region and eliminating ON
pixels in all other regions. Alternatively the image can be eroded with a
hit-miss structuring element that corresponds to the repeating pattern in
the textured region. This can then be followed by dilation or a close
operation.
A further understanding of the nature and advantages of the present
invention may be realized by reference to the remaining portions of the
specification and the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an image scanning and processing system
incorporating the present invention;
FIGS. 2A and 2B show documents with specific fiducial line patterns
provided thereon;
FIG. 3 is a flow diagram illustrating a technique for determining the
location of the fiducials;
FIG. 4 is a flow diagram illustrating a technique for filling 8-connected
regions to rectangles;
FIG. 5 is a flow diagram illustrating a technique for extracting corner
coordinates;
FIG. 6 is a flow diagram illustrating a technique for thinning a rectangle
down to a single pixel;
FIG. 7 shows a document with fiducials defined by finely textured regions;
FIG. 8 is a flow diagram illustrating a technique for determining the
location of the textured fiducials;
FIG. 9 is a flow diagram of an alternative technique for determining the
location of the textured fiducials;
FIG. 10A and 10B are flow diagrams illustrating alternatives to the
thresholded reductions used to fill the textured fiducials;
FIG. 11 is a flow diagram illustrating a technique for filling 4-connected
regions to rectangles;
FIG. 12 is a block diagram of special purpose hardware for performing image
reductions and expansions.
DESCRIPTION OF SPECIFIC EMBODIMENTS
DEFINITIONS AND TERMINOLOGY
The present discussion deals with binary images. In this context, the term
"image" refers to a representation of a two-dimensional data structure
composed of pixels. A binary image is an image where a given pixel is
either "ON" or "OFF." Binary images are manipulated according to a number
of operations wherein one or more source images are mapped onto a
destination image. The results of such operations are generally referred
to as images. The image that is the starting point for processing will
sometimes be referred to as the original image.
Pixels are defined to be ON if they are black and OFF if they are white. It
should be noted that the designation of black as ON and white as OFF
reflects the fact that most documents of interest have a black foreground
and a white background. While the techniques of the present invention
could be applied to negative images as well, the discussion will be in
terms of black on white.
A "solid region" of an image refers to a region extending many pixels in
both dimensions within which substantially all the pixels are ON.
A "textured region" of an image refers to a region that contains a
relatively fine-grained pattern. Examples of textured regions are
halftoned or stippled regions.
"Text" refers to portions of a document or image containing letters,
numbers, or other symbols including non-alphabetic linguistic characters.
"Line graphics" refers to portions of a document or image composed of
graphs, figures, or drawings other than text, generally composed of
horizontal, vertical, and skewed lines having a substantial run length as
compared to text. Graphics could range from horizontal and vertical lines
in an organization chart to more complicated horizontal, vertical, and
skewed lines in engineering drawings.
A "mask" refers to an image, normally derived from an original image, that
contains substantially solid regions of ON pixels corresponding to regions
of interest in the original image. The mask may also contain regions of ON
pixels that don't correspond to regions of interest.
AND, OR, and XOR are logical operations carried out between two images on a
pixel-by-pixel basis.
NOT is a logical operation carried out on a single image on a
pixel-by-pixel basis.
"Expansion" is a scale operation characterized by a SCALE factor N, wherein
each pixel in a source image becomes an N.times.N square of pixels, all
having the same value as the original pixel.
"Reduction" is a scale operation characterized by a SCALE factor N and a
threshold LEVEL M. Reduction with SCALE=N entails dividing the source
image into N.times.N squares of pixels, mapping each such square in the
source image to a single pixel on the destination image. The value for the
pixel in the destination image is determined by the threshold LEVEL M,
which is a number between 1 and N.sup.2. If the number of ON pixels in the
pixel square is greater or equal to M, the destination pixel is ON,
otherwise it is OFF.
"Subsampling" is an operation wherein the source image is subdivided into
smaller (typically square) elements, and each element in the source image
is mapped to a smaller element in the destination image. The pixel values
for each destination image element are defined by a selected subset of the
pixels in the source image element. Typically, subsampling entails mapping
to single pixels, with the destination pixel value being the same as a
selected pixel from the source image element. The selection may be
predetermined (e.g. upper left pixel) or random.
A "4-connected region" is a set of ON pixels wherein each pixel in the set
is laterally or vertically adjacent to at least one other pixel in the
set.
An "8-connected region" is a set of ON pixels wherein each pixel in the set
is laterally, vertically, or diagonally adjacent to at least one other
pixel in the set.
A number of morphological operations map a source image onto an equally
sized destination image according to a rule defined by a pixel pattern
called a structuring element (SE). The SE is defined by a center location
and a number of pixel locations, each having a defined value (ON or OFF).
Other pixel positions, referred to as "don't care", are ignored. The
pixels defining the SE do not have to be adjacent each other. The center
location need not be at the geometrical center of the pattern; indeed it
need not even be inside the pattern.
A "solid" SE refers to an SE having a periphery within which all pixels are
ON. For example, a solid 2.times.2 SE is a 2.times.2 square of ON pixels.
A solid SE need not be rectangular.
A "hit-miss" SE refers to an SE that specifies at least one ON pixel and at
least one OFF pixel.
"Erosion" is a morphological operation wherein a given pixel in the
destination image is turned ON if and only if the result of superimposing
the SE center on the corresponding pixel location in the source image
results in a match between all ON and OFF pixels in the SE and the
underlying pixels in the source image.
"Dilation" is a morphological operation wherein a given pixel in the source
image being ON causes the SE to be written into the destination image with
the SE center at the corresponding location in the destination image. The
SE's used for dilation typically have no OFF pixels.
"Opening" is a morphological operation that consists of an erosion followed
by a dilation. The result is to replicate the SE in the destination image
for each match in the source image.
"Closing" is a morphological operation consisting of a dilation followed by
an erosion.
The various operations defined above are sometimes referred to in noun,
adjective, and verb forms. For example, references to dilation (noun form)
may be in terms of dilating the image or the image being dilated (verb
forms) or the image being subjected to a dilation operation (adjective
form). No difference in meaning is intended.
SYSTEM OVERVIEW
FIG. 1 is a block diagram of an image analysis system 1 within which the
present invention may be embodied. The basic operation of system 1 is to
extract or eliminate certain characteristic portions of a document 2. To
this end, the system includes a scanner 3 which digitizes the document on
a pixel basis, and provides a resultant data structure, typically referred
to as an image. Depending on the application, the scanner may provide a
binary image (a single bit per pixel) or a gray scale image (a plurality
of bits per pixel). The image contains the raw content of the document, to
the precision of the resolution of the scanner. The image may be sent to a
memory 4 or stored as a file in a file storage unit 5, which may be a disk
or other mass storage device.
A processor 6 controls the data flow and performs the image processing.
Processor 6 may be a general purpose computer, a special purpose computer
optimized for image processing operations, or a combination of a general
purpose computer and auxiliary special purpose hardware. If a file storage
unit is used, the image is transferred to memory 4 prior to processing.
Memory 4 may also be used to store intermediate data structures and
possibly a final processed data structure.
The result of the image processing, of which the present invention forms a
part, can be a derived image, numerical data (such as coordinates of
salient features of the image) or a combination. This information may be
communicated to application specific hardware 8, which may be a printer or
display, or may be written back to file storage unit 5.
SPECIFIC EMBODIMENTS
FIG. 2A is a schematic representation of an image 10 to be processed. By
way of example, image 10 includes a feature of interest 12 whose position
must be determined, possibly for further processing of portions of the
image. In accordance with the invention, image 10 is provided with a
number of fiducials 15a-d, which are reference marks located at nominally
known locations relative to the feature of interest. FIG. 2B shows an
image 10 (with a feature of interest 12) having different fiducials 17a-d.
Fiducials 15a-d and 17a-d are distinguished by a characteristic that is not
shared by remaining portions of the image. In the particular examples
fiducials 15a-d are corners formed by the meeting of two perpendicular
lines at respective end points of each, while fiducials 17a-d are
crossings formed by the intersection of two perpendicular lines. These
fiducial patterns are appropriate so long as the line segments are longer
than the line segments adjacent to other corners of the image.
According to the invention, image 10 is subjected to a series of operations
that project out the fiducials and determine their positions. This allows
the position of the feature of interest to be determined. It is in general
not necessary to subject the entire image to these processing steps. For
example, the fiducials will be in positions that are generally known, and
therefore it may only be necessary to process the regions reasonably
likely to be occupied by the fiducials. In the cases illustrated, the
fiducials are generally near the corners of the image, and therefore
rectangular areas generally near the corners are all that need to be
processed. In the event that fiducials are searched for in limited
portions of the image, the distinguishing feature of the fiducials need
not be absent from all other portions of the image. It need only be absent
from the portions near the fiducials.
FIG. 3 is a flow diagram illustrating a sequence of operations for
extracting the positions of the fiducials. The image is subjected to an
erosion (step 23) with a structuring element (SE) 25a for fiducial 15a or
an SE 27a for fiducial 17a. The result of erosion step 23 is an image with
ON pixels in only those positions where the SE matches the image. These ON
pixels should be relatively few in number and closely clustered. This
resulting image is subjected to an operation that expands the pixel
regions to the smallest bounding rectangle (step 30). The rectangle is
then processed in one of two ways to determine its location: (a) it may be
subjected to an operation to extract its corner coordinates (step 32), or
(b) it may be thinned in order to result in a single pixel (step 33).
The coordinates of three fiducials are theoretically sufficient to compute
translation, rotation (skew), and scale factors in two orthogonal
directions so long as the fiducials are not collinear. However, for
robustness, it is preferred to use the coordinates of four fiducials,
which also serves as a consistency check.
SE 25a is a hit-miss SE suitable for extracting fiducial 15a, which has two
perpendicular line segments meeting at an upper left corner. Hit-miss SE
25a comprises a number of hits (ON pixels) 42 in an upper left corner
configuration, a number of misses (OFF pixels) 45 along the lines beyond
the corner, and a small number of "don't care" pixels 47 immediately
adjacent the corner. The center position of this SE is the ON pixel at the
corner. Pixel positions 47 allow for the possibility that there might be
some noise in the image. Thus, erosion of fiducial 15a by SE 25a will tend
to yield a small group of pixels, clustered at the corner location, and
generally corresponding in size to the thickness of the lines in the
fiducial. Corresponding SE's, rotated by 90.degree. increments, are used
to determine the locations of fiducials 15b-d.
SE 27a is suitable for projecting out fiducial 17a, which has two
intersecting line segments. The SE includes two rows of hits 52 arranged
in a cruciform pattern. Additionally, in order to exclude a match on a
large region of ON pixels, the SE contains four misses 55 surrounding the
cross center a few pixels out. The center position of this SE is the ON
pixel at the intersection. This same SE is suitable for extracting
fiducials 17b-d, which are the same as fiducial 17a.
FIG. 4 is an expanded flow diagram illustrating the steps within step 30
(filling to a rectangle). A presently preferred technique for filling all
8-connected regions to the smallest possible enclosing rectangle utilizes
an iterated sequence of erosions and dilations using two diagonal SE's 62
and 63. SE 62 has two ON pixels, one to the immediate right of the center
and one immediately beneath the center. SE 63 has two ON pixels, one at
the center and one diagonally down to the right.
The input image (containing the small regions of pixels resulting from the
erosions) is copied (step 65), with one copy reserved for later use and
one copy being a work copy subject to succeeding operations. The work copy
is first eroded (step 67) with SE 62, and then dilated (step 70) with SE
63. The result of this erosion and dilation is subjected to a logical OR
(step 75) with the copy reserved at copy step 65. The result of the
logical OR is copied (step 77), with one copy being reserved for use and
the other being a work copy. The work copy is eroded (step 80) with SE 62,
and dilated (step 82) with SE 63. The resulting image and the copy
reserved at copy step 77 are subjected to a logical OR (step 85). The
resultant iterated image and the copy of the input image reserved at step
65 are subjected to an exclusive OR. If the iterated image has not changed
(the XOR of the two images contains no ON pixels), the process is
complete. If the iterated image has changed (the XOR of the two images
contains at least one ON pixel), the iterated image is communicated back
and subjected to steps 65 through 87. The cycle repeats until the iterated
image agrees with the last version reserved at copy step 65.
FIG. 5 is a flow diagram illustrating the steps within step 33 (extracting
the coordinates of the solid regions). The locations of the corners of
each solid rectangular region are extracted by a series of erosion steps
95(ULC), 95(URC), 95(LLC), and 95(LRC), using respective SE's 100(ULC),
100(URC), 100(LLC), and 100(LRC). SE 100(ULC) is a 2.times.2 array
including an ON pixel in the lower right corner and OFF pixels in the
other three corners. It thus operates to pick out the upper left corner
when it is used to erode a rectangle. The other SE's pick out the other
corners. This series of erosions results in four pixel locations for each
fiducial region. The pixel locations for each fiducial region can be
averaged (step 105) to specify that fiducial's center.
FIG. 6 is a flow diagram illustrating a technique for thinning a solid
rectangular region to a single ON pixel. The basic technique is to remove
pixels along the rectangle's edges until a single pixel remains, using a
set of four hit-miss SE's 110(LE), 110(TE), 110(RE), and 110(BE). SE's
110(LE) and 110(RE) are 1.times.3 horizontal arrays. SE 110(LE) has an OFF
pixel at the left position and ON pixels at the center and right
positions. SE 110(RE) has ON pixels at the left and center positions and
an OFF pixel at the right position. SE's 110(TE) and 110(BE) are 3.times.1
vertical arrays. SE 110(TE) has an OFF pixel at the top position and ON
pixels at the center and bottom positions. SE 110(BE) has ON pixels at the
top and center positions and an OFF pixel at the bottom position. All the
SE's have the center pixel as the center location for the SE.
The input image is subjected to an alternating series of erosions and set
subtractions. The image is first copied (step 111(LE)), with one copy
being reserved and one copy being a work copy. The work copy is then
eroded (step 112(LE)) with SE 110(LE). The result is to project out the
pixels along the left edge of the rectangle. A set subtraction step
115(LE), which entails ANDing the reserved copy with the complement of the
eroded work copy, removes these projected pixels from the original image,
thus resulting in a rectangle having its left edge removed. This is
followed by a copy step 111(TE), an erosion 112(TE) using SE 110(TE), and
a set subtraction 115(TE), which removes the pixels along the top edge; a
copy step 111(RE), an erosion 112(RE) with SE 110(RE), and a set
subtraction 115(RE), which removes the pixels along the right edge; and a
copy step 111(BE), an erosion 112(BE) with SE 110(BE), and a set
subtraction 115(BE), which removes the pixels along the bottom edge. It is
noted that the erosion will only project out pixels along the edge if the
rectangle has at least two pixels along the long dimension of the SE.
Thus, if the rectangle has been thinned to a horizontal line, erosion by
the vertical SE's followed by set subtraction will have no effect. Once
all four edges have been processed, the result is tested (step 120) in
order to determined if only a single pixel exists. If not, the entire
sequence is repeated. If so, the coordinates of the single pixel are saved
(step 125).
FIG. 7 shows an image 10 (with a feature of interest 12) having fiducials
130a-d in the form of small, preferably rectangular regions having a
finely textured pattern. Although shown as cross-hatched in the drawing,
it is to be understood that the pattern within the rectangular regions is
a stippled or halftoned pattern of uniform intensity, consisting of a
number of black dots on a white background. The pattern is characterized
by a period (dot separation) and angle.
FIG. 8 is a flow diagram illustrating one technique for converting
fiducials 130a-d to rectangles of solid black, which can then be processed
to determine their positions as discussed above in connection with steps
32 and 33. In brief, the image is first subjected to a set of operations
that eliminate OFF pixels that are near ON pixels. While text and lines in
the image are thickened, they tend to retain their general character.
However, as the small dots in the textured regions expand, they coalesce
to form large masses and thereby solidify the formerly textured area.
Subsequent processing can reverse the thickening of characters and lines,
but not the solidification of the now solid regions.
The image is twice reduced with SCALE=2 and LEVEL=1 (steps 132 and 133).
The result is an image reduced by a linear factor of 4 and having the
textured regions darkened. The reduced image is then subjected to a close
operation (step 135) to finish the solidification of the textured region.
The close operation consists of a dilation and an erosion, preferably with
a solid 2.times.2 SE. The result of the close operation is invariant as to
which of the pixels in the SE is designated as the center.
The resulting image is then twice reduced (steps 137 and 138) with SCALE=2
and LEVEL=4. The resulting image, now reduced by a linear factor of 16,
contains only a few isolated ON pixels within the regions outside the once
textured (now solid) fiducial regions. The image is then subjected to an
open operation (step 139), preferably with the same solid 2.times.2 SE
used in close operation 135, to eliminate the ON pixels outside the
fiducial regions.
The result is then optionally filled (step 140) to a rectangle in the
manner discussed above in connection with step 30. The result of the
previous operations is an image at reduced scale consisting only of solid
black rectangular fiducials. The fiducial positions can be obtained by
extracting the corner coordinates or thinning the solid rectangles to a
respective single pixel. This can be done at the reduced scale, and the
coordinates scaled accordingly.
In the event that it is desired to expand the processed image to original
size, account may be t | | |