|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a ruled line extracting apparatus for
extracting a ruled line portion from an arbitrary document image read by a
photoelectric converter, etc., and method thereof.
2. Description of the Related Art
In recent years, the demand for an electronic filing system which converts
a paper document into an electronic form, and stores it on an optical
disc, etc., has increased, in order to improve the efficiency of
operations performed within a company. With a conventional electronic
filing system, a paper document is converted into an image by a
photoelectric converter such as an image scanner, etc., and the image with
a search keyword attached is stored on an optical disc or on a hard disk.
However, since the keyword must be input from a keyboard, the input
operation is troublesome.
As a former application by the present applicant in order to overcome this
troublesome operation, "Title Extracting Apparatus for Extracting Title
from Document Image and Method Thereof, U.S. patent application Ser. No.
08/694,503, Japanese patent application H7-341983" can be referred to.
With this method, a document title included in an image is automatically
extracted and registered as a keyword. Additionally, management
information such as a title, destination, transmitting source etc., can be
automatically extracted from various document images including a table
format document. For example, it is proved that a title outside a table
can be extracted with approximately 90% accuracy.
A title inside a table, however, can be extracted with only 55% accuracy,
which is insufficient to be put into practical use. To extract a keyword
such as a title from inside a table with high accuracy, ruled lines
structuring the table must be accurately extracted. The technique for
extracting a ruled line has been developed mainly for a spreadsheet in
which characters, etc. are regularly lined up.
As the conventional techniques for extracting a ruled line, "Image
Extracting Method" (Japanese patent laid-open H6-309498) and "Image
Extracting Apparatus" (Japanese patent laid-open H7-28937) can be referred
to. With these techniques, a frame can be extracted or removed without
requiring an input of information such as a frame position etc., in a
spreadsheet. A spreadsheet which can be processed is a sheet composed of
one-character frames, block frames (horizontal one-line frames, or free
format frames), or a sheet having a structure in which the shape of a
frame is rectangular, and horizontal frame lines are regularly arranged.
Additionally, as the techniques for extracting a ruled line according to
former applications in Japan by the present applicant, "Frame Extracting
Apparatus and Rectangle Extracting Apparatus" (Japanese patent application
H7-203259), "Pattern Area Extracting Apparatus and Pattern Extracting
Apparatus" (Japanese patent application H7-282171), and "Pattern
Extracting Apparatus and Pattern Area Extracting Method" (Japanese patent
application H8-107568) can be referred to.
With these techniques, a frame can be extracted/removed even if the outer
periphery of frames is rectangular as shown in FIG. 1A, or not rectangular
as shown in FIG. 1B. Furthermore, the frame of a table structured by a
rectangle which is surrounded by a frame, and partitioned into smaller
portions, can also be extracted and removed, like the shaded portion shown
in FIG. 1B. Provided below is the outline of this process.
(1) thinning: With a mask process, horizontal and vertical segments are
made thinner, and the difference between the thickness of a character and
that of a frame is eliminated.
(2) segment extraction: a relatively long straight line is extracted with
the adjacency projection method according to the "Image Extracting Method"
(Japanese patent laid-open H6-309498). The adjacency projection method is
a method for recognizing the result of adding the projection value of
pixels included in rows or columns around a specific row or column, to the
projection value of pixels in the specific row or column, as the final
projection value of the specific row or column. With this method, pixel
distribution around a particular row or column can be globally identified.
(3) straight line extraction: extracted segments are sequentially searched,
and it is examined whether or not there is an empty space of a
predetermined length between segments. If there is no such empty space,
the segments are sequentially linked, so that a long straight line is
extracted.
(4) straight line integration: extracted straight lines are again
integrated. Straight lines separated into two or more portions due to a
blur are integrated into one straight line.
(5) straight line extension: a straight line which is made shorter due to a
blur is extended, and restored to its original length, only when a
spreadsheet is proved to be regular.
However, the above described techniques have the following problems.
According to the techniques disclosed in the former applications, whether
the shape of a frame of a spreadsheet is regular or irregular, it can be
processed as long as it is a table frame composed of rectangular regions.
Whether a ruled line to be targeted is a solid or dotted line, it can be
processed regardless of the existence of a blur. Furthermore, a straight
line which is made shorter due to an extreme blur is extended only when a
table is proved to be regular.
A normal input image may sometimes include characters of a thick font, or a
shaded portion in a table, as shown in FIG. 1C. In such a case, a ruled
line is erroneously extracted from a defaced character string in which
characters touch one another, and ruled lines which are erroneously
extracted may sometimes be integrated with correct ruled lines.
Additionally, a ruled line which touches a group of black pixels such as a
shaded portion, or a ruled line which touches a character cannot be
extracted. To overcome these problems, it is desirable that a table
document such as a spreadsheet whose ruled-line structure is known
beforehand should be a process target.
However, since it is unknown beforehand what type of table a normal
document handled by electronic filing includes, the probability that
various images including a defaced character etc., are input, is high.
Accordingly, a ruled-line is not necessarily and correctly extracted
according to the techniques of the former applications as they are.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a ruled line extracting
apparatus and method thereof, which allow a ruled line portion to be
extracted from a normal document image whose ruled-line structure cannot
be predicted.
The ruled-line extracting apparatus according to the present invention
comprises an estimating unit, storing unit, segment extracting unit,
calculating unit, straight line extracting unit, graph generating unit,
straight line processing unit, straight line integrating unit and a
straight line deleting unit.
In a first aspect of the present invention, the estimating unit estimates
the size of a standard pattern included in an input image; and the
straight line extracting unit sets a threshold value based on the
information about the size of the standard pattern, and extracts the
information of one or more straight line patterns from the input image
using the threshold value.
In a second aspect of the present invention, the straight line extracting
unit extracts the information about one or more straight line patterns
from an input image; the calculating unit obtains a representative value
of the sizes of the one or more straight line patterns; and the straight
line processing unit sets a threshold value based on the representative
value, and processes the information of the one or more straight line
patterns using the threshold value.
In a third aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; the calculating unit obtains a representative value of the
sizes of one or more segment patterns structuring the one or more straight
line patterns; and the straight line processing unit sets a threshold
value based on the representative value, and processes the information of
the one or more straight line patterns using the threshold value.
In a fourth aspect of the present invention, the segment extracting unit
extracts the information of one or more segment patterns from an input
image; the storing unit classifies the information of one or more segment
patterns into the information of a large segment pattern and the
information of a small segment pattern, and stores them; and the straight
line extracting unit examines a link state of the one or more segment
patterns, and, when a large segment pattern is linked to small segment
patterns, extracts a straight line pattern composed of the small segment
patterns regardless of the size of the large segment pattern.
In a fifth aspect of the present invention, the straight line extracting
unit extracts the information about one or more straight line patterns
from an input image; and the straight line integrating unit integrates two
straight line patterns, included in the one or more straight line
patterns, into one, if they almost overlap.
In a sixth aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; and the straight line deleting unit determines whether or
not to delete one of the straight line patterns using at least either of
the information about the shape of one pattern among the one or more
straight line patterns, and the information about a distance between two
straight line patterns included in the one or more straight line patterns.
In a seventh aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; and the straight line deleting unit determines whether or
not to delete either of a horizontal straight line pattern and a vertical
straight line pattern included in the one or more straight line patterns
based on a link relationship between these patterns.
In an eighth aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; and the straight line deleting unit deletes a shorter
pattern of two straight line patterns which almost overlap, and included
in the one or more straight line patterns.
In a ninth aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; the straight line integrating unit recognizes an
integrated straight line pattern as a ruled line candidate when the size
of the straight line pattern, generated by integrating two straight line
patterns which partially overlap, and included in the one or more straight
line patterns, becomes approximately a predetermined value.
In a tenth aspect of the present invention, the straight line extracting
unit extracts the information of one or more straight line patterns from
an input image; and the straight line deleting unit deletes a straight
line pattern composed of segment patterns larger than a threshold value
among the one or more straight line patterns.
In an eleventh aspect of the present invention, the straight line
extracting unit extracts the information of a straight line pattern from
an input image; the graph generating unit obtains the number of pixels
included in a segment pattern of a standard size among one or more segment
patterns structuring the straight line pattern, and generates a graph
representing the number of pixels around the straight line pattern; and
the straight line deleting unit determines whether or not to delete the
straight line pattern based on the shape of the graph.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1A shows a simple table frame;
FIG. 1B shows a complicated table frame;
FIG. 1C shows a table frame from which a ruled line is difficult to be
extracted;
FIG. 2A is a block diagram showing the principle of a ruled line extracting
apparatus according to the present invention;
FIG. 2B is a functional block diagram showing the ruled line extracting
apparatus;
FIG. 3 is a block diagram showing the configuration of an information
processing device;
FIG. 4 shows the structure of data;
FIG. 5 is a schematic diagram showing a labelling process;
FIG. 6 shows a histogram of heights;
FIG. 7 shows a histogram for obtaining the most frequent value of height;
FIG. 8 shows a table of rectangle heights;
FIG. 9 shows a histogram corresponding to the contents of the table of
rectangle heights;
FIG. 10 is a schematic diagram showing a mask process;
FIG. 11 is a schematic diagram showing a segment detection process;
FIG. 12 is a schematic diagram showing a first segment integration process;
FIG. 13 is a schematic diagram showing a second segment integration
process;
FIG. 14 is a schematic diagram showing a straight line search process;
FIG. 15 is a schematic diagram showing a process for integrating straight
lines which completely overlap;
FIG. 16 is a schematic diagram showing a first straight line deletion
process;
FIG. 17 is a schematic diagram showing a second straight line deletion
process;
FIG. 18 is a schematic diagram showing a straight line which must not be
deleted;
FIG. 19 is a schematic diagram showing a third straight line deletion
process;
FIG. 20 shows a process for integrating straight lines which partially
overlap;
FIG. 21 is a schematic diagram showing the inside of straight lines which
partially overlap;
FIG. 22 is a schematic diagram showing a fourth straight line deletion
process;
FIG. 23 is a schematic diagram showing how to obtain the value of a
distance between two straight lines;
FIG. 24 is a schematic diagram showing a fifth straight line deletion
process;
FIG. 25 is a schematic diagram showing an image after a process for
integrating horizontal segments is performed;
FIG. 26 is a schematic diagram showing an image before a process for
integrating straight lines which completely overlap is performed;
FIG. 27 is a schematic diagram showing an image after the process for
integrating straight lines which completely overlap is performed;
FIG. 28 is a schematic diagram showing an image after the deletion process
based on the shape and position of a straight line, and a link
relationship between vertical and horizontal straight lines, is performed;
FIG. 29 shows an image before the process for integrating straight lines
which partially overlap is performed;
FIG. 30 shows an image after the process for integrating straight lines
which partially overlap is performed;
FIG. 31 shows an image before a process for deleting a straight line which
almost completely overlaps is performed;
FIG. 32 shows an image after the process for deleting a straight line which
almost completely overlaps, is performed;
FIG. 33 shows an image before the process for deleting a straight line
composed of only large segments, is performed;
FIG. 34 shows an image after the process for deleting a straight line
composed of only large segments, is performed;
FIG. 35 shows an image before a process for checking/deleting a straight
line using a segment shift, is performed;
FIG. 36 shows an image after the process for checking/deleting a straight
line using the segment shift, is performed;
FIG. 37 is a flowchart 1 showing the process for integrating segments;
FIG. 38 is a flowchart 2 showing the process for integrating segments;
FIG. 39 is a flowchart 3 showing the process for integrating segments;
FIG. 40 is a flowchart 4 showing the process for integrating segments;
FIG. 41 is a flowchart 5 showing the process for integrating segments;
FIG. 42 is a flowchart 1 showing the process for checking/deleting a
straight line;
FIG. 43 is a flowchart 2 showing the process for checking/deleting a
straight line;
FIG. 44 is a flowchart 3 showing the process for checking/deleting a
straight line;
FIG. 45 is a flowchart 4 showing the process for checking/deleting a
straight line;
FIG. 46 is a flowchart 5 showing the process for checking/deleting a
straight line; and
FIG. 47 is a flowchart 6 showing the process for checking/deleting a
straight line.
DESCRIPTION OF THE PREFERRED EMBODIMENT
Provided below is the explanation about the details of the preferred
embodiment according to the present invention, by referring to the
drawings.
FIG. 2A is a block diagram showing the principle of a ruled line extracting
apparatus according to the present invention. The ruled line extracting
apparatus shown in FIG. 2A includes the first, second, third, fourth,
fifth, sixth, seven, eighth, ninth, tenth and eleventh principles, and
comprises an estimating unit 1, storing unit 2, segment extracting unit 3,
calculating unit 4, straight line extracting unit 5, graph generating unit
6, straight line processing unit 7, straight line integrating unit 8 and a
straight line deleting unit 9.
According to the first principle, the estimating unit 1 estimates the size
of a standard pattern included in an input image. The straight line
extracting unit 5 sets a threshold value based on the information about
the size of the standard pattern, and extracts the information of one or
more straight line patterns from the input image using the threshold
value.
The standard pattern corresponds to a pattern of a character or the like of
a standard size, which appears most often in an input image. For example,
a pixel concatenation region representing a character is used as the
standard pattern. For example, the height or the width of a rectangle
circumscribed about that region is used as the size information.
A straight line pattern corresponds to a horizontally or vertically long
pattern extracted from an input image by a mask process using a
horizontally or vertically long mask, and a segment integration process.
The information of a straight line pattern includes, for example,
coordinate values of a rectangle which circumscribes a plurality of
segment patterns structuring the straight line pattern. The segment
pattern corresponds to a pixel region in a segment shape, which is
extracted from an image by the mask process.
The straight line extracting unit 5 determines each of threshold values
based on the size of the standard pattern, and classifies straight line
patterns in an image based on the threshold values. With this process, a
straight line pattern deriving from a shaded portion or a character which
touches another character, etc. is excluded from ruled line candidates,
and a correct ruled line candidate can be extracted.
According to the second principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image; the calculating unit 4 obtains the representative value of
the sizes of the one or more straight line patterns. The straight line
processing unit 7 sets a threshold value based on the representative
value, and processes the information of the one or more straight line
patterns using the threshold value.
The calculating unit 4 obtains the representative size of straight line
patterns, for example, based on a histogram of heights or widths of a
plurality of straight line patterns. The straight line processing unit 7
performs the operations such as setting a threshold value close to the
representative value, and excluding a straight line pattern whose size is
larger than the threshold value, etc., thereby extracting a correct ruled
line candidate.
According to the third principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The calculating unit 4 obtains the representative value of
the sizes of one or more segment patterns structuring the one or more
straight line patterns. The straight line processing unit 7 sets a
threshold value based on the representative value, and processes the
information of the one or more straight line patterns using the threshold
value.
A segment pattern corresponds to a pixel region in a segment shape, which
is extracted from an image by the mask process, as described above. The
calculating unit 4 obtains the representative size of segment patterns,
for example, based on a histogram of the heights or widths of a plurality
of segment patterns. The straight line processing unit 7 can extract a
correct ruled line candidate by performing the operations such as
excluding a straight line pattern composed of only segment patterns whose
sizes are larger than the threshold value based on the representative
value.
According to the fourth principle, the segment extracting unit 3 extracts
the information of one or more segment patterns from an input image. The
storing unit 2 classifies the information of one or more segment patterns
into the information of a large segment pattern and the information of a
small segment pattern, and stores them. The straight line extracting unit
5 examines a link state of the one or more segment patterns, and, when a
large segment pattern is linked to small segment patterns, extracts a
straight line pattern composed of the small segment patterns regardless of
the size of the large segment pattern.
The information of a segment pattern includes, for example, the coordinate
values of a rectangle which circumscribes a segment pattern, etc.
The storing unit 2 attaches, for example, particular attribute information
to the information of a segment pattern whose size is larger than an
appropriate threshold value, makes a distinction between the information
of the large segment pattern and the information of a small segment
pattern, and stores the results. The straight line extracting unit 5
ignores a large segment pattern and suitably links small segment patterns
on both sides of the large segment pattern, for example, when it
integrates a plurality of segment patterns which overlap and extracts a
rectangle which circumscribes the patterns as a straight line pattern.
With this process, from an image including a ruled line which contacts a
large pixel region such as a shaded portion, character, etc., a straight
line pattern which is not affected by the size of that region can be
extracted as a correct ruled line candidate.
According to the fifth principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The straight line integrating unit 8 integrates two straight
line patterns included in the one or more straight line patterns into one,
if they almost overlap.
The straight line integrating unit 8 reduces redundant straight line
information by integrating two straight line patterns which almost
overlap, thereby extracting a correct ruled line candidate.
According to the sixth principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The straight line deleting unit 9 determines whether or not
to delete one among the one or more straight line patterns using at least
either of the information about the shape of one pattern among the one or
more straight line patterns, and the information about a distance between
two straight line patterns included in the one or more straight line
patterns.
The straight line deleting unit 9 determines the degree of likeliness of a
ruled line of a straight line pattern, and deletes a straight line pattern
which does not look like a ruled line. With this process, a straight line
pattern deriving from a shaded portion or a defaced character string, etc.
is excluded from ruled line candidates, and a correct ruled line candidate
can be extracted.
According to the seventh principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The straight line deleting unit 9 determines whether or not
to delete either of a horizontal straight line pattern and a vertical
straight line pattern included in the one or more straight line patterns,
based on the link relationship between the horizontal straight line
pattern and the vertical straight line pattern.
The straight line deleting unit 9 excludes, for example, a vertical
straight line pattern which does not touch any horizontal straight line
pattern, and a horizontal straight line pattern which does not touch any
vertical straight line, from ruled line candidates. With this process, a
straight line pattern deriving from a defaced character string, etc. can
be excluded from ruled line candidates, and a correct ruled line candidate
can be extracted.
According to the eighth principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The straight line deleting unit 9 deletes a shorter pattern
among two straight line patterns which almost overlap, and included in the
one or more straight line patterns.
The straight line deleting unit 9 reduces redundant straight line
information by deleting a shorter pattern of two straight line patterns
which almost overlap, thereby extracting a correct ruled line candidate.
According to the ninth principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. If the size of a straight line pattern generated by
integrating two straight line patterns which partially overlap among the
one or more straight line patterns, becomes approximately a predetermined
value, the straight line integrating unit 8 recognizes the straight line
pattern after being integrated as a ruled line candidate.
If the thickness of a straight line pattern to be generated by integrating
two straight line patterns is approximately the representative thickness
of straight line patterns, the straight line integrating unit 8 performs
its integration process. As a result, redundant straight line information
can be reduced, thereby extracting a correct ruled line candidate.
According to the tenth principle, the straight line extracting unit 5
extracts the information of one or more straight line patterns from an
input image. The straight line deleting unit 9 deletes a straight line
pattern composed of segment patterns whose sizes are larger than a
threshold value among the one or more straight line patterns.
The straight line deleting unit 9 excludes, for example, a straight line
pattern composed of only segment patterns whose thicknesses are much more
than the representative thickness of segment patterns, from ruled
candidates. With this process, a straight line pattern deriving from a
defaced character string, etc. is excluded from ruled line candidates,
thereby extracting a correct ruled line candidate.
According to the eleventh principle, the straight line extracting unit 5
extracts the information of a straight line pattern from an input image.
The graph generating unit 6 obtains the number of pixels included in a
segment pattern of a standard size among one or more segment patterns
structuring the straight line pattern, and generates a graph representing
the number of pixels in the neighborhood of the straight line pattern. The
straight line deleting unit 9 determines whether or not to delete the
straight line pattern based on the shape of the graph.
The graph generating unit 6 generates, for example, a set of segment
patterns of a standard size by excluding a large segment pattern from a
set of segment patterns structuring a straight line pattern. Then, the
graph generating unit 6 shifts it to the region around the straight line
pattern, and generates a graph representing the relationship between the
amount of shift and the number of pixels. Furthermore, if the shape of the
graph is gentle and the maximum value is unclear, the straight line
deleting unit 9 deletes the straight line pattern from ruled line
candidates.
For a straight line pattern extracted from the inside of a shaded portion
or a defaced character string, pixels often exist all around the straight
line pattern. In such a case, the shape of the graph becomes gentle, and
the straight line pattern is excluded from ruled line candidates. As a
result, a correct ruled line candidate can be extracted.
For example, the storing unit 2 shown in FIG. 2A corresponds to a memory 32
shown in FIG. 3, to be described later. The estimating unit 1, segment
extracting unit 3, calculating unit 4, straight line extracting unit 5,
graph generating unit 6, straight line processing unit 7, straight line
integrating unit 8 and the straight line deleting unit 9 correspond to a
CPU (Central Processing Unit) 31 and the memory 32.
Currently, a form learning system of a table format document has been
developed in order to automatically extract a keyword such as a title from
a table, etc. with high accuracy. With this system, a document including a
table is registered beforehand, and thereafter a correct keyword can be
extracted from the registered document with high accuracy. The present
invention can be applied in order to correctly extract a ruled line from a
document image, when the form of a table format document is learned.
The present invention, which improves the technique for extracting a ruled
line in a spreadsheet according to the conventional technique or
techniques in former applications, makes a distinction between a straight
line extracted from an original ruled line and a straight line erroneously
extracted from a character string by taking full advantage of the
information of small segments structuring a ruled line. As a result, a
ruled line can be correctly extracted even if a character touches a ruled
line.
Furthermore, even if there is a segment extracted from a defaced portion of
a table, ruled line candidates are obtained by targeting only segments
extracted from an original ruled line. Then, a correct ruled line is
extracted based on the shape and position relationship of a ruled line,
and the distribution state of black pixels in a segment of the ruled line.
The following embodiment targets a document in which various characters
exist such as a character touching a frame, or a character beyond a frame,
when there is one or a plurality of frames such as a frame whose size,
position, or slope is unknown. Now, we shall consider the case in which a
frame is extracted from such a document image.
FIG. 2B is a functional block diagram showing a ruled line extracting
apparatus according to this embodiment. In this figure, an input pattern
11 to be targeted is a binary image in which an extreme slope or a
rotation is corrected. The shaded process blocks indicate the processes
mainly different from those according to the former applications,
including the application "Pattern Extracting Apparatus and Pattern Region
Extracting Method" (Japanese patent application H8-107568), etc.
After a reduction processing unit 12 reduces an image, and a concatenation
pattern extracting unit 13 extracts a concatenation pattern, the ruled
line extracting apparatus calculates the most frequent value of height of
rectangles (process P1), and a mask processing unit 14 performs thinning
operations.
Then, a horizontal straight line extracting unit 15 performs horizontal
adjacency projection (process P2), horizontal segment detection (process
P3), horizontal segment integration (process P4), and a horizontal
straight line search (process P5). Next, the ruled line extracting
apparatus performs horizontal dotted line detection (process P6). After a
vertical straight line extracting unit 16 performs vertical adjacency
projection (process P7), vertical segment detection (process P8), vertical
segment integration (process P9), and a vertical straight line search
(process P10), the ruled line extracting apparatus performs vertical
dotted line detection (process P11).
Next, the ruled line extracting apparatus calculates the most frequent
value of height of horizontal straight lines (process P12), calculates the
most frequent value of width of vertical straight lines (process P13),
calculates the most frequent value of height of horizontal segments
(process P14), and calculates the most frequent value of width of vertical
segments (process P15). Then, the apparatus integrates straight lines
which completely overlap (process P16), and deletes an unnecessary
straight line based on the shape of a straight line rectangle and the
distance to the next straight line rectangle (process P17). Next, the
apparatus deletes an unnecessary straight line based on the link
relationship between vertical and horizontal straight lines (process P18),
and integrates straight lines which partially overlap (process P19).
The ruled line extracting apparatus excludes a straight line which almost
completely overlaps with another (process P20), and deletes a straight
line composed of only segments whose sizes are larger than a predetermined
threshold value (process P21). The apparatus attaches a mark to a segment
whose size is larger than the threshold value (process P22), checks a
straight line while shifting a segment to be targeted and deletes an
unnecessary straight line (process P23), and outputs the remaining
straight lines, (process P24).
The ruled line extracting apparatus according to this embodiment is
implemented by, for example, an information processing device (computer)
shown in FIG. 3. The information processing device shown in FIG. 3
comprises a CPU 31, memory 32, input device 33, output device 34, external
storage device 35, medium driving device 36, network connecting device 37,
and a photoelectric converter 38, all of which are interconnected via a
bus 39.
The CPU 31 executes a program stored in the memory 32, and performs each of
the processes shown in FIG. 28. As the memory 32, for example, a ROM (Read
Only Memory), RAM (Random Access Memory), etc. are employed.
The input device 32 corresponds to, for example, a keyboard, pointing
device, etc., and is used to input a request or instruction from a user.
The output device 34 corresponds to a display device, printer, etc, and is
used to output the result of a process, etc.
The external storage device 35 is, for example, a magnetic disk device,
optical disk device, or a magneto-optical disk device, etc., and can store
a program and data. It is used as a database of an electronic filing
system, which is intended for storing images, keywords, etc.
The medium driving device 36 drives a portable storage medium 40, and can
access its stored contents. As the portable storage medium 40, an
arbitrary computer-readable storage medium such as a memory card, floppy
disk, CD-ROM (Compact Disc-Read Only Memory), optical disk,
magneto-optical disk, etc. can be used. The portable storage medium 40
stores the program for performing the processes shown in FIG. 2B in
addition to data.
The network connecting device 37 is connected to an arbitrary
communications network such as a LAN (Local Area Network), etc., and
performs a data conversion, etc. accompanying a communication. The ruled
line extracting apparatus can receive required data or program from an
external database, etc. via the network connecting device 37. The
photoelectric converter 38 is, for example, an image scanner, and is
intended to input a normal document image to be processed.
In the memory 32, data required for the processes is managed, for example,
as the structure shown in FIG. 4. In this figure, information 41 of one
input image is composed of the number of tables (table format frames)
included in an image, and information 42 of each table.
The information 42 of each table is composed of the coordinate values of a
circumscribed rectangle of a table, the number of cells included in the
table, information 43 of each cell, the number of horizontal straight
lines included in the table, information 44 of each horizontal straight
line, the number of vertical straight lines included in the table, and
information 44 of each vertical straight line. Here, a cell indicates a
region surrounded by ruled lines.
The information 43 of each cell includes the coordinate values of a cell,
and the information 44 of each straight line is composed of the coordinate
values of a rectangle representing a straight line, the attribute
information of the straight line, the number of small segments included in
the straight line, information 45 of each small segment, and a serial
number of the straight line in the entire image. The information 45 of
each small segment includes the attribute information of a small segment,
and the coordinate values of a rectangle representing the small segment.
The attribute information of a straight line and a small segment are used
to make a distinction, for example, between a solid line and a dotted
line, and between a wild card segment whose height or width exceeds a
predetermined value and another segment, etc.
Provided next is the explanation about each of the processes shown in FIG.
2B, by referring to FIGS. 5 through 24.
If the resolution of an image of the input pattern 11 is a predetermined
resolution or greater, and the size of the image is relatively large, the
reduction process unit 12 performs a process for reducing an image in
order to improve the efficiency of the process. The input original image
| | |