WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Ruled line extracting apparatus for extracting ruled line from normal document image and method thereof    

Get related patents on CD
United States Patent6754385   
Link to this pagehttp://www.wikipatents.com/6754385.html
Inventor(s)Katsuyama; Yutaka (Kawasaki, JP)
AbstractA ruled line extracting apparatus obtains circumscribed rectangles of pixel concatenation regions included in an input pattern, and calculates the most frequent value of their heights. Additionally, the apparatus integrates segments by ignoring a wild card segment, and calculates the most frequent value of height/width of extracted straight lines and segments structuring the straight line. Next, it performs a process for integrating/deleting straight lines using each threshold value based on the highest frequency value. Then, it checks/deletes a straight line according to a distribution of black pixels around the straight line, and recognizes the remaining straight lines as ruled line candidates.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Inventor     Katsuyama; Yutaka (Kawasaki, JP)
Owner/Assignee     Fujitsu Limited (Kawasaki, JP)
Patent assignment
All assignments
Company News
Publication Date     June 22, 2004
Application Number     09/755,036
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     January 8, 2001
US Classification     382/171 382/177 382/199 382/202
Int'l Classification     G06K 009/00 G06K 009/34 228 282-283
Examiner     Johnson; Timothy M.
Assistant Examiner    
Attorney/Law Firm     Staas & Halsey LLP
Address
Parent Case     CROSS-REFERENCE TO RELATED APPLICATION This application is a division of application Ser. No. 08/909,137, filed Aug. 11, 1997, now U.S. Pat. No. 6,226,402.
Priority Data     Dec 20, 1996[JP]8-342185
USPTO Field of Search     382/164 382/165 382/170 382/171 382/172 382/173 382/170 382/171 382/172 382/173 382/170 382/171 382/172 382/173 382/170 382/171 382/172 382/173 382/190 382/192 382/170 382/171 382/172 382/173 382/170 382/171 382/172 382/173 382/170 382/171 382/172 382/173 358/538 358/453 358/462 358/464
Patent Tags     ruled line extracting extracting ruled line normal document image
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5757957
Tachikawa

May,1998

[0 after 0 votes]
5307422
Wang
382/177
Apr,1994

[0 after 0 votes]
5191612
Katsuyama
382/171
Mar,1993

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A ruled line extracting apparatus, comprising:

straight line extracting means for extracting information of a straight line pattern from an input image;

graph generating means for obtaining the number of pixels included in a segment pattern of a standard size among one or more segment patterns structuring the straight line pattern, and generating a graph representing the number of pixels around the straight line pattern; and

straight line deleting means for determining whether or not to delete the straight line pattern based on a shape of the graph.

2. The ruled line extracting apparatus according to claim 1, further comprising:

storing means for attaching a mark to information of a large segment pattern among the one or more segment patterns, and storing information of the one or more segment patterns, wherein said graph generating means recognizes a segment pattern to which the mark is not attached among the one or more segment patterns as the segment pattern of the standard size.

3. The ruled line extracting apparatus according to claim 1, wherein:

said graph generating means shifts the segment pattern of the standard size in a direction perpendicular to a direction of a length of the straight line pattern, and generates the graph representing a relationship between an amount of shift and the number of pixels; and

said straight line deleting means deletes the straight line pattern if the shape of the graph is gentle.

4. A computer-readable storage medium, when used by a computer, to direct the computer to perform the functions of:

extracting information of a straight line pattern from an input image;

obtaining the number of pixels included in a segment pattern of a standard size among one or more segment patterns structuring the straight line pattern, and generating a graph representing the number of pixels around the straight line pattern; and

determining whether or not to delete the straight line pattern based on a shape of the graph.

5. A ruled line extracting method, comprising the steps of:

extracting information of a straight line pattern from an input image;

obtaining the number of pixels included in a segment pattern of a standard size among one or more segment patterns structuring the straight line pattern, and generating a graph representing the number of pixels around the straight line pattern; and

determining whether or not to delete the straight line pattern based on a shape of the graph.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a ruled line extracting apparatus for extracting a ruled line portion from an arbitrary document image read by a photoelectric converter, etc., and method thereof.

2. Description of the Related Art

In recent years, the demand for an electronic filing system which converts a paper document into an electronic form, and stores it on an optical disc, etc., has increased, in order to improve the efficiency of operations performed within a company. With a conventional electronic filing system, a paper document is converted into an image by a photoelectric converter such as an image scanner, etc., and the image with a search keyword attached is stored on an optical disc or on a hard disk. However, since the keyword must be input from a keyboard, the input operation is troublesome.

As a former application by the present applicant in order to overcome this troublesome operation, "Title Extracting Apparatus for Extracting Title from Document Image and Method Thereof, U.S. patent application Ser. No. 08/694,503, Japanese patent application H7-341983" can be referred to. With this method, a document title included in an image is automatically extracted and registered as a keyword. Additionally, management information such as a title, destination, transmitting source etc., can be automatically extracted from various document images including a table format document. For example, it is proved that a title outside a table can be extracted with approximately 90% accuracy.

A title inside a table, however, can be extracted with only 55% accuracy, which is insufficient to be put into practical use. To extract a keyword such as a title from inside a table with high accuracy, ruled lines structuring the table must be accurately extracted. The technique for extracting a ruled line has been developed mainly for a spreadsheet in which characters, etc. are regularly lined up.

As the conventional techniques for extracting a ruled line, "Image Extracting Method" (Japanese patent laid-open H6-309498) and "Image Extracting Apparatus" (Japanese patent laid-open H7-28937) can be referred to. With these techniques, a frame can be extracted or removed without requiring an input of information such as a frame position etc., in a spreadsheet. A spreadsheet which can be processed is a sheet composed of one-character frames, block frames (horizontal one-line frames, or free format frames), or a sheet having a structure in which the shape of a frame is rectangular, and horizontal frame lines are regularly arranged.

Additionally, as the techniques for extracting a ruled line according to former applications in Japan by the present applicant, "Frame Extracting Apparatus and Rectangle Extracting Apparatus" (Japanese patent application H7-203259), "Pattern Area Extracting Apparatus and Pattern Extracting Apparatus" (Japanese patent application H7-282171), and "Pattern Extracting Apparatus and Pattern Area Extracting Method" (Japanese patent application H8-107568) can be referred to.

With these techniques, a frame can be extracted/removed even if the outer periphery of frames is rectangular as shown in FIG. 1A, or not rectangular as shown in FIG. 1B. Furthermore, the frame of a table structured by a rectangle which is surrounded by a frame, and partitioned into smaller portions, can also be extracted and removed, like the shaded portion shown in FIG. 1B. Provided below is the outline of this process.

(1) thinning: With a mask process, horizontal and vertical segments are made thinner, and the difference between the thickness of a character and that of a frame is eliminated.

(2) segment extraction: a relatively long straight line is extracted with the adjacency projection method according to the "Image Extracting Method" (Japanese patent laid-open H6-309498). The adjacency projection method is a method for recognizing the result of adding the projection value of pixels included in rows or columns around a specific row or column, to the projection value of pixels in the specific row or column, as the final projection value of the specific row or column. With this method, pixel distribution around a particular row or column can be globally identified.

(3) straight line extraction: extracted segments are sequentially searched, and it is examined whether or not there is an empty space of a predetermined length between segments. If there is no such empty space, the segments are sequentially linked, so that a long straight line is extracted.

(4) straight line integration: extracted straight lines are again integrated. Straight lines separated into two or more portions due to a blur are integrated into one straight line.

(5) straight line extension: a straight line which is made shorter due to a blur is extended, and restored to its original length, only when a spreadsheet is proved to be regular.

However, the above described techniques have the following problems.

According to the techniques disclosed in the former applications, whether the shape of a frame of a spreadsheet is regular or irregular, it can be processed as long as it is a table frame composed of rectangular regions. Whether a ruled line to be targeted is a solid or dotted line, it can be processed regardless of the existence of a blur. Furthermore, a straight line which is made shorter due to an extreme blur is extended only when a table is proved to be regular.

A normal input image may sometimes include characters of a thick font, or a shaded portion in a table, as shown in FIG. 1C. In such a case, a ruled line is erroneously extracted from a defaced character string in which characters touch one another, and ruled lines which are erroneously extracted may sometimes be integrated with correct ruled lines.

Additionally, a ruled line which touches a group of black pixels such as a shaded portion, or a ruled line which touches a character cannot be extracted. To overcome these problems, it is desirable that a table document such as a spreadsheet whose ruled-line structure is known beforehand should be a process target.

However, since it is unknown beforehand what type of table a normal document handled by electronic filing includes, the probability that various images including a defaced character etc., are input, is high. Accordingly, a ruled-line is not necessarily and correctly extracted according to the techniques of the former applications as they are.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a ruled line extracting apparatus and method thereof, which allow a ruled line portion to be extracted from a normal document image whose ruled-line structure cannot be predicted.

The ruled-line extracting apparatus according to the present invention comprises an estimating unit, storing unit, segment extracting unit, calculating unit, straight line extracting unit, graph generating unit, straight line processing unit, straight line integrating unit and a straight line deleting unit.

In a first aspect of the present invention, the estimating unit estimates the size of a standard pattern included in an input image; and the straight line extracting unit sets a threshold value based on the information about the size of the standard pattern, and extracts the information of one or more straight line patterns from the input image using the threshold value.

In a second aspect of the present invention, the straight line extracting unit extracts the information about one or more straight line patterns from an input image; the calculating unit obtains a representative value of the sizes of the one or more straight line patterns; and the straight line processing unit sets a threshold value based on the representative value, and processes the information of the one or more straight line patterns using the threshold value.

In a third aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; the calculating unit obtains a representative value of the sizes of one or more segment patterns structuring the one or more straight line patterns; and the straight line processing unit sets a threshold value based on the representative value, and processes the information of the one or more straight line patterns using the threshold value.

In a fourth aspect of the present invention, the segment extracting unit extracts the information of one or more segment patterns from an input image; the storing unit classifies the information of one or more segment patterns into the information of a large segment pattern and the information of a small segment pattern, and stores them; and the straight line extracting unit examines a link state of the one or more segment patterns, and, when a large segment pattern is linked to small segment patterns, extracts a straight line pattern composed of the small segment patterns regardless of the size of the large segment pattern.

In a fifth aspect of the present invention, the straight line extracting unit extracts the information about one or more straight line patterns from an input image; and the straight line integrating unit integrates two straight line patterns, included in the one or more straight line patterns, into one, if they almost overlap.

In a sixth aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; and the straight line deleting unit determines whether or not to delete one of the straight line patterns using at least either of the information about the shape of one pattern among the one or more straight line patterns, and the information about a distance between two straight line patterns included in the one or more straight line patterns.

In a seventh aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; and the straight line deleting unit determines whether or not to delete either of a horizontal straight line pattern and a vertical straight line pattern included in the one or more straight line patterns based on a link relationship between these patterns.

In an eighth aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; and the straight line deleting unit deletes a shorter pattern of two straight line patterns which almost overlap, and included in the one or more straight line patterns.

In a ninth aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; the straight line integrating unit recognizes an integrated straight line pattern as a ruled line candidate when the size of the straight line pattern, generated by integrating two straight line patterns which partially overlap, and included in the one or more straight line patterns, becomes approximately a predetermined value.

In a tenth aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; and the straight line deleting unit deletes a straight line pattern composed of segment patterns larger than a threshold value among the one or more straight line patterns.

In an eleventh aspect of the present invention, the straight line extracting unit extracts the information of a straight line pattern from an input image; the graph generating unit obtains the number of pixels included in a segment pattern of a standard size among one or more segment patterns structuring the straight line pattern, and generates a graph representing the number of pixels around the straight line pattern; and the straight line deleting unit determines whether or not to delete the straight line pattern based on the shape of the graph.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows a simple table frame;

FIG. 1B shows a complicated table frame;

FIG. 1C shows a table frame from which a ruled line is difficult to be extracted;

FIG. 2A is a block diagram showing the principle of a ruled line extracting apparatus according to the present invention;

FIG. 2B is a functional block diagram showing the ruled line extracting apparatus;

FIG. 3 is a block diagram showing the configuration of an information processing device;

FIG. 4 shows the structure of data;

FIG. 5 is a schematic diagram showing a labelling process;

FIG. 6 shows a histogram of heights;

FIG. 7 shows a histogram for obtaining the most frequent value of height;

FIG. 8 shows a table of rectangle heights;

FIG. 9 shows a histogram corresponding to the contents of the table of rectangle heights;

FIG. 10 is a schematic diagram showing a mask process;

FIG. 11 is a schematic diagram showing a segment detection process;

FIG. 12 is a schematic diagram showing a first segment integration process;

FIG. 13 is a schematic diagram showing a second segment integration process;

FIG. 14 is a schematic diagram showing a straight line search process;

FIG. 15 is a schematic diagram showing a process for integrating straight lines which completely overlap;

FIG. 16 is a schematic diagram showing a first straight line deletion process;

FIG. 17 is a schematic diagram showing a second straight line deletion process;

FIG. 18 is a schematic diagram showing a straight line which must not be deleted;

FIG. 19 is a schematic diagram showing a third straight line deletion process;

FIG. 20 shows a process for integrating straight lines which partially overlap;

FIG. 21 is a schematic diagram showing the inside of straight lines which partially overlap;

FIG. 22 is a schematic diagram showing a fourth straight line deletion process;

FIG. 23 is a schematic diagram showing how to obtain the value of a distance between two straight lines;

FIG. 24 is a schematic diagram showing a fifth straight line deletion process;

FIG. 25 is a schematic diagram showing an image after a process for integrating horizontal segments is performed;

FIG. 26 is a schematic diagram showing an image before a process for integrating straight lines which completely overlap is performed;

FIG. 27 is a schematic diagram showing an image after the process for integrating straight lines which completely overlap is performed;

FIG. 28 is a schematic diagram showing an image after the deletion process based on the shape and position of a straight line, and a link relationship between vertical and horizontal straight lines, is performed;

FIG. 29 shows an image before the process for integrating straight lines which partially overlap is performed;

FIG. 30 shows an image after the process for integrating straight lines which partially overlap is performed;

FIG. 31 shows an image before a process for deleting a straight line which almost completely overlaps is performed;

FIG. 32 shows an image after the process for deleting a straight line which almost completely overlaps, is performed;

FIG. 33 shows an image before the process for deleting a straight line composed of only large segments, is performed;

FIG. 34 shows an image after the process for deleting a straight line composed of only large segments, is performed;

FIG. 35 shows an image before a process for checking/deleting a straight line using a segment shift, is performed;

FIG. 36 shows an image after the process for checking/deleting a straight line using the segment shift, is performed;

FIG. 37 is a flowchart 1 showing the process for integrating segments;

FIG. 38 is a flowchart 2 showing the process for integrating segments;

FIG. 39 is a flowchart 3 showing the process for integrating segments;

FIG. 40 is a flowchart 4 showing the process for integrating segments;

FIG. 41 is a flowchart 5 showing the process for integrating segments;

FIG. 42 is a flowchart 1 showing the process for checking/deleting a straight line;

FIG. 43 is a flowchart 2 showing the process for checking/deleting a straight line;

FIG. 44 is a flowchart 3 showing the process for checking/deleting a straight line;

FIG. 45 is a flowchart 4 showing the process for checking/deleting a straight line;

FIG. 46 is a flowchart 5 showing the process for checking/deleting a straight line; and

FIG. 47 is a flowchart 6 showing the process for checking/deleting a straight line.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Provided below is the explanation about the details of the preferred embodiment according to the present invention, by referring to the drawings.

FIG. 2A is a block diagram showing the principle of a ruled line extracting apparatus according to the present invention. The ruled line extracting apparatus shown in FIG. 2A includes the first, second, third, fourth, fifth, sixth, seven, eighth, ninth, tenth and eleventh principles, and comprises an estimating unit 1, storing unit 2, segment extracting unit 3, calculating unit 4, straight line extracting unit 5, graph generating unit 6, straight line processing unit 7, straight line integrating unit 8 and a straight line deleting unit 9.

According to the first principle, the estimating unit 1 estimates the size of a standard pattern included in an input image. The straight line extracting unit 5 sets a threshold value based on the information about the size of the standard pattern, and extracts the information of one or more straight line patterns from the input image using the threshold value.

The standard pattern corresponds to a pattern of a character or the like of a standard size, which appears most often in an input image. For example, a pixel concatenation region representing a character is used as the standard pattern. For example, the height or the width of a rectangle circumscribed about that region is used as the size information.

A straight line pattern corresponds to a horizontally or vertically long pattern extracted from an input image by a mask process using a horizontally or vertically long mask, and a segment integration process. The information of a straight line pattern includes, for example, coordinate values of a rectangle which circumscribes a plurality of segment patterns structuring the straight line pattern. The segment pattern corresponds to a pixel region in a segment shape, which is extracted from an image by the mask process.

The straight line extracting unit 5 determines each of threshold values based on the size of the standard pattern, and classifies straight line patterns in an image based on the threshold values. With this process, a straight line pattern deriving from a shaded portion or a character which touches another character, etc. is excluded from ruled line candidates, and a correct ruled line candidate can be extracted.

According to the second principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image; the calculating unit 4 obtains the representative value of the sizes of the one or more straight line patterns. The straight line processing unit 7 sets a threshold value based on the representative value, and processes the information of the one or more straight line patterns using the threshold value.

The calculating unit 4 obtains the representative size of straight line patterns, for example, based on a histogram of heights or widths of a plurality of straight line patterns. The straight line processing unit 7 performs the operations such as setting a threshold value close to the representative value, and excluding a straight line pattern whose size is larger than the threshold value, etc., thereby extracting a correct ruled line candidate.

According to the third principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The calculating unit 4 obtains the representative value of the sizes of one or more segment patterns structuring the one or more straight line patterns. The straight line processing unit 7 sets a threshold value based on the representative value, and processes the information of the one or more straight line patterns using the threshold value.

A segment pattern corresponds to a pixel region in a segment shape, which is extracted from an image by the mask process, as described above. The calculating unit 4 obtains the representative size of segment patterns, for example, based on a histogram of the heights or widths of a plurality of segment patterns. The straight line processing unit 7 can extract a correct ruled line candidate by performing the operations such as excluding a straight line pattern composed of only segment patterns whose sizes are larger than the threshold value based on the representative value.

According to the fourth principle, the segment extracting unit 3 extracts the information of one or more segment patterns from an input image. The storing unit 2 classifies the information of one or more segment patterns into the information of a large segment pattern and the information of a small segment pattern, and stores them. The straight line extracting unit 5 examines a link state of the one or more segment patterns, and, when a large segment pattern is linked to small segment patterns, extracts a straight line pattern composed of the small segment patterns regardless of the size of the large segment pattern.

The information of a segment pattern includes, for example, the coordinate values of a rectangle which circumscribes a segment pattern, etc.

The storing unit 2 attaches, for example, particular attribute information to the information of a segment pattern whose size is larger than an appropriate threshold value, makes a distinction between the information of the large segment pattern and the information of a small segment pattern, and stores the results. The straight line extracting unit 5 ignores a large segment pattern and suitably links small segment patterns on both sides of the large segment pattern, for example, when it integrates a plurality of segment patterns which overlap and extracts a rectangle which circumscribes the patterns as a straight line pattern.

With this process, from an image including a ruled line which contacts a large pixel region such as a shaded portion, character, etc., a straight line pattern which is not affected by the size of that region can be extracted as a correct ruled line candidate.

According to the fifth principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The straight line integrating unit 8 integrates two straight line patterns included in the one or more straight line patterns into one, if they almost overlap.

The straight line integrating unit 8 reduces redundant straight line information by integrating two straight line patterns which almost overlap, thereby extracting a correct ruled line candidate.

According to the sixth principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The straight line deleting unit 9 determines whether or not to delete one among the one or more straight line patterns using at least either of the information about the shape of one pattern among the one or more straight line patterns, and the information about a distance between two straight line patterns included in the one or more straight line patterns.

The straight line deleting unit 9 determines the degree of likeliness of a ruled line of a straight line pattern, and deletes a straight line pattern which does not look like a ruled line. With this process, a straight line pattern deriving from a shaded portion or a defaced character string, etc. is excluded from ruled line candidates, and a correct ruled line candidate can be extracted.

According to the seventh principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The straight line deleting unit 9 determines whether or not to delete either of a horizontal straight line pattern and a vertical straight line pattern included in the one or more straight line patterns, based on the link relationship between the horizontal straight line pattern and the vertical straight line pattern.

The straight line deleting unit 9 excludes, for example, a vertical straight line pattern which does not touch any horizontal straight line pattern, and a horizontal straight line pattern which does not touch any vertical straight line, from ruled line candidates. With this process, a straight line pattern deriving from a defaced character string, etc. can be excluded from ruled line candidates, and a correct ruled line candidate can be extracted.

According to the eighth principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The straight line deleting unit 9 deletes a shorter pattern among two straight line patterns which almost overlap, and included in the one or more straight line patterns.

The straight line deleting unit 9 reduces redundant straight line information by deleting a shorter pattern of two straight line patterns which almost overlap, thereby extracting a correct ruled line candidate.

According to the ninth principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. If the size of a straight line pattern generated by integrating two straight line patterns which partially overlap among the one or more straight line patterns, becomes approximately a predetermined value, the straight line integrating unit 8 recognizes the straight line pattern after being integrated as a ruled line candidate.

If the thickness of a straight line pattern to be generated by integrating two straight line patterns is approximately the representative thickness of straight line patterns, the straight line integrating unit 8 performs its integration process. As a result, redundant straight line information can be reduced, thereby extracting a correct ruled line candidate.

According to the tenth principle, the straight line extracting unit 5 extracts the information of one or more straight line patterns from an input image. The straight line deleting unit 9 deletes a straight line pattern composed of segment patterns whose sizes are larger than a threshold value among the one or more straight line patterns.

The straight line deleting unit 9 excludes, for example, a straight line pattern composed of only segment patterns whose thicknesses are much more than the representative thickness of segment patterns, from ruled candidates. With this process, a straight line pattern deriving from a defaced character string, etc. is excluded from ruled line candidates, thereby extracting a correct ruled line candidate.

According to the eleventh principle, the straight line extracting unit 5 extracts the information of a straight line pattern from an input image. The graph generating unit 6 obtains the number of pixels included in a segment pattern of a standard size among one or more segment patterns structuring the straight line pattern, and generates a graph representing the number of pixels in the neighborhood of the straight line pattern. The straight line deleting unit 9 determines whether or not to delete the straight line pattern based on the shape of the graph.

The graph generating unit 6 generates, for example, a set of segment patterns of a standard size by excluding a large segment pattern from a set of segment patterns structuring a straight line pattern. Then, the graph generating unit 6 shifts it to the region around the straight line pattern, and generates a graph representing the relationship between the amount of shift and the number of pixels. Furthermore, if the shape of the graph is gentle and the maximum value is unclear, the straight line deleting unit 9 deletes the straight line pattern from ruled line candidates.

For a straight line pattern extracted from the inside of a shaded portion or a defaced character string, pixels often exist all around the straight line pattern. In such a case, the shape of the graph becomes gentle, and the straight line pattern is excluded from ruled line candidates. As a result, a correct ruled line candidate can be extracted.

For example, the storing unit 2 shown in FIG. 2A corresponds to a memory 32 shown in FIG. 3, to be described later. The estimating unit 1, segment extracting unit 3, calculating unit 4, straight line extracting unit 5, graph generating unit 6, straight line processing unit 7, straight line integrating unit 8 and the straight line deleting unit 9 correspond to a CPU (Central Processing Unit) 31 and the memory 32.

Currently, a form learning system of a table format document has been developed in order to automatically extract a keyword such as a title from a table, etc. with high accuracy. With this system, a document including a table is registered beforehand, and thereafter a correct keyword can be extracted from the registered document with high accuracy. The present invention can be applied in order to correctly extract a ruled line from a document image, when the form of a table format document is learned.

The present invention, which improves the technique for extracting a ruled line in a spreadsheet according to the conventional technique or techniques in former applications, makes a distinction between a straight line extracted from an original ruled line and a straight line erroneously extracted from a character string by taking full advantage of the information of small segments structuring a ruled line. As a result, a ruled line can be correctly extracted even if a character touches a ruled line.

Furthermore, even if there is a segment extracted from a defaced portion of a table, ruled line candidates are obtained by targeting only segments extracted from an original ruled line. Then, a correct ruled line is extracted based on the shape and position relationship of a ruled line, and the distribution state of black pixels in a segment of the ruled line.

The following embodiment targets a document in which various characters exist such as a character touching a frame, or a character beyond a frame, when there is one or a plurality of frames such as a frame whose size, position, or slope is unknown. Now, we shall consider the case in which a frame is extracted from such a document image.

FIG. 2B is a functional block diagram showing a ruled line extracting apparatus according to this embodiment. In this figure, an input pattern 11 to be targeted is a binary image in which an extreme slope or a rotation is corrected. The shaded process blocks indicate the processes mainly different from those according to the former applications, including the application "Pattern Extracting Apparatus and Pattern Region Extracting Method" (Japanese patent application H8-107568), etc.

After a reduction processing unit 12 reduces an image, and a concatenation pattern extracting unit 13 extracts a concatenation pattern, the ruled line extracting apparatus calculates the most frequent value of height of rectangles (process P1), and a mask processing unit 14 performs thinning operations.

Then, a horizontal straight line extracting unit 15 performs horizontal adjacency projection (process P2), horizontal segment detection (process P3), horizontal segment integration (process P4), and a horizontal straight line search (process P5). Next, the ruled line extracting apparatus performs horizontal dotted line detection (process P6). After a vertical straight line extracting unit 16 performs vertical adjacency projection (process P7), vertical segment detection (process P8), vertical segment integration (process P9), and a vertical straight line search (process P10), the ruled line extracting apparatus performs vertical dotted line detection (process P11).

Next, the ruled line extracting apparatus calculates the most frequent value of height of horizontal straight lines (process P12), calculates the most frequent value of width of vertical straight lines (process P13), calculates the most frequent value of height of horizontal segments (process P14), and calculates the most frequent value of width of vertical segments (process P15). Then, the apparatus integrates straight lines which completely overlap (process P16), and deletes an unnecessary straight line based on the shape of a straight line rectangle and the distance to the next straight line rectangle (process P17). Next, the apparatus deletes an unnecessary straight line based on the link relationship between vertical and horizontal straight lines (process P18), and integrates straight lines which partially overlap (process P19).

The ruled line extracting apparatus excludes a straight line which almost completely overlaps with another (process P20), and deletes a straight line composed of only segments whose sizes are larger than a predetermined threshold value (process P21). The apparatus attaches a mark to a segment whose size is larger than the threshold value (process P22), checks a straight line while shifting a segment to be targeted and deletes an unnecessary straight line (process P23), and outputs the remaining straight lines, (process P24).

The ruled line extracting apparatus according to this embodiment is implemented by, for example, an information processing device (computer) shown in FIG. 3. The information processing device shown in FIG. 3 comprises a CPU 31, memory 32, input device 33, output device 34, external storage device 35, medium driving device 36, network connecting device 37, and a photoelectric converter 38, all of which are interconnected via a bus 39.

The CPU 31 executes a program stored in the memory 32, and performs each of the processes shown in FIG. 28. As the memory 32, for example, a ROM (Read Only Memory), RAM (Random Access Memory), etc. are employed.

The input device 32 corresponds to, for example, a keyboard, pointing device, etc., and is used to input a request or instruction from a user. The output device 34 corresponds to a display device, printer, etc, and is used to output the result of a process, etc.

The external storage device 35 is, for example, a magnetic disk device, optical disk device, or a magneto-optical disk device, etc., and can store a program and data. It is used as a database of an electronic filing system, which is intended for storing images, keywords, etc.

The medium driving device 36 drives a portable storage medium 40, and can access its stored contents. As the portable storage medium 40, an arbitrary computer-readable storage medium such as a memory card, floppy disk, CD-ROM (Compact Disc-Read Only Memory), optical disk, magneto-optical disk, etc. can be used. The portable storage medium 40 stores the program for performing the processes shown in FIG. 2B in addition to data.

The network connecting device 37 is connected to an arbitrary communications network such as a LAN (Local Area Network), etc., and performs a data conversion, etc. accompanying a communication. The ruled line extracting apparatus can receive required data or program from an external database, etc. via the network connecting device 37. The photoelectric converter 38 is, for example, an image scanner, and is intended to input a normal document image to be processed.

In the memory 32, data required for the processes is managed, for example, as the structure shown in FIG. 4. In this figure, information 41 of one input image is composed of the number of tables (table format frames) included in an image, and information 42 of each table.

The information 42 of each table is composed of the coordinate values of a circumscribed rectangle of a table, the number of cells included in the table, information 43 of each cell, the number of horizontal straight lines included in the table, information 44 of each horizontal straight line, the number of vertical straight lines included in the table, and information 44 of each vertical straight line. Here, a cell indicates a region surrounded by ruled lines.

The information 43 of each cell includes the coordinate values of a cell, and the information 44 of each straight line is composed of the coordinate values of a rectangle representing a straight line, the attribute information of the straight line, the number of small segments included in the straight line, information 45 of each small segment, and a serial number of the straight line in the entire image. The information 45 of each small segment includes the attribute information of a small segment, and the coordinate values of a rectangle representing the small segment. The attribute information of a straight line and a small segment are used to make a distinction, for example, between a solid line and a dotted line, and between a wild card segment whose height or width exceeds a predetermined value and another segment, etc.

Provided next is the explanation about each of the processes shown in FIG. 2B, by referring to FIGS. 5 through 24.

If the resolution of an image of the input pattern 11 is a predetermined resolution or greater, and the size of the image is relatively large, the reduction process unit 12 performs a process for reducing an image in order to improve the efficiency of the process. The input original image