|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
| Add a new US reference: |
| | Reference | Relevancy | Comments | Reference | Relevancy | Comments | 2905927
|      Your vote accepted [0 after 0 votes] | | 3133266
|      Your vote accepted [0 after 0 votes] | | 3295105
|      Your vote accepted [0 after 0 votes] | | 5305389 Palmer 382/305 Apr,1994 |      Your vote accepted [0 after 0 votes] | | 5291560 Daugman 382/117 Mar,1994 |      Your vote accepted [0 after 0 votes] | | 5272764 Bloomberg 382/164 Dec,1993 |      Your vote accepted [0 after 0 votes] | | 5216725 McCubbrey 382/102 Jun,1993 |      Your vote accepted [0 after 0 votes] | | 5214719 Budd 382/201 May,1993 |      Your vote accepted [0 after 0 votes] | | 5179596 Weingard 382/158 Jan,1993 |      Your vote accepted [0 after 0 votes] | | 5179419 Palmquist 356/73.1 Jan,1993 |      Your vote accepted [0 after 0 votes] | | 5142589 Lougheed 382/102 Aug,1992 |      Your vote accepted [0 after 0 votes] | | 4977603 Irie 382/218 Dec,1990 |      Your vote accepted [0 after 0 votes] | | 4956869 Miyatake 382/242 Sep,1990 |      Your vote accepted [0 after 0 votes] | | 4949392 Barski 382/283 Aug,1990 |      Your vote accepted [0 after 0 votes] | | 4949281 Hillenbrand 345/442 Aug,1990 |      Your vote accepted [0 after 0 votes] | | 4933977 Ohnishi 382/178 Jun,1990 |      Your vote accepted [0 after 0 votes] | | 4926490 Mano 382/177 May,1990 |      Your vote accepted [0 after 0 votes] | | 4918740 Ross 382/177 Apr,1990 |      Your vote accepted [0 after 0 votes] | | 4864628 Scott 382/197 Sep,1989 |      Your vote accepted [0 after 0 votes] | | 4821333 Gillies 382/308 Apr,1989 |      Your vote accepted [0 after 0 votes] | | 4809344 Peppers 382/173 Feb,1989 |      Your vote accepted [0 after 0 votes] | | 4764972 Yoshida 382/187 Aug,1988 |      Your vote accepted [0 after 0 votes] | | 4747148 Watanabe
May,1988 |      Your vote accepted [0 after 0 votes] | | 4731857 Tappert 382/178 Mar,1988 |      Your vote accepted [0 after 0 votes] | | 4701960 Scott 382/122 Oct,1987 |      Your vote accepted [0 after 0 votes] | | 4644585 Crimmins 382/283 Feb,1987 |      Your vote accepted [0 after 0 votes] | | 4558461 Schlang 382/290 Dec,1985 |      Your vote accepted [0 after 0 votes] | | 4495644 Parks 382/123 Jan,1985 |      Your vote accepted [0 after 0 votes] | | 4481664 Linger 382/149 Nov,1984 |      Your vote accepted [0 after 0 votes] | | 4400828 Pirz 704/238 Aug,1983 |      Your vote accepted [0 after 0 votes] | | 4326190 Borland 382/197 Apr,1982 |      Your vote accepted [0 after 0 votes] | | 4155072 Kawa 382/186 May,1979 |      Your vote accepted [0 after 0 votes] | | 4010445 Hoshino 382/231 Mar,1977 |      Your vote accepted [0 after 0 votes] | | 4477926 Linger 382/149 Dec,1969 |      Your vote accepted [0 after 0 votes] | | |
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
Claims  |
|
|
We claim:
1. A method performed by a computer for comparing at least two image sections having a plurality of image signals, each image section representing a token, to identify similar tokens,
each token representing a unit of semantic understanding comprising the steps of:
(a) rasterizing, using source image derivation system, a document to produce an image section representing a token;
(b) storing image signals of an image section representing a first token in a first model memory;
(c) dilating the image signals representing the first token to produce a dilated representation of the first token and storing the detailed representation of the first token in first image memory;
(c) storing image signals of an image section representing a second token in second model memory;
(e) dilating the image signals of an image section representing the second token to produce a dilated representation of the second token and storing the dilated representation of the second token in a second image memory;
(f) comparing the image signals stored in the first model memory with the images signals stored in the second image memory to determine a first similarity metric;
(g) comparing the image signals store in the second model memory with the image signals stored in the first image memory to determine a second similarity metric; and
(h) indicating whether the first token is similar to the second token in response to the first and second similarity metrics.
2. The method of claim 1, wherein the steps of producing a dilated representation of the token comprises the steps of:
(a) copying the image signals represented in the model memory into the image memory; and
(b) dilating the image signals in the image memory by a predetermined dilation radius so as to produce a dilated representation of the image signals therein.
3. The method of claim 2, wherein the step of dilating the image signals further comprises the step of using a predetermined dilation radius not greater than 1.4 to produce the dilated representation of the image signals.
4. The method of claim 1, wherein the step of comparing the image signals stored in the model memory with the image signals stored in the image memory to determine a similarity metric includes the steps of:
(a) counting the total number of black image signals in the model memory;
(b) logically ANDing the image section stored in the model memory with the image section stored in the image memory;
(c) counting the number of black image signals which are present in the logically ANDed image sections to determine the number of matching black signals; and
(d) dividing the number of matching black signals by the total number of black image signals to determine the similarity metric.
5. The method of claim 1, wherein the step of indicating whether the first token is similar to the second token includes the step of identifying the first and second tokens as being similar whenever both the first similarity metric and the
second similarity metric are greater than a predefined threshold.
6. The method of claim 1, further comprising the step of extracting image signals representing the first token from rasterized data defining an image.
7. The method of claim 6, further comprising the step of extracting image signals representing the second token from rasterized data defining the image.
8. The method of claim 6, wherein the step of extracting image signals representing the first token from rasterized data defining the image comprises the steps of:
(a) finding connected components within the image;
(b) identifying boundaries about each group of connected components within the image;
(c) locating text rows using the boundaries identified in step (b); and
(d) combining adjacent groups of connected components within the text rows located in step (c), based upon a relationship between the boundaries of adjacent groups, so as to segment the image by word tokens.
9. The method of claim 1, wherein the image signals representing a first token are identified as falling within a first bounding box and the image signals representing a second token are identified as falling within a second bounding box,
further comprising the steps of:
(a) comparing a common dimension of the first and second bounding boxes; and
(b) determining that the first and second tokens may be similar if the common dimensions of the first and second bounding boxes are within a predefined tolerance, otherwise determining that the first and second tokens are dissimilar.
10. The method of claim 1, further comprising the step of extracting image signals representing the first token from rasterized data defining an image wherein the first token is defined as a string of symbols.
11. The method of claim 10, further comprising the step of extracting image signals representing the second token from rasterized data defining an image wherein the second token is defined as a string of symbols.
12. A method performed in a programmable computer for comparing at least two image sections, each image section consisting of a plurality of image signals, wherein a first image section represents a token from an unknown image object and a
second image section represents a token from a known image object stored in a dictionary of images and where a token represents a unit of semantic understanding, to identify the unknown token as matching the token previously stored in the dictionary of
images, comprising the steps of:
a) storing image signals of an image section representing an unknown token in a first model memory;
(b) dilating the image signals representing the unknown token to produce a dilated representation of the unknown token and storing the dilated representation of the unknown token in a first image memory;
(c) replicating image signals of an image section representing a known token stored in a dictionary of images to a second model memory;
(d) dilating the image signals representing the known token to produce a dilated representation of the second token and storing the dilated representation of the second token in a second image memory;
(e) comparing the image signals stored in the first model memory with the image signals stored in the first image memory to determine a second similarity metric;
(f) comparing the image signals stored in the second model memory with the image signals stored in the first image memory to determine a second similarity metric; and
(g) indicating whether the unknown token is similar to the known token in response to the first and second similarity metrics.
13. The method of claim 12, wherein the steps of producing a dilated representation of the token comprises the steps of:
(a) replicating the image signals represented in the model memory in the image memory; and
(b) dilating the image signals in the image memory by a predetermined dilation radius so as to produce a dilated representation of the image signals therein.
14. The method of claim 13, wherein the step of dilating the image signals further comprises the step of using a predetermined dilation radius not greater than 1.4 to produce the dilated representation of the image signals.
15. The method of claim 12, wherein the step of comparing the image signals stored in the model memory with the image signals stored in the image memory to determine a similarity metric includes the steps of:
(a) counting the total number of black image signals in the model memory;
(b) logically ANDing the image section stored in the model memory with the image section stored in the image memory;
(c) counting the number of black image signals which are present in the logically ANDed image section to determine the number of matching black signals; and
(d) dividing the number of matching black signals by the total number of black image signals to determine the similarity metric.
16. The method of claim 12, wherein the step of indicating whether the unknown token is similar to the known token includes the step of identifying the unknown and known tokens as being similar whenever both the first similarity metric and the
second similarity metric are greater than a predefined threshold.
17. The method of claim 12, further comprising the step of extracting image signals representing the unknown token from rasterized data defining the image.
18. The method of claim 17, wherein the step of extracting image signals representing the unknown token from rasterized data defining the image comprises the steps of:
(a) finding connected components within the image;
(b) identifying boundaries about each group of connected components within the image;
(c) locating text rows using the boundaries identified in step (b); and
(d) combining adjacent groups of connected components within the text rows located in step (c), based upon a relationship between the boundaries of adjacent groups, so as to segment the image by tokens.
19. The method of claim 12, wherein the image signals representing an unknown token are identified as falling within a first bounding box and the image signals representing a known token are identified as falling within a second bounding box,
further comprising the steps of:
(a) comparing a common dimension of the first and second bounding boxes; and
(b) determining that the tokens may be similar if the common dimensions of the first and second bounding boxes are within a predefined tolerance, otherwise determining that the first and known tokens are dissimilar.
20. The method of claim 12, further comprising the step of extracting image signals representing the unknown token from rasterized data defining an image, wherein the first token is defined as a string of symbols within the image.
21. The method of claim 20, further comprising the step of rasterizing image signals representing the known token as a function of a known string of symbols that are to be represented by the known token. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
This invention relates to a method of comparing image tokens forming words, a plurality of connected components, or similar units of semantic understanding represented within an array of image data by shape,
without a requirement for individually detecting and/or identifying the characters, symbols, glyphs, or components making up the tokens.
CROSS REFERENCE
The following related applications are hereby incorporated by reference for their teachings:
"Optical Word Recognition by Examination of Word Shape," Huttenlocher et al., Ser. No. 07/796,119, filed Nov. 19, 1991, now abandoned;
"Method for Comparing Word Shapes," Huttenlocher et al., Ser. No. 07/795,169, filed Nov. 19, 1991 now abandoned;
"Method of Determining Boundaries of Words in Text," Huttenlocher et al., Ser. No. 07/794,392, file Nov. 19, 1991, now U.S. Pat. No. 5,321,770;
"A Method of Deriving Wordshapes for Subsequent Comparison," Huttenlocher et al., Ser. No. 07/794,391, filed Nov. 19, 1991 , now abandoned;
"Methods and Apparatus for Automatic Modification of Semantically Significant Portions of a Document Without Document Image Decoding," Huttenlocher et al., Ser. No. 07/795,174, filed Nov. 19, 1991 , now U.S. Pat. No.5,384,863;
"Method and Apparatus for Summarizing a Document Without Document Image Decoding," Without et al., Ser. No. 07/794,543, filed Nov. 19, 1991 now abandoned;
"Method and Apparatus for Supplementing Significant Portions of a Document Selected Without Document Image Decoding With Retrieved Information" to Withgott et al., Ser. No. 07/795,419, filed Nov. 19, 1991 now abandoned; and
"Method for Identifying Word Bounding Boxes in Text," Huttenlocher et al., Ser. No. 08/169,949, field Feb. 2, 1993, now U.S. Pat. No. 5,410,611.
COPYRIGHT NOTIFICATION
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owners have no objection to the facsimile reproduction, by anyone, of the patent document or the patent disclosure, as
it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
MICROFICHE APPENDIX
An appendix comprising 3 microfiche having a total of 260 frames thereon is included as part of this application.
BACKGROUND OF THE INVENTION
Text in electronically encoded documents (electronic documents) tends to be found in either of two formats, each distinct from the other. In a first format, the text may be in a bitmap format, in which text is defined only in terms of an array
of image data or pixels, essentially indistinguishable from adjacent images which are similarly represented. In this format, text is generally incapable of being subjected to processing by a computer based on textual content alone and must be segmented
into image units for processing as described, for example, in the copending application for "Methods and Apparatus for Automatic Modification of Semantically Significant Portions of a Document Without Document Image Decoding," Huttenlocher et al., Ser.
No. 07/795,174, filed Nov. 19, 1991. In a second format, hereinafter referred to as a character code format, the text is represented as a string of character codes (e.g. ASCII code). In the character code format, the image or bitmap of the text is not
available.
Conversion from bitmap to character code format using an optical character recognition (OCR) process carries a significant cost in terms of time and processing effort. Each bitmap of a character must be distinguished from its neighbors, its
appearance analyzed, and in a decision making process, identified as a distinct character in a predetermined set of characters. As examples of OCR techniques, U.S. Pat. No. 4,864,628 to Scott discloses a method for reading data which circumnavigates a
character image. U.S. Pat. No. 4,326,190 to Borland et al. teaches a character feature detection system for reading alphanumeric characters. In addition, U.S. Pat. No. 4,956,869 to Miyatake et al. suggests a more efficient method for tracing
contour lines to prepare contour coordinates of a figure within an image consisting of a plurality of lines.
When the electronic document has been derived by scanning an original, however, image quality and noise in its reproduction contribute to uncertainty in the actual appearance of the bitmap. A degraded bitmap appearance may be caused by an
original document of poor quality, by scanning error, or by similar factors affecting the digitized representation of the image. Therefore, the decision process employed in identifying a character has an inherent uncertainty about it. A particular
problem in this regard is the tendency of characters in text to blur, or merge. Most character identifying processes commence with an assumption that a character is an independent set of connected pixels. When this assumption fails, due to the quality
of the input image, character identification also fails.
The following patents illustrate particularly relevant approaches to improving character detection. U.S. Pat. No. 4,926,490 to Mano discloses a method and apparatus for recognizing skewed characters on a document. A rectangle is created
around each character image, oriented with the detection orientation rather than the image orientation, and position data for each rectangle is stored in a table. The rectangle is created by detecting a character's outline. U.S. Pat. No. 4,558,461 to
Schlang discloses a text line bounding system wherein skewed text is adjusted by analyzing vertical patches of a document. After the skew has been determined, each text line is bounded by determining a top, bottom, left, and right boundary of the text
line. U.S. Pat. No. 3,295,105 to Gray et al. discloses a scan controller for normalizing a character in a character recognition apparatus wherein a character is analyzed by determining certain character characteristics including top, bottom, right and
left character boundaries. U.S. Pat. No. 4,918,740 to Ross discloses a processing means for use in an optical character recognition system wherein sub-line information is used to analyze a character and identify it. U.S. Pat. No. 4,949,392 to
Barski et al. discloses a document recognition system which recognizes an unknown document form by comparison against a library of templates, thus allowing for the intelligent association of text characters in certain locations of the unknown document to
aid in the recognition thereof. U.S. Pat. No. 5,142,589 to Lougheed et al. discloses a system for repairing digital images of broken characters which first dilates the character strokes to fill small gaps therein and then erodes the image to conform
to the original strokes, thereby producing recognizable characters before separation into individual digits for recognition. U.S. Pat. No. 5,214,719 to Budd et al. teaches a character recognition system and method for teaching and recognizing
characters. The method obtains an image, identifies a character, samples the character, and then does a vector correlation of the sample points to stored points of known characters to recognize the character.
OCR methods have sought to segment images in various fashions. For example, U.S. Pat. No. 4,558,461 to Schlang suggests a text line bounding system for nonmechanically adjusting for skewed text in scanned text. The skew angle of the text is
then established, following which the text lines are statistically bounded. The actual text data is then rotated according to the orientation established for conventional processing. U.S. Pat. No. 4,809,344 to Peppers et al. teaches preprocessing of
character recognition so as to obtain data necessary for character recognition. Page segmentation is performed by simultaneously extracting a plurality of features, separation between lines, separation between characters, and separation between the
lines and the characters are simultaneously performed, and a calculation time for no | | |