Circumnavigation using an increased test cell domain may be used to collect data from discontinuous characters stored in a bit map such as those printed by a dot matrix printer. The test cell domain is increased to locate a pixel and consider it a portion of the same character if it is within the minimum permissable pixel gap range. The sequence of testing must be designed to read each possible pixel and not enter an endless loop during the circumnavigation. One sequence of testing is to begin testing those cells in the column adjacent to the reference cell beginning of the cell in the row adjacent to the reference cell and testing cells adjacent to the most recently tested cell along the entire column until the desired number of cells in that column according to the minimum permissable pixel gap range have been tested. Each column adjacent to the most recently tested is tested in the described sequence until the desired number of columns have been tested to permit the minimum permissable gap spacing. The last set of cells tested are in the same row as the reference cell beginning at the cell adjacent to the reference cell and testing each cell adjacent to the most recently read cell. This is a method by which discontinuous text such as dot matrix print or poor quality print may be recognized. An apparatus for carrying out the method of this invention includes an optical scanner, memory, template reference characters and an output device to provide the ASCII code of the recognized sample character.
An output apparatus which includes a font information memory for outputting a pattern on the basis of information from a host computer, a selector for designating an attribute of fonts, and a dot printer for outputting a table of the fonts having the attribute designated by the selector.
A pattern generator comprising a memory for storing a plurality of regular character and sign patterns, an input device for receiving as image data a character and sign pattern in a document, a pattern recognition unit for recognizing the character and sign pattern in the image data that the input device has received, a converter for converting the character and sign pattern recognized by the recognition into to the corresponding character and sign code, and an output device for reading from the memory a character and sign pattern corresponding to the character and sign code generated by the converter and outputting the read pattern to a recording medium.
A threshold free algorithm is used to extract text in a region which has been circled with any hand drawn shape of any size that consitutes a closed curve. Use of this technique allows an operator to select intensity regions in text material in a paper-based document and automate the extraction of the enclosed text in the digitized image of the document.
Optical character recognition is achieved by a system which comprises a scanner for scanning a document, an edge extractor for identifying edges in the image produced by the scanner to produce an outline of each object identified in the image, a segmentation facility for grouping the object outlines into blocks, means for identifying features of the outlines, and a final classification stage for providing data in an appropriate format representative of the characters in the image. Also disclosed are a novel edge extractor, a novel page segmentation facility and a novel feature extraction facility.
This invention features a new computer system architecture comprising a variable comparison of input data with programmable Template data. The system has the capability to provide an "almost" condition, which is a similarity match between a near or close actual data and the "exact" data stored in template memory. Different degree levels of "almost" are possible with the inventive system.