A method and apparatus for classification of scanned symbols into equivalence classes as may be used for image data compression. The present invention performs run-length symbol extraction and classifies symbols based on both horizontal and vertical run length information. An equivalence class is represented by an exemplar. Feature-based classification criteria for matching an exemplar is defined by a corresponding exemplar template. The feature-based classification criteria all use quantities, which includes the stroke width of symbols, that can be readily computed from the run endpoints. Reducing the number of equivalence classes is achieved through a process called equivalence class consolidation. Equivalence class consolidation utilizes the symbol classifier to identify matched exemplars indicating equivalence classes which may be merged. For a consolidated equivalence class, the exemplar matching the most symbols is selected as the representative for the class.
Histograms are widely used to explore data, to present data, and to persuade with data. A method and system for identifying all possible histograms from a data sample using histogram appearances. The method and system includes determining all possible one dimensional histogram appearances for a transformed data sample using constant width intervals and for multidimensional appearances with data cell boundaries that are parallel as well as not parallel to data space axes together with multiple conditions on the interval widths for each dimension or transformed dimension.
A method and system for determining histograms and histogram appearances from small data samples. The method and system determine relevant histogram appearances (i.e., bin frequency lists) for uniform bin width sample histograms, exactly determine error minimizing histogram density estimators and determine histogram appearance reversals and mode inversions.
According to the present invention, the quality of a small character can be prevented from lowering during copy of an image in an original. An image in an original is read by a scanner, and a recognition unit performs detection of a character size and a character position as well as character recognition. A CPU reads a font from a dictionary in accordance with the recognized character recognized by the recognition unit, and an image is generated based on the character size and the character position detected by the recognition unit and a copy magnification set by an MMI.