A multi-channel multi-genre character recognition discriminator is disclosed which performs the decision making process between strings of characters coming from a multi-channel (i.e., three or more channels) alpha-numeric output optical character reader (OCR) system for use in such applications as, for example, text processing and mail processing. The multi-channel output OCR uses separate recognition processes for each genre or character set indicative of a distinct group with respect to style (i.e., font) or form, and attempts to recognize each character independently as belonging to each respective genre. For example, in a three channel output OCR for reading mixed numeric, English and Russian Cyrillic character sets, the English alphabetic interpretation of a scanned word is outputted as an English alphabetic subfield on a first OCR output line, the Cyrillic interpretation of the scanned word is outputted as a Cyrillic subfield on a second OCR output line, and numeric interpretation of the scanned word is outputted as a numeric subfield on a third OCR output line. A multi-channel multi-genre character recognition discriminator analyzes these three subfield character streams by calculating a first conditional probability that given the OCR has scanned and recognized an English alphabetic character E.sub.i, the probability that numeric N.sub.K and Cyrillic C.sub.J characters were respectively misrecognized by their recognition channels; a second conditional probability that given the OCR has scanned and recogized a Cyrillic character C.sub.J the probability that numeric N.sub.K and English E.sub.i characters were respectively misrecognized by their recognition channels; and a third conditional probability that given the OCR scanned and recognized a numeric character N.sub.K, the probability that English E.sub.i and Cyrillic C.sub.J characters were respectively misrecognized by their recognition channels. These conditional probabilities are developed character by character for each character within a string thereof or a word. A first product of all the first type conditional probabilities is calculated for all of the characters in a word (which may, of course, contain only a single character); similarly second and third products are calculated for the second and third conditional probabilities, respectively. The magnitudes of the products of these conditional probabilities are then compared in an N-channel comparator, and the highest probability subfield is selected as the most probable interpretation of the word scanned by the OCR.
In order to classify fingerprint images with a high precision by integrating classification results and their merits of different classification means making use of their probability data, a fingerprint image classification system of the invention includes: a plurality of classification units (12 and 15), each of the plurality of classification units (12 and 15) generating an individual probability data set (17 or 18) indicating each probability of a fingerprint image (16) to be classified into each of categories; a probability estimation unit (13) for estimating an integrated probability data set (19) from every of the individual probability data set (17 and 18); and a category decision unit (14) for outputting a classification result of the fingerprint image (16) according to the integrated probability data set (19).
A logic state analyzer stores into a data acquisition memory only state data meeting preselected qualification state criteria chosen to weed out state data not of interest among the totality of states occuring within a collection of digital signals. The data acquisition memory retains only the last m-many states stored therein. A selectable integer k, o.ltoreq.k.ltoreq.m, determines how many additional storage operations are performed for qualified state data following the detection of a preselected trigger condition. The actual number of states occurring in the collection of digital signals after the trigger condition but before the storage of the kth qualified data state can be many times the value of k. Qualifying the state data prior to storage allows a modest size data acquisition memory to do the work of a much larger memory and spares the user the task of sorting through much state data known not to be of interest. The preselected qualification criteria may include don't-cares in the definition of the qualification state, as well as the logical OR'ing of a plurality of such qualification states.
The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.
An automatic language-determining apparatus automatically determines the particular European language of the text image of a document when the gross-script-type is known to be, or is determined to be, an European script-type. A word token generating means generates word tokens from the text image. A feature determining means determines the frequency of appearance of word tokens of the text portion which correspond to predetermined word tokens. A language determining means converts the determined frequency of appearance rates to a point in a new coordinate space, then determines which predetermined region of the new coordinate space the point is closes to, to determine the language of the text portion.
A selection agent within a symbol determination system receives input character codes (ICCs) each with an associated confidence factor (CF) from a plurality of OCR processors. The selection agent selects the mathematically most probable character (MPC) from among the ICCs based on the relative values of the joint confidence factor (JCF) in accordance with the relationship: for n like ICCs from N OCR processors.