A speech recognition method for detecting and recognizing one or more keywords in a continuous audio signal is disclosed. Each keyword is represented by a keyword template representing one or more target patterns, and each target pattern comprises statistics of each of at least one spectrum selected from plural short-term spectra generated according to a predetermined system for processing of the incoming audio. The spectra are processed by a frequency equalization and normalizing method to enhance the separation between the spectral pattern classes during later analysis. The processed audio spectra are grouped into spectral patterns, are transformed to reduce dimensionality of the patterns, and are compared by means of likelihood statistics with the target patterns of the keyword templates. A concatenation technique employing a loosely set detection threshold makes it very unlikely that a correct pattern will be rejected.
An electron gun of a flat plate type image display apparatus for use in a field of an image information apparatus is described. A rear electrode part of the electron guns is formed by a flat plate type rear electrode which has a conductive film on its surface, and is arranged with a constant distance from the plural line electrodes. Plural spacers are disposed between the plural line electrodes (2), with one end of each fixed on the rear electrode and conductive films formed on the surfaces. The fabrication of the electron gun is simplified and furthermore, electric field is stabilized by prevention of generation of electric charge and generation of unevenness of luminance on a surface of an anode.
According to this invention, in image data A and B which are respectively constituted by pixel groups .SIGMA.A.sub.ij and .SIGMA.B.sub.ij consisting of N pixels A.sub.ij and B.sub.ij (N is a positive integer; i and j respectively indicate a row position and a column position), and in which density data A.sub.dij and B.sub.dij of the pixels are expressed by n(2m)-bit data (n and m are positive integers), designated 2m-bit portion a.sub.ij of each density data A.sub.dij of image data A is divided into upper m-bit portion a.sub.uij and lower m-bit portion a.sub.Lij. A histogram processor calculates .SIGMA.a.sub.ij .multidot.b.sub.ij between 2m-bit portions a.sub.ij and b.sub.ij using a histogram obtained by the upper or lower m-bit portion a.sub.uij or a.sub.Lij and designated 2m-bit portion b.sub.ij of each density data B.sub.dij of pixel group .SIGMA.B.sub.ij. An average can also be calculated by a calculation section using the histogram. Therefore, a covariance can be calculated at high speed.
A speech recognizer, for recognizing unknown utterances in isolated-word small-vocabulary speech has improved rejection of out of vocabulary utterances. Both a usual spectral representation including a dynamic component and an equalized representation are used to match unknown utterances to templates for in-vocabulary words. In a preferred embodiment, the representations are mel-based cepstral with dynamic components being signed vector differences between pairs of primary cepstra. The equalized representation being the signed difference of each cepstral coefficient less an average value of the coefficients. Factors are generated from the ordered lists of templates to determine the probability of the top choice being a correct acceptance, with different methods being applied when the usual and equalized representations yield a different match. For additional enhancement, the rejection method may use templates corresponding to non-vocabulary utterances or decoys. If the top choice corresponds to a decoy, the input is rejected.
A method of speech analysis calculates one or more difference parameters for each of a sequence of acoustic frames, where each difference parameter is a function of the difference between an acoustic parameter in one frame and an acoustic parameter in a nearby frame. The method is used in speech recognition which compares the difference parameters of each frame against acoustic models representing speech units, where each speech-unit model has a model of the difference parameters associated with the frames of its speech unit. The difference parameters can be slope parameters or energy difference parameters. Slope parameters are derived by finding the difference between the energy of a given spectral parameter of a given frame and the energy, in a nearby frame, of a spectral parameter associated with a different frequency band. The resulting parameter indicates the extent to which the frequency of energy in the part of the spectrum represented by the given parameter is going up or going down. Energy difference parameters are calculated as a function of the difference between a given spectral parameter in one frame and a spectral parameter in a nearby frame representing the same frequency band. In one embodiment of the invention, dynamic programming compares the difference parameters of a sequence of frames to be recognized against a sequence of dynamic programming elements associated with each of a plurality of speech-unit models. In another embodiment of the invention, each speech-unit model represents one phoneme, and the speech-unit models for a plurality of phonemes are compared against individual frames, to associate with each such frame the one or more phonemes whose models compare most closely with it.
In a feature data processing apparatus, one of two designated reference density vectors di and dj, to which a feature vector x corresponding to a feature element is closer, is determined from dij of the equation: This value is calculated for all the combinations of two reference feature vectors di and dj selected from a reference feature vector group dk (k=0 to n-1), thereby obtaining classification data. The classification data is determined, using a logical formula or a reference table. An average value of the components is calculated from the classification data and the components of all the feature vectors. The above operation is repeated until the calculated average and the components of the reference density vector converge within a predetermined allowance, thereby precisely classifying feature data.