In an articulated phonetic sound wherein the information bearing group of format resonances varies within different frequency regions in the sound spectrum, such as a phonome by diffferently-pitched speakers the filter-separated signals derived from said group of resonances are regrouped (shifted) in prearranged combinations sequentially until a reference regrouping is established for neutralizing (normalizing) the undesired effect of said variations for adaptation to standard analysis. A specific group of signals from said regrouped signals are then selected and their amplitude ratios one with respect to another are further matched with standard groups of amplitude-ratio measuring means for obtaining a null output representing a good amplitude match.
Identification of complex signals such as a phonetic sound in speech is accomplished by the combination of ratio values both between the amplitudes and frequencies of the resonances in the complex sound wave. The amplitude ratio between the signals detected from two resonances (e.g. formants) is derived as a null signal when they match a pair of signal-gain preadjustments during a given time period. This null-signal is also obtained when both input signals are absent during that time period. In order to avoid false null-signal indication, a sensing signal is derived from said input signals to indicate that the null-signal is not false.
4039754 - Speech analyzer - Owned by The United States of America as represented by the Administrator of the (Washington, DC)
A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform. A first level of the first pulse train is derived while the first formant signal has an amplitude lying in even numbered ones of the bands; a second level is derived while the first formant signal has an amplitude lying in odd number ones of the band. The pulse trains representing the first and third formant signals are normalized relative to the second formant pulse train. Normalization is attained in each instance by counting the number of pulses in the first and third pulse trains over the interval required for the pulses in the second train to reach a predetermined number. The resulting normalized pulse trains are supplied to a memory to identify a phoneme in the speech signal or are transmitted as narrow band width signals.
An apparatus and method are provided for the recognition of speech produced by various vocal pitches capable of recognition and classification of speech articulation at a real time rate. The apparatus and method assume that articulation of a given sound in an individual's speech can be approximated as the output of a specific linear filter, corresponding to the condition of the individual's vocal tract at the time of articulation, in response to an input of one or more source impulses. The invention selects one of a library of sounds, in response to a speech waveform input, by means of a bank of vocal tract inverse filters, each of which is connected to the speech waveform input. Each vocal tract inverse filter has a complex Fourier transfer function that is the reciprocal of a particular vocal tract transfer function corresponding to a specific speech sound. Thus there is one vocal tract inverse filter for each speech sound as spoken by the particular individual. The assumed vocal tract filters are thus effectively in cascade with the vocal tract inverse filters of the invention so as to form an all-pass filter which can derive the original source waveform.
A signal encoder and classifier particularly adapted to speech recognition includes a buffer which is independently addressed by a new data writing address system and a buffered data reading system so that writing and reading of data may be accomplished on a time shared basis. This time shared operation permits serial writing and reading of the pattern data without interrupting income signal storage. The reading data address system utilizes stored addresses identifying the beginning and end of the signal patterns for addressing sequential patterns from the buffer.
A simplified, speaker independent, selected vocabulary, word recognizing microcomputer functions without the use of a typical front end filtering network. The microcomputer identifies vowel-like fricative-like, and silence signal states within a word or phrase by counting speech pattern zero crossings during sequential time periods. Variable zero crossing count thresholds are used to identity states based upon previously identified states, and histeresis is provided, through the use of state time measurement, to prevent state oscillations which would result in erroneous state sequences. The microcomputer, by monitoring zero crossings, defines words as a sequence of vowel-like, fricative-like, and silence states. By limiting the recognizable vocabulary to words which have dissimilar sequences, the incoming speech pattern may be recognized by comparison with state templates defining the limited vocabulary stored in the microcomputer's memory.