Synthesized speech is produced by a vocal tract model corresponding functionally to the human vocal tract and constructed of a linear digital filter. The parameter of the synthesis vocal tract model are determined by an analysis operation on an original speech signal using an identical vocal tract model, which may be the same model as used for synthesis. The analysis vocal tract model has its parameters adjusted according to a comparison between the original speech signal and the output signal of the analysis model so as to minimize the deviation between these two signals. Those parameters for which the deviation falls below a threshold value are used directly as parameters of the synthesis vocal tract model. The adjustments of the parameters are determined by a parameter computer working on the results of the aforementioned comparison and which itself may contain the vocal tract model.
To produce a signal simulating the characteristics of the average human voice, a basic periodic waveform with generally sinusoidal sections separated by level sections is passed through a first filter for substantially equalizing its frequency components and is then shaped in a second filter whose transfer function approximates that of the vocal tract in a frequency band of 0 to 4 kHz. The basic waveform fed to the first filter may be modulated in amplitude and/or recurrence period by a pseudorandom signal from an ancillary generator.
A speech synthesis circuit implementable in an integrated circuit device, capable of converting frames of data into analog signals representative of human speech. The frames of data are comprised of digital representations of values of pitch, energy and certain filter coefficients, which are stored in non-volatile memory. The filter coefficients are utilized to control a linear predictive filter which is excited by voiced and unvoiced excitations stored in non-volatile memory. A control circuit coupling the excitation signals to the linear predictive filter allows the operator to select an external excitation signal rather than precalculated stored excitation signals. Thus, the synthesizer may be utilized in a vocoder application, wherein a residual excitation signal transmitted from an analysis circuit may be utilized as the excitation signal.
An elevator call entry system comprises microphones mounted on the floors or in the car, oral message recognizing units producing an oral message signal indicative of a voice calling for a call entry with statement of the floor of destination, or of a voice calling for its cancellation, when said voices are supplied to said microphones, voiceprint recognizing units operable to recognize voiceprints of the voice supplied to said microphone and to produce an output signal when the voiceprint of the voice calling for a call entry is coincident with that of the voice calling for its cancellation, and call effecting units responsive to the output of the oral message recognizing units to issue a command to enter a call to the floor of destination and operable upon reception of an output signal from the voiceprint recognizing units to issue a command to cancel the call entry.
An electronic teaching aid which enables a student viewing a visual display containing text material being read or studied to designate any words or portion of said text for immediate audible vocalization or alternatively to designate any word of said test for immediate visual display of the definition of said designated word.
A method of converting speech, in which reflection coefficients are calculated from a speech signal of a speaker. From these coefficients, characteristics of cross-sectional areas of cylinder portions of a lossless tube modelling the speaker's vocal tract are calculated. Sounds are identified from those characteristics of the speaker and provided with respective identifiers. Subsequently, differences between the stored characteristics representing at least one sound and respective characteristics representing the same at least one sound are calculated, a second speaker's speaker-specific characteristics modelling that speaker's vocal tract for the same at least one sound are searched for in a memory on the basis of the identifier of the respective identified sound, a sum is formed by summing the differences and the second speaker's speaker-specific characteristics modelling that second speaker's vocal tract for the respective same sound, new reflection coefficients are calculated (614) from that sum, and a new speech signal is produced from the new reflection coefficients.