A speech digitizer is disclosed including an analyzer for generating power and filter coefficient parameters representative of an analog speech waveform. The digitizer also includes a pitch detector for generating a digital pitch parameter substantially representing the fundamental periodicity of the waveform and including range restrictor means for restricting the pitch signal to a range of pitches within a predetermined tolerance if the average pitch of the periodicity signal is below a predetermined level. The pitch detector also includes means for determining the number of extreme maximum and minimum points within a predetermined range of an absolute magnitude difference function thereby generating a structure number signal representing a voiced event. The digitizer includes a voicing detector for generating a three-level voicing/unvoicing parameter representing whether the speech waveform is voiced or unvoiced.
A speech analyzer for extracting spectrum information and pitch information from natural speech wherein an accuracy of pitch extraction is enhanced by sampling pitch at a sampling frequency which is higher than a sampling frequency for analyzing the spectrum information.
Systems for the recognition of speech by computer have usually required t the speech be of a strictly standarized nature, free of such features as regional accent, and unaccompanied by background noise. The problem has been to produce a system providing information which will allow accurate recognition in the presence of noise, and with non-standard speech sounds. The invention provides to a computer in a speech recognition system information in parallel streams on a number of factors, viz, the existence, in any signal which may contain speech sounds, of a fundamental repetitive structure and its periodicity; the existence of a high frequency component having a wide frequency band; the existence of a component having energy relatively stable with respect to time and characteristic of background noise; the peak frequency, peak amplitude and band width of sounds lacking a low frequency component; and the frequency of resonant content of any component having a fundamental repetitive structure and the level of correlation.
An absolute magnitude difference function (AMDF) generator for a linear predictive coding (LPC) system including a high speed low pass filter, with the AMDF generator formed on a single semiconductor chip including a data bus, control bus, memory, and a plurality of arithmetic logic units (ALU) for performing a plurality of functions in a reduced number of steps.
Speech parameter signals may be coded to consist of fewer bits for data reduction in memory. In this system both coded and uncoded signals are stored with an associated control signal which controls selection of corresponding direct or decoder circuit coupling of memory output to a speech synthesizer.
This invention presents a voicing determination algorithm for classification of a speech signal segment as voiced or unvoiced. The algorithm is based on a normalized autocorrelation where the length of the window is proportional to the pitch period. The speech segment to be classified is further divided into a number of sub-segments, and the normalized autocorrelation is calculated for each sub-segment if a certain number of the normalized autocorrelation values is above a predetermined threshold, the speech segment is classified as voiced. To improve the performance of the voicing determination algorithm in unvoiced to voiced transients, the normalized autocorrelations of the last sub-segments are emphasized. The performance of the voicing decision algorithm can be enhanced by utilizing also the possible lookahead information.