or
Bookmark and Share
Apparatus and methods for shift invariant speech recognition
 
   
Document Number
US Patent 5956671
Issued Date
September 21, 1999
Link
Map
Abstract
The present invention includes a method of generating a set of substantially shift invariant acoustic features from an input speech signal which comprises the steps of: splitting the input speech signal into a plurality of input speech signals; respectively delaying a majority of the input speech signals by a successively incrementing time interval; respectively extracting a plurality of sets of acoustic features from the plurality of input speech signals; summing the plurality of sets of acoustic features to form a set of summed acoustic features; and dividing the set of summed acoustic features by a number equivalent to the number of sets of acoustic features summed in the summing step thereby forming a set of averaged acoustic features which are substantially shift invariant. Further, the present invention may include a method for generating at least one substantially shift invariant speech recognition model from speech training data which comprises the steps of: inputting the speech training data a first time; extracting acoustic features from the speech training data input the first time; inputting the speech training data a plurality of times thereafter, each time respectively delaying the input speech training data by a successively incrementing time interval; respectively extracting acoustic features from each delayed speech training data input each time; and utilizing at least the acoustic features extracted in the extracting steps to form the at least one speech recognition model which is substantially shift invariant. Still further, the present invention may include a synchrosqueezing process in the feature extraction steps. Also, the invention contemplates implementing these processes individually, in combination with another of the processes, and a combination of all the processes.
Drawing
Apparatus and methods for shift invariant speech recognition - US Patent 5956671 Drawing
Drawing from US Patent 5956671
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
18
Comments:
no comments yet
Published
September 21, 1999
Application Number
08/868,860
Filed
June 4, 1997
US Classification
704/203   704/231
Int'l Classification
G10L   15/00   (20060101)   G10L   15/02   (20060101)  
Assistant Examiner
USPTO Field of Search
704/203   704/231  
Related Patents
6253175 - Wavelet-based energy binning cepstal features for automatic speech recognition - Owned by International Business Machines Corporation (Armonk, NY)

Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves "synchrosqueezing" spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as "K-mean Wastrum." In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as "formant-based wastrum." Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added. This method requires adequate formant tracking. The resulting robust formant extraction has a number of applications in speech processing and analysis including vocal tract normalization.

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us