or
Bookmark and Share
Speech and speaker recognition using factor analysis to model covariance structure of mixture components
   
Document Number
US Patent 5946656
Issued Date
August 31, 1999
Link
Inventors
Map
Abstract
Hidden Markov models (HMMs) rely on high-dimensional feature vectors to summarize the short-time properties of speech correlations between features that can arise when the speech signal is non-stationary or corrupted by noise. These correlations are modeled using factor analysis, a statistical method for dimensionality reduction. Factor analysis is used to model acoustic correlation in automatic speech recognition by introducing a small number of parameters to model the covariance structure of a speech signal. The parameters are estimated by an Expectation Maximization (EM) technique that can be embedded in the training procedures for the HMMs, and then further adjusted using Minimum Classification Error (MCE) training, which demonstrates better discrimination and produces more accurate recognition models.
Drawing
Speech and speaker recognition using factor analysis to model covariance structure of mixture components - US Patent 5946656 Drawing
Drawing from US Patent 5946656
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
28
Comments:
no comments yet
Owner
AT & T Corp. (Middletown, NJ)
Published
August 31, 1999
Application Number
08/971,838
Filed
November 17, 1997
US Classification
704/256.2   704/240
Int'l Classification
G10L   15/00   (20060101)   G01L   9/16   (20060101)  
Assistant Examiner
USPTO Field of Search
704/236   704/240   704/256  
Related Patents
6148284 - Method and apparatus for automatic speech recognition using Markov processes on curves - Owned by AT&T Corporation (New York, NY)

A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

6691090 - Speech recognition system including dimensionality reduction of baseband frequency signals - Owned by Nokia Mobile Phones Limited (Espoo,FI)

A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.

6301561 - Automatic speech recognition using multi-dimensional curve-linear representations - Owned by AT&T Corporation (New York, NY)

A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

6401064 - Automatic speech recognition using segmented curves of individual speech components having arc lengths generated along space-time trajectories - Owned by AT&T Corp. (New York, NY)

A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

7295978 - Systems and methods for using one-dimensional gaussian distributions to model speech - Owned by Verizon Corporate Services Group Inc. (New York, NY) BBN Technologies Corp. (Cambridge, MA)

A system for recognizing speech receives an input speech vector and identifies a Gaussian distribution. The system determines an address from the input speech vector (610) and uses the address to retrieve a distance value for the Gaussian distribution from a table (620). The system then determines the probability of the Gaussian distribution using the distance value (630) and recognizes the input speech vector based on the determined probability (640).

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us