or
Bookmark and Share
Systems and methods for extracting meaning from multimodal inputs using finite-state devices
   
Document Number
US Patent 7069215
Issued Date
June 27, 2006
Link
Inventors
Map
Abstract
Finite-state systems and methods allow multiple input streams to be parsed and integrated by a single finite-state device. These systems and methods not only address multimodal recognition, but are also able to encode semantics and syntax into a single finite-state device. The finite-state device provides models for recognizing multimodal inputs, such as speech and gesture, and composes the meaning content from the various input streams into a single semantic representation. Compared to conventional multimodal recognition systems, finite-state systems and methods allow for compensation among the various input streams. Finite-state systems and methods allow one input stream to dynamically alter a recognition model used for another input stream, and can reduce the computational complexity of multidimensional multimodal parsing. Finite-state devices provide a well-understood probabilistic framework for combining the probability distributions associated with the various input streams and for selecting among competing multimodal interpretations.
Drawing
Systems and methods for extracting meaning from multimodal inputs using finite-state devices - US Patent 7069215 Drawing
Drawing from US Patent 7069215
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
53
Comments:
no comments yet
Owner
AT&T Corp. (New York, NY)
Published
June 27, 2006
Application Number
09/904,253
Filed
July 12, 2001
US Classification
704/255   382/187 382/190 382/198 382/228 704/231 704/251 704/256 704/270 704/275
Int'l Classification
G10L   15/28   (20060101)  
Examiner
USPTO Field of Search
704/270   704/271   704/272   704/273   704/274   704/275   704/231   704/251   704/256   704/243   704/9   704/257   704/255   704/260   382/187   382/190   382/198   382/228  
Related Patents
7295975 - Systems and methods for extracting meaning from multimodal inputs using finite-state devices - Owned by AT&T Corp. (New York, NY)

Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

7430324 - Method and apparatus for classifying and ranking interpretations for multimodal input fusion - Owned by Motorola, Inc. (Schaumburg, IL)

A method used in an electronic equipment (100) generates a set of joint multimodal interpretations (125) from a set of multimodal interpretations (115) generated by one or more modalities (105) during a turn, generates a set of integrated multimodal interpretations (135) including an integrated multimodal interpretation formed from each joint multimodal interpretation by unifying the type feature structure of each multimodal interpretation in the joint multimodal interpretation, and generates a multilevel confidence score for each integrated multimodal interpretation based on at least one of a context score, a content score, and a confidence score of the integrated multimodal interpretation. The method classifies the multimodal interpretations and generates a set of joint multimodal interpretations that comprises essentially all possible joint multimodal interpretations. The multilevel confidence scoring is based on up to eleven factors, which provides for an accurate ranking of the integrated multimodal interpretations.

7451125 - System and method for compiling rules created by machine learning program - Owned by AT&T Intellectual Property II, L.P. (New York, NY)

A system, a method, and a machine-readable medium are provided. A group of linear rules and associated weights are provided as a result of machine learning. Each one of the group of linear rules is partitioned into a respective one of a group of types of rules. A respective transducer for each of the linear rules is compiled. A combined finite state transducer is created from a union of the respective transducers compiled from the linear rules.

7587318 - Correlating video images of lip movements with audio signals to improve speech recognition - Owned by Broadcom Corporation (Irvine, CA)

A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us