or
Bookmark and Share
Method for guiding text-to-speech output timing using speech recognition markers
   
Document Number
US Patent 7010489
Issued Date
March 7, 2006
Link
Inventors
Lewis; James R. (Delray Beach, FL)
Wang; Huifang (Boynton Beach, FL)
Map
Abstract
A method for guiding text-to-speech output timing with speech recognition markers can include the following steps. First, tokens can be retrieved in a TTS system. The tokens can include words, phrase markers, punctuation marks and meta-tags. Second, phrase markers can be identified among the retrieved tokens. Third, words can be identified among the retrieved tokens. Fourth, the TTS system can TTS play back the identified words. Finally, during the TTS playback of the words, the TTS system can pause in response to the identification of the phrase markers.
Tags:
Description:
Amusing 0%
Clever 0%
Complex 0%
Efficient 0%
Historic 0%
Important 0%
Innovative 0%
Interesting 0%
Practical 0%
Simple 0%
Number of Claims:
20
Comments:
no comments yet
Published
March 7, 2006
Application Number
09/521,593
Filed
March 9, 2000
US Classification
704/260   704/258
Int'l Classification
G10L   13/08   (20060101)  
Attorney/Law Firm
USPTO Field of Search
704/270   704/260   704/258   704/256   704/257   704/255  
Related Patents
7263488 - Method and apparatus for identifying prosodic word boundaries - Owned by Microsoft Corporation (Redmond, WA)

A method and computer-readable medium are provided that identify prosodic word boundaries for a text. If the text is unsegmented, it is first segmented into lexical words. The lexical words are then converted into prosodic words using an annotated lexicon to divide large lexical words into smaller words and a model to combine the lexical words and/or the smaller words into larger prosodic words. The boundaries of the resulting prosodic words are used to set the prosody for the synthesized speech.

7127396 - Method and apparatus for speech synthesis without prosody modification - Owned by Microsoft Corporation (Redmond, WA)

A speech synthesizer is provided that concatenates stored samples of speech units without modifying the prosody of the samples. The present invention is able to achieve a high level of naturalness in synthesized speech with a carefully designed training speech corpus by storing samples based on the prosodic and phonetic context in which they occur. In particular, some embodiments of the present invention limit the training text to those sentences that will produce the most frequent sets of prosodic contexts for each speech unit. Further embodiments of the present invention also provide a multi-tier selection mechanism for selecting a set of samples that will produce the most natural sounding speech.

7496498 - Front-end architecture for a multi-lingual text-to-speech system - Owned by Microsoft Corporation (Redmond, WA)

A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.

Claims
Description
About| FAQs| Terms & Disclaimer| Link to Us| Contact Us