WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation    
United States Patent5652828   
Link to this pagehttp://www.wikipatents.com/5652828.html
Inventor(s)Silverman; Kim Ernest Alexander (Danbury, CT)
AbstractImproved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5652828
Automated voice synthesis employing enhanced prosodic treatment of text,
     spelling of text and rate of annunciation - US Patent 5652828 Drawing
Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation
Inventor     Silverman; Kim Ernest Alexander (Danbury, CT)
Owner/Assignee     NYNEX Science & Technology, Inc. (White Plains, NY)
Patent assignment
All assignments
Publication Date     July 29, 1997
Application Number     08/641,480
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 1, 1996
US Classification     704/260 704/258 704/266 704/267
Int'l Classification    
Examiner     Hafix; Tariq R.
Assistant Examiner    
Attorney/Law Firm     Straub; Michael P. Michaelson & Wallace Michaelson; Peter L.
Address
Parent Case     RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 08/460,030 filed Jun. 2, 1995 which is a continuation of U.S. patent application Ser. No. 08/033,528 now abandoned filed Mar. 19, 1993 both of which are titled "IMPROVED AUTOMATED VOICE SYNTHESIS EMPLOYING ENHANCED PROSODIC TREATMENT OF TEXT, SPELLING OF TEXT AND RATE OF ANNUNCIATION".
Priority Data    
USPTO Field of Search     395/2.1 395/2.67 395/2.69 395/2.76
Patent Tags     automated voice synthesis employing enhanced prosodic treatment text, spelling text rate annunciation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5384893
Hutchins

Jan,1995

[0 after 0 votes]
5212731
Zimmermann

May,1993

[0 after 0 votes]
5040218
Vitale et al.

Aug,1991

[0 after 0 votes]
4979216
Malsheen et al.

Dec,1990

[0 after 0 votes]
4964167
Kunizawa et al.

Oct,1990

[0 after 0 votes]
4908867
Silverman

Mar,1990

[0 after 0 votes]
4907279
Higuchi et al.

Mar,1990

[0 after 0 votes]
4896359
Yamamoto

Jan,1990

[0 after 0 votes]
4831654
Dick

May,1989

[0 after 0 votes]
4829580
Church

May,1989

[0 after 0 votes]
4783811
Fisher et al.

Nov,1988

[0 after 0 votes]
4783810
Kroon

Nov,1988

[0 after 0 votes]
4695962
Goudie

Sep,1987

[0 after 0 votes]
4692941
Jack et al.

Sep,1987

[0 after 0 votes]
4685135
Lin et al.

Aug,1987

[0 after 0 votes]
4689817
Kroon

Aug,1987

[0 after 0 votes]
4470150
Ostrowski

Sep,1984

[0 after 0 votes]
3704345
Coker et al.

Nov,1972

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of synthesizing human audible speech from restricted text having a predetermined information content and predetermined format characteristics, the method comprising the steps of:

generating prosody indica for the restricted text as a function of the predetermined information content and predetermined format characteristics by performing the steps of:

a) identifying major prosodic groupings within the restricted text by utilizing major demarcation features which are a function of the predetermined format characteristics to define the beginning and end of the major prosodic groupings;

b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the restricted text as a function of the predetermined information content for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;

c) identifying within the prosodic subgroupings prosodically separable subgroup components;

d) generating prosodic indica which include salience signifiers, the salience signifiers controlling the salience of segments of the synthesized speech, the step of generating the prosodic indica including the steps of:

(i) generating salience signifiers within the prosodic subgroupings in accordance with predetermined salience placement rules relating to the components of the subgroupings themselves;

(ii) modifying the salience at the beginning and end of each prosodic subgroup; and

(iii) modifying the salience at the beginning and end of each major prosodic grouping; and

generating and outputting audible speech from the restricted text and prosodic indica.

2. The method of claim 1,

wherein the predetermined information content includes a carrier phrase including word strings that have a structuring purpose and information words;

wherein the step of identifying major prosodic groupings includes the step of identifying the carrier phrase.

3. The method of claim 2, wherein the information words include names with prefixed titles and wherein the method further comprises the steps of:

increasing a speaking rate of the word strings that have a structuring purpose relative to a speaking rate of the information words.

4. The method of claim 3, wherein the information words include names which include prefixed titles followed by a word of the name, the method further comprising the step of:

modifying the generated salience indicators to assign less salience to the prefixed title than the word following the prefixed title.

5. The method of claim 4, wherein a first time speech is generated from a word it is assigned greater salience then when speech is subsequently generated from the same word.

6. The method of claim 5, further comprising the steps of:

repeatedly outputting the audible speech corresponding to a first segment of text;

decreasing a rate of annunciation of the first segment of text after a first number of successive repeats of the audible speech corresponding to the first segment of text.

7. The method of claim 6,

wherein the step of modifying the salience at the beginning and end of each prosodic subgroup includes the steps of:

modifying the generated salience signifiers to increase the salience at the beginning of each prosodic subgroup; and

modifying the generated salience signifiers to decrease the salience at the end of each prosodic subgroup; and

wherein the step of modifying the salience at the beginning and end of each major prosodic grouping includes the steps of:

modifying the generated salience signifiers to increase the salience at the beginning of each major prosodic grouping; and

modifying the generated salience signifiers to decrease the salience at the end of each prosodic subgroup.

8. The method of claim 6, wherein each word of a name includes a plurality of letters, the method further comprising the steps of:

arranging the letters of a word of a name into groups; and

generating indica of prosodic boundaries between the groups of letters to insert a slight pause between the groups of letters when audible speech is generated therefrom.

9. The method of claim 8, further comprising the step of:

generating audible speech representing the spelling of the name following the generation of audible speech from the groups of letters.

10. The method of claim 9, further comprising the steps of:

allowing users to obtain repeats of audible speech segments generated from text segments;

changing the rate of annunciation of a first audible speech segment after a first number of successive repeats of the first audible speech segment for the first user;

decreasing the rate of annunciation of a second audible speech segment generated from a second text segment for the first user after the first number of successive repeats of the first audible speech segment; and

increasing the rate of annunciation for a third audible speech segment generated from a third text segment if the first user does not obtain repeats of the second audible speech segment.

11. The method of claim 10, further comprising the step of:

adjusting the initial annunciation rate for subsequent users as a function of the number of consecutive prior users for whom the rate of annunciation has been altered.

12. The method of claim 1, wherein the step identifying within the prosodic subgroupings prosodically separable subgroup components includes the steps of:

a) identifying predetermined textual indicators which mark divisions of text groupings around them;

b) utilizing the predetermined textual indicators to separate the text within the prosodic subgrouping into units of nominal text which do not include said predetermined textual indicators; and

c) identifying within the units of nominal text other indicators of textual groupings that are not predetermined textual indicators.

13. The method of claim 12, further comprising the steps of:

repeatedly outputting the audible speech corresponding to a first segment of text;

decreasing a rate of annunciation of the first segment of text after a first number of successive repeats of the audible speech corresponding to the first segment of text.

14. The method of claim 13,

wherein the prosodic indica are generated by a set of prosody rules with predetermined discourse constraints which are a function of the context of the synthesis of the restricted text; and

wherein the restricted text includes name and address information.

15. The method of claim 14,

wherein the a major prosodic grouping is a sentence, a prosodic subgrouping is a name including a plurality of words, and a subgroup component is a word in a name.

16. The method of claim 15, wherein the salience signifiers are indica of pitch.

17. The method of claim 16, further comprising the step of:

arranging letters of a name into groups;

generating indica of prosodic boundaries between the groups of letters.

18. The method of claim 17, wherein the generated indica of prosodic boundaries between groups of letters results in the insertion of a slight pause between the groups of letters when audible speech is generated therefrom.

19. The method of claim 18, further comprising the step of:

generating audible speech representing the spelling of the name following the generation of audible speech from the groups of letters.

20. The method of claim 16, further comprising the step of:

generating audible speech representing the spelling of a name.

21. The method of claim 1, wherein the audible speech is generated for a plurality of users, the method further comprising the steps of:

outputting at a first annunciation rate and to a first user, a first segment of audible speech corresponding to a first segment of text;

repeatedly outputting to the first user the first segment of audible speech; and

decreasing a rate of annunciation of the first segment of audible speech after a first number of successive repeats of the first segment of audible speech.

22. The method of claim 21, further comprising the step of:

outputting the first segment of audible speech corresponding to the first segment of text to a second user at a second annunciation rate which is determined as a function of the number of times the first segment of audible speech was output to the first user.

23. The method of claim 22, wherein the second annunciation rate is lower than the first annunciation rate.

24. The method of claim 1, further comprising the steps of:

allowing users to obtain repeats of audible speech segments generated from text segments;

changing the rate of annunciation of a first audible speech segment after a first number of successive repeats of the first audible speech segment for the first user;

decreasing the rate of annunciation of a second audible speech segment generated from a second text segment for the first user after the first number of successive repeats of the first audible speech segment; and

increasing the rate of annunciation for a third audible speech segment generated from a thirds text segment if the first user does not obtain repeats of the second audible speech segment.

25. The method of claim 24, further comprising the step of:

adjusting the initial annunciation rate for subsequent users as a function of the number of consecutive prior users for whom the rate of annunciation has been altered.

26. A method of synthesizing human audible speech from text including a predetermined information content and having predetermined format characteristics, the method comprising the steps of:

generating prosody indica for the text as a function of the predetermined information content and predetermined format characteristics of the text by performing the steps of:

a) identifying major prosodic groupings within the restricted text by utilizing major demarcation features which are a function of the predetermined format characteristics to define the beginning and end of the major prosodic groupings;

b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the restricted text as a function of the predetermined information content for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;

c) identifying within the prosodic subgroupings prosodically separable subgroup components, at least one subgroup component being a word in the name;

d) generating prosodic indica which include salience signifiers, the salience signifiers controlling the salience of segments of the synthesized speech, the step of generating the prosodic indica including the steps of:

(i) generating salience signifiers within the prosodic subgroupings in accordance with salience placement rules solely relating to the components of the subgroupings themselves;

(ii) modifying the generated salience signifiers to increase the salience at the start of each prosodic subgroup and to further signify the salience at the end of each prosodic subgroup; and

(iii) further modifying the salience signifiers to further increase the salience of the beginning of the major prosodic grouping and further signify the salience of the end of the major prosodic grouping.

27. The method of claim 26, further comprising the steps of:

arranging letters of the name into groups;

generating indica of prosodic boundaries between the groups of letters, the generated indica of prosodic boundaries between groups of letters resulting in the insertion of a slight pause between the groups of letters when audible speech is generated therefrom.

28. The method of claim 27, wherein the audible speech is generated for a plurality of users, the method further comprising the steps of:

outputting to a first user at a first annunciation rate a first segment of audible speech corresponding to a first segment of text;

repeatedly outputting to the first user the first segment of audible speech; and

decreasing the rate of annunciation of the first segment of audible speech after a first number of successive repeats of the first segment of audible speech.

29. An apparatus for synthesizing human audible speech from a machine readable representation of restricted text having a predetermined information content and predetermined format characteristics, comprising:

prosody preprocessor means for receiving the restricted text and for generating prosody indica by assigning the prosody indica on the basis of the predetermined informational content of the restricted text, means for:

a) identifying major prosodic groupings by utilizing major demarcation features to define the beginning and end of the major prosodic groupings;

b) identifying prosodic subgroupings within the major prosodic groupings according to prosodic rules for analyzing the text for predetermined textual markers indicative of prosodically isolatible subgroupings not delineated by the major demarcations dividing the prosodic major groupings;

c) identifying within the prosodic subgroupings prosodically separable subgroup components; and

d) generating prosodic indicia which include salience signifiers utilizable by the speech synthesizer means to vary the salience of segments of the synthesized speech such that:

(i) the salience signifiers within the prosodic subgroupings are first generated in accordance with predetermined salience placement rules solely relating to the components themselves,

(ii) thereafter the first generated salience signifiers are modified to increase the salience at the start of the prosodic subgroup and further signify the salience at the end of the prosodic subgroup, and

(iii) the salience signifiers arc subsequently further modified to further increase the salience of the beginning of the major prosodic grouping and further signify the salience of the end of the major prosodic grouping; and

speech synthesizer means for synthesizing human audible speech from text, the speech synthesizer means including means for generating prosody indica on unrestricted text and for interpreting and executing prosody indica received from the prosody preprocessor means, the prosody indica from the prosody preprocessor means being used to override and supplement the prosody indica generated by the internal prosody indica generating means.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to automated synthesis of human speech from computer readable text, such as that stored in databases or generated by data processing systems automatically or via a user. Such systems are under current consideration and are being placed in use for example, by banks or telephone companies to enable customers to readily access information about accounts, telephone numbers, addresses and the like.

Text-to-speech synthesis is seen to be potentially useful to automate or create many information services. Unfortunately to date most commercial systems for automated synthesis remain too unnatural