WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Text-to-speech synthesis with controllable processing time and speech quality    
United States Patent5615300   
Link to this pagehttp://www.wikipatents.com/5615300.html
Inventor(s)Hara; Yoshiyuki (Tokyo, JP); Nitta; Tsuneo (Yokohama, JP)
AbstractSynthesized speech is generated by a software-implemented system with a programmed central processing unit. Phonetic parameters are generated from a series of phonetic symbols of an input text to be converted into synthesized speech, and prosodic parameters are also generated from prosodic information of the input text. The activity ratio of the central processing unit is determined, and the order of phonetic parameters or the arrangement of a synthesis unit or filter for speech synthesis is determined depending on the determined activity ratio of the central processing unit. Synthesized speech sounds are generated and filtered based on the phonetic and prosodic parameters according to the determined order of phonetic parameters or the determined arrangement of the filter.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5615300
Text-to-speech synthesis with controllable processing time and speech

     quality - US Patent 5615300 Drawing
Text-to-speech synthesis with controllable processing time and speech quality
Inventor     Hara; Yoshiyuki (Tokyo, JP); Nitta; Tsuneo (Yokohama, JP)
Owner/Assignee     Toshiba Corporation (Kanagawa-ken, JP)
Patent assignment
All assignments
Publication Date     March 25, 1997
Application Number     08/067,079
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 26, 1993
US Classification     704/260 704/267 704/268
Int'l Classification     G10L 009/00
Examiner     MacDonald; Allen R.
Assistant Examiner     Sartori; Michael A.
Attorney/Law Firm     Finnegan, Henderson, Farabow, Garrett & Dunner, L.L.P.
Address
Parent Case    
Priority Data     May 28, 1992[JP]4-137177
USPTO Field of Search     395/2.67 395/2.69 395/2.76 395/2.77 395/2 395/2.4 395/2.44 395/2..75 381/40 381/51 381/52
Patent Tags     text-to-speech synthesis controllable processing time speech quality
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4896359
Yamamoto
704/260
Jan,1990

[0 after 0 votes]
4817161
Kaneko
704/267
Mar,1989

[0 after 0 votes]
4709340
Capizzi
704/268
Nov,1987

[0 after 0 votes]
4618936
Shiono
704/267
Oct,1986

[0 after 0 votes]
4581757
Cox
704/258
Apr,1986

[0 after 0 votes]
4296279
Stork
704/264
Oct,1981

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of:

generating phonetic parameters from a series of phonetic symbols of an input text;

generating prosodic parameters from prosodic information of the input text;

detecting an activity rate of the central processing unit;

determining a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, depending on the detected activity rate of the central processing unit; and

generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters including adapting the filtering according to the determined degree number of the particular phonetic parameter.

2. A method according to claim 1, wherein said input text has frames, accent phrases., pauses, sentences, and paragraphs, said activity rate of the central processing unit being detected in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

3. A method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of:

generating phonetic parameters from a series of phonetic symbols of an input text;

generating prosodic parameters from prosodic information of the input text;

detecting an activity rate of the central processing unit;

determining a degree number of at least one particular phonetic parameter each particular phonetic parameter having a different degree number in different contexts, said degree number depending on said detected activity rate;

determining a synthesis unit from a plurality of synthesis units according to said particular phonetic parameter; and

generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the determined synthesis unit.

4. A method according to claim 3, wherein said input text has frames, accent phrases, pauses, sentences, and paragraphs, said activity rate of the central processing unit being detected in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

5. A method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of:

generating phonetic parameters from a series of phonetic symbols of an input;

generating prosodic parameters from prosodic information of the input text;

detecting an activity rate of the central processing unit;

inputting information representative of a quality of synthesized speech sounds to be generated depending on the activity rate of the central processing unit;

determining a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, said degree number being determined according to the input information; and

selecting a synthesis unit from among a plurality of synthesis units according to said degree number of said particular phonetic parameter during each one of a plurality of different periods of time; and

generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters employing said selected synthesis unit.

6. A method according to claim 5, wherein said input text has frames, accent phrases, pauses, sentences, and paragraphs, said activity rate of the central processing unit being detected in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

7. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

detector means for detecting an activity rate of the central processing unit;

control means for determining a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, said degree number being determined depending on the detected activity rate of the central processing unit; and

speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters including adaptable filtering means and means for adapting the adaptable filtering means according to the determined degree number of the particular phonetic parameter.

8. An apparatus according to claim 7, wherein said input text has frames, accent phrases, pauses, sentences, and paragraphs, said detector means comprising means for detecting the activity rate of the central processing unit in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

9. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

a plurality of synthesis units for effecting filtering during synthesis of speech sounds;

detector means for detecting an activity rate of the central processing unit;

means for determining a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, said particular degree number depending on said detected activity rate;

selector means for selecting a respective one of said plurality of synthesis units according to said degree number of said particular phonetic parameter during each one of a plurality of different periods of time, a plurality of phonetic parameters and a plurality of prosodic parameters being generated during each one of said plurality of different periods of time; and

means including the selected synthesis unit for applying all said phonetic and prosodic parameters generated during each said one period of time to the respective one synthesis unit which is selected by said selector means to generate synthesized speech sounds.

10. An apparatus according to claim 9, wherein said input text has frames, accent phrases, pauses, sentences, and paragraphs, said detector means comprising means for detecting the activity rate of the central processing unit in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

11. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

a plurality of synthesis units for effecting filtering during synthesis of speech sounds;

detector means for detecting an activity rate of the central processing unit;

means for determining a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, said degree number depending on one of said detected activity rate and a quality of synthesized speech sounds to be generated;

selector means for selecting a respective one of said plurality of synthesis units according to said degree number of said particular phonetic parameter during each one of a plurality of different periods of time, a plurality of phonetic parameters and a plurality of prosodic parameters being generated during each one of said plurality of different periods of time; and

means including the selected synthesis unit for applying all of said phonetic and prosodic parameters generated during each said one period of time to the respective one synthesis unit that is selected by said selector means to generate synthesized speech sounds.

12. An apparatus according to claim 11, wherein said input text has frames, accent phrases, pauses, sentences, and paragraphs, said detector means comprising means for detecting the activity rate of the central processing unit in every frame, every accent phrase, every pause, every sentence, or every paragraph of the input text, or once at the beginning of the input text.

13. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

input means for inputting information representative of a degree number of phonetic parameters in a first mode;

detector means for detecting an activity rate of the central processing unit in a second mode;

mode selector means for selecting one of said first and second modes;

control means for determining information representative of a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, depending on the detected activity rate of the central processing unit; and

speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the information input by said input means in said first mode selected by said mode selector means and according to the information determined by said control means in said second mode selected by said mode selection means.

14. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

input means for inputting information representative of a quality of synthesized speech sounds to be generated in a first mode;

detector means for detecting an activity rate of the central processing unit in a second mode;

mode selector means for selecting one of said first and second modes;

control means for determining information representative of depending on the detected activity rate of the central processing unit; and

control means for determining information representative of a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in different contexts, depending on the detected activity rate of the central processing unit; and

speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the information input by said input means in said first mode selected by said mode selector means and according to the information including said degree number determined by said control means in said second mode selected by said mode selector means.

15. An apparatus for synthesizing speech with a system having a programmed central processing unit, comprising:

means for generating phonetic parameters from a series of phonetic symbols of an input text;

means for generating prosodic parameters from prosodic information of the input text;

a plurality of synthesis units for effecting filtering on synthesized speech sounds for respective different periods of time;

input means for inputting information representative of one of said synthesis units in a first mode;

detector means for detecting an activity rate of the central processing unit in a second mode;

mode selector means for selecting one of said first and second modes;

means for generating a degree number of at least one particular phonetic parameter, each particular phonetic parameter having a different degree number in differing contexts, the degree number of said particular phonetic parameter depending on the detected activity rate of the central processing unit;

synthesis unit selector means for selecting said one of the synthesis units which is represented by the information input by said input means in said first mode selected by said mode selector means and for selecting one of the synthesis units depending on the the degree number of said particular phonetic parameter in said second mode selected by said mode selector means; and

speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters with said selected synthesis unit.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention:

The present invention relates to a method of and an apparatus for generating synthesized speech from either a sequence of character codes or a series of phonetic symbols and prosodic information associated therewith.

2. Description of the Prior Art:

Recently, there have been developed various speech synthesizers for analyzing Japanese sentences composed of a mixture of Kanji (Chinese) characters and Kana (Japanese syllabary) characters and generating synthesized speech from phonetic and prosodic information represented by the analyzed sentences according to the synthesis-by-rule process. Such speech synthesis systems are finding wide use in telephone information services in the banking business, newspaper revising systems, document readers, and other apparatus employing synthesized speech.

Basically, the speech synthesizer based on the synthesis-by-rule process operates as follows: The speech synthesizer has a speech segment file which stores phonetic information that has been obtained by the LSP (line spectrum pair) analysis or the cepstrum analysis from each unit of human speech which may be of a syllable structure CV (consonant-vowel), a syllable structure CVC (consonant-vowel-consonant), a syllable structure VCV (vowel-consonant-vowel), or a syllable structure VC (vowel-consonant). When a text is inputted to the speech synthesizer, the speech synthesizer analyzes the text, produces phonetic and prosodic parameters for the text by referring to the speech segment file, and generates and filters sound sources based on the phonetic and prosodic parameters for generating synthesized speech of the text.

It has heretofore been customary to construct the speech synthesizer of dedicated hardware components that are required for real-time data processing. There are primarily two system designs available for the dedicated-hardware speech synthesizer. According to one system, a host computer such as personal computer converts a sentence of Kanji and Kana characters into phonetic and prosodic information, and a dedicated hardware device generates phonetic and prosodic parameters based on the converted phonetic and prosodic information, generates and filters sound sources, and converts the filtered sound sources into an analog speech signal for generating synthesized speech. According to the other system, all the above processing steps are executed by a dedicated hardware device. Usually, the dedicated hardware device of each of the above systems comprises an LSI circuit called a DSP (digital signal processor) which is capable of high-speed logic operations including ANDing and ORing, and a general-purpose MPU (microprocessor unit).

Recent years have seen another system approach to software-implementation of the above processing on a real-time basis. The software-implemented system has been made possible by a personal computer or an engineering work station having a high processing capability combined with a D/A converter, an analog output device, and a loudspeaker.

The software-implemented system is free of problems with respect to speech synthesis while it is processing a relatively few tasks. However, when many tasks require to be processed simultaneously by the system, the system may not be able to generate real-time synthesized speech. If the system fails to generate real-time synthesized speech, then unvoiced intervals are inserted in synthesized words, making it difficult for the user to hear the synthesized words clearly. Specifically, a certain constant period of time is needed for the CPU (central processing unit) of the system to carry out the process of speech synthesis. Therefore, insofar as the CPU of the system operates to process a relatively small number of tasks, it can produce synthesized speech on a real-time basis. However, when the CPU of the system is required to process an increased number of tasks, the CPU requires a longer execution time to process those tasks, possibly failing to generate real-time synthesized speech.

The present speech synthesizer that operates according to the synthesis-by-rule process can produce synthesized speech in different patterns that reflect such differences as sex, age, pronunciation rate, pitch, and stress. The user of the speech synthesizer can select any one of the different speech patterns according to his preference. However, the user cannot change the quality of the synthesized speech.

Most speech synthesizers that are available todaty generate crisp synthesized speech sounds that can be heard clearly. If the user of the speech synthesizer hears such crisp synthesized speech sounds for the first time, then the user will find them acceptable as they are sharp and clear. However, if the user who has become accustomed to synthesized speech hears crisp synthesized speech sounds for a continued period of time, then the user finds them physically and mentally fatiguing. Since the quality of synthesized speech, i.e., the quality of being crisp, cannot be changed, the conventional speech synthesizer does not lend itself to continuous usage for a long period of time.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a method of and an apparatus for generating synthesized speech while allowing a period of time required for speech synthesis and the quality of synthesized speech to be varied by varying the order of filtering for speech synthesis.

Another object of the present invention is to provide a method of and an apparatus for generating synthesized speech while allowing a period of time required for speech synthesis and the quality of synthesized speech to be varied by varying the arrangement of a synthesis unit used for filtering for speech synthesis.

Still another object of the present invention is to provide a method of and an apparatus for generating high-quality synthesized speech on a real-time basis by varying the order of filtering for speech synthesis or the arrangement of a synthesis unit depending on the activity ratio of a central processing unit that is programmed for speech synthesis.

According to the present invention, there is provided a method of synthesizing speech, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, supplying information representative of the order of phonetic parameters, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided a method of synthesizing speech, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, supplying information representative of the quality of synthesized speech sounds to be generated, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided a method of synthesizing speech, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, supplying information representative of the arrangement of a synthesis unit to be used, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters with a synthesis unit which is arranged according to the supplied information.

According to the present invention, there is also provided a method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, determining the activity ratio of the central processing unit, determining the order of phonetic parameters depending on the determined activity ratio of the central processing unit, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the determined order of phonetic parameters.

According to the present invention, there is also provided a method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, determining the activity ratio of the central processing unit, determining the arrangement of a synthesis unit to be used depending on the activity ratio of the central processing unit, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the determined arrangement of a synthesis unit to be used.

According to the present invention, there is also provided a method of synthesizing speech with a system having a programmed central processing unit, comprising the steps of generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, generating prosodic parameters from prosodic information of the input text, determining the activity ratio of the central processing unit, supplying information representative of the quality of synthesized speech sounds to be generated depending on the activity ratio of the central processing unit, and generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided an apparatus for synthesizing speech, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, means for supplying information representative of the order of phonetic parameters, and means generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided an apparatus for synthesizing speech, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, means for supplying information representative of the quality of synthesized speech sounds to be generated, and means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided an apparatus for synthesizing speech, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, a plurality of synthesis units for effecting filtering on synthesized speech sounds for respective different periods of time, input means for supplying information representative of one of said synthesis units, selector means for selecting one of said synthesis units according to the information supplied by said input means, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters with said one of the synthesis units which is selected by said selector means.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, extractor means for determining the activity ratio of the central processing unit, control means for determining the order of phonetic parameters depending on the determined activity ratio of the central processing unit, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the determined order of phonetic parameters.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, a plurality of synthesis units for effecting filtering on synthesized speech sounds for respective different periods of time, extractor means for determining the activity ratio of the central processing unit, control means for determining the arrangement of a synthesis unit according to the determined activity ratio of the central processing unit, selector means for selecting one of said synthesis units which has the determined arrangement, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters with said one of the synthesis units which is selected by said selector means.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, a plurality of synthesis units for effecting filtering on synthesized speech sounds for respective different periods of time, extractor means for determining the activity ratio of the central processing unit, input means for supplying information representative of the quality of synthesized speech sounds to be generated according to the determined activity ratio of the central processing unit, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the supplied information.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, input means for supplying information representative of the order of phonetic parameters in a first mode, extractor means for determining the activity ratio of the central processing unit in a second mode, mode selector means for selecting one of said first and second modes, control means for determining information representative of the order of phonetic parameters depending on the determined activity ratio of the central processing unit, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the information supplied by said input means in said first mode and according to the information determined by said control means in said second mode.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, input means for supplying information representative of the quality of synthesized speech sounds to be generated in a first mode, extractor means for determining the activity ratio of the central processing unit in a second mode, mode selector means for selecting one of said first and second modes, control means for determining information representative of the quality of synthesized speech sounds to be generated depending on the determined activity ratio of the central processing unit, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters according to the information supplied by said input means in said first mode and according to the information determined by said control means in said second mode.

According to the present invention, there is also provided an apparatus for synthesizing speech with a system having a programmed central processing unit, comprising means for generating phonetic parameters from a series of phonetic symbols of an input text to be converted into synthesized speech, means for generating prosodic parameters from prosodic information of the input text, a plurality of synthesis units for effecting filtering on synthesized speech sounds for respective different periods of time, input means for supplying information representative of one of said synthesis units in a first mode, extractor means for determining the activity ratio of the central processing unit in a second mode, mode selector means for selecting one of said first and second modes, control means for determining information representative of one of said synthesis units depending on the determined activity ratio of the central processing unit, selector means for selecting one of the synthesis units which is represented by the information supplied by said input means in said first mode and one of the synthesis units which is represented by the information determined by said control means in said second mode, and speech synthesizer means for generating and filtering synthesized speech sounds based on the phonetic and prosodic parameters with said selected one of the synthesis units.

The above and other objects, features, and advantages of the present invention will become apparent from the following description when taken in conjunction with the accompanying drawings which illustrate preferred embodiments of the present invention by way of example.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a speech synthesizing apparatus according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a processing sequence of a speech synthesizer of the speech synthesizing apparatus shown in FIG. 1;

FIG. 3 is a block diagram of a speech synthesizing apparatus according to a second embodiment of the present invention;

FIG. 4 is a flowchart of a processing sequence of a speech synthesizer of the speech synthesizing apparatus shown in FIG. 3;

FIG. 5 is a flowchart of a subroutine (A) in the processing sequence shown in FIG. 4;

FIGS. 6A and 6B are diagrams showing examples of information stored in a rate information file of the speech synthesizing apparatus shown in FIG. 3;

FIGS. 7A through 7G are diagrams showing a specific example of speech synthesis as well as an input text during operation of the speech synthesizing apparatus shown in FIG. 3;

FIG. 8 is a block diagram of a filter arrangement which may be employed in the speech synthesizer of the speech synthesizing apparatus shown in FIG. 3;

FIG. 9 is a diagram showing examples of information stored in the rate information file which are necessary to vary the processing time with filter switching in the speech synthesizing apparatus shown in FIG. 3;

FIG. 10 is a flowchart of a processing sequence of a speech synthesizer of the speech synthesizing apparatus shown in FIGS. 3 and 8; and

FIG. 11 is a flowchart of a subroutine (B) in the processing sequence shown in FIG. 10.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the description that follows, reference is made to a certain Japanese text which is given as an example in conversion from text to speech. For an easier understanding, the Japanese sentences are fully transliterated and their meaning is fully given in English. It should be understood that the example in Japanese is employed only for a better description of the present invention and that the principles of the present invention are not limited to the Japanese language, but also applicable to other languages including English.

1ST EMBODIMENT:

As shown in FIG. 1, a speech synthesizing apparatus according to a first embodiment of the present invention includes an input unit 1 for entering a series of character codes representing a mixture of Kanji and Kana characters to be converted into synthesized speech and control information for controlling the synthesized speech. The control information comprises order information to select the order N of synthetic parameters to be supplied to a filter in a speech synthesizer 6 (described later).

The speech synthesizing apparatus also has a word dictionary 2 storing registered accent types, pronunciations, and parts of speech of words and phrases to be converted into speech, and a linguistic processor 3 for analyzing a series of character codes entered from the input unit 1 with the information stored in the word dictionary 2 and generating a series of phonetic symbols and prosodic information associated therewith.

The speech synthesizing apparatus further includes a speech segment file 4 which stores a group of cepstral parameters that have been determined by analyzing units of input speech and information indicative of the orders of the cepstral parameters, and a synthetic parameter generator 5 for generating phonetic parameters, i.e., phonetic cepstral parameters, according to the series of phonetic symbols generated by the linguistic processor 3 and the order information from the input unit 1. The synthetic parameter generator 5 also serves to generate prosodic parameters according to the prosodic information generated by the linguistic processor 3.

The speech synthesizing apparatus also has a speech synthesizer 6 for generating a sound source based on the phonetic parameters generated by the synthetic parameter generator 5, the order information, and the prosodic parameters generated by the synthetic parameter generator 5, and filtering the generated sound source with an Nth-order filter to generate synthesized speech, and a loudspeaker 7 for outputting the generated synthesized speech. The speech synthesizer 6 includes a D/A converter (not shown) for converting the synthesized speech into an analog signal.

The speech synthesizing apparatus shown in FIG. 1 is realized by a personal computer (PC) or an engineering work station (EWS) which is capable of executing multiple tasks at the same time. The input unit 1, the linguistic processor 3, the synthetic parameter generator 5, and the speech synthesizer 6 are functional blocks whose functions are performed by a programmed sequence of a CPU of the personal computer or the engineering work station, i.e., by the execution of a speech synthesis task.

The speech synthesizing apparatus shown in FIG. 1 operates as follows:

A series of character codes representing a sentence of mixed Kanji and Kana characters to be converted into synthesized speech, and order information indicative of an order N are entered into the speech synthesizing apparatus through the input unit 1. The linguistic processor 3 compares the entered series of character codes with the word dictionary 2 to determine accent types, pronunciations, and parts of speech of words and phrases represented by the series of character codes, determines accent types and boundaries according to the parts of speech, and converts the sentence of mixed Kanji and Kana characters into a pronunciation format, for thereby generating a series of phonetic symbols and prosodic information.

The series of phonetic symbols and prosodic information generated by the linguistic processor 3 are then supplied to the synthetic parameter generator 5, which is also supplied with the order information from the input unit 1.

The synthetic parameter generator 5 extracts phonetic cepstral parameters corresponding to the series of phonetic symbols from the speech segment file 4 with respect to the order N represented by the order information from the input unit 1, for thereby generating phonetic parameters. At the same time, the synthetic parameter generator 5 generates prosodic parameters according to the prosodic information.

The synthetic parameter generator 5 supplies the phonetic parameters and the prosodic parameters to the speech synthesizer 6, which temporarily holds the supplied phonetic and prosodic parameters together with the order information supplied from the input unit 1. Then, based on the phonetic and prosodic parameters and the order information, the speech synthesizer 6 generates a sound source and effects digital filtering on the sound source to generate synthesized speech representing the entered series of character codes. The generated synthesized speech is converted by the D/A converter into an analog speech signal, which is applied to the loudspeaker 7. The loudspeaker 7 now produces synthesized speech sounds corresponding to the entered sentence of mixed Kanji and Kana characters.

The processing sequence of the speech synthesizer 6 will be described in detail below with reference to FIG. 2.

The speech synthesizer 6 sets a counter variable j indicating a frame number to an initial value of "1" in a step S1 and also sets a counter variable i indicating the remaining number of samples to be processed per frame to an initial value of "P"=frame period/sampling period in a step S2. The sampling period is the same as the period of a clock signal supplied to the D/A converter (not shown).

Thereafter, the speech synthesizer 6 selectively enters, in a step S3, synthetic parameters Rj composed of one frame (whose frame number is "j") of phonetic parameters C0.about.CN and prosodic parameters, which one frame corresponds to the order N indicated by the order information supplied from the input unit 1, from the phonetic and prosodic parameters which have been supplied from the synthetic parameter generator 5 and held therein.

Then, the speech synthesizer 6 generates one sample of speech waveform data, i.e., a sound source, using the phonetic parameter C0 and the prosodic parameters in a step S4. After the step S4, the speech synthesizer 6 effects filtering, i.e., digital filtering, on the generated sample of speech waveform data using the phonetic parameters C1.about.C6 in a step S5.

Thereafter, the speech synthesizer 6 determines whether the order N indicated by the order information supplied from the input unit 1 is "6" or not in a step S6. If the order N is "6" then the speech synthesizer 6 outputs the filtered sample of speech waveform data in a step S10.

If the order N is not "6" in the step S6, then the speech synthesizer 6 effects filtering on the sample of speech waveform data generated in the step S5, using the phonetic parameters C7.about.C10 in a step S7. The speech synthesizer 6 then determines whether the order N is "10" or not in a step S8.

If the order N is "10", then control jumps from the step S8 to the step S10. If the order N is other than "10", then the speech synthes