|
Claims  |
|
|
What is claimed is:
1. A method of synthesizing speech with a system having a programmed
central processing unit, comprising the steps of:
generating phonetic parameters from a series of phonetic symbols of an
input text;
generating prosodic parameters from prosodic information of the input text;
detecting an activity rate of the central processing unit;
determining a degree number of at least one particular phonetic parameter,
each particular phonetic parameter having a different degree number in
different contexts, depending on the detected activity rate of the central
processing unit; and
generating and filtering synthesized speech sounds based on the phonetic
and prosodic parameters including adapting the filtering according to the
determined degree number of the particular phonetic parameter.
2. A method according to claim 1, wherein said input text has frames,
accent phrases., pauses, sentences, and paragraphs, said activity rate of
the central processing unit being detected in every frame, every accent
phrase, every pause, every sentence, or every paragraph of the input text,
or once at the beginning of the input text.
3. A method of synthesizing speech with a system having a programmed
central processing unit, comprising the steps of:
generating phonetic parameters from a series of phonetic symbols of an
input text;
generating prosodic parameters from prosodic information of the input text;
detecting an activity rate of the central processing unit;
determining a degree number of at least one particular phonetic parameter
each particular phonetic parameter having a different degree number in
different contexts, said degree number depending on said detected activity
rate;
determining a synthesis unit from a plurality of synthesis units according
to said particular phonetic parameter; and
generating and filtering synthesized speech sounds based on the phonetic
and prosodic parameters according to the determined synthesis unit.
4. A method according to claim 3, wherein said input text has frames,
accent phrases, pauses, sentences, and paragraphs, said activity rate of
the central processing unit being detected in every frame, every accent
phrase, every pause, every sentence, or every paragraph of the input text,
or once at the beginning of the input text.
5. A method of synthesizing speech with a system having a programmed
central processing unit, comprising the steps of:
generating phonetic parameters from a series of phonetic symbols of an
input;
generating prosodic parameters from prosodic information of the input text;
detecting an activity rate of the central processing unit;
inputting information representative of a quality of synthesized speech
sounds to be generated depending on the activity rate of the central
processing unit;
determining a degree number of at least one particular phonetic parameter,
each particular phonetic parameter having a different degree number in
different contexts, said degree number being determined according to the
input information; and
selecting a synthesis unit from among a plurality of synthesis units
according to said degree number of said particular phonetic parameter
during each one of a plurality of different periods of time; and
generating and filtering synthesized speech sounds based on the phonetic
and prosodic parameters employing said selected synthesis unit.
6. A method according to claim 5, wherein said input text has frames,
accent phrases, pauses, sentences, and paragraphs, said activity rate of
the central processing unit being detected in every frame, every accent
phrase, every pause, every sentence, or every paragraph of the input text,
or once at the beginning of the input text.
7. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
detector means for detecting an activity rate of the central processing
unit;
control means for determining a degree number of at least one particular
phonetic parameter, each particular phonetic parameter having a different
degree number in different contexts, said degree number being determined
depending on the detected activity rate of the central processing unit;
and
speech synthesizer means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters including adaptable
filtering means and means for adapting the adaptable filtering means
according to the determined degree number of the particular phonetic
parameter.
8. An apparatus according to claim 7, wherein said input text has frames,
accent phrases, pauses, sentences, and paragraphs, said detector means
comprising means for detecting the activity rate of the central processing
unit in every frame, every accent phrase, every pause, every sentence, or
every paragraph of the input text, or once at the beginning of the input
text.
9. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
a plurality of synthesis units for effecting filtering during synthesis of
speech sounds;
detector means for detecting an activity rate of the central processing
unit;
means for determining a degree number of at least one particular phonetic
parameter, each particular phonetic parameter having a different degree
number in different contexts, said particular degree number depending on
said detected activity rate;
selector means for selecting a respective one of said plurality of
synthesis units according to said degree number of said particular
phonetic parameter during each one of a plurality of different periods of
time, a plurality of phonetic parameters and a plurality of prosodic
parameters being generated during each one of said plurality of different
periods of time; and
means including the selected synthesis unit for applying all said phonetic
and prosodic parameters generated during each said one period of time to
the respective one synthesis unit which is selected by said selector means
to generate synthesized speech sounds.
10. An apparatus according to claim 9, wherein said input text has frames,
accent phrases, pauses, sentences, and paragraphs, said detector means
comprising means for detecting the activity rate of the central processing
unit in every frame, every accent phrase, every pause, every sentence, or
every paragraph of the input text, or once at the beginning of the input
text.
11. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
a plurality of synthesis units for effecting filtering during synthesis of
speech sounds;
detector means for detecting an activity rate of the central processing
unit;
means for determining a degree number of at least one particular phonetic
parameter, each particular phonetic parameter having a different degree
number in different contexts, said degree number depending on one of said
detected activity rate and a quality of synthesized speech sounds to be
generated;
selector means for selecting a respective one of said plurality of
synthesis units according to said degree number of said particular
phonetic parameter during each one of a plurality of different periods of
time, a plurality of phonetic parameters and a plurality of prosodic
parameters being generated during each one of said plurality of different
periods of time; and
means including the selected synthesis unit for applying all of said
phonetic and prosodic parameters generated during each said one period of
time to the respective one synthesis unit that is selected by said
selector means to generate synthesized speech sounds.
12. An apparatus according to claim 11, wherein said input text has frames,
accent phrases, pauses, sentences, and paragraphs, said detector means
comprising means for detecting the activity rate of the central processing
unit in every frame, every accent phrase, every pause, every sentence, or
every paragraph of the input text, or once at the beginning of the input
text.
13. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
input means for inputting information representative of a degree number of
phonetic parameters in a first mode;
detector means for detecting an activity rate of the central processing
unit in a second mode;
mode selector means for selecting one of said first and second modes;
control means for determining information representative of a degree number
of at least one particular phonetic parameter, each particular phonetic
parameter having a different degree number in different contexts,
depending on the detected activity rate of the central processing unit;
and
speech synthesizer means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters according to the
information input by said input means in said first mode selected by said
mode selector means and according to the information determined by said
control means in said second mode selected by said mode selection means.
14. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
input means for inputting information representative of a quality of
synthesized speech sounds to be generated in a first mode;
detector means for detecting an activity rate of the central processing
unit in a second mode;
mode selector means for selecting one of said first and second modes;
control means for determining information representative of depending on
the detected activity rate of the central processing unit; and
control means for determining information representative of a degree number
of at least one particular phonetic parameter, each particular phonetic
parameter having a different degree number in different contexts,
depending on the detected activity rate of the central processing unit;
and
speech synthesizer means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters according to the
information input by said input means in said first mode selected by said
mode selector means and according to the information including said degree
number determined by said control means in said second mode selected by
said mode selector means.
15. An apparatus for synthesizing speech with a system having a programmed
central processing unit, comprising:
means for generating phonetic parameters from a series of phonetic symbols
of an input text;
means for generating prosodic parameters from prosodic information of the
input text;
a plurality of synthesis units for effecting filtering on synthesized
speech sounds for respective different periods of time;
input means for inputting information representative of one of said
synthesis units in a first mode;
detector means for detecting an activity rate of the central processing
unit in a second mode;
mode selector means for selecting one of said first and second modes;
means for generating a degree number of at least one particular phonetic
parameter, each particular phonetic parameter having a different degree
number in differing contexts, the degree number of said particular
phonetic parameter depending on the detected activity rate of the central
processing unit;
synthesis unit selector means for selecting said one of the synthesis units
which is represented by the information input by said input means in said
first mode selected by said mode selector means and for selecting one of
the synthesis units depending on the the degree number of said particular
phonetic parameter in said second mode selected by said mode selector
means; and
speech synthesizer means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters with said selected
synthesis unit. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention:
The present invention relates to a method of and an apparatus for
generating synthesized speech from either a sequence of character codes or
a series of phonetic symbols and prosodic information associated
therewith.
2. Description of the Prior Art:
Recently, there have been developed various speech synthesizers for
analyzing Japanese sentences composed of a mixture of Kanji (Chinese)
characters and Kana (Japanese syllabary) characters and generating
synthesized speech from phonetic and prosodic information represented by
the analyzed sentences according to the synthesis-by-rule process. Such
speech synthesis systems are finding wide use in telephone information
services in the banking business, newspaper revising systems, document
readers, and other apparatus employing synthesized speech.
Basically, the speech synthesizer based on the synthesis-by-rule process
operates as follows: The speech synthesizer has a speech segment file
which stores phonetic information that has been obtained by the LSP (line
spectrum pair) analysis or the cepstrum analysis from each unit of human
speech which may be of a syllable structure CV (consonant-vowel), a
syllable structure CVC (consonant-vowel-consonant), a syllable structure
VCV (vowel-consonant-vowel), or a syllable structure VC (vowel-consonant).
When a text is inputted to the speech synthesizer, the speech synthesizer
analyzes the text, produces phonetic and prosodic parameters for the text
by referring to the speech segment file, and generates and filters sound
sources based on the phonetic and prosodic parameters for generating
synthesized speech of the text.
It has heretofore been customary to construct the speech synthesizer of
dedicated hardware components that are required for real-time data
processing. There are primarily two system designs available for the
dedicated-hardware speech synthesizer. According to one system, a host
computer such as personal computer converts a sentence of Kanji and Kana
characters into phonetic and prosodic information, and a dedicated
hardware device generates phonetic and prosodic parameters based on the
converted phonetic and prosodic information, generates and filters sound
sources, and converts the filtered sound sources into an analog speech
signal for generating synthesized speech. According to the other system,
all the above processing steps are executed by a dedicated hardware
device. Usually, the dedicated hardware device of each of the above
systems comprises an LSI circuit called a DSP (digital signal processor)
which is capable of high-speed logic operations including ANDing and
ORing, and a general-purpose MPU (microprocessor unit).
Recent years have seen another system approach to software-implementation
of the above processing on a real-time basis. The software-implemented
system has been made possible by a personal computer or an engineering
work station having a high processing capability combined with a D/A
converter, an analog output device, and a loudspeaker.
The software-implemented system is free of problems with respect to speech
synthesis while it is processing a relatively few tasks. However, when
many tasks require to be processed simultaneously by the system, the
system may not be able to generate real-time synthesized speech. If the
system fails to generate real-time synthesized speech, then unvoiced
intervals are inserted in synthesized words, making it difficult for the
user to hear the synthesized words clearly. Specifically, a certain
constant period of time is needed for the CPU (central processing unit) of
the system to carry out the process of speech synthesis. Therefore,
insofar as the CPU of the system operates to process a relatively small
number of tasks, it can produce synthesized speech on a real-time basis.
However, when the CPU of the system is required to process an increased
number of tasks, the CPU requires a longer execution time to process those
tasks, possibly failing to generate real-time synthesized speech.
The present speech synthesizer that operates according to the
synthesis-by-rule process can produce synthesized speech in different
patterns that reflect such differences as sex, age, pronunciation rate,
pitch, and stress. The user of the speech synthesizer can select any one
of the different speech patterns according to his preference. However, the
user cannot change the quality of the synthesized speech.
Most speech synthesizers that are available todaty generate crisp
synthesized speech sounds that can be heard clearly. If the user of the
speech synthesizer hears such crisp synthesized speech sounds for the
first time, then the user will find them acceptable as they are sharp and
clear. However, if the user who has become accustomed to synthesized
speech hears crisp synthesized speech sounds for a continued period of
time, then the user finds them physically and mentally fatiguing. Since
the quality of synthesized speech, i.e., the quality of being crisp,
cannot be changed, the conventional speech synthesizer does not lend
itself to continuous usage for a long period of time.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a method of
and an apparatus for generating synthesized speech while allowing a period
of time required for speech synthesis and the quality of synthesized
speech to be varied by varying the order of filtering for speech
synthesis.
Another object of the present invention is to provide a method of and an
apparatus for generating synthesized speech while allowing a period of
time required for speech synthesis and the quality of synthesized speech
to be varied by varying the arrangement of a synthesis unit used for
filtering for speech synthesis.
Still another object of the present invention is to provide a method of and
an apparatus for generating high-quality synthesized speech on a real-time
basis by varying the order of filtering for speech synthesis or the
arrangement of a synthesis unit depending on the activity ratio of a
central processing unit that is programmed for speech synthesis.
According to the present invention, there is provided a method of
synthesizing speech, comprising the steps of generating phonetic
parameters from a series of phonetic symbols of an input text to be
converted into synthesized speech, generating prosodic parameters from
prosodic information of the input text, supplying information
representative of the order of phonetic parameters, and generating and
filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the supplied information.
According to the present invention, there is also provided a method of
synthesizing speech, comprising the steps of generating phonetic
parameters from a series of phonetic symbols of an input text to be
converted into synthesized speech, generating prosodic parameters from
prosodic information of the input text, supplying information
representative of the quality of synthesized speech sounds to be
generated, and generating and filtering synthesized speech sounds based on
the phonetic and prosodic parameters according to the supplied
information.
According to the present invention, there is also provided a method of
synthesizing speech, comprising the steps of generating phonetic
parameters from a series of phonetic symbols of an input text to be
converted into synthesized speech, generating prosodic parameters from
prosodic information of the input text, supplying information
representative of the arrangement of a synthesis unit to be used, and
generating and filtering synthesized speech sounds based on the phonetic
and prosodic parameters with a synthesis unit which is arranged according
to the supplied information.
According to the present invention, there is also provided a method of
synthesizing speech with a system having a programmed central processing
unit, comprising the steps of generating phonetic parameters from a series
of phonetic symbols of an input text to be converted into synthesized
speech, generating prosodic parameters from prosodic information of the
input text, determining the activity ratio of the central processing unit,
determining the order of phonetic parameters depending on the determined
activity ratio of the central processing unit, and generating and
filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the determined order of phonetic parameters.
According to the present invention, there is also provided a method of
synthesizing speech with a system having a programmed central processing
unit, comprising the steps of generating phonetic parameters from a series
of phonetic symbols of an input text to be converted into synthesized
speech, generating prosodic parameters from prosodic information of the
input text, determining the activity ratio of the central processing unit,
determining the arrangement of a synthesis unit to be used depending on
the activity ratio of the central processing unit, and generating and
filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the determined arrangement of a synthesis unit to
be used.
According to the present invention, there is also provided a method of
synthesizing speech with a system having a programmed central processing
unit, comprising the steps of generating phonetic parameters from a series
of phonetic symbols of an input text to be converted into synthesized
speech, generating prosodic parameters from prosodic information of the
input text, determining the activity ratio of the central processing unit,
supplying information representative of the quality of synthesized speech
sounds to be generated depending on the activity ratio of the central
processing unit, and generating and filtering synthesized speech sounds
based on the phonetic and prosodic parameters according to the supplied
information.
According to the present invention, there is also provided an apparatus for
synthesizing speech, comprising means for generating phonetic parameters
from a series of phonetic symbols of an input text to be converted into
synthesized speech, means for generating prosodic parameters from prosodic
information of the input text, means for supplying information
representative of the order of phonetic parameters, and means generating
and filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the supplied information.
According to the present invention, there is also provided an apparatus for
synthesizing speech, comprising means for generating phonetic parameters
from a series of phonetic symbols of an input text to be converted into
synthesized speech, means for generating prosodic parameters from prosodic
information of the input text, means for supplying information
representative of the quality of synthesized speech sounds to be
generated, and means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters according to the
supplied information.
According to the present invention, there is also provided an apparatus for
synthesizing speech, comprising means for generating phonetic parameters
from a series of phonetic symbols of an input text to be converted into
synthesized speech, means for generating prosodic parameters from prosodic
information of the input text, a plurality of synthesis units for
effecting filtering on synthesized speech sounds for respective different
periods of time, input means for supplying information representative of
one of said synthesis units, selector means for selecting one of said
synthesis units according to the information supplied by said input means,
and speech synthesizer means for generating and filtering synthesized
speech sounds based on the phonetic and prosodic parameters with said one
of the synthesis units which is selected by said selector means.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, extractor means for determining the activity ratio of the
central processing unit, control means for determining the order of
phonetic parameters depending on the determined activity ratio of the
central processing unit, and speech synthesizer means for generating and
filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the determined order of phonetic parameters.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, a plurality of synthesis units for effecting filtering on
synthesized speech sounds for respective different periods of time,
extractor means for determining the activity ratio of the central
processing unit, control means for determining the arrangement of a
synthesis unit according to the determined activity ratio of the central
processing unit, selector means for selecting one of said synthesis units
which has the determined arrangement, and speech synthesizer means for
generating and filtering synthesized speech sounds based on the phonetic
and prosodic parameters with said one of the synthesis units which is
selected by said selector means.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, a plurality of synthesis units for effecting filtering on
synthesized speech sounds for respective different periods of time,
extractor means for determining the activity ratio of the central
processing unit, input means for supplying information representative of
the quality of synthesized speech sounds to be generated according to the
determined activity ratio of the central processing unit, and speech
synthesizer means for generating and filtering synthesized speech sounds
based on the phonetic and prosodic parameters according to the supplied
information.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, input means for supplying information representative of the
order of phonetic parameters in a first mode, extractor means for
determining the activity ratio of the central processing unit in a second
mode, mode selector means for selecting one of said first and second
modes, control means for determining information representative of the
order of phonetic parameters depending on the determined activity ratio of
the central processing unit, and speech synthesizer means for generating
and filtering synthesized speech sounds based on the phonetic and prosodic
parameters according to the information supplied by said input means in
said first mode and according to the information determined by said
control means in said second mode.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, input means for supplying information representative of the
quality of synthesized speech sounds to be generated in a first mode,
extractor means for determining the activity ratio of the central
processing unit in a second mode, mode selector means for selecting one of
said first and second modes, control means for determining information
representative of the quality of synthesized speech sounds to be generated
depending on the determined activity ratio of the central processing unit,
and speech synthesizer means for generating and filtering synthesized
speech sounds based on the phonetic and prosodic parameters according to
the information supplied by said input means in said first mode and
according to the information determined by said control means in said
second mode.
According to the present invention, there is also provided an apparatus for
synthesizing speech with a system having a programmed central processing
unit, comprising means for generating phonetic parameters from a series of
phonetic symbols of an input text to be converted into synthesized speech,
means for generating prosodic parameters from prosodic information of the
input text, a plurality of synthesis units for effecting filtering on
synthesized speech sounds for respective different periods of time, input
means for supplying information representative of one of said synthesis
units in a first mode, extractor means for determining the activity ratio
of the central processing unit in a second mode, mode selector means for
selecting one of said first and second modes, control means for
determining information representative of one of said synthesis units
depending on the determined activity ratio of the central processing unit,
selector means for selecting one of the synthesis units which is
represented by the information supplied by said input means in said first
mode and one of the synthesis units which is represented by the
information determined by said control means in said second mode, and
speech synthesizer means for generating and filtering synthesized speech
sounds based on the phonetic and prosodic parameters with said selected
one of the synthesis units.
The above and other objects, features, and advantages of the present
invention will become apparent from the following description when taken
in conjunction with the accompanying drawings which illustrate preferred
embodiments of the present invention by way of example.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a speech synthesizing apparatus according to a
first embodiment of the present invention;
FIG. 2 is a flowchart of a processing sequence of a speech synthesizer of
the speech synthesizing apparatus shown in FIG. 1;
FIG. 3 is a block diagram of a speech synthesizing apparatus according to a
second embodiment of the present invention;
FIG. 4 is a flowchart of a processing sequence of a speech synthesizer of
the speech synthesizing apparatus shown in FIG. 3;
FIG. 5 is a flowchart of a subroutine (A) in the processing sequence shown
in FIG. 4;
FIGS. 6A and 6B are diagrams showing examples of information stored in a
rate information file of the speech synthesizing apparatus shown in FIG.
3;
FIGS. 7A through 7G are diagrams showing a specific example of speech
synthesis as well as an input text during operation of the speech
synthesizing apparatus shown in FIG. 3;
FIG. 8 is a block diagram of a filter arrangement which may be employed in
the speech synthesizer of the speech synthesizing apparatus shown in FIG.
3;
FIG. 9 is a diagram showing examples of information stored in the rate
information file which are necessary to vary the processing time with
filter switching in the speech synthesizing apparatus shown in FIG. 3;
FIG. 10 is a flowchart of a processing sequence of a speech synthesizer of
the speech synthesizing apparatus shown in FIGS. 3 and 8; and
FIG. 11 is a flowchart of a subroutine (B) in the processing sequence shown
in FIG. 10.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
In the description that follows, reference is made to a certain Japanese
text which is given as an example in conversion from text to speech. For
an easier understanding, the Japanese sentences are fully transliterated
and their meaning is fully given in English. It should be understood that
the example in Japanese is employed only for a better description of the
present invention and that the principles of the present invention are not
limited to the Japanese language, but also applicable to other languages
including English.
1ST EMBODIMENT:
As shown in FIG. 1, a speech synthesizing apparatus according to a first
embodiment of the present invention includes an input unit 1 for entering
a series of character codes representing a mixture of Kanji and Kana
characters to be converted into synthesized speech and control information
for controlling the synthesized speech. The control information comprises
order information to select the order N of synthetic parameters to be
supplied to a filter in a speech synthesizer 6 (described later).
The speech synthesizing apparatus also has a word dictionary 2 storing
registered accent types, pronunciations, and parts of speech of words and
phrases to be converted into speech, and a linguistic processor 3 for
analyzing a series of character codes entered from the input unit 1 with
the information stored in the word dictionary 2 and generating a series of
phonetic symbols and prosodic information associated therewith.
The speech synthesizing apparatus further includes a speech segment file 4
which stores a group of cepstral parameters that have been determined by
analyzing units of input speech and information indicative of the orders
of the cepstral parameters, and a synthetic parameter generator 5 for
generating phonetic parameters, i.e., phonetic cepstral parameters,
according to the series of phonetic symbols generated by the linguistic
processor 3 and the order information from the input unit 1. The synthetic
parameter generator 5 also serves to generate prosodic parameters
according to the prosodic information generated by the linguistic
processor 3.
The speech synthesizing apparatus also has a speech synthesizer 6 for
generating a sound source based on the phonetic parameters generated by
the synthetic parameter generator 5, the order information, and the
prosodic parameters generated by the synthetic parameter generator 5, and
filtering the generated sound source with an Nth-order filter to generate
synthesized speech, and a loudspeaker 7 for outputting the generated
synthesized speech. The speech synthesizer 6 includes a D/A converter (not
shown) for converting the synthesized speech into an analog signal.
The speech synthesizing apparatus shown in FIG. 1 is realized by a personal
computer (PC) or an engineering work station (EWS) which is capable of
executing multiple tasks at the same time. The input unit 1, the
linguistic processor 3, the synthetic parameter generator 5, and the
speech synthesizer 6 are functional blocks whose functions are performed
by a programmed sequence of a CPU of the personal computer or the
engineering work station, i.e., by the execution of a speech synthesis
task.
The speech synthesizing apparatus shown in FIG. 1 operates as follows:
A series of character codes representing a sentence of mixed Kanji and Kana
characters to be converted into synthesized speech, and order information
indicative of an order N are entered into the speech synthesizing
apparatus through the input unit 1. The linguistic processor 3 compares
the entered series of character codes with the word dictionary 2 to
determine accent types, pronunciations, and parts of speech of words and
phrases represented by the series of character codes, determines accent
types and boundaries according to the parts of speech, and converts the
sentence of mixed Kanji and Kana characters into a pronunciation format,
for thereby generating a series of phonetic symbols and prosodic
information.
The series of phonetic symbols and prosodic information generated by the
linguistic processor 3 are then supplied to the synthetic parameter
generator 5, which is also supplied with the order information from the
input unit 1.
The synthetic parameter generator 5 extracts phonetic cepstral parameters
corresponding to the series of phonetic symbols from the speech segment
file 4 with respect to the order N represented by the order information
from the input unit 1, for thereby generating phonetic parameters. At the
same time, the synthetic parameter generator 5 generates prosodic
parameters according to the prosodic information.
The synthetic parameter generator 5 supplies the phonetic parameters and
the prosodic parameters to the speech synthesizer 6, which temporarily
holds the supplied phonetic and prosodic parameters together with the
order information supplied from the input unit 1. Then, based on the
phonetic and prosodic parameters and the order information, the speech
synthesizer 6 generates a sound source and effects digital filtering on
the sound source to generate synthesized speech representing the entered
series of character codes. The generated synthesized speech is converted
by the D/A converter into an analog speech signal, which is applied to the
loudspeaker 7. The loudspeaker 7 now produces synthesized speech sounds
corresponding to the entered sentence of mixed Kanji and Kana characters.
The processing sequence of the speech synthesizer 6 will be described in
detail below with reference to FIG. 2.
The speech synthesizer 6 sets a counter variable j indicating a frame
number to an initial value of "1" in a step S1 and also sets a counter
variable i indicating the remaining number of samples to be processed per
frame to an initial value of "P"=frame period/sampling period in a step
S2. The sampling period is the same as the period of a clock signal
supplied to the D/A converter (not shown).
Thereafter, the speech synthesizer 6 selectively enters, in a step S3,
synthetic parameters Rj composed of one frame (whose frame number is "j")
of phonetic parameters C0.about.CN and prosodic parameters, which one
frame corresponds to the order N indicated by the order information
supplied from the input unit 1, from the phonetic and prosodic parameters
which have been supplied from the synthetic parameter generator 5 and held
therein.
Then, the speech synthesizer 6 generates one sample of speech waveform
data, i.e., a sound source, using the phonetic parameter C0 and the
prosodic parameters in a step S4. After the step S4, the speech
synthesizer 6 effects filtering, i.e., digital filtering, on the generated
sample of speech waveform data using the phonetic parameters C1.about.C6
in a step S5.
Thereafter, the speech synthesizer 6 determines whether the order N
indicated by the order information supplied from the input unit 1 is "6"
or not in a step S6. If the order N is "6" then the speech synthesizer 6
outputs the filtered sample of speech waveform data in a step S10.
If the order N is not "6" in the step S6, then the speech synthesizer 6
effects filtering on the sample of speech waveform data generated in the
step S5, using the phonetic parameters C7.about.C10 in a step S7. The
speech synthesizer 6 then determines whether the order N is "10" or not in
a step S8.
If the order N is "10", then control jumps from the step S8 to the step
S10. If the order N is other than "10", then the speech synthes | | |