|
Description  |
|
|
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a voice recognition apparatus which can
extract feature variables from an input voice independently of speakers
and languages, and which can absorb fluctuations dependent on speakers and
effectively reduce an amount of calculations in matching for voice
recognition.
2. Description of the Related Art
Voice recognition apparatus are generally divided into two systems. One
system is a word voice recognition system in which word voices are
recognized through matching by the use of reference patterns composed of
words as units. The other system is a phoneme recognition system in which
word phonemes are recognized through matching by the use of standard
patterns composed of phonemes or syllables, smaller than words, as units.
The word voice recognition system has no problem of false recognition due
to articulatory coupling and can provide a high rate of recognition.
However, the word voice recognition system has a problem such that the
number of reference patterns is increased with increasing the number of
vocabularies, which requires a large memory capacity and a great deal of
calculations in matching. Particularly, in the case of recognizing many
and unspecified speakers, a plurality of reference patterns
(multi-templates) are needed for each word, because voices are largely
fluctuated dependent on individual speakers. Such voice fluctuations are
attributable to various factors. Thus, since speakers have their own
physiological factors such as sex, age, and length of a vocal tract,
voices are fluctuated as speakers change. In the case of a single speaker,
voice fluctuations are also caused if the speaker makes voices in a
different manner (loudness, voice production speed, etc.) dependent on
circumstances, or if the surrounding noise is varied.
Therefore, the problem arisen by increasing the number of vocabularies has
been dealt with as follows. In order to reduce the number of reference
patterns for use in matching, preliminary selection of the reference
patterns is performed before executing principal matching, based on the
intermediate result of DP matching among the reference patterns,
durations, global features and local features of the input voices.
However, there has not yet been found an approach of completely eliminating
voice fluctuations due to change of speakers.
Applicants know that, to some extent, sound source characteristics among
fluctuations depending on speakers can be compensated by passing voices
through primary to tertiary adaptive inverted filters of the critical
damping type. It has also been attempted to normalize a difference between
the individual speakers by making a voice signal subjected to simple
conversion using first formant through third formant.
In the case of recognizing an input voice signal by a voice recognition
apparatus of the phoneme recognition system, the input voice signal is
frequency-analyzed by a feature extracting device to extract several
feature variables of phonemes relating to the recognized object in
advance. These plural feature variables of phoneme are stored in a storage
section as reference patterns for the respective phonemes. Then, each of
words is expressed by a series of such phoneme reference patterns, and the
resulting series of phoneme reference patterns are stored in a storage
device in association with phoneme series of words using word-by-word for
being stocked as a word dictionary. On the other hand, when an unknown
voice is input, the aforesaid feature extracting device extracts feature
variables from the input voice for each frame in a like manner as
mentioned above. A check is then made to similarity between the extracted
feature variables of the unknown voice for each frame and the phoneme
reference patterns stored in the storage section. As a result, the phoneme
corresponding to the phoneme reference pattern with the maximum similarity
is determined as a phoneme of that frame. Likewise, phonemes of subsequent
frames are determined successively to express the unknown voice as a
series of phonemes. Afterward, a check is made to similarity between the
phoneme series obtained from the unknown voice and the series of phoneme
reference patterns for respective words in the word dictionary which are
stored in the storage section. As a result, the word corresponding to the
series of phoneme reference patterns with the maximum similarity is
determined as a word of the input voice.
In acoustic analysis and feature extraction, a voice can be expressed with
a less number of parameters through the linear prediction analysis (LPC)
by supposing the voice to be an all-polar model. There has been proposed
an attempt to utilize such a model approach to directly express the
structure of articulatory organs and motional characteristics thereof,
thereby effectively describing vocal tract functions cross-section area
with the aid of a model. This is called an articulartory model using an
articulatory parameter x (Shirai and Honda: "Estimation of Articulatory
Parameters from Speech Waves", Trans. IECE Japan, 61-A, 5, pp. 409-416,
1978). The articulatory parameter x composed of an opening/closing angle
of the lower jaw: X(J), an antero-posterior (longitudinal) deformation of
the tongue surface: X(T1), a vertical deformation of the tongue: X(T2), an
opening area/extension of the lip: X(L), a shape of glottis: X(G), and an
opening of the velum (degree of nasalization): X(N). Thus, the
articulatory parameter can be expressed by: x=[X(T1), X(T2), X(J), X(L),
X(G), X(N)]. Assuming that a non-linear articulatory model for converting
the articulatory parameter x to an acoustic parameter is given, the
articulatory parameter x can be derived by solving the non-linear
optimization problem from the acoustic parameter in a reversed manner.
While the number of parameter dimensions is normally 12-20 in the
aforementioned LPC, the number of dimensions for the articulatory
parameter x is 6. This means that in the case of using the articulatory
parameter x, information is compressed down to a half or less level
compared with the LPC parameter.
Meanwhile, a narrow degree C at a point of articulation in the vocal tract
has difficulties to express with high accuracy using the articulatory
parameter x, but it is deeply related to the types of articulation such as
vowel, fricative and closure. For the reason, the narrow degree is
extracted separately from the articulatory parameter x and the coordinates
(x, y) of a narrowed position so that it is utilized for voice recognition
and the like. Further, both of the narrow degree C and the vector (x, y)
of the narrowed position can be calculated simply from the acoustic
parameter by using a neural network, while avoiding the non-linear
optimization problem in the tone parameter x.
However, the above-mentioned conventional voice recognition apparatus has
problems as follows. In the method based on the phoneme reference
patterns, the feature variable of the phoneme extracted by the feature
extracting device may be different depending on not only a physiological
difference (e.g., a length difference in the vocal tract) between
individual speakers but also an influence of articulatory coupling in the
successive phonemic environment in the case of vowel(s) in a word, even if
the voice is produced to express a phoneme symbol of the same
representation. Stated otherwise, if voice recognition is made using the
feature variable of phoneme, even the voice produced to express the same
phoneme symbol may be determined as a different phoneme, whereby it is
rejected or incorrectly recognized. Accordingly, high recognition ability
cannot be obtained. This problem is attributable to the fact that voice
recognition is performed using the feature variables of phonemes which may
be fluctuated dependent on speakers and phonemic environment.
Speaker independent word recognition has a problem, as mentioned above,
that an amount of calculations necessary for matching between the feature
patterns of an input voice and the reference patterns is increased.
Further, the method of predicting an articulatory parameter from an
acoustic parameter using an articulatory model is also problematic that
the non-linear optimization problem must be solved, which is
disadvantageous in amount of calculations and stability of convergence. To
avoid this problem, there have been attempted several methods such as
taking into account a fluctuation range and continuity of the parameter,
utilizing a table lookup, etc. However, an amount of calculations remains
essentially large. Another problem is in that the articulatory parameter x
is directed to specified speakers and prediction can be well succeeded
only in a vowel steady portion.
There have also been proposed various methods using formant frequencies to
be adapted for many and unspecified speakers. But, no decisive method has
been found.
SUMMARY OF THE INVENTION
An object of this invention is to provide a voice recognition apparatus
which can extract a position of articulation (i.e., a narrowed position
formed in a vocal tract) as a feature variable specific to phonation of a
voice, which position is not dependent on speakers and languages.
Another object of this invention is to provide a voice recognition
apparatus in which when making speaker independent voice recognition,
feature variables of smaller dimensions capable of removing voice
fluctuations caused by and dependent on a physiological difference between
individual speakers are set by further developing feature variables of the
narrow degree C and the coordinates (x, y) of a point of articulation,
thereby to reduce the number of reference patterns and an amount of
calculations necessary for matching by using the feature variables thus
set.
The object of the invention can be achieved by a voice recognition
apparatus for analyzing frequencies of an input voice inputted from a
input device, for extracting feature variables of the input voice from the
analyzed frequencies to recognize the input voice, the apparatus includes:
an unit for analyzing frequencies of the input voice;
an unit coupled to the analyzing unit for determining a vowel and a
consonant zone of the analyzed input voice; and
an unit for determining a position of articulation of a member of the input
voice determined as a vowel zone by calculating from frequency components
of the input voice in accordance with a predetermined algorithm based on
frequency components of monophthongs having known phonation contents and
position of articulation.
Preferably, the vowel position of articulation determining unit has a
neural network capable of self-producing rules by learning processes using
the error back propagation method, the position of articulation of vowel
is calculated from the frequency components of the member in accordance
with the rules.
The vowel position of articulation determining unit further has a storage
unit for storing transform equations used for the algorithm so as to
calculate the vowel position of articulation.
The recognition apparatus may further includes an unit coupled to the
vowel/consonant zone determining unit for comparing the input voice
determined as a consonant zone with memorized consonant patterns so as to
make matching of consonants.
Preferably, the comparing unit includes a storage unit for storing the
consonant patterns.
The voice recognition apparatus may further include an unit coupled to the
determining unit for discriminating the determined input voice outputted
from the determining unit by comparing with stored reference patterns
therein.
The discriminating unit preferably include a matching unit coupled to the
determining unit for comparing the input voice outputted from the
determining unit with memorized reference patterns therein so as to
calculate similarity therebetween, and a display unit for displaying the
calculated similarity obtained by the matching unit.
The analyzing unit is preferably formed of either a group of band pass
filter or a high-speed Fourier transformer.
In a voice feature extracting system of the first invention, a
vowel/consonant zone determining device and a position of articulation
extracting device are provided. Based on frequency components of a
plurality of monophthongs whose phonation contents and positions of
articulation are known, the input phonation content determined by the
vowel/consonant zone determining device as a vowel zone is processed by
the position of articulation extracting device to calculate a position of
articulation of the vowel in problem from frequency components thereof.
This allows to derive the position of articulation as a feature variable
of the voice independent of speakers and languages. In accordance with the
present invention, therefore, the position of articulation can be
extracted through the simple processing with high accuracy.
Alternatively, a vowel/consonant zone determining device and a position of
articulation extracting device which includes a neural network are
provided. Based on frequency characteristics of the zones which have been
determined by the vowel/consonant zone determining device as vowel zones,
the neural network creates by itself rules to derive positions of
articulation of vowels, through learning. In accordance with those rules,
a position of articulation of an input vowel is derived by the position of
articulation extracting device from frequency components in the zone
determined as a vowel zone. This allows to derive the position of
articulation as a feature variable of the voice independent of speakers
and languages, based on the frequency components of the vowel. With the
present invention, therefore, the position of articulation can be
extracted through the simple processing with high accuracy.
The another object of the invention can be achieved by a voice recognition
apparatus for analyzing frequencies of an input voice inputted from a
input device, for determining feature variables of the input voice from
the analyzed frequencies to recognize the input voice, and for indicating
the recognized input voice, the apparatus includes:
an unit for analyzing frequencies of the input voice so as to derive
acoustic parameters from the input voice;
a pattern converting unit coupled to the analyzing unit and having a neural
network for converting the acoustic parameters to articulatory vectors,
the neural network capable of learning by the error back propagation
method using target data produced by a predetermined sequence based on the
acoustic parameters so as to create rules in order to convert the acoustic
parameters of the input voice to the articulatory vector having at least
two vector elements.
a recognizing unit coupled to the pattern converting unit for recognizing
the input voice by comparing a feature pattern of the analyzed input voice
having the articulatory vector with reference patterns in predetermined
sequence; and
a storage unit coupled to the recognizing unit for storing the reference
patterns having the articulatory vectors created by the pattern converting
unit.
Preferably, the two vector elements are selected from among respective
positions of a point of articulation in an antero-posterior and vertical
directions, a degree of narrowness of a vocal tract at the point of
articulation, presence or absence of vibrations of a vocal cords, a degree
of nasalization, and a rounded degree.
Furthermore, the pattern converting unit is to convert the acoustic
parameters to the articulatory vector by frame.
The neural network is preferably a multi-layered perceptron.
The voice recognition further includes an unit coupled to the recognizing
unit for displaying similarity between the analyzed input voice and the
reference patterns obtained by the recognizing unit.
Preferably, the storage unit is to memorize the reference patterns each of
which has a time series of the articulatory vector created by the pattern
converting unit for an acoustic sample.
Furthermore, the recognizing unit includes a preliminary selecting unit
coupled to the pattern converting unit for selecting patterns from the
reference patterns, and a discriminating unit coupled to the preliminary
selecting unit for discriminating the input voice vector in accordance
with a distance between respective time series of the articulatory vectors
for the feature pattern and the reference pattern.
The analyzing unit preferably is formed of either a group of band pass
filter or a fast Fourier transformer.
In a voice recognition system of a second invention, an acoustic parameter
is first derived from an input voice signal by an acoustic analyzing
device. From this acoustic parameter of the input voice, an articulatory
vector is then produced which includes as its element at least two among
antero-posterior/vertical positions of a point of articulation, a narrow
degree of a vocal tract, presence or absence of vibrations of the vocal
cords, a degree of nasalization, and a rounded degree (or degree of
labilaization). A feature pattern of the input voice represented by a time
series of the above articulatory vector and a reference pattern of a voice
sample represented by a time series of the articulatory vector thereof are
subjected to matching in a discriminating section by using a distance
between those two time series of the articulatory vectors. Therefore, by
expressing a voice by the articulatory vector, it becomes possible to
eliminate voice fluctuations caused by a physiological difference between
individual speakers, and to reduce the number of templates for the
reference patterns, thereby reducing an amount of calculations necessary
for matching, when voice recognition is made to many and unspecified
speakers.
When producing the above articulatory vector, a parameter converting device
makes learning with the error back propagation method by using target data
which has been formed in advance through a predetermined sequence, so that
the acoustic parameter of the input voice is converted to the articulatory
vector by a neural network which has by itself created rules for
converting the acoustic parameter of the input voice to the articulatory
vector. Therefore, the articulatory vector can be produced through only
simple operations of multiplication/summation, and an amount of
calculations required for producing the articulatory vector can be
reduced. In addition, by making learning of the neural network using the
target data which has been obtained from voice samples of many speakers,
the articulatory vector can stably be produced for all sorts of input
voices.
Further objects and advantages of the present invention will be apparent
from the following description, reference being had to the accompanying
drawings wherein preferred embodiments of the present invention are
clearly shown.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of one embodiment of a voice recognition
apparatus according to this invention;
FIG. 2 is a representation showing positions of articulation of various
vowels;
FIG. 3 is a representation showing the relationship between a first formant
frequency and a second formant frequency for Japanese vowels;
FIG. 4 is a representation showing the relationship between a first formant
frequency and a second formant frequency for other various vowels phonated
by some speaker;
FIG. 5 is a flowchart of the position of articulation calculating operation
of one vowel in a word;
FIG. 6 is a flowchart of a position of articulation calculating routine for
a vowel / [a]/ in the word of FIG. 5;
FIG. 7 is a flowchart of a position of articulation calculating routine for
a vowel / [i]/ in the word of FIG. 5;
FIG. 8 is a flowchart of a position of articulation calculating routine for
a vowel / [u]/ in the word of FIG. 5;
FIG. 9 is a flowchart of a position of articulation calculating routine for
a vowel / [e]/ in the word of FIG. 5;
FIG. 10 is a flowchart of a position of articulation calculating routine
for a vowel / [o]/ in the word of FIG. 5;
FIG. 11 is an illustration for explaining the structure of a neural
network;
FIG. 12 is a block diagram showing another embodiment of a voice
recognition apparatus according to the present invention;
FIG. 13a is a chart showing one example of the waveform of an input voice;
FIG. 13b is a chart showing a time series of phoneme symbols corresponding
to the voice waveform FIG. 13a;
FIG. 13c is a chart showing a time series of an element C of the
articulatory vector corresponding to the voice waveform FIG. 13a;
FIG. 13d is a group of charts showing time series of elements x, y, n, g
and l of the articulatory vector corresponding to the voice waveform FIG.
13a;
FIG. 14 is an illustration showing a neural network;
FIG. 15a is a sectional view of the mouth (or oral cavity) of a human being
for explaining points of articulation of consonants; and
FIG. 15b is a view showing values of the element x for consonants
corresponding to points of articulation in the section of the mouth shown
in FIG. 15a.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Hereinafter, the invention will be described in detail with reference to
the illustrated embodiments.
FIG. 1 is a block diagram of a voice recognition apparatus according to a
first invention. A voice signal input from a microphone 1 is amplified by
an amplifier 2 and applied to an acoustic analyzing device 3. The acoustic
analyzing device 3 performs frequency analysis of the input voice signal
through a group of band-pass filters (hereinafter referred to as BPF) or
fast Fourier transform (made for values obtained by multiplying the data
of voice waveform by a window).
A vowel/consonant zone determining device 4 determines whether the input
voice signal is a vowel zone or a consonant zone. This determination of
vowel zone or consonant zone is carried out by referring changes in power
and spectra of the input voice, or the like. If the input voice is
determined as a vowel zone, a position of articulation of the vowel is
extracted by a position of articulation extracting device 5. This
extraction of the position of articulation of the vowel is carried out by
reading transform equations and rules necessary for calculating the
position of articulation from frequency components of the voice from a
transform equation storage device 6, and then employing those transform
equations and rules thus read in. On the other hand, if the input voice is
determined as a consonant zone by the vowel/consonant zone determining
device 4, matching between frequency components of the consonant zone and
consonant patterns stored in a consonant pattern storage device 8 is made
in a consonant pattern converting device 7 to output a candidate of the
consonant pattern. In this manner, the input voice is converted to time
series of the position of articulation of the vowels and the consonant
patterns.
A pattern matching device 9 calculates similarity between the time series
of the position of articulation of the vowels and the consonant patterns
derived from the input voice as mentioned above and each of reference
patterns for words derived by the similar method as mentioned above for
respective known words and stored in reference pattern storage device 10.
Based on the result of similarity calculation, the word is recognized and
the recognition result is indicated on a result display section 11.
A first embodiment of the first invention will be described below in
detail.
This embodiment is concerned with the position of articulation extracting
device 5 for vowels, which calculates positions of articulation of vowels
in a word from frequency components of the voice in accordance with
predetermined algorithm, based on frequency components of monophthongs
whose phonation contents and positions of articulation are known. In this
embodiment, Japanese vowels (/ ;a/, / ;i/, / ;u/, / ;e/, / ;o/) are used
as monophthongs whose positions of articulation are known. FIG. 2 is a
representation showing positions of articulation of various vowels. In the
FIG. 2, x represents position of articulation in an antero-posterior
(longitudinal) direction, with the larger value being nearer to the
anterior (front) end. Also, y represents the position of articulation in a
vertical direction, with the larger value being nearer to the lower side.
In FIG. 2, a Katakana notation encircled by .circle. indicates each of
the aforesaid Japanese vowels. The toning position is now represented by
the coordinates (x, y) within the following range:
1.ltoreq.x.ltoreq.7, 1.ltoreq.y.ltoreq.7
where x, y: integers
In this embodiment, it is assumed that each position of articulation is
located on a lattice point of the coordinates. This is reasonable from the
standpoint of auditory accuracy of a human being. Specifically, it is here
assumed that the monophthong / ;a/ has a position of articulation (2, 7),
the monophthong / ;i/has a position of articulation (6, 2), the
monophthong / ;u/ has a position of articulation (2, 2), the monophthong /
;e/ has a position of articulation (5, 4), and the monophthong / ;o/ has a
position of articulation (1, 4). Based on the positions of articulation of
the monophthongs thus set, positions of articulation of vowels in a word
are each expressed by the coordinates (x, y).
The relationship between positions of articulation of vowels and frequency
components of vowels phonated at those positions of articulation will now
be described. FIG. 3 is a representation which indicates respective ranges
of a first formant frequency (hereinafter expressed by F(1)) and a second
formant frequency (hereinafter expressed by F(2)) of the Japanese vowels
shown in FIG. 2 for males and females. FIG. 4 is a representation showing
the relationship between F(1) and F(2) for various vowels other than the
Japanese vowels, which are phonated by a specific speaker. From FIGS. 2, 3
and 4, it is found that the relationship between the formant frequencies
and the positions of articulation is generally given by proportional
relations between F(1) and y and between F(2) and x. For some vowels (/
;i/ and / ;e/), the values of x, y are affected upon an increase and
decrease in a third formant frequency (hereinafter expressed by F(3)).
Based on those relationships, positions of articulation of vowels in a
word are predicted from frequency components of the vowels in the word.
Next, a method of predicting positions of articulation of vowels in a word
will be explained in more detail.
As described in connection with FIG. 1, the waveform of an input voice is
previously sectioned into vowel zones and consonant zones for being
labeled by the acoustic analyzing device 3 and the vowel/consonant zone
determining device 4, and also subjected to acoustic analysis to extract
formant frequencies. In this embodiment, the formant frequencies of the
phoneme zones thus labeled as vowels are used for prediction.
FIG. 5 is a flowchart of the position of articulation calculating operation
of one vowel in a word to be executed in the position of articulation
extracting device 5 of FIG. 1.
In step S1, the formant frequencies of the zone determined by the
vowel/consonant zone determining device 4 as a vowel zone are input and
the kind of label (i.e., the phonation content) which is added to the
formant frequencies of the input vowel zone is determined. In accordance
with the label determined, the process goes to any one of steps S2, S3,
S4, S5 and S6.
In step S2, a position of articulation calculating routine for a vowel /
;a/, described later in detail, is executed to complete the position of
articulation calculating operation for one vowel.
In step S3, a position of articulation calculating routine for a vowel /
;i/, described later in detail, is executed to complete the position of
articulation calculating operation for one vowel.
In step S4, a position of articulation calculating routine for a vowel /
;u/, described later in detail, is executed to complete the position of
articulation calculating operation for one vowel.
In step S5, a position of articulation calculating routine for a vowel /
;e/, described later in detail, is executed to complete the position of
articulation calculating operation for one vowel.
In step S6, a position of articulation calculating routine for a vowel /
;o/, described later in detail, is executed to complete the position of
articulation calculating operation for one vowel.
The position of articulation calculating routines for respective vowels
executed in steps S2 to step S6 will be explained below in more detail.
(A) Position of Articulation Calculating Routine for Vowel / ;a/
In the vicinity of the position of articulation of the monophthong, F(1)
and F(2) are varied non-linearly upon changes in the position of
articulation. Therefore, a table for directly converting the values of
F(1), F(2) of the vowel / ;a/ in the word to a position of articulation
(hereinafter referred to as a conversion table) is prepared (one example
shown in the following Table 1) and stored in the transform equation
storage device 6.
TABLE 1
______________________________________
x 12345678910111213141516
______________________________________
1 23, 23, 24, 24, 24, 25, 25, 26, 26, 27, 27, 27, 27, 28, 28, 28
2 23, 23, 24, 24, 24, 25, 25, 26, 26, 27, 27, 27, 27, 28, 28, 28
3 30, 30, 24, 24, 24, 25, 25, 26, 26, 27, 27, 27, 27, 28, 28, 28
4 30, 30, 31, 31, 31, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 28
5 30, 30, 31, 31, 31, 32, 32, 33, 33, 34, 34, 34, 34, 35, 35, 35
6 30, 30, 31, 31, 31, 39, 39, 40, 40, 40, 41, 41, 41, 42, 35, 35
7 30, 30, 38, 38, 38, 39, 39, 40, 40, 41, 41, 41, 41, 42, 42, 35
8 37, 37, 38, 38, 38, 39, 39, 40, 40, 48, 48, 48, 42, 42, 42, 35
9 37, 37, 38, 38, 38, 38, 39, 40, 47, 47, 48, 48, 42, 42, 42, 42
10 37, 37, 38, 38, 38, 38, 39, 47, 47, 47, 48, 48, 48, 49, 49, 42
11 45, 45, 45, 45, 45, 45, 45, 46, 47, 47, 48, 48, 48, 48, 49, 49
12 45, 45, 45, 45, 45, 45, 45, 45, 46, 47, 47, 48, 48, 48, 48,
______________________________________
49
This conversion table was prepared by asking many speakers to produce
voices at various positions of articulation, and then considering the
relationship between the positions of articulation and the formant
frequencies. The coordinates of the monophthongs on the conversion table
(hereinafter referred to as table positions) are expressed by (I, J) as
follows. The table position of the monophthong / ;a/ is given by (8, 11),
the table position of the monophthong / ;e/ is given by (2, 4), and the
table position of the monophthong / ;o/ is given by (2, 15). I is
increased and decreased upon an increase and decrease in F(1) (i.e., upon
an increase and decrease in y), while J is increased and decreased upon an
increase and decrease in F(2) (i.e., upon an increase and decrease in x).
The positions of articulation of vowels in a word are calculated from F(1),
F(2) thereof using the above conversion table in a manner below. When F(2)
of the vowel / ;a/ in the word is higher than F(2) of the monophthong /
;a/, the former's position of articulation is shifted toward the position
of articulation of the monophthong / ;e/. Accordingly, F(1) of the vowel /
;a/ in the word is normalized from F(1) of the monophthong / ;a/ and F(1)
of the monophthong / ;e/ to derive I of the table position (I, J) of the
vowel / ;a/ in the word. Then, F(2) of the vowel / ;a/ in the word is
normalized from F(2) of the monophthong / ;a/ and F(2) of the monophthong
/ ;e/ to derive J of the table position (I, J) of the vowel / ;a/ in the
word. The table position (I, J) of the vowel / ;a/ in the word is thus
calculated. When F(2) of the vowel / ;a/ in the word is lower than F(2) of
the monophthong / ;a/, the former's position of articulation is shifted
toward the position of articulation of the monophthong / ;o/. Accordingly,
F(1) of the vowel / ;a/ in the word is normalized from F(1) of the
monophthong / ;a/ and F(1) of the monophthong / ;o/ to derive I of the
table position (I, J) of the vowel / ;a/ in the word. Then, F(2) of the
vowel / ;a/ in the word is normalized from F(2) of the monophthong / ;a/
and F(2) of the monophthong / ;o/ to derive J of the table position (I, J)
of the vowel / ;a/ in the word. The table position (I, J) of the vowel /
;a/ in the word is thus calculated.
Afterward, the values of the thus-calculated table positions (I, J) on the
conversion table (hereinafter referred to as TABLE (I, J)) are derived
from the conversion table. Based on the TABLE (I, J) derived from the
conversion table, the position of articulation (x, y) of the vowel / ;a/
in the word using the relative equation (1) between the TABLE (I, J) and
the position of articulation (x, y) below:
##EQU1##
where [N] is a maximum integer not exceeding N, N=[TABLE (I, J)-1)/7].
FIG. 6 is a flowchart of a position of articulation calculating routine for
a vowel / ;a/ in the word employed in the flowchart of FIG. 5. Those
variables which are used in the following explanation of the toning
position calculating routines for respective vowels are defined below:
F.sup.V (n) (V=a, i, u, e, o, n=1, 2, 3) . . . n-th formant frequency of a
vowel V in the word
F.sup.V.sub.lV (n) (V=a, i, u, e, o, n=1, 2, 3) . . . n-th formant
frequency of a monophthong V
(I, J) (V=a, i, u, e, o) . . . table postion of the vowel V in the word
(I.sub.V, J.sub.V) (V=a, i, u, e, o) . . . table position of the
monophthong V
The position of articulation calculating routine for a vowel / ;a/ in the
word will be described below in more detail with reference to FIG. 6.
Step S11 determines whether | | |