|
|
|
| United States Patent | 4850022 |
| Link to this page | http://www.wikipatents.com/4850022.html |
| Inventor(s) | Honda; Masaaki (Kodaira, JP);
Moriya; Takehiro (Soka, JP) |
| Abstract | A speech signal processing system in which the correlation is removed from
the sample values of a speech waveform supplied to an inverse-filter for
obtaining sample values of a prediction residual waveform,
phase-equalizing filter coefficients are determined to have
phase-characteristic inverse to that of the prediction residual waveform
at each pitch position of the speech waveform, the phase-equalizing filter
coefficients are set as filter coefficients of the phase-equalizing
filter, and the speech waveform or the prediction residual waveform is
passed through the phase-equalizing filter, thereby zero-phasing the
prediction residual waveform or the prediction residual waveform component
in the speech waveform and concentrating energy around the pitch position. |
|
|
|
Title Information  |
|
|
|
|
|
|
| Publication Date |
July 18, 1989 |
|
|
|
|
|
| Filing Date |
October 11, 1988 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This application is a continuation of Ser. No. 712,811, filed on Mar. 18,
1985, now abandoned. |
|
| Priority Data |
Mar 21, 1984[JP]59-53757
Aug 20, 1984[JP]59-173903 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A speech signal processing system comprising:
an input terminal for receiving successive sample values of a speech
waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;
inverse-filter means connected to said input terminal for obtaining
successive sample values of a prediction residual waveform e(n) by
removing a short-time correlation from the speech waveform S(n);
phase-equalizing filter means connected to said input terminal for
receiving the speech waveform S(n) therefrom and producing successive
samples of a phase-equalized speech waveform Sp(n) in the time domain by
zero-phasing a prediction residual waveform component in the speech
waveform in accordance with successive sets of M+1 phase-equalizing filter
coefficients h(m,n) supplied thereto as filter coefficients thereof, where
m=0, 1, 2, . . . , M, and M is a positive integer; and
filter coefficient determining means connected to the output of said
inverse-filter means for determining said phase-equalizing filter
coefficients h(m,n) on the basis of said prediction residual waveform
e(n), said filter coefficient determining means including voiced/unvoiced
sound discriminator means connected to the output of said inverse-filter
means for discriminating whether said speech waveform is a voiced sound or
an unvoiced sound based on whether a computed value of an auto-correlation
function on said prediction residual waveform during an analysis window of
a length N at said filter coefficient determining means is above or below
a threshold value, pitch position detecting means connected to the outputs
of said inversefilter means and said voiced/unvoiced sound discriminator
means for detecting, when said speech waveform is discriminated as a
voiced sound, pitch positions n.sub.l from said prediction residual
waveform e(n), and filter coefficient computing means connected to the
outputs of said inverse-filter means, said voiced/unvoiced sound
discriminator means and said pitch position detecting means, respectively,
for computing, when said speech waveform is discriminated as a voiced
sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a
time point n of each pitch position n=n.sub.l by solving the following
simultaneous equations given for K=0, 1, . . . M,
##EQU26##
where L is the number of the pitch positions n.sub.l in the analysis
window and V(m) is an auto-correlation function of said prediction
residual waveform e(n) given by:
##EQU27##
and for setting, when said speech waveform is discriminated as an
unvoiced sound, a particular one order of coefficient of said
phase-equalizing filter coefficients to a certain value and the other
orders thereof to zero;
the output of said filter coefficient determining means being connected to
said phase-equalizing filter means so that successive sets of said
phase-equalizing filter coefficients h(m,n.sub.l) determined by said
filter coefficient determining means are supplied to said phase-equalizing
filter means as the filter coefficients thereof, whereby said
phase-equalizing filter means outputs the phaseequalized speech waveform
Sp(n) as the output of said system representing the input speech waveform.
2. The speech signal processing system according to claim 1 wherein the
analysis window length N is selected comparable to a pitch period so that
the number L of said pitch positions n.sub.l is one, and said filter
coefficient computing means computes filter coefficients h*(m,n.sub.l)
instead of the coefficients h(m,n.sub.l) when the speech waveform is
discriminated as a voiced sound by said voiced/unvoiced sound
discriminating means, where
##EQU28##
and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual
waveform at the pitch position n.sub.l.
3. The speech signal processing system according to claim 1 or 2 wherein
said pitch position detecting means comprises a second phase equalizing
filter means connected to the output of said inverse-filter means for
phase-equalizing said prediction residual waveform e(n) from said
inverse-filter means to produce a phase-equalized prediction residual
waveform ep(n), filter coefficients of said second phase-equalizing filter
means being controlled by the phase-equalizing filter coefficients
determined by said filter coefficient determining means, and amplitude
comparing means connected to the output of said second phase-equalizing
means for detecting, as the pitch positions, time points at which relative
amplitude values of the phase-equalized prediction residual waveform ep(n)
within the analysis window are over a predetermined value.
4. The speech signal processing system according to claim 3 wherein said
system further comprises:
pulse-processing means for detecting an amplitude m.sub.l of said
phase-equalized prediction residual waveform ep(n) at the pitch position
n.sub.l obtained by said pitch position detecting means; and
quantizing means connected to the output of said pulse-processing means for
quantizing said detected pulse amplitude and producing quantized pulse
amplitude c(n);
the quantized pulse amplitude c(n), the pitch position n.sub.l l, a voiced
or unvoiced sound discriminating value from said discriminator means and
filter coefficients a(k) of said inverse-filter means being output as the
output of the system representing the input speech signal.
5. The speech signal processing system according to claim 4 wherein said
quantizing means comprises quantization step computing means connected to
the output of said phase-equalizing filter means for computing the
electric power v of said phase-equalized prediction residual waveform
ep(n) supplied from said phase-equalizing filter means and a quantization
step size from the computed electric power v, and adaptively varying a
quantization step size of said quantizing means in accordance with the
computed step size, the electric power of said phase-equalized prediction
residual waveform being output as part of the output of said system
representing the input speech waveform.
6. The speech signal processing system according to claim 1 or 2 wherein
said filter coefficient determining means comprises filter coefficient
interpolating means connected to the output of said filter coefficient
computing means for interpolating the phase-equalizing filter coefficients
for a time point between the computations of two successive sets of the
phase-equalizing filter coefficients by said filter coefficient computing
means so that the output of said filter coefficient determining means
includes the interpolated phase-equalizing filter coefficients.
7. The speech signal processing system according to claim 1 or 2 wherein
said system includes coding-processing means connected to the output of
said phase-equalizing filter means for coding said phase-equalized speech
waveform and outputting the coded phase-equalized speech waveform as the
output of said system representing the input speech waveform.
8. The speech signal processing system according to claim 7 wherein said
coding-processing means comprises:
a second phase-equalizing filter means connected to the output of said
inverse-filter means for receiving therefrom the prediction residual
waveform e(n) and producing a phase-equalized prediction residual waveform
ep(n) in accordance with the phase-equalizing filter coefficients
h(m,n.sub.l) supplied from said filter coefficient determining means as
filter coefficients of said second phase-equalizing filter means;
tree code generating means connected to the output of said second
phase-equalizing filter means for producing a series of sample values q(n)
along a path of successive branches in a tree of codes defined in
accordance with quantizing bit numbers R(n) for quantization of the
phase-equalized prediction residual waveform ep(n), said path of
successive branches being selected in accordance with a sequence of tree
codes c(n);
prediction filter means connected to the output of said tree code
generating means for receiving therefrom the sample values q(n) and
producing a local decoded speech waveform Sp(n), said prediction filter
means being controlled by the same filter coefficients as those of said
inverse-filter means;
difference detecting means connected to the outputs of said first mentioned
phase-equalizing filter means and said second phase-equalizing filter
means for detecting the difference between said phase-equalized speech
waveform Sp(n) and the local decoded speech waveform Sp(n); and
code sequence optimizing merans connected to said tree code generating
means for generating and supplying thereto sequences of tree codes, said
code sequence optimizing means being connected to the output of said
difference detecting means for receiving therefrom the detected difference
and searching an optimum sequence of the tree codes which minimizes the
detected difference produced by said difference detecting means;
the optimum code sequence c(n) obtained by said code sequence optimizing
means and the filter coefficients for said inverse-filter means being
outputted as the coded phase-equalized speech waveform.
9. The speech signal processing system according to claim 8 wherein said
tree code generating means comprises:
subinterval setting means connected to the output of said second
phase-equalizing filter means for receiving therefrom the phase-equalized
prediction residual waveform ep(n) and determining an energy-concentrated
position Td and a pitch period Tp of the phase-equalized prediction
residual waveform and corresponding residual power u.sub.i of each
subinterval within the pitch period from the phase-equalized prediction
residual waveform;
bit allocating means connected to the output of said subinterval setting
means for receiving therefrom the residual power u.sub.i and computing the
quantizing bit number R(n) as the number of branches at each node in said
tree code based on the residual power u.sub.i, said number of branches
representing the number of bits to be allocated to encode samples of the
phase-equalized prediction residual waveform in the corresponding
subinterval; and
step size computing means connected to the output of said subinterval
setting means for receiving therefrom the residual power u.sub.i and
computing, based on the residual power, a quantization step size
.DELTA.(n) for quantizing the phase-equalized prediction residual
waveform;
said tree of codes being defined by the computed number of branches R(n) at
each node of the tree and said tree code generating means being operative
to produce the sample value q(n) as a decoded value from the computed step
size .DELTA.(n) and the tree code c(n) on each selected branch, and the
pitch period Tp, the pitch position Td and the residual power u.sub.i
being outputted in codes from said coding-processing means as the output
of said system representing the input speech waveform.
10. The speech signal processing system according to claim 7 wherein said
coding-processing means comprises:
multi-pulse coding means connected to said filter coefficient determining
means for determining pulse positions t.sub.i and pulse amplitudes m.sub.i
with respect to the pitch position n.sub.l received from said filter
coefficient determining means;
multi-pulse generating means connected to the output of said multi-pulse
coding means for receiving therefrom the pulse positions t.sub.i and the
pulse amplitudes m.sub.i and generating a multi-pulse signal e(n) composed
of a train of pulses having the amplitudes m.sub.i at the respective pulse
positions t.sub.i ;
prediction filter means connected to the output of said multi-pulse coding
means for producing a local decoded waveform Sp(n) by passing said
multi-pulse signal through said prediction filter means while said
prediction filter means is controlled by the same filter coefficients as
those for said inverse-filter means; and
difference detecting means connected to the outputs of said first mentioned
phase-equalizing filter means and said second phase-equalizing filter
means for receiving therefrom said phase-equalized speech waveform Sp(n)
and said local decoded waveform Sp(n) and detecting the difference
therebetween;
the output of said difference detecting means being connected to said
multi-pulse coding means to supply thereto the detected difference, and
said multi-pulse coding means determing the pulse positions t.sub.i and
the pulse amplitudes m.sub.i so as to minimize the detected difference and
being operative to output, as part of the coded speech speech waveform,
the determined pulse positions t.sub.i and pulse amplitudes m.sub.i along
with the filter coefficients a(k).
11. A speech signal processing system comprising:
an input terminal for receiving successive sample values of a speech
waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;
inverse-filter means connected to said input terminal for obtaining
successive sample values of a prediction residual waveform e(n) by
removing a short-time correlation from the speech waveform S(n);
phase-equalizing filter means connected to the output of said
inverse-filter means for obtaining a phase-equalized residual waveform
ep(n) in the time domain by zero-phasing the prediction residual waveform
e(n) from said inverse-filter means in accordance with successive sets of
M+1 phase-equalizing filter coefficients h(m,n) supplied thereto as filter
coefficients thereof, where m=0, 1, 2, . . . , M and M is a positive
integer; and
filter coefficient determining means connected to the output of said
inverse-filter means for determining said phase-equalizing filter
coefficients h(m,n) on the basis of said prediction residual waveform
e(n), said filter coefficient determining means including voiced/unvoiced
sound discriminator means connected to the output of said inverse-filter
means for discriminating whether said speech waveform is a voiced sound or
unvoiced sound based on whether a computed value of an auto-correlation
function on said prediction residual waveform during an analysis window of
a length N at said filter coefficient determining means is above or below
a threshold value, pitch position detecting means connected to the outputs
of said inverse-filter means and said voiced/unvoiced sound discriminator
means for detecting, when said speech waveform is discriminated as a
voiced sound, pitch positions n.sub.l from said prediction residual
waveform e(n), and filter coefficient computing means connected to the
outputs of said inverse-filter means, said voiced/unvoiced sound
discriminator means and said pitch position detecting means, respectively,
for computing, when said speech waveform is discriminated as a voiced
sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a
time point n of each pitch position n=n.sub.l by solving the following
simultaneous equations given for k=0, 1, . . . M,
##EQU29##
where L is the number of the pitch positions n.sub.l in the analysis
window and V(m) is an auto-correlation function of said prediction
residual waveform e(n) given by:
##EQU30##
and for setting, when said speech waveform is discriminated as an
unvoiced sound, a particular one order of coefficient of said
phase-equalizing filter coefficients to a certain value and the other
orders thereof to zero;
the output of said filter coefficient determining means being connected to
said phase-equalizing means so that successive set of said
phase-equalizing filter coefficients h(m,n.sub.l) determined by said
filter coefficient determining means are supplied to said phase-equalizing
filter means as filter coefficients thereof, whereby said phase-equalizing
filter means outputs the phase-equalized prediction residual waveform
ep(n) as the output of said system representing the input speech waveform.
12. The speech signal processing system according to claim 11 wherein the
analysis window length N is selected comparable to a pitch period so that
the number L of said pitch positions n.sub.l is one, and said filter
coefficient computing means computes filter coefficients h*(m,n.sub.l)
instead of the coefficients h(m,n.sub.l) when the speech waveform is
discriminated as a voiced sound by said voiced/unvoiced sound
discriminating means, where
##EQU31##
and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual
waveform at the pitch position n.sub.l.
13. The speech signal processing system according to claim 11 or 12 wherein
said pitch position detecting means comprises a second phase equalizing
filter means connected to the output of said inverse-filter means for
phase-equalizing the prediction residual waveform e(n) from said inverse
filter means to produce a phase-equalized prediction residual waveform
ep(n), filter coefficients of said second phase-equalizing filter means
being controlled by the phase-equalizing filter coefficients determined by
said filter coefficient determining means, and amplitude comparing means
connected to the output of said second phase equalizing filter means for
detecting, as the pitch positions, time points at which relative amplitude
values of the phase-equalized prediction residual waveform ep(n) within
the analysis window are over a predetermined value.
14. The speech signal processing system according to claim 11 or 12 wherein
said filter coefficient determining means comprises filter coefficient
interpolating means connected to the output of said filter coefficient
computing means for interpolating the phase-equalizing filter coefficients
for a time point between the computations of two successive sets of the
phase-equalizing filter coefficients by said filter coefficient computing
means so that the output of said filter coefficient determining means
includes the interpolated phase-equalizing filter coefficients.
15. The speech signal processing system according to claim 11 wherein said
system further comprises coding-processing means connected to the output
of said phase-equalizing filter means for coding the phase-equalized
prediction residual waveform and outputting the coded phase-equalized
prediction residual waveform as the output of said system representing the
input speech waveform.
16. The speech signal processing system according to claim 15 wherein said
coding processing means includes energy-concentrated portion coding means
connected to the output of said phase-equalizing means for detecting a
position t.sub.i of each energy-concentrated portion in said
phase-equalized residual waveform and coding the energy-concentrated
portion to produce a code Pc representing the energy concentrated portion,
the code of the energy-concentrated portion Pc and a code showing the
energy-concentrated position t.sub.i being outputted along with codes of
said filter coefficients a(k) of said inverse-filter means as the output
of said system representing the input speech waveform.
17. The speech signal processing system according to claim 16 wherein said
energy-concentrated portion coding means comprises pulse pattern
generating means for reproducing a pulse pattern signal P(n) composed of a
train of the energy-concentrated portions each centered at the respective
energy-concentrated positions t.sub.i of said phase-equalized prediction
residual waveform, and said coding processing means further comprises
difference signal coding means connected to the output of said
energy-concentrated portion coding means for generating a difference code
c(n) representing a difference between said pulse pattern signal P(n) and
said phase-equalized prediction residual waveform, said difference code
c(n) being outputted as part of the output of said system representing the
input speech waveform.
18. The speech signal processing system according to claim 17 wherein said
pulse pattern generating means produces the pulse pattern signal P(n) by
vector-quantizing a waveform of plural samples of each said
energy-concentrated portion.
19. The speech signal processing system according to claim 17 wherein said
difference signal coding means comprises subtraction means connected to
the outputs of said phase-equalized filter means and said pulse pattern
generating means for receiving the phase-equalized prediction residual
waveform ep(n) and the pulse pattern signal P(n) and producing a
difference therebetween as a difference signal V(n), and spectrum
quantizing means connected to the output of said subtraction means for
quantizing frequency components of said difference signal V(n) to produce
a spectrum envelope code as the difference code c(n) representing said
difference signal.
20. The speech signal processing system according to claim 17 wherein said
difference signal coding means comprises vector code generating means for
producing said difference code c(n) and a decoded vector value Vc(n) based
on said difference code c(n), adder means connected to the outputs of said
pulse pattern generating means and said vector code generating means for
adding said pulse pattern signal P(n) and said decoded vector value Vc(n)
received therefrom to produce a local decoded residual waveform ep(n),
first prediction filter means connected to the output of said adder means
for receiving therefrom the local decoded residual waveform ep(n) and
producing a local decoded speech waveform Sp(n) by controlling filter
coefficients of said prediction filter means with the same filter
coefficients as those for said inverse-filter means, second prediction
filter means connected to the output of said phase-equalizing filter means
for regenerating a phase-equalized speech waveform Sp(n) from said
phase-equalized prediction residual waveform ep(n), subtraction means
connected to the outputs of said first and second prediction filter means
for producing a difference between said regenerated phase-equalized speech
waveform Sp(n) and said local decoded speech waveform Sp(n), and path
search means connected to receive the difference and to control successive
selections of said difference codes in said vector code generating means
so that said difference becomes minimum.
21. The speech signal processing system according to claim 17 wherein said
difference signal coding means comprises means for determining as the
difference code c(n) a code of an optimum vector-tree value Vc(n)
representing the difference between said phase-equalized residual waveform
and said pulse pattern signal P(n).
22. The speech signal processing system according to claim 17 wherein said
difference signal coding means comprises means for quantizing frequency
components of the difference between said phase-equalized residual
waveform and said pulse pattern signal and outputting the quantized
results as the difference code c(n). |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to a speech signal processing system wherein
the prediction residual waveform is obtained by removing the short-time
correlation from the speech waveform and the prediction residual waveform
is used for coding, for example, a speech waveform.
Prior art speech signal coding systems have two classes of waveform coding
and analysis-synthesizing system (vocoder). In a linear predictive coding
(LPC) vocoder belonging to the latter class of the analysis-synthesizing
system, coefficients of an all-pole filter (prediction filter)
representing a speech spectrum envelope are given by the linear prediction
analysis of an input speech waveform and then the input speech waveform is
passed through an all-zero filter (inverse-filter) whose characteristics
are inverse to the prediction filter so as to obtain a prediction residual
waveform, and a parameter extracting part serves to extract periodicity as
a parameter characterizing said residual waveform (discrimination of
voiced or unvoiced sound), a pitch period and average power of the
residual waveform and then these extracted parameters and the prediction
filter coefficients are sent out. In the synthesizing part, a train of
periodic pulses of the received pitch period in the case of a voiced sound
or a noise waveform in the case of an unvoiced sound is outputted from an
excitation source generating part, in place of the prediction residual
waveform, so as to be supplied to a prediction filter which outputs a
speech waveform by setting filter coefficients of the prediction filter as
the received filter coefficients.
On the other hand, in an adaptive predictive coding (APC) system belonging
to the former class of the waveform coding, a prediction residual waveform
is obtained in a manner similar to the case of vocoder and then sampled
values of this residual waveform are directly quantized (coded) so as to
be sent out along with coefficients of a prediction filter. In the
synthesizing section, the received coded residual waveform is decoded and
supplied to a prediction filter which serves to generate a speech waveform
by setting the received predictions filter coefficients in filter
coefficients of the prediction filter.
The difference between these two conventional systems resides in the method
of coding a prediction residual waveform. The above-stated LPC vocoder can
achieve large reduction in bit rate in comparison with the above-stated
APC system for transmitting a quantized value of each sample of the
residual waveform, because relative to the residual waveform, the LPC
vocoder is required to transmit only the characterizing parameters
(periodicity, a pitch period, and average electric power). However, on the
contrary, in the LPC vocoder, it is impossible to avoid degradation in
speech quality caused by replacing a residual waveform with a pulse train
or noise, resulting in such as, what is called, a mechanical synthesizing
voice. Even though the bit rate increases, enhancement in quality would
saturate at about 6 kb/s. As a result, the LPC vocoder has a disadvantage
that it cannot provide natural voice quality. Another factor of the
lowering quality is that the timing for controlling the prediction filter
coefficients cannot be suitably determined relative to each pulse position
(phase) in the pulse train supplied to the prediction filter because of
lack of information indicating each pitch position. Further the LPC
vocoder also has the disadvantage that the lowering of quality is brought
about by the extracting of erroneous characterizing parameters from a
residual waveform. On the other hand, the above-stated APC system has an
advantage that it is possible to enhance speech quality so that it is very
close to the original speech by increasing the number of quantizing bits
for a residual waveform, but on the contrary, it has the disadvantage that
when the bit rate is lowered less than 16 kb/s, quantization distortion
increases to abruptly degrade the speech quality.
Moreover, in the prior art systems, there is a possibility that such as an
alteration in pitch of a speech signal and combining of speech signal
frames happen to be carried out at time locations where signal energy is
concentrated, resulting in generation of unnatural speech.
Furthermore, in the prior art as is disclosed in U.S. Pat. No. 4,214,125,
F. S. MOZER, "Method and apparatus for speech synthesizing" or U.S. Pat.
No. 3,892,919, A. ICHIKAWA, "Speech synthesizing system", it has been
proposed to carry out the following processing procedure. After the
Fourier transform is carried out on samples in each waveform section of
one pitch length cut out from a speech waveform and the resultant sine
component is set to zero, that is, the phase of each harmonic component is
set to zero, the resultant is subjected to the inverse Fourier transform
to zero-phase the cut-out speech waveform, thereby temporarily
concentrating the signal energy into a pulsative waveform. Each
zero-phased waveform of the pitch length is coded. In the synthesizing
part the resultant codes are decoded and the zero-phased waveform sections
each having a pitch period duration are concatenated to one another to
restore the speech waveform. In this method, erroneous extraction of a
pitch period greatly influences the speech quality. The processing
distortion is caused by the zero-phasing process applied to a speech
waveform. Furthermore, in this method, the location of energy
concentration (pulse) caused by the zero-phasing has nothing to do with
the portion where energy of the original speech waveform in each pitch
length is comparatively concentrated, that is, the pitch location and thus
the restored speech waveform synthesized by successively concatenating
zero-phased speech waveform sections is far from the original speech
waveform and excellent speech quality cannot be obtained.
Further, in J. IECE Jpn. Trans. A, vol. 62-t. No. 3, March 1979, "Function
and basic characteristics of SPAC" by Takasugi, the following method is
proposed: The auto-correlation function of a speech waveform is obtained,
a certain kind of zero-phasing operation is conducted on the speech
waveform and each speech waveform section of a pitch length is coded. In
the decoding part, the decoded waveform sections are successively
concatenated one another. Moreover, the operation of obtaining the
auto-correlation function is somewhat similar to performing a square
operation, so that the low frequency components with large energy are
emphasized, resulting in square-law distortion in the spectrum of the
processed signal. In this case, said zero-phasing serves to concentrate
energy in the form of a pulse in each pitch period of the auto-correlation
function, but, the pulse location does not necessarily coincide with the
location where the energy in each pitch period of speech waveform is
concentrated and therefore when the decoded waveform sections are
connected to one another to reconstruct a speech waveform, the
reconstructed speech waveform may be far from the original speech
waveform.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a speech signal processing
system which can maintain comparatively excellent speech quality even in
the case of a bit rate lower than 16 kb/s.
Another object of the present invention is to provide a speech signal
processing system which allows to obtain a natural characteristic in the
case of concatenating pieces of, for example, subjected to
linear-predictive-analysis and a short-time correlation of the speech
waveform is removed from the waveform by an inverse-filter so as to obtain
a prediction residual waveform. Then a filter coefficient computing part
determines filter coefficients of a phase-equalizing (linear) filter which
has reverse phase characteristics to the short-time (for example, shorter
than a pitch period) phase characteristics of said prediction residual
waveform. The determined filter coefficients are set to a phaseequalizing
filter. The above-stated speech waveform or prediction residual waveform
is passed through the phase-equalizing filter so as to zero-phase, that
is, phase-equalize the prediction residual waveform components of said
speech waveform or said prediction residual waveform. This phase-equalized
prediction residual waveform (components) has a temporal energy
concentration in the form of an impulse in every pitch of the speech
waveform and the impulse position almost coincides with the pitch position
of the speech waveform (the portion where the energy is concentrated). For
example, the concatenation of the speech waveforms is accomplished at the
portions where the energy is not concentrated so as to obtain a speech
waveform having an excellent nature. Furthermore, since the prediction
residual waveform (components) is phase-equalized instead of
phase-equalizing the speech waveform, the spectrum distortion caused
thereby can be made smaller.
Moreover, when the above-stated phase-equalized speech waveform or
prediction residual waveform is coded, efficient coding can be attained by
adaptively allocating more bits to, for example, the portions where the
energy is concentrated than elsewhere. In this case, it is possible to
obtain relatively excellent speech quality even with a bit rate less than
16 kb/s.
In addition, in case the above-stated determination of filter coefficients
are adaptatively performed, it is possible to realize more excellent
speech quality.
THEORY OF THE INVENTION
Now, the theory of the speech signal processing system according to the
present invention will be described. As described above, in the
conventional LPC vocoder, a pitch period and average electric power of a
residual waveform of a voiced sound are transmitted and on the decoding
side, a pulse train having the pitch period is generated and passed
through a prediction filter. Accordingly, the pitch positions of the
original speech waveform (the positions where the energy is concentrated
and much information is included) do not respectively correspond to the
pulse positions of a regenerated speech and thus the speech quality is
poor. On the other hand, in the present invention, the time axis of the
residual waveform within one pitch period is reversed at the pitch
position regarded as the time origin and sample values of the
time-reversed residual waveform are used as filter coefficients of a
phase-equalizing filter; therefore, the output of this phase-equalizing
filter is ideally made to be the impulses whose energy is concentrated on
the pitch positions of the speech waveform. Consequently, by passing the
output pulse train from the phase-equalizing filter through a prediction
filter, a waveform whose pitch positions agree with those of the original
speech waveform can be obtained, resulting in excellent speech quality.
Further, in the case where the speech waveform is passed through said
phase-equalizing filter, the residual waveform components are zero-phased
and thus the output of the filter has energy concentrated on each pitch
position of the speech waveform. Therefore, by allocating more information
bits to the residual waveform samples where energy is concentrated and
less information bits to the other portions, it is possible to enhance the
quality of decoded speech even when a small number of information bits are
used in total.
Next, the theory of the invention will be explained with reference to
formulas. Letting a sample value of the speech waveform be noted by S(n)
and a prediction coefficients obtained by a linear-prediction-analysis of
the speech waveform by a(k) (k=1, 2, . . . p), a sample value e(n) of a
prediction residual waveform is given by the following equation;
##EQU1##
where a(0)=1. Since the residual waveform e(n) is one which is obtained by
removing the spectrum envelope components from the speech waveform, that
is, one obtained by removing the correlation between the sample values of
the speech waveform, the residual waveform has a flat spectrum envelope
and, in the case of voiced sound, has pitch period components of the
speech. Thus, the characteristics of this residual waveform are idealized
and expressed by the following pulse train;
##EQU2##
where .delta.(n) is the Kronecker's delta function defined by .delta.(0)=1
and .delta.(n)=0 (n.noteq.0). n.sub.l represents a pulse position (i.e.
pitch position) and n.sub.l -n.sub.l-1 corresponds to a pitch period of
the speech. Thus, this pulse train function e.sub.M (n) has a pulse only
at each pitch position n.sub.l and is zero at the other positions. Since
both the residual waveform e(n) and the pulse train e.sub.M (n) have a
flat spectrum envelope and the same pitch period components, the
difference between both waveforms is based on the difference between the
phase-characteristics thereof in a short-time, that is, a time which is
shorter than the pitch period. Thus, representing an impulse response of a
linear-filter which has characteristics inverse to short-time phase
characteristics of the residual waveform by h(n), the following equation
(3) allows computation of the phase-equalized (zero-phased) residual
waveform e.sub.p (n) which would be obtained by passing the residual
waveform e(n) through the linear-filter (phase-equalizing filter) to
phase-equalize all the spectrum components;
##EQU3##
This impulse response h(m) can be given by minimizing the means square
error between e.sub.p (n) and e.sub.M (n). The mean square error is given
by the following equation;
##EQU4##
By substituting the formulas (2) and (3) in equation (4), partial
differentiating the modified equation (4) with h(m), and setting the
differentiated expression to 0, the impulse response h(m) can be given as
a solution of the following simultaneous equations;
##EQU5##
where v(k) is an auto-correlation function and is computed by the
following equation;
##EQU6##
In the case where the time corresponding to the tap number M+1 of the
phase-equalizing filter, that is, the response time is shorter than the
pitch period, the auto-correlation function can be approximated by
v(k).perspectiveto.v.sub.0 .perspectiveto.(k) because the residual
waveform has a flat spectrum. In short, the residual waveform has a value
only in the case of k=0. Thus, equation (5) assumes a value only in the
case of m=k, and can be simplified as follows;
##EQU7##
Further, if the analysis window length N is shorter than a pitch period,
the value of L would be one, allowing only one pulse to be present. Thus,
the impulse response can be computed by the following equation;
##EQU8##
Thus, the impulse response h(m) is equivalent to one that is obtained by
reversing the residual waveform in the time domain at the time point
n.sub.0. Moreover, in case the power spectrum is completely white (the
amplitudes of all the frequency components are constant), the Fourier
transform of the impulse response h(m) can be expressed by the following
equation (9) in which the gain is normalized;
##EQU9##
where E(k) denotes a Fourier transform of the residual waveform e(n).
Accordingly, since the Fourier transform E.sub.p (k) of the
phase-equalized residual waveform e.sub.p (n) is E.sub.p
(k)=H(k).multidot.E(k) in the light of equation (3) and E(k) is
E(k)=.vertline.E(k).vertline.exp{argE(k)}, the following equation can be
obtained by substituting equation (9) in E.sub.p (k) as follows;
##EQU10##
From equation (10), it will be understood that the phaseequalized residual
waveform e.sub.p (n) is one that is obtained by making the residual
waveform e(n) zero-phased (all spectrum components are made to have the
same zero phase) except for a linear phase component exp{-2.pi.kn.sub.0
/(M+1)}. In the case if it is ideally holds that
.vertline.E(k).vertline.=E.sub.0 (constant), then e.sub.p (n) is to have
zero phases and thus is a single pulse waveform. In summary, when the
residual waveform e(n) is passed through the phase-equalizing filter
having the filter coefficients h(m) as mentioned above, the output
waveform becomes one that has energy concentrated mainly at a pitch
position, that is, the output waveform takes a shape of a single pulse.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram showing a speech signal processing system of the
present invention, particularly an example of the arrangement of an
adaptive phase-equalizing processing system.
FIG. 2 is a block diagram showing the internal arrangement example of a
pitch position detecting part 25 in FIG. 1.
FIG. 3 is a block diagram showing an example of a basic arrangement for
speech coding by utilizing the phase-equalizing processing.
FIG. 4 is a block diagram showing an example of an arrangement for
variable-rate tree-coding of a speech waveform.
FIG. 5 is an explanatory diagram in relation to the setting of
sub-intervals.
FIG. 6 is an explanatory diagram showing an arrangement for variable-rate
tree coding.
FIGS. 7A to 7G are diagrams showing the waveform examples at respective
parts in the speech signal processing system.
FIG. 8 is a block diagram showing an example of an arrangement of a speech
signal multi-pulse-coding utilizing the phase-equalizing processing.
FIG. 9 is a block diagram showing an example of an arrangement of a speech
analysis-synthesizing system on the basis of a zero-phased residual
waveform.
FIG. 10 is a block diagram showing an example of an arrangement of a speech
analysis-synthesizing system utilizing the phase-equalizing processing.
FIG. 11 is a block diagram showing another arrangement of the speech
analysis-synthesizing system.
FIG. 12 is a graph showing comparison in effects of quantization of samples
neighboring the pulse depending on the presence or absence of the
phase-equalization.
FIG. 13 is a graph showing comparison in quantization performance between
the embodiment shown in FIG. 10 and a tree coding of an ordinary vector
unit.
FIG. 14 is a graph showing comparison in quantization performance between
the embodiment shown in FIG. 11 and an ordinary adaptive
transformation-coding method utilizing a vector quantum.
FIGS. 15A to 15E are diagrams respectively showing examples of waveforms in
the process of obtaining filter coefficients h(m,n) in FIG. 1.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Next, a concrete embodiment of the speech signal processing system of this
invention will be described with reference to FIG. 1. Sample values S(n)
of a speech waveform are inputted at an input terminal 11 and are supplied
to a linear prediction analysis part 21 and an inverse-filter 22. The
linear prediction analysis part 21 serves to compute prediction
coefficients a(k) in equation (1) on the basis of a speech waveform S(n)
by means of the linear prediction analysis. The prediction coefficients
a(k) are set as filter coefficients of the inverse-filter 22. Thus, the
inverse-filter 22 serves to accomplish a filtering operation expressed by
equation (1) on the basis of the input of the speech waveform S(n) and
then to output a prediction residual waveform e(n), which is identical
with such a waveform is obtained by removing from the input speech
waveform a short-time correlation (correlation among sample values)
thereof. This prediction residual waveform e(n) is supplied to a
voiced/unvoiced sound discriminating part 24, a pitch position detecting
part 25 and a filter coefficients computer part 26 in a filter coefficient
determining part 23. The voiced/unvoiced sound discriminating part 24
serves to obtain an | | |