WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Speech signal processing system    
United States Patent4850022   
Link to this pagehttp://www.wikipatents.com/4850022.html
Inventor(s)Honda; Masaaki (Kodaira, JP); Moriya; Takehiro (Soka, JP)
AbstractA speech signal processing system in which the correlation is removed from the sample values of a speech waveform supplied to an inverse-filter for obtaining sample values of a prediction residual waveform, phase-equalizing filter coefficients are determined to have phase-characteristic inverse to that of the prediction residual waveform at each pitch position of the speech waveform, the phase-equalizing filter coefficients are set as filter coefficients of the phase-equalizing filter, and the speech waveform or the prediction residual waveform is passed through the phase-equalizing filter, thereby zero-phasing the prediction residual waveform or the prediction residual waveform component in the speech waveform and concentrating energy around the pitch position.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Honda; Masaaki (Kodaira, JP); Moriya; Takehiro (Soka, JP)
Owner/Assignee     Nippon Telegraph and Telephone Public Corporation (Tokyo, JP)
Patent assignment
All assignments
Publication Date     July 18, 1989
Application Number     07/255,566
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 11, 1988
US Classification     704/207 704/214
Int'l Classification     G10L 005/00
Examiner     Harkcom; Gary V.
Assistant Examiner     Anderson; Lawrence E.
Attorney/Law Firm     Pollock, Vande Sande and Priddy
Address
Parent Case     This application is a continuation of Ser. No. 712,811, filed on Mar. 18, 1985, now abandoned.
Priority Data     Mar 21, 1984[JP]59-53757 Aug 20, 1984[JP]59-173903
USPTO Field of Search     381/36 381/37 381/38 381/39 381/40 381/51 364/513.5
Patent Tags     speech signal processing
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4742550
Fette
704/219
May,1988

[0 after 0 votes]
4672670
Wang
704/217
Jun,1987

[0 after 0 votes]
4561102
Prezas
704/207
Dec,1985

[0 after 0 votes]
4472832
Atal
704/221
Sep,1984

[0 after 0 votes]
4458110
Mozer
704/211
Jul,1984

[0 after 0 votes]
4214125
Mozer
704/268
Jul,1980

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A speech signal processing system comprising:

an input terminal for receiving successive sample values of a speech waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;

inverse-filter means connected to said input terminal for obtaining successive sample values of a prediction residual waveform e(n) by removing a short-time correlation from the speech waveform S(n);

phase-equalizing filter means connected to said input terminal for receiving the speech waveform S(n) therefrom and producing successive samples of a phase-equalized speech waveform Sp(n) in the time domain by zero-phasing a prediction residual waveform component in the speech waveform in accordance with successive sets of M+1 phase-equalizing filter coefficients h(m,n) supplied thereto as filter coefficients thereof, where m=0, 1, 2, . . . , M, and M is a positive integer; and

filter coefficient determining means connected to the output of said inverse-filter means for determining said phase-equalizing filter coefficients h(m,n) on the basis of said prediction residual waveform e(n), said filter coefficient determining means including voiced/unvoiced sound discriminator means connected to the output of said inverse-filter means for discriminating whether said speech waveform is a voiced sound or an unvoiced sound based on whether a computed value of an auto-correlation function on said prediction residual waveform during an analysis window of a length N at said filter coefficient determining means is above or below a threshold value, pitch position detecting means connected to the outputs of said inversefilter means and said voiced/unvoiced sound discriminator means for detecting, when said speech waveform is discriminated as a voiced sound, pitch positions n.sub.l from said prediction residual waveform e(n), and filter coefficient computing means connected to the outputs of said inverse-filter means, said voiced/unvoiced sound discriminator means and said pitch position detecting means, respectively, for computing, when said speech waveform is discriminated as a voiced sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a time point n of each pitch position n=n.sub.l by solving the following simultaneous equations given for K=0, 1, . . . M, ##EQU26## where L is the number of the pitch positions n.sub.l in the analysis window and V(m) is an auto-correlation function of said prediction residual waveform e(n) given by: ##EQU27## and for setting, when said speech waveform is discriminated as an unvoiced sound, a particular one order of coefficient of said phase-equalizing filter coefficients to a certain value and the other orders thereof to zero;

the output of said filter coefficient determining means being connected to said phase-equalizing filter means so that successive sets of said phase-equalizing filter coefficients h(m,n.sub.l) determined by said filter coefficient determining means are supplied to said phase-equalizing filter means as the filter coefficients thereof, whereby said phase-equalizing filter means outputs the phaseequalized speech waveform Sp(n) as the output of said system representing the input speech waveform.

2. The speech signal processing system according to claim 1 wherein the analysis window length N is selected comparable to a pitch period so that the number L of said pitch positions n.sub.l is one, and said filter coefficient computing means computes filter coefficients h*(m,n.sub.l) instead of the coefficients h(m,n.sub.l) when the speech waveform is discriminated as a voiced sound by said voiced/unvoiced sound discriminating means, where ##EQU28## and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual waveform at the pitch position n.sub.l.

3. The speech signal processing system according to claim 1 or 2 wherein said pitch position detecting means comprises a second phase equalizing filter means connected to the output of said inverse-filter means for phase-equalizing said prediction residual waveform e(n) from said inverse-filter means to produce a phase-equalized prediction residual waveform ep(n), filter coefficients of said second phase-equalizing filter means being controlled by the phase-equalizing filter coefficients determined by said filter coefficient determining means, and amplitude comparing means connected to the output of said second phase-equalizing means for detecting, as the pitch positions, time points at which relative amplitude values of the phase-equalized prediction residual waveform ep(n) within the analysis window are over a predetermined value.

4. The speech signal processing system according to claim 3 wherein said system further comprises:

pulse-processing means for detecting an amplitude m.sub.l of said phase-equalized prediction residual waveform ep(n) at the pitch position n.sub.l obtained by said pitch position detecting means; and

quantizing means connected to the output of said pulse-processing means for quantizing said detected pulse amplitude and producing quantized pulse amplitude c(n);

the quantized pulse amplitude c(n), the pitch position n.sub.l l, a voiced or unvoiced sound discriminating value from said discriminator means and filter coefficients a(k) of said inverse-filter means being output as the output of the system representing the input speech signal.

5. The speech signal processing system according to claim 4 wherein said quantizing means comprises quantization step computing means connected to the output of said phase-equalizing filter means for computing the electric power v of said phase-equalized prediction residual waveform ep(n) supplied from said phase-equalizing filter means and a quantization step size from the computed electric power v, and adaptively varying a quantization step size of said quantizing means in accordance with the computed step size, the electric power of said phase-equalized prediction residual waveform being output as part of the output of said system representing the input speech waveform.

6. The speech signal processing system according to claim 1 or 2 wherein said filter coefficient determining means comprises filter coefficient interpolating means connected to the output of said filter coefficient computing means for interpolating the phase-equalizing filter coefficients for a time point between the computations of two successive sets of the phase-equalizing filter coefficients by said filter coefficient computing means so that the output of said filter coefficient determining means includes the interpolated phase-equalizing filter coefficients.

7. The speech signal processing system according to claim 1 or 2 wherein said system includes coding-processing means connected to the output of said phase-equalizing filter means for coding said phase-equalized speech waveform and outputting the coded phase-equalized speech waveform as the output of said system representing the input speech waveform.

8. The speech signal processing system according to claim 7 wherein said coding-processing means comprises:

a second phase-equalizing filter means connected to the output of said inverse-filter means for receiving therefrom the prediction residual waveform e(n) and producing a phase-equalized prediction residual waveform ep(n) in accordance with the phase-equalizing filter coefficients h(m,n.sub.l) supplied from said filter coefficient determining means as filter coefficients of said second phase-equalizing filter means;

tree code generating means connected to the output of said second phase-equalizing filter means for producing a series of sample values q(n) along a path of successive branches in a tree of codes defined in accordance with quantizing bit numbers R(n) for quantization of the phase-equalized prediction residual waveform ep(n), said path of successive branches being selected in accordance with a sequence of tree codes c(n);

prediction filter means connected to the output of said tree code generating means for receiving therefrom the sample values q(n) and producing a local decoded speech waveform Sp(n), said prediction filter means being controlled by the same filter coefficients as those of said inverse-filter means;

difference detecting means connected to the outputs of said first mentioned phase-equalizing filter means and said second phase-equalizing filter means for detecting the difference between said phase-equalized speech waveform Sp(n) and the local decoded speech waveform Sp(n); and

code sequence optimizing merans connected to said tree code generating means for generating and supplying thereto sequences of tree codes, said code sequence optimizing means being connected to the output of said difference detecting means for receiving therefrom the detected difference and searching an optimum sequence of the tree codes which minimizes the detected difference produced by said difference detecting means;

the optimum code sequence c(n) obtained by said code sequence optimizing means and the filter coefficients for said inverse-filter means being outputted as the coded phase-equalized speech waveform.

9. The speech signal processing system according to claim 8 wherein said tree code generating means comprises:

subinterval setting means connected to the output of said second phase-equalizing filter means for receiving therefrom the phase-equalized prediction residual waveform ep(n) and determining an energy-concentrated position Td and a pitch period Tp of the phase-equalized prediction residual waveform and corresponding residual power u.sub.i of each subinterval within the pitch period from the phase-equalized prediction residual waveform;

bit allocating means connected to the output of said subinterval setting means for receiving therefrom the residual power u.sub.i and computing the quantizing bit number R(n) as the number of branches at each node in said tree code based on the residual power u.sub.i, said number of branches representing the number of bits to be allocated to encode samples of the phase-equalized prediction residual waveform in the corresponding subinterval; and

step size computing means connected to the output of said subinterval setting means for receiving therefrom the residual power u.sub.i and computing, based on the residual power, a quantization step size .DELTA.(n) for quantizing the phase-equalized prediction residual waveform;

said tree of codes being defined by the computed number of branches R(n) at each node of the tree and said tree code generating means being operative to produce the sample value q(n) as a decoded value from the computed step size .DELTA.(n) and the tree code c(n) on each selected branch, and the pitch period Tp, the pitch position Td and the residual power u.sub.i being outputted in codes from said coding-processing means as the output of said system representing the input speech waveform.

10. The speech signal processing system according to claim 7 wherein said coding-processing means comprises:

multi-pulse coding means connected to said filter coefficient determining means for determining pulse positions t.sub.i and pulse amplitudes m.sub.i with respect to the pitch position n.sub.l received from said filter coefficient determining means;

multi-pulse generating means connected to the output of said multi-pulse coding means for receiving therefrom the pulse positions t.sub.i and the pulse amplitudes m.sub.i and generating a multi-pulse signal e(n) composed of a train of pulses having the amplitudes m.sub.i at the respective pulse positions t.sub.i ;

prediction filter means connected to the output of said multi-pulse coding means for producing a local decoded waveform Sp(n) by passing said multi-pulse signal through said prediction filter means while said prediction filter means is controlled by the same filter coefficients as those for said inverse-filter means; and

difference detecting means connected to the outputs of said first mentioned phase-equalizing filter means and said second phase-equalizing filter means for receiving therefrom said phase-equalized speech waveform Sp(n) and said local decoded waveform Sp(n) and detecting the difference therebetween;

the output of said difference detecting means being connected to said multi-pulse coding means to supply thereto the detected difference, and said multi-pulse coding means determing the pulse positions t.sub.i and the pulse amplitudes m.sub.i so as to minimize the detected difference and being operative to output, as part of the coded speech speech waveform, the determined pulse positions t.sub.i and pulse amplitudes m.sub.i along with the filter coefficients a(k).

11. A speech signal processing system comprising:

an input terminal for receiving successive sample values of a speech waveform S(n) at successive time points n, where n=0, 1, 2, . . . ;

inverse-filter means connected to said input terminal for obtaining successive sample values of a prediction residual waveform e(n) by removing a short-time correlation from the speech waveform S(n);

phase-equalizing filter means connected to the output of said inverse-filter means for obtaining a phase-equalized residual waveform ep(n) in the time domain by zero-phasing the prediction residual waveform e(n) from said inverse-filter means in accordance with successive sets of M+1 phase-equalizing filter coefficients h(m,n) supplied thereto as filter coefficients thereof, where m=0, 1, 2, . . . , M and M is a positive integer; and

filter coefficient determining means connected to the output of said inverse-filter means for determining said phase-equalizing filter coefficients h(m,n) on the basis of said prediction residual waveform e(n), said filter coefficient determining means including voiced/unvoiced sound discriminator means connected to the output of said inverse-filter means for discriminating whether said speech waveform is a voiced sound or unvoiced sound based on whether a computed value of an auto-correlation function on said prediction residual waveform during an analysis window of a length N at said filter coefficient determining means is above or below a threshold value, pitch position detecting means connected to the outputs of said inverse-filter means and said voiced/unvoiced sound discriminator means for detecting, when said speech waveform is discriminated as a voiced sound, pitch positions n.sub.l from said prediction residual waveform e(n), and filter coefficient computing means connected to the outputs of said inverse-filter means, said voiced/unvoiced sound discriminator means and said pitch position detecting means, respectively, for computing, when said speech waveform is discriminated as a voiced sound, a set of the M+1 phase-equalizing filter coefficients h(m,n) for a time point n of each pitch position n=n.sub.l by solving the following simultaneous equations given for k=0, 1, . . . M, ##EQU29## where L is the number of the pitch positions n.sub.l in the analysis window and V(m) is an auto-correlation function of said prediction residual waveform e(n) given by: ##EQU30## and for setting, when said speech waveform is discriminated as an unvoiced sound, a particular one order of coefficient of said phase-equalizing filter coefficients to a certain value and the other orders thereof to zero;

the output of said filter coefficient determining means being connected to said phase-equalizing means so that successive set of said phase-equalizing filter coefficients h(m,n.sub.l) determined by said filter coefficient determining means are supplied to said phase-equalizing filter means as filter coefficients thereof, whereby said phase-equalizing filter means outputs the phase-equalized prediction residual waveform ep(n) as the output of said system representing the input speech waveform.

12. The speech signal processing system according to claim 11 wherein the analysis window length N is selected comparable to a pitch period so that the number L of said pitch positions n.sub.l is one, and said filter coefficient computing means computes filter coefficients h*(m,n.sub.l) instead of the coefficients h(m,n.sub.l) when the speech waveform is discriminated as a voiced sound by said voiced/unvoiced sound discriminating means, where ##EQU31## and e(n.sub.l +M/2-m) denotes a sample value of said prediction residual waveform at the pitch position n.sub.l.

13. The speech signal processing system according to claim 11 or 12 wherein said pitch position detecting means comprises a second phase equalizing filter means connected to the output of said inverse-filter means for phase-equalizing the prediction residual waveform e(n) from said inverse filter means to produce a phase-equalized prediction residual waveform ep(n), filter coefficients of said second phase-equalizing filter means being controlled by the phase-equalizing filter coefficients determined by said filter coefficient determining means, and amplitude comparing means connected to the output of said second phase equalizing filter means for detecting, as the pitch positions, time points at which relative amplitude values of the phase-equalized prediction residual waveform ep(n) within the analysis window are over a predetermined value.

14. The speech signal processing system according to claim 11 or 12 wherein said filter coefficient determining means comprises filter coefficient interpolating means connected to the output of said filter coefficient computing means for interpolating the phase-equalizing filter coefficients for a time point between the computations of two successive sets of the phase-equalizing filter coefficients by said filter coefficient computing means so that the output of said filter coefficient determining means includes the interpolated phase-equalizing filter coefficients.

15. The speech signal processing system according to claim 11 wherein said system further comprises coding-processing means connected to the output of said phase-equalizing filter means for coding the phase-equalized prediction residual waveform and outputting the coded phase-equalized prediction residual waveform as the output of said system representing the input speech waveform.

16. The speech signal processing system according to claim 15 wherein said coding processing means includes energy-concentrated portion coding means connected to the output of said phase-equalizing means for detecting a position t.sub.i of each energy-concentrated portion in said phase-equalized residual waveform and coding the energy-concentrated portion to produce a code Pc representing the energy concentrated portion, the code of the energy-concentrated portion Pc and a code showing the energy-concentrated position t.sub.i being outputted along with codes of said filter coefficients a(k) of said inverse-filter means as the output of said system representing the input speech waveform.

17. The speech signal processing system according to claim 16 wherein said energy-concentrated portion coding means comprises pulse pattern generating means for reproducing a pulse pattern signal P(n) composed of a train of the energy-concentrated portions each centered at the respective energy-concentrated positions t.sub.i of said phase-equalized prediction residual waveform, and said coding processing means further comprises difference signal coding means connected to the output of said energy-concentrated portion coding means for generating a difference code c(n) representing a difference between said pulse pattern signal P(n) and said phase-equalized prediction residual waveform, said difference code c(n) being outputted as part of the output of said system representing the input speech waveform.

18. The speech signal processing system according to claim 17 wherein said pulse pattern generating means produces the pulse pattern signal P(n) by vector-quantizing a waveform of plural samples of each said energy-concentrated portion.

19. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises subtraction means connected to the outputs of said phase-equalized filter means and said pulse pattern generating means for receiving the phase-equalized prediction residual waveform ep(n) and the pulse pattern signal P(n) and producing a difference therebetween as a difference signal V(n), and spectrum quantizing means connected to the output of said subtraction means for quantizing frequency components of said difference signal V(n) to produce a spectrum envelope code as the difference code c(n) representing said difference signal.

20. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises vector code generating means for producing said difference code c(n) and a decoded vector value Vc(n) based on said difference code c(n), adder means connected to the outputs of said pulse pattern generating means and said vector code generating means for adding said pulse pattern signal P(n) and said decoded vector value Vc(n) received therefrom to produce a local decoded residual waveform ep(n), first prediction filter means connected to the output of said adder means for receiving therefrom the local decoded residual waveform ep(n) and producing a local decoded speech waveform Sp(n) by controlling filter coefficients of said prediction filter means with the same filter coefficients as those for said inverse-filter means, second prediction filter means connected to the output of said phase-equalizing filter means for regenerating a phase-equalized speech waveform Sp(n) from said phase-equalized prediction residual waveform ep(n), subtraction means connected to the outputs of said first and second prediction filter means for producing a difference between said regenerated phase-equalized speech waveform Sp(n) and said local decoded speech waveform Sp(n), and path search means connected to receive the difference and to control successive selections of said difference codes in said vector code generating means so that said difference becomes minimum.

21. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises means for determining as the difference code c(n) a code of an optimum vector-tree value Vc(n) representing the difference between said phase-equalized residual waveform and said pulse pattern signal P(n).

22. The speech signal processing system according to claim 17 wherein said difference signal coding means comprises means for quantizing frequency components of the difference between said phase-equalized residual waveform and said pulse pattern signal and outputting the quantized results as the difference code c(n).
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The present invention relates to a speech signal processing system wherein the prediction residual waveform is obtained by removing the short-time correlation from the speech waveform and the prediction residual waveform is used for coding, for example, a speech waveform.

Prior art speech signal coding systems have two classes of waveform coding and analysis-synthesizing system (vocoder). In a linear predictive coding (LPC) vocoder belonging to the latter class of the analysis-synthesizing system, coefficients of an all-pole filter (prediction filter) representing a speech spectrum envelope are given by the linear prediction analysis of an input speech waveform and then the input speech waveform is passed through an all-zero filter (inverse-filter) whose characteristics are inverse to the prediction filter so as to obtain a prediction residual waveform, and a parameter extracting part serves to extract periodicity as a parameter characterizing said residual waveform (discrimination of voiced or unvoiced sound), a pitch period and average power of the residual waveform and then these extracted parameters and the prediction filter coefficients are sent out. In the synthesizing part, a train of periodic pulses of the received pitch period in the case of a voiced sound or a noise waveform in the case of an unvoiced sound is outputted from an excitation source generating part, in place of the prediction residual waveform, so as to be supplied to a prediction filter which outputs a speech waveform by setting filter coefficients of the prediction filter as the received filter coefficients.

On the other hand, in an adaptive predictive coding (APC) system belonging to the former class of the waveform coding, a prediction residual waveform is obtained in a manner similar to the case of vocoder and then sampled values of this residual waveform are directly quantized (coded) so as to be sent out along with coefficients of a prediction filter. In the synthesizing section, the received coded residual waveform is decoded and supplied to a prediction filter which serves to generate a speech waveform by setting the received predictions filter coefficients in filter coefficients of the prediction filter.

The difference between these two conventional systems resides in the method of coding a prediction residual waveform. The above-stated LPC vocoder can achieve large reduction in bit rate in comparison with the above-stated APC system for transmitting a quantized value of each sample of the residual waveform, because relative to the residual waveform, the LPC vocoder is required to transmit only the characterizing parameters (periodicity, a pitch period, and average electric power). However, on the contrary, in the LPC vocoder, it is impossible to avoid degradation in speech quality caused by replacing a residual waveform with a pulse train or noise, resulting in such as, what is called, a mechanical synthesizing voice. Even though the bit rate increases, enhancement in quality would saturate at about 6 kb/s. As a result, the LPC vocoder has a disadvantage that it cannot provide natural voice quality. Another factor of the lowering quality is that the timing for controlling the prediction filter coefficients cannot be suitably determined relative to each pulse position (phase) in the pulse train supplied to the prediction filter because of lack of information indicating each pitch position. Further the LPC vocoder also has the disadvantage that the lowering of quality is brought about by the extracting of erroneous characterizing parameters from a residual waveform. On the other hand, the above-stated APC system has an advantage that it is possible to enhance speech quality so that it is very close to the original speech by increasing the number of quantizing bits for a residual waveform, but on the contrary, it has the disadvantage that when the bit rate is lowered less than 16 kb/s, quantization distortion increases to abruptly degrade the speech quality.

Moreover, in the prior art systems, there is a possibility that such as an alteration in pitch of a speech signal and combining of speech signal frames happen to be carried out at time locations where signal energy is concentrated, resulting in generation of unnatural speech.

Furthermore, in the prior art as is disclosed in U.S. Pat. No. 4,214,125, F. S. MOZER, "Method and apparatus for speech synthesizing" or U.S. Pat. No. 3,892,919, A. ICHIKAWA, "Speech synthesizing system", it has been proposed to carry out the following processing procedure. After the Fourier transform is carried out on samples in each waveform section of one pitch length cut out from a speech waveform and the resultant sine component is set to zero, that is, the phase of each harmonic component is set to zero, the resultant is subjected to the inverse Fourier transform to zero-phase the cut-out speech waveform, thereby temporarily concentrating the signal energy into a pulsative waveform. Each zero-phased waveform of the pitch length is coded. In the synthesizing part the resultant codes are decoded and the zero-phased waveform sections each having a pitch period duration are concatenated to one another to restore the speech waveform. In this method, erroneous extraction of a pitch period greatly influences the speech quality. The processing distortion is caused by the zero-phasing process applied to a speech waveform. Furthermore, in this method, the location of energy concentration (pulse) caused by the zero-phasing has nothing to do with the portion where energy of the original speech waveform in each pitch length is comparatively concentrated, that is, the pitch location and thus the restored speech waveform synthesized by successively concatenating zero-phased speech waveform sections is far from the original speech waveform and excellent speech quality cannot be obtained.

Further, in J. IECE Jpn. Trans. A, vol. 62-t. No. 3, March 1979, "Function and basic characteristics of SPAC" by Takasugi, the following method is proposed: The auto-correlation function of a speech waveform is obtained, a certain kind of zero-phasing operation is conducted on the speech waveform and each speech waveform section of a pitch length is coded. In the decoding part, the decoded waveform sections are successively concatenated one another. Moreover, the operation of obtaining the auto-correlation function is somewhat similar to performing a square operation, so that the low frequency components with large energy are emphasized, resulting in square-law distortion in the spectrum of the processed signal. In this case, said zero-phasing serves to concentrate energy in the form of a pulse in each pitch period of the auto-correlation function, but, the pulse location does not necessarily coincide with the location where the energy in each pitch period of speech waveform is concentrated and therefore when the decoded waveform sections are connected to one another to reconstruct a speech waveform, the reconstructed speech waveform may be far from the original speech waveform.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a speech signal processing system which can maintain comparatively excellent speech quality even in the case of a bit rate lower than 16 kb/s.

Another object of the present invention is to provide a speech signal processing system which allows to obtain a natural characteristic in the case of concatenating pieces of, for example, subjected to linear-predictive-analysis and a short-time correlation of the speech waveform is removed from the waveform by an inverse-filter so as to obtain a prediction residual waveform. Then a filter coefficient computing part determines filter coefficients of a phase-equalizing (linear) filter which has reverse phase characteristics to the short-time (for example, shorter than a pitch period) phase characteristics of said prediction residual waveform. The determined filter coefficients are set to a phaseequalizing filter. The above-stated speech waveform or prediction residual waveform is passed through the phase-equalizing filter so as to zero-phase, that is, phase-equalize the prediction residual waveform components of said speech waveform or said prediction residual waveform. This phase-equalized prediction residual waveform (components) has a temporal energy concentration in the form of an impulse in every pitch of the speech waveform and the impulse position almost coincides with the pitch position of the speech waveform (the portion where the energy is concentrated). For example, the concatenation of the speech waveforms is accomplished at the portions where the energy is not concentrated so as to obtain a speech waveform having an excellent nature. Furthermore, since the prediction residual waveform (components) is phase-equalized instead of phase-equalizing the speech waveform, the spectrum distortion caused thereby can be made smaller.

Moreover, when the above-stated phase-equalized speech waveform or prediction residual waveform is coded, efficient coding can be attained by adaptively allocating more bits to, for example, the portions where the energy is concentrated than elsewhere. In this case, it is possible to obtain relatively excellent speech quality even with a bit rate less than 16 kb/s.

In addition, in case the above-stated determination of filter coefficients are adaptatively performed, it is possible to realize more excellent speech quality.

THEORY OF THE INVENTION

Now, the theory of the speech signal processing system according to the present invention will be described. As described above, in the conventional LPC vocoder, a pitch period and average electric power of a residual waveform of a voiced sound are transmitted and on the decoding side, a pulse train having the pitch period is generated and passed through a prediction filter. Accordingly, the pitch positions of the original speech waveform (the positions where the energy is concentrated and much information is included) do not respectively correspond to the pulse positions of a regenerated speech and thus the speech quality is poor. On the other hand, in the present invention, the time axis of the residual waveform within one pitch period is reversed at the pitch position regarded as the time origin and sample values of the time-reversed residual waveform are used as filter coefficients of a phase-equalizing filter; therefore, the output of this phase-equalizing filter is ideally made to be the impulses whose energy is concentrated on the pitch positions of the speech waveform. Consequently, by passing the output pulse train from the phase-equalizing filter through a prediction filter, a waveform whose pitch positions agree with those of the original speech waveform can be obtained, resulting in excellent speech quality. Further, in the case where the speech waveform is passed through said phase-equalizing filter, the residual waveform components are zero-phased and thus the output of the filter has energy concentrated on each pitch position of the speech waveform. Therefore, by allocating more information bits to the residual waveform samples where energy is concentrated and less information bits to the other portions, it is possible to enhance the quality of decoded speech even when a small number of information bits are used in total.

Next, the theory of the invention will be explained with reference to formulas. Letting a sample value of the speech waveform be noted by S(n) and a prediction coefficients obtained by a linear-prediction-analysis of the speech waveform by a(k) (k=1, 2, . . . p), a sample value e(n) of a prediction residual waveform is given by the following equation; ##EQU1## where a(0)=1. Since the residual waveform e(n) is one which is obtained by removing the spectrum envelope components from the speech waveform, that is, one obtained by removing the correlation between the sample values of the speech waveform, the residual waveform has a flat spectrum envelope and, in the case of voiced sound, has pitch period components of the speech. Thus, the characteristics of this residual waveform are idealized and expressed by the following pulse train; ##EQU2## where .delta.(n) is the Kronecker's delta function defined by .delta.(0)=1 and .delta.(n)=0 (n.noteq.0). n.sub.l represents a pulse position (i.e. pitch position) and n.sub.l -n.sub.l-1 corresponds to a pitch period of the speech. Thus, this pulse train function e.sub.M (n) has a pulse only at each pitch position n.sub.l and is zero at the other positions. Since both the residual waveform e(n) and the pulse train e.sub.M (n) have a flat spectrum envelope and the same pitch period components, the difference between both waveforms is based on the difference between the phase-characteristics thereof in a short-time, that is, a time which is shorter than the pitch period. Thus, representing an impulse response of a linear-filter which has characteristics inverse to short-time phase characteristics of the residual waveform by h(n), the following equation (3) allows computation of the phase-equalized (zero-phased) residual waveform e.sub.p (n) which would be obtained by passing the residual waveform e(n) through the linear-filter (phase-equalizing filter) to phase-equalize all the spectrum components; ##EQU3## This impulse response h(m) can be given by minimizing the means square error between e.sub.p (n) and e.sub.M (n). The mean square error is given by the following equation; ##EQU4## By substituting the formulas (2) and (3) in equation (4), partial differentiating the modified equation (4) with h(m), and setting the differentiated expression to 0, the impulse response h(m) can be given as a solution of the following simultaneous equations; ##EQU5## where v(k) is an auto-correlation function and is computed by the following equation; ##EQU6## In the case where the time corresponding to the tap number M+1 of the phase-equalizing filter, that is, the response time is shorter than the pitch period, the auto-correlation function can be approximated by v(k).perspectiveto.v.sub.0 .perspectiveto.(k) because the residual waveform has a flat spectrum. In short, the residual waveform has a value only in the case of k=0. Thus, equation (5) assumes a value only in the case of m=k, and can be simplified as follows; ##EQU7## Further, if the analysis window length N is shorter than a pitch period, the value of L would be one, allowing only one pulse to be present. Thus, the impulse response can be computed by the following equation; ##EQU8## Thus, the impulse response h(m) is equivalent to one that is obtained by reversing the residual waveform in the time domain at the time point n.sub.0. Moreover, in case the power spectrum is completely white (the amplitudes of all the frequency components are constant), the Fourier transform of the impulse response h(m) can be expressed by the following equation (9) in which the gain is normalized; ##EQU9## where E(k) denotes a Fourier transform of the residual waveform e(n). Accordingly, since the Fourier transform E.sub.p (k) of the phase-equalized residual waveform e.sub.p (n) is E.sub.p (k)=H(k).multidot.E(k) in the light of equation (3) and E(k) is E(k)=.vertline.E(k).vertline.exp{argE(k)}, the following equation can be obtained by substituting equation (9) in E.sub.p (k) as follows; ##EQU10## From equation (10), it will be understood that the phaseequalized residual waveform e.sub.p (n) is one that is obtained by making the residual waveform e(n) zero-phased (all spectrum components are made to have the same zero phase) except for a linear phase component exp{-2.pi.kn.sub.0 /(M+1)}. In the case if it is ideally holds that .vertline.E(k).vertline.=E.sub.0 (constant), then e.sub.p (n) is to have zero phases and thus is a single pulse waveform. In summary, when the residual waveform e(n) is passed through the phase-equalizing filter having the filter coefficients h(m) as mentioned above, the output waveform becomes one that has energy concentrated mainly at a pitch position, that is, the output waveform takes a shape of a single pulse.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a speech signal processing system of the present invention, particularly an example of the arrangement of an adaptive phase-equalizing processing system.

FIG. 2 is a block diagram showing the internal arrangement example of a pitch position detecting part 25 in FIG. 1.

FIG. 3 is a block diagram showing an example of a basic arrangement for speech coding by utilizing the phase-equalizing processing.

FIG. 4 is a block diagram showing an example of an arrangement for variable-rate tree-coding of a speech waveform.

FIG. 5 is an explanatory diagram in relation to the setting of sub-intervals.

FIG. 6 is an explanatory diagram showing an arrangement for variable-rate tree coding.

FIGS. 7A to 7G are diagrams showing the waveform examples at respective parts in the speech signal processing system.

FIG. 8 is a block diagram showing an example of an arrangement of a speech signal multi-pulse-coding utilizing the phase-equalizing processing.

FIG. 9 is a block diagram showing an example of an arrangement of a speech analysis-synthesizing system on the basis of a zero-phased residual waveform.

FIG. 10 is a block diagram showing an example of an arrangement of a speech analysis-synthesizing system utilizing the phase-equalizing processing.

FIG. 11 is a block diagram showing another arrangement of the speech analysis-synthesizing system.

FIG. 12 is a graph showing comparison in effects of quantization of samples neighboring the pulse depending on the presence or absence of the phase-equalization.

FIG. 13 is a graph showing comparison in quantization performance between the embodiment shown in FIG. 10 and a tree coding of an ordinary vector unit.

FIG. 14 is a graph showing comparison in quantization performance between the embodiment shown in FIG. 11 and an ordinary adaptive transformation-coding method utilizing a vector quantum.

FIGS. 15A to 15E are diagrams respectively showing examples of waveforms in the process of obtaining filter coefficients h(m,n) in FIG. 1.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Next, a concrete embodiment of the speech signal processing system of this invention will be described with reference to FIG. 1. Sample values S(n) of a speech waveform are inputted at an input terminal 11 and are supplied to a linear prediction analysis part 21 and an inverse-filter 22. The linear prediction analysis part 21 serves to compute prediction coefficients a(k) in equation (1) on the basis of a speech waveform S(n) by means of the linear prediction analysis. The prediction coefficients a(k) are set as filter coefficients of the inverse-filter 22. Thus, the inverse-filter 22 serves to accomplish a filtering operation expressed by equation (1) on the basis of the input of the speech waveform S(n) and then to output a prediction residual waveform e(n), which is identical with such a waveform is obtained by removing from the input speech waveform a short-time correlation (correlation among sample values) thereof. This prediction residual waveform e(n) is supplied to a voiced/unvoiced sound discriminating part 24, a pitch position detecting part 25 and a filter coefficients computer part 26 in a filter coefficient determining part 23. The voiced/unvoiced sound discriminating part 24 serves to obtain an