WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for speech compression using multi-mode code excited linear predictive coding    
United States Patent5602961   
Link to this pagehttp://www.wikipatents.com/5602961.html
Inventor(s)Kolesnik; Victor D. (St. Petersburg, RU); Trofimov; Andrey N. (St. Petersburg, RU); Bocharova; Irina E. (St. Petersburg, RU); Krachkovsky; Victor Y. (St. Petersburg, RU); Kudryashov; Boris D. (St. Petersburg, RU); Ovsjannikov; Eugeny P. (St. Petersburg, RU); Trojanovsky; Boris K. (St. Petersburg, RU); Kovalov; Sergei I. (St. Petersburg, RU)
AbstractAn apparatus and method of coding speech. The apparatus includes a first circuit being coupled to receive a first signal, the first signal corresponds to the speech signal. The first circuit is for generating a first set of parameters corresponding to the first frame. The apparatus includes a second circuit, being coupled to receive a second signal and the first set of parameters, the second signal corresponding to the speech signal, and the second circuit is for generating a third signal. The apparatus further includes a pulse train analyzer, being coupled to the second circuit, for generating a third match value, a third set of parameters, and a third excitation value. The apparatus further including a fourth circuit, being coupled to the second circuit, for generating a fourth match value, a fourth set of parameters, and a fourth excitation value. The apparatus further including a fifth circuit, being coupled to the third circuit and the fourth circuit, for selecting a mode corresponding to a match value. The apparatus further including a sixth circuit, being coupled to the fifth circuit, for selecting a selected set of parameters and a selected excitation corresponding to the mode. The apparatus further including a seventh circuit, being coupled to the first circuit and the sixth circuit, for generating an encoded signal responsive to the selected set of parameters and the mode.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5602961
Method and apparatus for speech compression using multi-mode code

     excited linear predictive coding - US Patent 5602961 Drawing
Method and apparatus for speech compression using multi-mode code excited linear predictive coding
Inventor     Kolesnik; Victor D. (St. Petersburg, RU); Trofimov; Andrey N. (St. Petersburg, RU); Bocharova; Irina E. (St. Petersburg, RU); Krachkovsky; Victor Y. (St. Petersburg, RU); Kudryashov; Boris D. (St. Petersburg, RU); Ovsjannikov; Eugeny P. (St. Petersburg, RU); Trojanovsky; Boris K. (St. Petersburg, RU); Kovalov; Sergei I. (St. Petersburg, RU)
Owner/Assignee     Alaris, Inc. (Fremont, CA); GT Technology (Saratoga, CA)
Patent assignment
All assignments
Publication Date     February 11, 1997
Application Number     08/251,471
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     May 31, 1994
US Classification     704/223 704/219 704/221 704/225 704/229 704/230 704/262 704/264
Int'l Classification     G10L 003/02
Examiner     Tung; Kee M.
Assistant Examiner    
Attorney/Law Firm     Blakely, Sokoloff, Taylor & Zafman
Address
Parent Case    
Priority Data    
USPTO Field of Search     381/29 381/30 381/36 381/37 381/38 381/51 395/2..39 395/2.67 395/2..74
Patent Tags     speech compression multi-mode code excited linear predictive coding
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5414796
Jacobs
704/221
May,1995

[0 after 0 votes]
5394508
Lim
704/200.1
Feb,1995

[0 after 0 votes]
5388181
Anderson
704/200.1
Feb,1995

[0 after 0 votes]
5369724
Lim
704/200.1
Nov,1994

[0 after 0 votes]
5255339
Fette
704/200
Oct,1993

[0 after 0 votes]
5235671
Mazor

Aug,1993

[0 after 0 votes]
5233659
Ahlberg

Aug,1993

[0 after 0 votes]
5222189
Fielder
704/229
Jun,1993

[0 after 0 votes]
5199076
Taniguchi
704/207
Mar,1993

[0 after 0 votes]
5195137
Swaminathan
704/222
Mar,1993

[0 after 0 votes]
5187745
Yip
704/219
Feb,1993

[0 after 0 votes]
5177799
Naitoh
704/204
Jan,1993

[0 after 0 votes]
5073940
Zinser

Dec,1991

[0 after 0 votes]
5060269
Zinser
704/220
Oct,1991

[0 after 0 votes]
5012518
Liu
704/222
Apr,1991

[0 after 0 votes]
4980916
Zinser
704/207
Dec,1990

[0 after 0 votes]
4969192
Chen
704/222
Nov,1990

[0 after 0 votes]
4944013
Gouvianakis
704/219
Jul,1990

[0 after 0 votes]
4932061
Kroon
704/223
Jun,1990

[0 after 0 votes]
4924508
Crepy
704/207
May,1990

[0 after 0 votes]
4914701
Zibman
704/203
Apr,1990

[0 after 0 votes]
4912764
Hartwell
704/261
Mar,1990

[0 after 0 votes]
4868867
Davidson
704/200.1
Sep,1989

[0 after 0 votes]
4817157
Gerson
704/230
Mar,1989

[0 after 0 votes]
4790016
Mazor
704/203
Dec,1988

[0 after 0 votes]
4736428
Deprettere
704/264
Apr,1988

[0 after 0 votes]
4472832
Atal
704/221
Sep,1984

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. An apparatus for processing an input signal, said input signal including a frame, said apparatus comprising:

a first circuit coupled to receive a first signal, said first signal corresponding to said input signal, said first circuit for generating a first set of parameters corresponding to said frame;

a second circuit coupled to receive said first signal and said first set of parameters, said second circuit for generating a second signal;

a pulse train analyzer, coupled to said second circuit, said pulse train analyzer for generating a first match value, a second set of parameters, and a first excitation value;

a fourth circuit, coupled to said second circuit, said fourth circuit for generating a second match value, a third set of parameters, and a second excitation value, said fourth circuit including an adaptive codebook and an adaptive codebook analyzer, said adaptive codebook being coupled to said adaptive codebook analyzer;

a fifth circuit, coupled to said pulse train analyzer and said fourth circuit, for determining a set of admissible excitation search modes based upon a prior excitation search mode, and said fifth circuit further for selecting an excitation search mode from said set of admissible excitation search modes;

a sixth circuit, coupled to said fifth circuit, for selecting a selected set of parameters and a selected excitation corresponding to said excitation search mode, and

a seventh circuit, coupled to said first circuit and said sixth circuit, for generating an encoded signal responsive to said selected set of parameters and said excitation search mode.

2. The apparatus of claim 1 further comprising:

an eighth circuit, coupled to said second circuit, said eighth circuit for generating a third match value, a fourth set of parameters, and a third excitation value, and

wherein, said fifth circuit is coupled to said eighth circuit.

3. The apparatus of claim 2 wherein said eighth circuit further includes a stochastic codebook analyzer for generating said fourth set of parameters.

4. The apparatus of claim 2 wherein said eighth circuit includes a trellis codebook analyzer for generating said fourth set of parameters.

5. The apparatus of claim 2 wherein said first set of parameters includes linear prediction coefficients (LPCs) corresponding to said frame, and wherein said second circuit is coupled to receive said LPCs and is for performing ringing removal and perceptual weighting of said first signal to generate said second signal.

6. The apparatus of claim 3 wherein each of said second, third, and fourth set of parameters includes an index parameter and a gain parameter.

7. The apparatus of claim 4 wherein said frame includes a subframe, and wherein said second set of parameters corresponds to said subframe.

8. The apparatus of claim 7 wherein said second set of parameters include a pitch parameter, an index parameter, and a phase parameter, and wherein the index parameter includes an index to a shape pulse.

9. The apparatus of claim 7 wherein an index parameter of said third set of parameters includes an index to said adaptive codebook.

10. The apparatus of claim 7 wherein said eighth circuit includes a short adaptive codebook.

11. The apparatus of claim 7 wherein said fifth circuit is for weighting said first, second and third match values prior to selecting said excitation search mode.

12. The apparatus of claim 11 wherein said first match value is weighted by an amount between 0.7-0.9, wherein said second match value is weighted by an amount between 1.1-1.3, and wherein said third match value is weighted by an amount between 0.8-1.0.

13. The apparatus of claim 7 wherein said input signal includes a previous subframe, said previous subframe having said previous excitation search mode, and said fifth circuit is for selecting said excitation search mode responsive to said previous subframe.

14. The apparatus of claim 7 wherein said input signal includes digitized speech.

15. The apparatus of claim 7 further comprising a filter circuit coupled to receive said input signal and for generating said first signal.

16. The apparatus of claim 7 further comprising a line spectrum pair circuit, being coupled to said first circuit and said seventh circuit, for generating line spectrum pair parameters from said first set of parameters, wherein said seventh circuit includes a multiplexing circuit, and wherein said seventh circuit is for multiplexing said line spectrum pair parameters with said selected set of parameters and said selected excitation.

17. The apparatus of claim 2 wherein said fifth circuit is further configured to select said excitation search mode corresponding to one of said set of admissible excitation search modes requiring the least number of bits and complying with a predetermined error threshold.

18. A multi-mode linear predictive coder for processing digital speech signals, said digital speech signals being partitioned into frames of a first predetermined length, where each frame is partitioned into subframes of a second predetermined length, said coder comprising:

a short-term prediction analyzer responsive to said digital speech signals, said short-term prediction analyzer for generating linear prediction parameters and line spectrum parameters;

a variable rate encoder, coupled to said short-term prediction analyzer, for coding differences of said line spectrum parameters by a predetermined variable rate code;

a ringing removal and perceptual weighting circuit for ringing removal and perceptual weighting said digital speech signals to produce predistorted speech vectors for successive subframes;

a multi-mode excitation analyzer, coupled to said ringing removal and perceptual weighting circuit, for generating a set of excitations, a set of match values, and a set of parameters, each excitation in said set of excitations corresponding to a maximal value of a match function in said set of match values;

a pause analyzer, responsive to said digital speech signals, for pause detecting and producing a pause mode signal;

a comparator and controller, coupled to said multi-mode excitation analyzer and said pause analyzer, for weighting and comparing said match function values for each of a plurality of excitation search modes, and for generating a current excitation search mode corresponding to one of said plurality of excitation search modes with a maximal weighted match function value;

a selector of parameters, coupled to said multi-mode excitation analyzer, for generating selected parameters from said set of parameters corresponding to said current excitation search mode; and

a selector of excitations, coupled to said multi-mode excitation analyzer, for selecting a current excitation from said set of excitations corresponding to said current excitation search mode.

19. The multi-mode linear predictive coder as recited in claim 18, wherein said multi-mode excitation analyzer further comprises:

an adaptive codebook (ACB) analyzer, coupled to said ringing removal and perceptual weighting circuit, for generating an ACB excitation, an ACB match function and ACB parameters for each subframe in said frame;

a pulse train analyzer, coupled to said tinging removal and perceptual weighting circuit, for generating a pulse excitation, a pulse match function and pulse parameters;

a shortened adaptive codebook (SACB) analyzer, coupled to said ringing removal and perceptual weighting circuit, for generating a SACB codebook excitation and SACB parameters; and

a stochastic analyzer, coupled to said ringing removal and perceptual weighting circuit, said stochastic analyzer for generating a stochastic gain, a stochastic codeword index, a stochastic excitation, and a stochastic match function, said stochastic excitation corresponding to said SACB excitation.

20. The multi-mode linear predictive coder of claim 19 wherein said stochastic analyzer is a trellis analyzer, and wherein said stochastic gain is a trellis gain, said stochastic codeword index is a trellis codeword index, said stochastic excitation is a trellis excitation, and said stochastic match function is a trellis match function.

21. A method of selecting encoding parameters, said method for use in a speech synthesizer to improve the subjective speech quality, said method comprising the steps of:

constructing a pulse based upon the time inversion of a pulse response of a response filter;

generating an excitation vector in the form of multiple pitch spaced pulses using a set of pitch values, a set of phase values, and said pulse, said set of pitch values and said set of phase values derived from a perceptually weighted speech signal;

computing energy values and correlation values, said energy values determined using a filtered vector, said correlation values representing the correlation between said filtered vector and said perceptually weighted speech signal, said filtered vector corresponding to said excitation vector; and

selecting the pulse excitation from said excitation vector corresponding to correlation values and energy values that maximize a pulse mode match function.

22. The method of claim 21 wherein said method further comprises the step of receiving a set of linear prediction coefficients (LPCs), said LPCs defining a linear prediction (LP) analysis filter of order m, and said step of constructing a pulse uses the following equations:

A(z)=1-a.sub.1 z.sup.-1 -a.sub.2 z.sup.-2 - . . . -a.sub.m z.sup.-m ;

U(z)=(1-.delta.z.sup.-1)/A(.alpha.z);

V.sub.0,n-1 (z)=z.sup.n-1 U.sub.0,n-1 (z.sup.-1);

W(z)=(V.sub.n-m,n-1 (z)+z.sup.-n U.sub.0,d (z))A(.beta.z); and

V.sub.n,m-1 (Z)=W.sub.n,M-1 (Z); where X.sub.i,j (z) represents the polynomial X.sub.i,j (z)=X.sub.i z.sup.-i +x.sub.i+1 z.sup.-(i+1) +. . . +x.sub.j z.sup.-j, j>i, where A(z) denotes the Z-transform for the LP analysis filter, where a.sub.i represents one linear prediction coefficient of said set of LPCs, where samples of said pulse are represented by V.sub.i (z), where n<M, where .alpha. and .delta. are empirically chosen constants, 0.ltoreq..alpha.,.delta..ltoreq.1, where .beta. is an empirically chosen constant, 0.ltoreq..beta..ltoreq.1, and where d, d.gtoreq.0, is a fixed constant.

23. The method of claim 22 wherein .alpha. is in the range 0.9 to 0.98, .delta. is in the range 0.55 to 0.75, and .beta. is in the range 0.6 to 0.8.

24. A pulse train analyzer for use in a speech synthesizer comprising:

a pulse generator coupled to receive a set of pitch values, a set of phase values, and a set of linear prediction coefficients (LPCs), said set of pitch values and said set of phase values derived from a perceptually weighted speech signal, said set of LPCs derived from an input speech signal, said pulse generator producing an excitation vector based upon said set of pitch values, said set of phase values, and said set of LPCs;

a correlation circuit coupled to said pulse generator and further coupled to receive said perceptually weighted speech signal, said correlation circuit using a pulse mode match function to determine a set of match values, said set of match values based upon said excitation vector and said perceptually weighted speech signal; and

a pulse train selector coupled to receive said set of match values, said pulse train selector selecting the excitation from said excitation vector that corresponds to the maximal value in said set of match values as a selected pulse excitation.

25. The pulse train analyzer of claim 24 said correlation circuit further comprising:

a response filter coupled to said pulse generator producing a pulse response corresponding to said excitation vector;

a correlator coupled to receive said perceptually weighted speech signal and coupled to said response filter, said correlator computing correlation values between said pulse response and said perceptually weighted speech signal;

an energy calculator coupled to said response filter computing energy values using said pulse response; and

a match function calculator coupled to said correlator and said energy calculator to produce said set of match values using said pulse mode match function, said set of match values based upon applying said pulse mode match function to said correlation values and said energy values.

26. The pulse train analyzer of claim 25 said pulse generator further comprising:

a pulse train generator coupled to receive said set of pitch values and said set of phase values, said set of pitch values and said set of phase values derived from said perceptually weighted speech signal, said pulse train generator producing said excitation vector in the form of multiple pitch spaced pulses based upon said set of pitch values, said set of phase values, and a pulse; and

a pulse shape generator coupled to said pulse train generator, said pulse shape generator producing a pulse using a formula corresponding to the time inversion of the pulse response.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of Invention

The present invention generally relates to speech coding at low bit rates (in a range 2.4-4.8 kb/s). In particular, the present invention relates to improving excitation generating and linear predicting coefficient coding directed at the reduction of the number of data bits for coded speech.

2. Description of Related Art

Digital speech communication systems including voice storage and voice response facilities utilize signal compression to reduce the bit rate needed for storage and/or transmission. As it is well known in the art, a speech pattern contains redundancies that are not essential to its apparent quality. Removal of redundant components of the speech pattern significantly lowers the number of bits required to synthesize the speech signal. A goal of effective digital speech coding is to provide an acceptable subjective quality of synthesized speech at low bit rates. However, the coding must also be fast enough to allow for real time implementation.

One method used to partially achieve these goals is based on the standard Linear Prediction (LP) technique. The characteristic features of this technique are the following. The sampled and quantized speech signal is partitioned into successive intervals (frames), then a set of parameters representative of the interval speech is generated. The parameter set includes linear prediction coefficients (LPCs) which determine an LP filter, and the best excitation signal. The best LPCs and excitation are then used to produce a synthesized signal close to the original speech signal. This is done on a per frame basis.

The best excitation is typically found through a look-up in a table, or codebook. The codebook includes vectors whose components are consecutive excitation samples. Each vector contains the same number of excitation samples as there are speech samples in a frame.

One of the most effective approaches of this type is the Code Excited Linear Prediction (CELP) method which was disclosed in "Predictive Coding of Speech at Low Bit Rates", Atal B.S., IEEE Transactions on Communications, vol. COM-30, No. 4, (April, 1982), 600-614.

FIG. 1 illustrates how a CELP implementation generates the best excitation for an LP filter such that the output of the filter closely approximates input speech.

In each frame the input speech signal is pre-filtered by a fixed digital pre-filter 100. Next, the pre-filtered speech is processed by linear prediction analyzer 101 to estimate the linear predictive filter A(z) of a prescribed order. Each frame is broken into a predetermined number of subframes. This allows excitations to be generated for each subframe. Each speech vector, for a given subframe, is passed through the ringing removal and perceptual weighting module 102. The speech signal is perceptually predistorted by a linear filter with the transfer function W(z)=A(z)/A(.gamma.z) for some .gamma.. The output w, of module 102, is analyzed by the long-term prediction analyzer 103 to obtain a periodic (pitch) component p relating to the excitation. The best pitch excitation is found by searching the index (code word number) I.sub.A in an adaptive codebook (ACB) and computing the optimal gain factor g.sub.A. These jointly minimize the squared norm .vertline..vertline.d.vertline..vertline..sup.2 of the vector d=w-bg.sub.A, where b denotes the response of the synthesis filter 1/A(z.gamma.) 104 excited by p. For this purpose, an exhaustive search in an ACB is performed to find the maximal value of the match function:

M=(w,b).sup.2 /(b,b).

The optimal gain value is determined as follows:

g.sub.A =(w,b)/(b,b).

The residual vector u=w-b g.sub.A from the output of adder 105 enters the stochastic codebook analyzer 108. Here the best residual excitation index I.sub.S, and the optimal gain factor g.sub.s, are found. These jointly minimize the squared norm .vertline..vertline.d.vertline..vertline..sup.2 of the error vector d=u-rg.sub.s, where r denotes the response of the stochastic codebook analyzer 108's synthesis filter excited by the code word c, from the precomputed stochastic codebook 109. Using the multiplier 106, multiplier 110, and adder 107, we obtain the resulting excitation vector e for a given subframe as the following sum:

e=pg.sub.A +cg.sub.s.

For the CELP speech coding technique, the synthesized speech quality rapidly degrades as data rates are reduced. For example, at 4.8 kb/s, a 10-bit codebook is generally used. However, at 2.4 kb/s, the number of bits of the codebook must be decreased to 5. Since 5 bits are too small to cover many types of speech signals, the speech quality is abruptly degraded at a bit rate lower than 4.8 kb/s.

Various improvements of the CELP technique exist. These techniques attempt to provide acceptable speech compression at data rates below 4800 bps. Such techniques are reported in the following references:

Zinser R. L., Koch S. R. "CELP coding at 4.0 kb/sec and below: improvements to FS-1016." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-313 through I-316, March 1992;

Wang S., Gersho A. "Improved phonetically-segmented vector excitation coding at 3.4 kb/s." Proceedings of the 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. I-349 through I-352, March 1992;

J. Ha