WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Efficient codebook search for CELP vocoders    
United States Patent5187745   
Link to this pagehttp://www.wikipatents.com/5187745.html
Inventor(s)Yip; William C. (Scottsdale, AZ); Barron; David L. (Scottsdale, AZ)
AbstractA new way of determining correlation coefficients for stochastic codebook vectors for CELP coding of speech takes advantage of the sparsely populated nature of stochastic codebook vectors. N valued input signals (e.g., convolution vectors) to be correlated with N valued codebook vectors are fed to an N by N multiplexer or other selection means and the signal values either passed to an accumulator or not according to the state of N select inputs or other identification means determined from a memory store (e.g., an EPROM) whose entries correspond to the non-zero values of the codebook vectors. The accumulator output is the correlation of the codebook vector with the input signal. A sequencer steps through the entire codebook to provide correlation values for each vectors. The results are used to determine the optimum stochastic codebook vector for replicating the particular speech frame being analyzed.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5187745
Efficient codebook search for CELP vocoders - US Patent 5187745 Drawing
Efficient codebook search for CELP vocoders
Inventor     Yip; William C. (Scottsdale, AZ); Barron; David L. (Scottsdale, AZ)
Owner/Assignee     Motorola, Inc. (Schaumburg, IL)
Patent assignment
All assignments
Publication Date     February 16, 1993
Application Number     07/722,572
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 27, 1991
US Classification     704/219
Int'l Classification     G10L 005/00
Examiner     Kemeny; Emanuel S.
Assistant Examiner    
Attorney/Law Firm     Handy; Robert M.
Address
Parent Case    
Priority Data    
USPTO Field of Search     381/29 381/30 381/31 381/32 381/33 381/34 381/35 381/36 381/37 381/38 381/39 381/40
Patent Tags     efficient codebook search celp vocoders
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4910781
Ketchum
704/223
Mar,1990

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method for CELP coding speech by combining a first vector with a set of second vectors identified by index k, wherein the first and second vectors have values identified by indices n running from n=1 to N, comprising:

providing an N by N multiplexer having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a first logic level presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input;

supplying n=1 to N values of the first vector to the n=1 to N first inputs of the multiplexer;

presenting n=1 to N values of the second vector of index k=1 to n=1 to N select means of the multiplexer, the second vector providing at the n=1 to N select means the first logic level for some values of n and the second logic level for other values of n;

adding together values of the first vector coupled to the multiplexer output to provide a sum;

repeating the presenting and adding steps for further values of k; and

synthesizing speech based on whichever sum identifies a second vector giving the closest match to target speech.

2. The method of claim 1 further comprising determining which vector k=j has the largest sum.

3. The method of claim 1 wherein the supplying, presenting and adding steps are repeated for other first vectors.

4. A method for CELP coding speech by combining a first vector with a set of second vectors identified by index k, wherein the first and second vectors have values identified by indices n running from n=1 to N, comprising:

dividing each second vector into two portion, a first portion having values 0, +1 corresponding to the location of values of 0, +1 of the second vector and a second portion having values 0, +1 corresponding to the location of values 0, -1 of the second vector, comprising:

providing first and second N by N multiplexers each having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a first logic level presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input;

supplying n=1 to N values of the first vector to the n=1 to N first inputs of the first and second multiplexers;

presenting n=1 to N values of the k=1 first portion of the second vector to n=1 to N select means of the first multiplexer and presenting the n=1 to N values of the k=1 second portion of the second vector to n=1 to N select means of the second multiplexer;

adding together values of the first vector coupled to the output of the first multiplexer to provide a first sum and adding together values of the first vector at the output of the second multiplexer to provide a second sum;

combining the first and second sums to provide a result;

repeating the presenting, adding and combining steps for further values of k; and

synthesizing speech based on whichever result identifies a second vector giving the closest match to target speech.

5. The method of claim 4 further comprising comparing the results for each value of k to determine the value of k having the largest result.

6. The method of claim 4 wherein the supplying, presenting, adding and combining steps are repeated for other first vectors.

7. An apparatus for CELP coding speech by combining a first vector with a set of second vectors identified by an index k, wherein the first and second vectors have values identified by indices n running from n=1 to N, comprising:

an N by N multiplexer having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a first logic level presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input;

means for supplying n=1 to N values of the first vector to the n=1 to N first inputs of the multiplexer;

means for presenting n=1 to N values of the second vector of index k=1 to the n=1 to N select means of the multiplexer, the second vector providing at the n=1 to N select means the first logic level for some values of n and the second logic level for other values of n;

means coupled to the multiplexer output for adding together values of the first vector transferred to the outputs of the multiplexer to provide a sum;

means for indexing k from k=1 to k=K; and

means for synthesizing speech based on whichever sum identifies a second vector giving the closest match to target speech.

8. The apparatus of claim 7 further comprising means for determining which vector k=j has the largest sum.

9. The apparatus of claim 7 further comprising means for supplying other first vectors.

10. An apparatus for CELP coding speech by combining a first vector with a set of second vectors identified by index k and having values identified by indices n=1 to N, comprising:

memory means for storing first portions of the second vectors, and having values 0, +1 corresponding to the locations of values 0, +1 of the second vectors;

memory means for storing second portions of the second vectors, and having values 0, +1 corresponding to the locations of values 0, -1 of the second vectors, comprising:

first and second N by N multiplexers each having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a +1 presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a 0 presented to the n.sup.th select means couples the n.sup.th output to the second input, the first multiplexer coupled to the first memory means and the second multiplexer coupled to the second memory means;

means for supplying n=1 to N values of the first vector to the n=1 to N first inputs of the first and second multiplexers;

means for presenting n=1 to N values of the k=1 first portion of the second vector to n=1 to N select means of the first multiplexer;

means for presenting the n=1 to N values of the k=1 second portion of the second vector to n=1 to N select means of the second multiplexer;

first adder coupled to the outputs of the first multiplexer for summing values of the first vector appearing at outputs of the first multiplexer to produce a first sum;

second adder coupled to the outputs of the second multiplexer for summing values of the first vector appearing at outputs of the second multiplexer to produce a second sum;

means for combining the first and second sums to produce a result;

means for indexing k to load first and second portions of other second vectors into the memory means and for multiplexing, adding and combining to produce other results; and

means for synthesizing speech based on whichever result identifies a second vector giving the closest match to target speech.

11. The apparatus of claim 10 further comprising means for comparing the results for each value of k to determine the value of k having the largest result.

12. The apparatus of claim 10 further comprising means for providing other first vectors.

13. A method for CELP coding speech using a combination of a first vector V(n) having values identified by index n running from n=1 to N, and a set of the second vectors S.sub.k (n) wherein each of the second vectors is identified by index k and wherein each of the second vectors has up to N values which are either zero or non-zero and are identified by index n from n=1 to N, comprising:

identifying indices n.sub.k,i of S.sub.k (n) for different k wherein S.sub.k (n.sub.i) are non-zero;

adding values of the V(n) corresponding to indices n.sub.k,i to form sums Q(k);

identifying the value k=j corresponding to the largest value Q(k=j); and

synthesizing.speech using S.sub.k =j(n).

14. The method of claim 13 wherein successive vectors of the set of second vectors are determined by overlap of the preceding second vector according to an overlap amount .DELTA.k,.DELTA.n, wherein the identifying and adding steps comprise:

identifying for k=1 indices n.sub.1,i of S.sub.k (n) wherein S.sub.1 (n.sub.i) are non-zero:

starting from n.sub.1,i and using the overlap amount .DELTA.k,.DELTA.n, determining further indices n.sub.k,i' for k>1 wherein S.sub.k (n.sub.i') are non-zero; and

adding values of the V(n) for such indices and further indices to form sums Q(k).

15. The method of claim 14 further comprising identifying for k.gtoreq.2, a first index n.sub.k,i" not previously identified wherein S.sub.k (n.sub.i") is non-zero, and then, starting from index n.sub.k,i" determining still further indices n.sub.k,i"' for k.gtoreq.3 wherein S.sub.k (n.sub.i"') are non-zero using the overlap amount, and adding values of v(n) for such still further indices to further form sums Q(k).

16. An apparatus for CELP coding speech using a combination of a first vector V(n) having values identified by index n running from n=1 to N, and a set of the second vectors S.sub.k (n) wherein each of the second vectors is identified by index k and wherein each of the second vectors has up to N values which are either zero or non-zero and are identified by index n from n=1 to N, comprising:

means for identifying indices n.sub.k,i of S.sub.k (n) for different k wherein S.sub.k (n.sub.i) are non-zero;

means for adding values of the V(n) corresponding to indices n.sub.k,i to form sums Q(k);

means for identifying the value k=j corresponding to the largest value Q(k=j); and

means for synthesizing.speech using S.sub.k =j(n).

17. The apparatus of claim 16 wherein successive vectors of the set of second vectors are determined by overlap of the preceding second vector according to an overlap amount .DELTA.k,.DELTA.n, wherein the means for identifying and adding comprise:

means for identifying for k=1 indices n.sub.1,i of S.sub.k (n) wherein S.sub.1 (n.sub.i) are non-zero:

means for determining further indices n.sub.k,i' for k>1 wherein S.sub.k (n.sub.i) are non-zero, starting from n.sub.1,i and using the overlap amount .DELTA.k,.DELTA.n; and

means for adding values of the V(n) for such indices and further indices to form sums Q(k).

18. The apparatus of claim 17 wherein the means for identifying, determining and adding comprise, means for identifying for k.gtoreq.2, a first index n.sub.k,i" not previously identified wherein S.sub.k (n.sub.i") is non-zero, means for determining still further indices n.sub.k,i"' for k.gtoreq.3 wherein S.sub.k (n.sub.i"') are non-zero starting from index n.sub.k,i" and using the overlap amount, and means for adding values of V(n) for such still further indices to further form sums Q(k).
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The present invention concerns an improved means and method for digital coding of speech or other analog signals and, more particularly, code excited linear predictive coding.

BACKGROUND OF THE INVENTION

Code Excited Linear Predictive (CELP) coding is a well-known stochastic coding technique for speech communication In CELP coding, the short-time spectral and long-time pitch are modeled by a set of time-varying linear filters. In a typical speech coder based communication system, speech is sampled by an A/D converter at approximately twice the highest frequency desired to be transmitted, e.g., an 8 KHz sampling frequency is typically used for a 4 KHz voice bandwidth. CELP coding synthesizes speech by utilizing encoded excitation information to excite a linear predictive (LPC) filter. The excitation, which is used as inputs to the filters, is modeled by a codebook of white Gaussian signals. The optimum excitation is found by searching through a codebook of candidate excitation vectors on a frame-by-frame basis.

LPC analysis is performed on the input speech frame to determine the LPC parameters. Then the analysis proceeds by comparing the output of the LPC filter with the digitized input speech, when the LPC filter is excited by various candidate vectors from the table, i.e., the code book. The best candidate vector is chosen based on how well speech synthesized using the candidate excitation vector matches the input speech. This is usually performed on several subframes of speech

After the best match has been found, information specifying the best codebook entry, the LPC filter coefficients and the gain coefficients are transmitted to the synthesizer. The synthesizer has the same copy of the codebook and accesses the appropriate entry in that codebook, using it to excite the same LPC filter.

The codebook is made up of vectors whose components are consecutive excitation samples. Each vector contains the same number of excitation samples as there are speech samples in the subframe or frame. The excitation samples can come from a number of different sources, Long term pitch coding is determined by the proper selection of a code vector from an adaptive codebook. The adaptive codebook is a set of different pitch periods of the previously synthesized speech excitation waveform.

The optimum selection of a code vector, either from the stochastic or the adaptive codebooks, depends on minimizing the perceptually weighted error function. This error function is typically derived from a comparison between the synthesized speech and the target speech for each vector in the codebook. These exhaustive comparison procedures require a large amount of computation and are usually not practical for a single Digital Signal Processor (DSP) to implement in real time. The ability to reduce the computation complexity without sacrificing voice quality is important in the digital communications environment.

The error function, codebook vector search, calculations are performed using vector and matrix operations of the excitation information and the LPC filter. The Problem is that a large number of calculations, for example, approximately 5.times.10.sup.8 multiply-add operations per second for a 4.8 Kbps vocoder, must be performed. Prior art arrangements have not been entirely successful in reducing the number of calculations that must be performed. Thus, a need continues to exist for improved CELP coding means and methods that reduce the computational burden without sacrificing voice quality.

A prior art 4.8 k bit/second CELP coding system is described in Federal Standard FED-STD-1016 issued by the General Services Administration of the United States Government. Prior art CELP vocoder systems are described for example in U.S. Pat. Nos. 4,899,385 and 4,910,781 to Ketchum et al., 4,220,819 to Atal, 4,797,925 to Lin, and 4,817,157 to Gerson, which are incorporated herein by reference.

Typical prior art CELP vocoder systems use an 8 kHz sampling rate and a 30 millisecond frame duration divided into four 7.5 millisecond subframes. Prior art CELP coding consists of three basic functions: (1) short delay "spectrum" prediction, (2) long delay "pitch" search, and (3) residual "code book" search.

While the present invention is described for the case of analog signals representing human speech, this is merely for convenience of explanation and, as used herein, the word "speech" is intended to include any form of analog signal of bandwidth within the sampling capability of the system.

SUMMARY OF THE INVENTION

A new way of CELP coding speech simplifies the recursive loop used to poll stochastic code book vectors by more quickly and easily determining the correlation coefficients of stochastic codebook vectors with other vectors generated by the CELP codingprocess in order to identify the optimum stochastic codebook vector for replicating the target speech.

There is provided in general a method for CELP coding speech by using a combination of a first vector V(n) having values identified by index n running from n=1 to N, and a set of the second vectors S.sub.k (n) wherein each of the second vectors is identified by index k and wherein each of the second vectors has up to N values which are either zero or non-zero and are identified by index n from n=1 to N, comprising, identifying indices n.sub.k,i of S.sub.k (n) for different k wherein S.sub.k (ni) are non-zero, adding values of the V(n) corresponding to indices n.sub.k,i to form sums Q(k), identifying the value k=j corresponding to the largest value Q(k=j), and synthesizing.speech using S.sub.k=j (n).

In a preferred embodiment, successive vectors of the set of second vectors are determined by overlap of the preceding second vector according to an overlap amount .DELTA.k,.DELTA.n, and the identifying and adding steps comprise, identifying for k=1 indices n.sub.1,i of S.sub.k (n) wherein S.sub.1 (n.sub.i) are non-zero, starting from n.sub.1,i and using the overlap amount .DELTA.k,.DELTA.n, determining further indices n.sub.k,i' for k>1 wherein S.sub.k (n.sub.i') are non-zero, and adding values of the V(n) for such indices and further indices to form sums Q(k). This procedure using the overlap amount.

In another embodiment an N by N multiplexer having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means is used to combine the vectors, wherein a first logic level presented to nth select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input, supplying n=1 to N values of the first vector to the n=1 to N first inputs of the multiplexer, presenting n=1 to N values of the second vector of index k=1 to n=1 to N select means of the multiplexer, the second vector providing at the n=1 to N select means the first logic level for some values of n and the second logic level for other values of n, adding together values of the first vector coupled to the multiplexer output to provide a sum, repeating the presenting, and adding steps for further values of k, and synthesizing speech based on whichever second vector has the sum giving the closest match to target speech. Desirably, the method further comprises determining which vector k=j has the largest sum.

In a still further embodiment, each second vector is divided into two portion, a first portion having values 0, +1 corresponding to the location of values of 0, +1 of the second vector and a second portion having values 0, +1 corresponding to the location of values 0, -1 of the second vector, and the steps of providing first and second N by N multiplexers each having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a first logic level presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input, supplying n=1 to N values of the first vector to the n=1 to N first inputs of the first and second multiplexers, presenting n=1 to N values of the k=1 first portion of the second vector to n=1 to N select means of the first multiplexer, adding together values of the first vector coupled to the output of the first multiplexer to provide a first sum, presenting the n=1 to N values of the k=1 second portion of the second vector to n=1 to N select means of the second multiplexer, adding together values of the first vector at the output of the second multiplexer to provide a second sum, combining the first and second sums to provide a result, repeating the presenting, adding and combining steps for further values of k, and synthesizing speech based on whichever second vector has the result giving the closest match to target speech.

There is provided an apparatus for CELP coding speech by combining a first vector and with a set of second vectors identified by an index k, wherein the first and second vectors having values identified by an indices n running from n=1 to N, comprising, an N by N multiplexer having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a first logic level presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a second logic level presented to the n.sup.th select means couples the n.sup.th output to the second input, means for supplying n=1 to N values of the first vector to the n=1 to N first inputs of the multiplexer, means for presenting n=1 to N values of the second vector of index k=1 to the n=1 to N select means of the multiplexer, the second vector providing at the n=1 to N select means the first logic level for some values of n and the second logic level for other values of n, means coupled to the multiplexer output for adding together values of the first vector to provide a sum, means for indexing k from k=1 to k=K, and means for synthesizing speech based on whichever sum identifies a second vector giving the closest match to target speech. It is preferred to further have a means for determining which vector k=j has the largest sum.

In a preferred embodiment there is provided, memory means for storing first portions of the second vectors having values 0, 1 corresponding to the locations of values of 0, +1 of the second vectors, memory means for storing second portions of the second vectors having values 0, +1 corresponding to the locations of values 0, -1 of the second vectors, first and second N by N multiplexers each having n=1 to N outputs, n=1 to N first inputs, second inputs, and n=1 to N select means, wherein a+1 presented to n.sup.th select means couples the n.sup.th output to the n.sup.th first input and a 0 presented to the n.sup.th select means couples the n.sup.th output to the second input, the first multiplexer coupled to the first memory means and the second multiplexer coupled to the second memory means, supplying n=1 to N values of the first vector to the n=1 to N first inputs of the first and second multiplexers, presenting n=1 to N values of the k=1 first portion of the second vector to n=1 to N select means of the first multiplexer, first adder coupled to the outputs of the first multiplexer for summing values of the first vector appearing at outputs of the first multiplexer to produce a first sum, second adder coupled to the outputs of the second multiplexer for summing values of the first vector appearing at outputs of the second multiplexer to produce a second sum, means for combining the first and second sums to produce a result, means for indexing k to load first and second portions of other second vectors into the memory means, multiplex, add and combine to produce other results, and means for synthesizing speech based on whichever result identifies a second vector giving the closest match to target speech. It is desirable to have means for comparing the results for each value of k to determine the value of k having the largest result.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 illustrates in simple block diagram and generalized form a CELP vocoder system;

FIGS. 2A-B illustrates, in simplified block diagram form, a CELP coder according a preferred embodiment of the present invention;

FIG. 3 illustrates, in greater detail, a portion of the coder of FIG. 2B, according to a first embodiment;

FIG. 4 illustrates, in greater detail, a portion of the coder of FIG. 2B, according to a preferred embodiment of the present invention;

FIG. 5 illustrates an apparatus for providing autocorrelation coefficients of the adaptive codebook vectors according to a preferred embodiment of the present invention;

FIG. 6 illustrates the content of a small stochastic codebook of a type used for CELP coding;

FIG. 7 is a simplified block diagram of a cross-section function according to the present invention;

FIG. 8 is a schematic diagram showing further details of the multiplexers used in FIG. 7; and

FIGS. 9-10 illustrate the content of first and second memory means whose entries correspond to non-zero entries of the codebook of FIG. 6.

DETAILED DESCRIPTION

FIG. 1 illustrates, in simplified block diagram form, a vocoder transmission system utilizing CELP coding. CELP coder 100 receives incoming speech 102 and produces CELP coded output signal 104. CELP coded signal 104 is sent via transmission path or channel 106 to CELP decoder 300 where facsimile 302 of original speech signal 102 is reconstructed by synthesis. Transmission channel 106 may have any form, but typically is a wired or radio communication link of limited bandwidth. CELP coder 100 is frequently referred to as an "analyzer" because its function is to determine CELP code parameters 104 (e.g., code book vectors, gain information, LPC filter parameters, etc.) which best represent original speech 102. CELP decoder 300 is frequently referred to as a synthesizer because its function is to recreate output synthesized speech 302 based on incoming CELP coded signal 104. CELP decoder 300 is conventional and is not a part of the present invention and will not be discussed further.

FIGS. 2A-B show CELP coder 100 in greater detail and according to a preferred embodiment of the present invention. Incoming analog speech signal 102 is first bandpassed by filter 110 to prevent aliasing. Band-passed analog speech signal 111 is then sampled by analog to digital (A/D) converter 112. Sampling is usually at the Nyquist rate, for example at 8 KHz for a 4 KHz CELP vocoder. Other sampling rates may also be used. Any suitable A/D converter may be used. Digitized signal 113 from A/D converter 112 comprises a train of samples, e.g., a train of narrow pulses whose amplitudes correspond to the envelop of the speech waveform

Digitized speech signal 113 is then divided into frames or blocks, that is, successive time brackets containing a predetermined number of digitized speech samples, as for example, 60, 180 or 240 samples per frame. This is customarily referred to as the "frame rate" in CELP processing. Other frame rates may also be used. This is accomplished in framer 114. Means for accomplishing this are well known in the art. Successive speech frames 115 are stored in frame memory 116. Output 117 of frame memory 116 sends frames 117 of digitized speech 115 to blocks 122, 142, 162 and 235 whose function will be presently explained.

Those of skill in the art understand that frames of digitized speech may be further divided into subframes and speech analysis and synthesis performed using subframes. As used herein, the word "frame", whether singular or plural, is intended to refer to both frames and subframes of digitized speech.

CELP coder 100 uses two code books, i.e., adaptive codebook 155 and stochastic codebook 180 (see FIG. 2B). For each speech frame 115, coder 100 calculates LPC coefficients 123 representing the formant characteristics of the vocal tract. Coder 100 also searches for entries (vectors) from both stochastic codebook 180 and adaptive codebook 155 and associated scaling (gain) factors that, when used to excite a filter with LPC coefficients 123, best approximates input speech frame 117. The LPC coefficients, the codebook vectors and the scaling (gain coefficient) information are processed and sent to channel coder 210 where they are combined to form coded CELP signal 104 which is transmitted by path 106 to CELP decoder 300. The process by which this is done will now be explained in more detail.

Referring now to data path 121 containing blocks 122, 125, 130 and 135, LPC analyzer 122 is responsive to incoming speech frames 117 to determine LPC coefficients 123 using well-known techniques. LPC coefficients 123 are in the form of Line Spectral Pairs (LSPs) or Line Spectral Frequencies (LSFs), terms which are well understood in the art. LSPs 123 are quantized by coder 125 and quantized LPC output signal 126 sent to channel coder 210 where it forms a part (i.e., the LPC filter coefficients) of CELP signal 104 being sent via transmission channel 106 to decoder 300.

Quantized LPC coefficients 126 are decoded by decoder 130 and the decoded LSPs sent via output signals 131, 132 respectively, to spectrum inverse filters 145 and 170, which are described in connection with data paths 141 and 161, and via output signal 133 to bandwidth expansion weighting generator 135. Signals 131, 132 and 133 contain information on decoded quantized LPC coefficients. Means for implementing coder 125 and decoder 130 are well known in the art.

Bandwidth expansion weighting generator 135 provides a scaling factor (typically =0.8) and performs the function of bandwidth expansion of the formants, producing output signals 136, 137 containing information on bandwidth expanded LPC filter coefficients. Signals 136, 137 are sent respectively, to cascade weighting filters 150 and 175 whose function will be explained presently.

Referring now to data path 141 containing blocks 142, 145 and 150, spectral predictor memory subtracter 142 subtracts previous states 196 (i.e., left by the immediately preceding frame) in short term spectrum predictor filter 195 (see FIG. 2B) from input sampled speech 115 arriving from frame memory 116 via 117. Subtractor 142 provides speech residual signal 143 which is digitized input speech 115 minus what is referred to in the art as the filter ringing signal or the filter ringdown. The filter ringing signal arises because an impulse used to excite a filter (e.g., LPC filter 195 in FIG. 2B) in connection with a given speech frame does not completely dissipate by the end of that frame, but may cause filter excitation (i.e., "ringing") extending into a subsequent frame. This ringing signal appears as distortion in the subsequent frame, since it is unrelated to the speech content of that frame. If the ringing signal is not removed, it affects the choice of code parameters and degrades the quality of the speech synthesized by decoder 300.

Speech residual signal 143 containing information on speech 115 minus filter ringing signal 196 is fed into spectrum inverse filter 145 along with signal 131 from decoder 130. Filter 145 is typically implemented as a zero filter (i.e. A(z) =A.sub.o +A.sub.1 z.sup.-1 +. . .+A.sub.n z.sup.-n where the A's are LPC filter coefficients and z is "Z transform" of the filter), but other means well known in the art may also be used. Signals 131 and 143 are combined in filter 145 by convolution to create LPC inverse-filtered speech. Output signal 146 of filter 145 is sent to cascade weighting filter 150. Filter 150 is typically implemented as a pole filter (i.e., 1/A(z/r), where A(z/r)=A.sub.o +A.sub.1 rz.sup.-1 +. . .+A.sub.n r.sup.n z.sup.-n, and the A's are LPC filter coefficients and r is an expansion factor and z is "Z transform" of the filter), but other means well known in the art may also be used.

Output signal 152 from block 150 is perceptually weighted LPC impulse function H(n) derived from the convolution of an impulse function (e.g., 1, 0, 0, ... , 0) with bandwidth expanded LPC coefficient signal 136 arriving from block 135. Signal 136 is also combined with signal 146 in block 150 by convolution to create at output 151, perceptually weighted short delay target speech signal X(n) derived from path 141.

Outputs 151 and 152 of weighting filter 150 are fed to adaptive codebook searcher 220. Target speech signal 151 (i.e., X(n)) and perceptually weighted impulse function signal 152 (i.e., H(n)) are used by the searcher 220 and adaptive codebook 155 to determine the pitch period (i.e., the excitation vector for filter 195) and the gain therefore which most closely corresponding to digitized input speech frame 117. The manner in which this is accomplished is explained in more detail in connection with FIGS. 3-4.

Referring now to data path 161 which contains blocks 162, 165, 170 and 175, pitch predictor memory subtractor 162 subtracts previous filter states 192 in long delay pitch Predictor filter 190 from digitized input sampled speech 115 received from memory 116 via 117 to give output signal 163 consisting of sampled speech minus the ringing of long delay pitch predictor filter 190. Output signal 163 is fed to spectrum predictor memory subtractor 165.

Spectral memory subtractor 165 performs the same function as described in connection with block 142 and subtracts out short delay spectrum predictor ("spectral") filter ringing or ringdown signal 196 from digitized input speech frame 117 transmitted via pitch subtracter 162. This produces remainder output signal 166 consisting of current frame sampled speech 117 minus the ringing of long delay ("pitch") filter 190 and short delay ("spectral") filter 195 left over from the previous frame. Remainder signal 166 is fed to spectrum inverse filter 170 which is analogous to block 145.

Inverse filter 170 receives remainder signal 166 and output 132 of decoder 130. Signal 132 contains information on decoded quantized LPC coefficients. Filter 170 combines signals 166 and 132 by convolution to create output signal 171 comprising LPC inverse-filtered speech. Output signal 171 is sent to cascade weighting filter 175 analogous to block 150.

Weighting filter 175 receives signal 171 from filter 170 and signal 137 from bandwidth expansion weighting generator 135. Signal 137 contains information on bandwidth expanded LPC coefficients. Cascade weighting filter 175 produces output signals 176, 177. Filter 175 is typically implemented as a pole filter (i.e. only poles in the complex plane), but other means well known in the art may also be used.

Signals 137, 171 are combined in filter 175 by convolution to create at output 177, perceptually weighted LPC impulse function H(n) derived from path 121, and create at output 176, perceptually weighted long delay and short delay target speech signal Y(n) derived from path 161. Output signals 176, 177 are sent to stochastic searcher 225.

Stochastic searcher 225 uses stochastic codebook 180 to select an optimum white noise vector and a optimum scaling (gain) factor which, when applied to pitch and LPC filters 190, 195 of predetermined coefficients, provide the best match to input digitized speech frame 117. Stochastic searcher 225 performs operations well known in the art and generally analogous to those performed by adaptive searcher 220 described more fully in connection with FIGS. 3-4.

In summary, in chain 141, spectrum inverse filter 145 receives LSPs 131 and residual 143 and sends its output 146 to cascade weighting filter 150 to generate perceptually weighted LPC impulse function response H(n) at output 152 and perceptually weighted short delay target speech signal X(n) at output 151. In chain 161, spectrum inverse filter 170 receives LSPs 132 and short delay and long delay speech residual 166, and sends its output 171 to weighting filter 175 to generate perceptually weighted LPC impulse function H(n) at output 177 and perceptually weighted short and long term delay target speech signal Y(n) at output 176.

Blocks 135, 150, 175 collectively labelled 230 provide the perceptual weighting function. The decoded LSPs from chain 121 are used to generate the bandwidth expand weighting factor at outputs 136, 137 in block 135. Weighting factors 136, 137 are used in cascade weighting filters 150 and 175 to generate perceptually weighted LPC impulse function H(n). The elements of perceptual weighting block 230 are responsive to the LPC coefficients to calculate spectral weighting information in the form of a matrix that emphasizes those portions of speech that are known to have important speech content. This spectral weighting information 1/A(z/r) is based on finite impulse response H(n) of cascade weighting filters 150, and 175. The utilization of finite impulse response function H(n) greatly reduces the number of calculations which codebook searchers 220 and 225 must perform. The spectral weighting information is utilized by the searchers in order to determine the best candidate for the excitation information from the codebooks 155 and 180.

Continuing to refer to FIGS. 2A-B, adaptive codebook searcher 220 generates optimum adaptive codebook vector index 221 and associated gain 222 to be sent to channel coder 210. Stochastic codebook searcher 225 generates optimum stochastic codebook vector index 226, and associated gain 227 to be sent to channel coder 210. These signals are encoded by channel coder 210.

Channel coder 210 receives five signals: quantized LSPs 126 from coder 125, optimum stochastic codebook vector index 226 and gain setting 227 therefore, and optimum adaptive codebook vector index 221 and gain setting 222 therefore. The output of channel coder 210 is serial bit stream 104 of the encoded parameters. Bit stream 104 is sent via channel 106 to CELP decoder 300 (see FIG. 1) where, after decoding, the recovered LSPs, codebook vectors and gain settings are applied to identical filters and codebooks to produce synthesized speech 302.

As has already been explained, CELP coder 100 determines the optimum CELP parameters to be transmitted to decoder 300 by a process of analysis, synthesis and comparison. The results of using trial CELP parameters must be compared to the input speech frame by frame so that the optimum CELP parameters can be selected. Blocks 190, 195, 197, 200, 205, and 235 are used in conjunction with the blocks already described in FIGS. 2A-B to accomplish this. The selected CELP parameters (LSP coefficients, codebooks vectors and gain, etc.) are passed via output 211 to decoder 182 from whence they are distributed to blocks 190, 195, 197, 200, 205, and 235 and thence back to blocks 142, 145, 150, 162, 165, 170 and 175 already discussed.

Block 182 is identified as a "channel decoder" having the function of decoding signal 211 from coder 210 to recover signals 126, 221, 222, 226, 227. However, those of skill in the art will understand that the code-decode operation indicated by blocks 210-182 may be omitted and signals 126, 221, 222, 226, 227 fed in uncoded form to block 182 with block 182 merely acting as a buffer for distributing the signals to blocks 190, 195, 197, 200, 205, and 235. Either arrangement is satisfactory, and the words "channel coder 182", "coder 182" or "block 182" are intended to indicate either arrangement or any other means for passing such information.

The output signals of decoder 182 are quantized LSP signal 126 which is sent to block 195, adaptive codebook index signal 221 which is sent to block 190, adaptive codebook vector gain index signal 222 which is sent to block 190, stochastic codebook index signal 226 which is sent to block 180, and stochastic codebook vector gain index signal 227 which is sent to block 197. These signals excite filter 190 thereby producing output 191 which is fed to to adaptive codebook 155 and to filter 195. Output 191 in combination with output 126 of coder 182, further excites filter 195 to produce synthesized speech 196.

Synthesizer 228 comprises gain multiplier 197, long delay pitch predictor 190, and short delay spectrum predictor 195, subtractor 235, spectrum inverse filter 200 and cascade weighting filter 205. Using the decoded parameters 126, 221, 222, 226 and 227, stochastic code vector 179 is selected and sent to gain multiplier 197 to be scaled by gain parameter 226. Output 198 of gain multiplier 197 is used by long delay pitch predictor 190 to generate speech residual 191. Filter state output information 192, also referred to in the art as the speech residual of predictor filter 190, is sent to pitch memory subtracter 162 for filter memory update. Short delay spectrum predictor 195, which is an LPC filter whose parameters are set by incoming LPC parameter signal 126, is excited by speech residual 191 to produce synthesized digital speech output 196. The same speech residual signal 191 is used to update adaptive codebook 155.

Synthesized speech 196 is subtracted from digitized input speech 117 by subtracter 235 to produce digital speech remainder output signal 236. Speech remainder 236 is fed to the spectrum inverse filter 200 to generate residual error signal 202. Output signal 202 is fed to the cascade weighting filter 205, and output filter state information 206, 207 is used to update cascade weighting filters 150 and 175 as previously described in connection with signal paths 141 and 161. Output signal 201, 203, which is the filter state information of spectrum inverse filter 200, is used to update the spectrum inverse filters 145 and 170 as previously described in connection with blocks 145, 170.

FIGS. 3-4 are simplified block diagrams of adaptive codebook searcher 220. FIG. 3 shows a suitable arrangement for adaptive codebook searcher 220 and FIG. 4 shows a further improved arrangement. The arrangement of FIG. 4 is preferred.

Referring now to FIGS. 3-4 generally, the information in adaptive codebook 155 is excitation information from previous frames For each frame, the excitation information consists of the same number of samples as the sampled original speech. Codebook 155 is conveniently organized as a circular list so that a new set of samples is simply shifted into codebook 155 replacing the earliest samples presently in the codebook. The new excitation samples are provided by output 191 of long delay pitch predictor 190.

When utilizing excitation information out of codebook 155, searcher 220 deals in sets, i.e., subframes and does not treat the vectors as disjointed samples. Searcher 220 treats the samples in codebook 155 as a linear array. For example, for 60 sample frames, searcher 220 forms the first candidate set of information by utilizing samples 1 through sample 60 from codebook 155, and the second set of candidate information by using samples 2 through 61 and so on. This type of codebook searching is often referred to as an overlapping codebook search. The present invention is not concerned with the structure and function of codebook 155, but with how codebook 155 is searched to identify the optimum codebook vector.

Adaptive codebook searcher 220 accesses previously synthesized pitch information 156 already stored in adaptive codebook 155 from output 191 in FIG. 2B, and utilizes each such set of information 156 to minimize an error criterion between target excitation 151 received from block 150 and accessed excitation 156 from codebook 155. Scaling factor or gain index 222 is also calculated for each accessed set of information 156 since the information stored in adaptive codebook 155 does not allow for the changes in dynamic range of human speech or other input signal.

The preferred error criterion used is the Minimum Squared Prediction Error (MPSE), which is the square of the difference between the original speech frame 115 from frame memory output 117 and synthetic speech 196 produced at the output of block 195 of FIG. 2B. Synthetic speech 196 is calculated in terms of trial excitation information 156 obtained from the codebook 155. The error criterion is evaluated for each candidate vector or set of excitation information 156 obtained from codebook 155, and the particular set of excitation information 156' giving the lowest error value is the set of information utilized for the present frame (or subframe).

After searcher 220 has determined the best match set of excitation information 156' to be utilized along with a corresponding best match scaling factor or gain 222', vector index output signal 221 corresponding to best match index 156' and scaling factor 222 corresponding to the best match scaling factor 222' are transmitted to channel encoder 210.

FIG. 3 shows a block diagram of adaptive searcher 220 according to a first embodiment and FIG. 4 shows adaptive searcher 220' according to a further improved and preferred embodiment. Adaptive searchers 220, 220' perform a sequential search through the adaptive codebook 155 vectors indices C.sub.1 (n). . . C.sub.K (n). During the sequential search operation, searchers 220, 220' accesses each candidate excitation vector C.sub.k (n) from the codebook 155 where k is an index running from 1 to K identifying the particular vector in the codebook and where n is a further index running from n=1 to n=N where N is the number of samples within a given frame. In a typical CELP application K=256 or 512 or 1024 and N=60 or 120 or 240, however, other values of K and N may also be used.

Adaptive codebook 155 contains sets of different pitch periods determined from the previously synthesized speech waveform. The first sample vector starts from the Nth sample of the synthesized speech waveform C.sub.k (N) which is located from the current last sample of the synthesized speech waveform back N samples In human voice, the pitch frequency is generally around 40 Hz to 500 Hz. This translates to about 200 to 16 samples. If fractional pitch is involved in the calculation, K can be 256 or 512 in order to represent the pitch range. Therefore, the adaptive codebook contains a set of K vectors C.sub.k (n) which are basically samples of one or more pitch periods of a particular frequency.

Referring now to FIG. 3, convolution generator 510 of adaptive codebook searcher 220 convolves each codebook vector C.sub.k (n), i.e., signal 156, with perceptually weighted LPC impulse response function H(n), i.e., signal 152 from cascade weighted filter 150. Output 512 of convolution generator 510 is then cross-correlated with target speech residual signal X(n) (i.e., signal 151 of FIGS. 2A-B) in cross-correlator 520. The convolution and correlation are done for each codebook vector C.sub.k (n) where n=1, . . . , N. The operation performed by convolution generator 510 is expressed mathematically by equation (1) below: ##EQU1## The operation performed by cross correlation generator 520 is expressed mathematically by equation (2) below: ##EQU2## Output 512 of convolution generator 510 is also fed to energy calculator 535 comprising squarer 552 and accumulator 553 (accumulator 553 provides the sum of the squares determined by squarer 552). Output 554 is delivered to divider 530 which calculates the ratio of signals 551 and 554. Output 521 of cross-correlator 520 is fed to squarer 525 whose output 551 is also fed to divider 530. Output 531 of divider 530 is fed to peak selector circuit 570 whose function is to determine which value C.sub.k (m) of C.sub.k (n) produces the best match, i.e., the greatest cross-correlation. This can be expressed mathematically by equations (3a) and (3b). Equation (3a) expresses the error E. ##EQU3## To minimize error E is to maximize the cross-correlation expressed by equation (3b) below, where G.sub.k is defined by equation (4): ##EQU4## The identification (index) of the optimum vector index C.sub.k (m) is delivered to output 221. Output 571 of peak selector 570 carries the gain scaling information associated with best match pitch vector C.sub.k (m) to gain calculator 580 which provides gain index output 222. The operation performed by gain calc