WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Source dependent channel coding with error protection    
United States Patent5091945   
Link to this pagehttp://www.wikipatents.com/5091945.html
Inventor(s)Kleijn; Willem B. (Batavia, IL)
AbstractA parameter communication arrangement where a parameter that is transmitted over a channel using m-bit codewords or labels is quantized before transmission as one of only p levels, where, significantly, p<k=2.sup.m. Since only p labels are needed to transmit the p levels, the unused k-p labels are advantageously available to provide redundancy. The receiver decodes the redundant labels in accordance with an error routine. An encoding table mapping from the p levels to p labels and a decoding table inverse mapping from the p labels to p levels are obtained using an optimization procedure to minimize the effect of channels errors. The optimization is based on the probability distribution for the p levels such that a relatively high proportion of the error protection made available by having redundant labels inures to the benefit of parameter levels which are more likely to be transmitted. The optimization procedure is a well known technique referred to as simulated annealing which is for the first time applied to source dependent channel coding.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5091945
Source dependent channel coding with error protection - US Patent 5091945 Drawing
Source dependent channel coding with error protection
Inventor     Kleijn; Willem B. (Batavia, IL)
Owner/Assignee     AT&T Bell Laboratories (Murray Hill, NJ)
Patent assignment
All assignments
Publication Date     February 25, 1992
Application Number     07/414,155
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 28, 1989
US Classification     704/219
Int'l Classification     G10L 009/18
Examiner     Shaw; Dale M.
Assistant Examiner     Knepper; David D.
Attorney/Law Firm     Watland; Ross T.
Address
Parent Case    
Priority Data    
USPTO Field of Search     381/29 381/30 381/31 381/32 381/33 381/34 381/35 381/36 381/37 381/38 381/39 381/40 364/513.5
Patent Tags     source dependent channel coding error protection
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
4899385
Ketchum
704/223
Feb,1990

[0 after 0 votes]
4718087
Papamichalis
704/222
Jan,1988

[0 after 0 votes]
4291405
Jayant
714/747
Sep,1981

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. Speech processing apparatus comprising

speech analyzer means responsive to input speech signals from a source of input speech signals for generating a plurality of parameter signals representing said input speech signals in accordance with a speech model, at least one of said parameter signals being quantized as one of p levels,

channel encoder means comprising

encoder memory means for storing an encoding table defining a mapping from each of said p levels to a unique one of p, m-bit label signals, where p<k=2.sup.m, and

means responsive to said speech analyzer means for transmitting, over a channel to a destination, the one of said p label signals that is associated with said one of said p levels in said encoding table,

channel decoder means comprising

decoder memory means for storing a decoding table defining the inverse of said encoding table mapping and

means, responsive to a label signal received at said destination from said channel, for decoding said received label signal as the one of said p levels associated with said received label signal in said decoding table inverse mapping when said received label signal is one of said p label signals, and for decoding said received label signal in accordance with an error routine when said received label signal is one of the k-p, m-bit label signals other than said p label signals, and

speech synthesizer means responsive to said decoding means for synthesizing speech based at least in part on said decoded label signal,

wherein said encoding table mapping stored by said encoder memory means and said decoding table inverse mapping stored by said decoder memory means are obtained to minimize the effect of channel errors and are obtained using simulated annealing based on a probability distribution of said p levels for said one parameter signal.

2. Speech processing apparatus in accordance with claim 1 wherein said encoder memory means further stores other mappings and said decoder memory means further stores other inverse mappings, said other mappings and said other inverse mappings being for use in communication of other parameter signals from said source over said channel to said destination, said other mappings and said other inverse mappings being obtained, concurrently with said encoding table mapping and said decoding table inverse mapping, using said simulated annealing.

3. Speech processing apparatus in accordance with claim 2 wherein said simulated annealing minimizes an overall error measure for said one parameter signal and said other parameter signals.

4. Speech processing apparatus in accordance with claim 1 wherein

said decoding table stored by said decoder memory means defines an additional mapping from each of said k-p label signals,

said decoding means decodes said received label signal in accordance with said additional mapping when said received label signal is one of said k-p label signals, and

said decoding table additional mapping is also obtained using said simulated annealing to minimize the effect of channel errors based on said probability distribution.

5. Speech processing apparatus in accordance with claim 4 wherein said inverse and additional mappings are obtained concurrently using said simulated annealing.

6. Speech processing apparatus in accordance with claim 1 wherein

said decoding means decodes said received label signal as a default level when said received label signal is one of said k-p label signals.

7. Speech processing apparatus in accordance with claim 1 wherein

said decoding means decodes said received label signal based on information received over said channel other than said received label signal when said received label signal is one of said k-p label signals.

8. Speech processing apparatus in accordance with claim 1 wherein

said decoding means decodes said received label signal as the same level that was obtained from a previous communication of said one parameter signal over said channel when said received label signal is one of said k-p label signals.

9. Speech processing apparatus in accordance with claim 1

said decoding table stored by said decoder memory means defines an additional mapping from each of certain ones of said k-p label signals,

said decoding means decodes said received label signal in accordance with said additional mapping when said received label signal is one of said certain ones of said k-p label signals, and

said decoding means decodes said received label signal as a default level when said received label signal is one of said k-p label signals other said certain ones.

10. Speech processing apparatus in accordance with claim 1

said decoding table stored by said decoder memory means defines an additional mapping from each of certain ones of said k-p label signals,

said decoding means decodes said received label signal in accordance with said additional mapping when said received label signal is one of said certain ones of said k-p label signals, and

said decoding means decodes said received label signal based on information received over said channel other than said received label signal when said received label signal is one of said k-p label signals other said certain ones.

11. Speech processing apparatus in accordance with claim 1

said decoding table stored by said decoder memory means defines an additional mapping from each of certain ones of said k-p label signals,

said decoding means decodes said received label signal in accordance with said additional mapping when said received label signal is one of said certain ones of said k-p label signals, and

said decoding means decodes said received label signal as the same level that was obtained from a previous communication of said one parameter signal over said channel when said received label signal is one of said k-p label signals other said certain ones.

12. Speech processing apparatus in accordance with claim 1 wherein said p label signals are selected from k, m-bit label signals as a result of said simulated annealing.

13. Speech processing apparatus in accordance with claim 1 wherein said model is a code excited linear prediction model.

14. Speech processing apparatus in accordance with claim 1 wherein said encoding table mapping and said decoding table mapping are obtained to minimize distortion in said synthesized speech.

15. Speech processing apparatus in accordance with claim 1 wherein p>2.sup.m-1.

16. Speech processing apparatus comprising

speech analyzer means responsive to input speech signals from a source of input speech signals for generating a plurality of parameter signals representing said input speech signals in accordance with a speech model, at least one of said parameter signals being quantized as one of p levels,

channel encoder means comprising

encoder memory means for storing an encoding table defining a mapping from each of said p levels to a unique one of p, m-bit label signals, where p<k=2.sup.m, and

means responsive to said speech analyzer means for transmitting, over a channel to a destination, the one of said p label signals that is associated with said one of said p levels in said encoding table,

channel decoder means comprising

decoder memory means for storing a decoding table defining the inverse of said encoding table mapping and defining an additional mapping from each of certain ones of the k-p, m-bit label signals other than said p label signals and

means, responsive to a label signal received at said destination from said channel, for decoding said received label signal as the one of said p levels associated with said received label signal in said decoding table inverse mapping when said received label signal is one of said p label signals, and for decoding said received label signal as defined by said additional mapping when said received label signal is one of said certain ones of said k-p label signals, and

speech synthesizer means responsive to said decoding means for synthesizing speech based at least in part on said decoded label signal,

wherein said encoding table mapping stored by said encoder memory means and said decoding table inverse mapping stored by said decoder memory means are obtained to minimize the effect of channel errors and are obtained based on a probability distribution of said p levels for said one parameter signal,

wherein said inverse and additional mappings are such that at least one of said p label signals differs in b bits, 1<=b<m, from a label signal which maps into the same level as said at least one of said p label signals and which also differs in b bits from a label signal which maps into a level other than said same level.

17. Speech processing apparatus in accordance with claim 16 wherein

said decoding means decodes said received label signal as a default level when said received label signal is one of said k-p label signals other than said certain ones.

18. Speech processing apparatus in accordance with claim 16 wherein

said decoding means decodes said received label signal based on information received over said channel other than said received label signal when said received label signal is one of said k-p label signals other than said certain ones.

19. Speech processing apparatus in accordance with claim 16 wherein

said decoding means decodes said received label signal as the same level that was obtained from a previous communication of said one parameter signal over said channel when said received label signal is one of said k-p label signals other than said certain ones.

20. Speech processing apparatus in accordance with claim 16 where p>2.sup.m-1.

21. Speech processing apparatus in accordance with claim 16 wherein said model is a code excited linear prediction model.

22. Speech processing apparatus in accordance with claim 16 wherein said encoding table mapping and said decoding table mapping are obtained to minimize distortion in said synthesized speech.

23. Speech processing apparatus comprising

speech analyzer means responsive to input speech signals from a source of input speech signals for generating a plurality of parameter signals representing said input speech signals in accordance with a speech model, at least one of said parameter signals being quantized as one of p levels,

channel encoder means comprising

encoder memory means for storing an encoding table defining a mapping from each of said p levels to a unique one of p, m-bit label signals, where p<=k=2.sup.m, and

means responsive to said speech analyzer means for transmitting, over a channel to a destination, the one of said p label signals that is associated with said one of said p levels in said encoding table,

channel decoder means comprising

decoder memory means for storing a decoding table defining the inverse of said encoding table mapping and

means, responsive to a label signal received at said destination from said channel, for decoding said received label signal as the one of said p levels associated with said received label signal in said decoding table inverse mapping when said received label signal is one of said p label signals, and

speech synthesizer means responsive to said decoding means for synthesizing speech based at least in part on said decoded label signal,

wherein said encoding table mapping stored by said encoder memory means and said decoding table inverse mapping stored by said decoder memory means are obtained to minimize the effect of channel errors and are obtained using simulated annealing based on a probability distribution of said p levels for said one parameter signal.

24. Speech processing apparatus in accordance with claim 23 wherein said model is a code excited linear prediction model.

25. Speech processing apparatus in accordance with claim 23 wherein said encoding table mapping and said decoding table mapping are obtained to minimize distortion in said synthesized speech.
 Description Submit all comments and votes
 


TECHNICAL FIELD

This invention relates to information processing and communication.

BACKGROUND AND PROBLEM

The code excited linear predictive (CELP) speech compression procedure has been shown to provide excellent speech quality at low bit rates. Since its original introduction in 1984, much effort has been spent to make the procedure feasible for commercial applications. Thus, while the original procedure was computationally extremely expensive, many different techniques are now available to reduce the computational effort. Its current level of maturity makes the CELP procedure desirable for many applications where bandwidth is at a premium, such as voice mail/storage, secure telephony and mobile telephony.

In some applications the CELP procedure will encounter channel errors. Efforts to minimize the effect of channel errors on speech compression procedures can be divided into methods which change the robustness of the source coder, by taking advantage of redundancies in the transmitted information, and methods which add error correction and/or error detection by means of a separate channel coder. Conventional implementations of the latter approach add a channel coder which maps selected bits of the quantization indices of a compression procedure into generic error-correction/detection codes which do not depend on the source. That this procedure is not optimal is suggested by the fact that the bits to be protected by the error correcting codes are hand picked, based on a judgement of their sensitivity. The separation between source and channel coders is justified if an arbitrarily complex coder-decoder design is optimized for a channel of a particular capacity (usually a worst case channel). Then the source coder rate can be matched to the capacity of this channel, resulting in suboptimal performance for channels of higher or lower capacity (or equal capacity, but with different characteristics). Speech coders usually encounter a variety of error conditions, and in many cases low error rates are prevalent. It is desirable to have a speech coder which exploits maximally the prevalent channels and decreases minimally in performance with diminishing channel capacity. To obtain this behavior, the source distortion must be considered in the design of the channel coder.

As an illustration that the source distortion should be considered in optimizing a channel code which is used in channels of various error rates, consider the example of Table 1. A four level scalar quantizer, of which each level has identical a-priori probability (no redundancy in the transmitted bit stream), is encoded with three different encoding schemes. Assume that virtually all channels are without errors, except a few in which a significant random error rate occurs. Table 1 shows the well known L1 and L2 error criteria for single bit errors (two bit errors per code word are exceedingly unlikely at low error rates) per codeword per error for the three encoding schemes. All codes are optimized for channels with zero error rate and have zero redundancy, but code 1 will result in the lowest L2 distortion, and code 1 and code 2 result in the lowest L1 distortion for noisy channels.

TABLE 1 ______________________________________ Four-Level Quantizer Example code 1 code 2 code 3 ______________________________________ quantizer level 0.0 00 01 10 1.0 01 00 01 4.0 10 10 00 9.0 11 11 11 error criterion L1 4.5 4.5 6.0 L2 26.5 29.0 42.5 ______________________________________

This example makes clear that a coder optimized for a certain channel (a channel with no bit errors in this case) can be further optimized to enhance performance for channels of lower quality by considering the source quality.

A technique known as pseudo-Gray coding, described in J-H. Chen, G. Davidson, A. Bersho, and K. Zeger, "Speech Coding for the Mobile Satellite Experiment", Proc. IEEE Int. Conf. on Communications, 756-763, (June 1987), is used to optimize the arrangement of a codebook to protect against the effects of channel errors. The Chen procedure takes as input a codebook and yields a rearrangement of the codevectors that minimizes the expected time average bit-error distortion. The utility of the Chen procedure is somewhat limited however because it does not include the effects of redundancy in the optimization. This is a serious limitation since in most applications where channel errors are at all significant, some redundancy is desirable despite the typically low bit rates, e.g., 4.8 kilobits per second. Furthermore, the Chen procedure uses a gradient optimization technique which involves iteratively switching the positions of codevectors to reduce the expected value of the bit-error distortion until a locally optimal state is reached. However, since the function being optimized typically has more than one local minimum, the Chen procedure will frequently result in sub-optimum performance.

In view of the foregoing, a recognized need exists in the art for an optimized, source dependent channel coder where the error protective effects of redundancy are included in the optimization and where the resulting code is more than locally optimal.

SOLUTION

This need is met and a technical advance is achieved in accordance with the principles of the invention in a parameter communication arrangement where a parameter that is transmitted over a channel using m-bit codewords or labels is quantized before transmission as one of only p levels, where, significantly, p<k=2.sup.m. Since only p labels are needed to transmit the p levels, the unused k-p labels are advantageously available to provide redundancy. The receiver decodes the redundant labels in accordance with an error routine. An encoding table mapping from the p levels to p labels and a decoding table inverse mapping from the p labels to p levels are obtained using an optimization procedure to minimize the effect of channel errors. The optimization is based on the probability distribution for the p levels such that a relatively high proportion of the error protection made available by having redundant labels inures to the benefit of parameter levels which are more likely to be transmitted. The optimization procedure is a well known technique referred to as simulated annealing which is for the first time applied to source dependent channel coding and which provides a degree of randomness in the perturbation of labels which is gradually reduced to obtain a code which is globally optimum rather than only locally optimum. Since low bit rates are desirable in many applications, a degree of redundancy is afforded by having the number of quantized levels, p, between 2.sup.m-1 and 2.sup.m in illustrative embodiments herein. The expense of such an arrangement in terms of transmitted bits is less than that of simple parity error detection.

A method in accordance with the invention is used to communicate a parameter from a source over a channel to a destination. The parameter is quantized at the source as one of p levels. The term quantization level as used herein refers to either a scalar quantization value, described by a single number, or a quantized vector value, described by an ordered set of numbers. The label that is transmitted over the channel is the one of p, m-bit labels that is associated with the quantized level in an encoding table defining a mapping from each of the p levels to a unique one of the p labels, where p<k=2.sup.m. When the m-bit label received at the destination is one of the p labels, it is decoded as the level associated with that label in a decoding table defining the inverse of the encoding table mapping. When the received label is one of the k-p labels other than the p labels, it is decoded in accordance with an error routine. The mapping of the encoding table and the inverse mapping of the decoding table are obtained to minimize the effect of channel errors and are obtained using simulated annealing based on a probability distribution of the p levels for the parameter.

In one illustrative embodiment, the error routine comprises error correction and the received label is decoded as defined by an additional mapping of the decoding table from each of the k-p redundant labels. The encoding table mapping and the decoding table inverse and additional mappings are obtained concurrently as the result of a single, simulated annealing optimization.

In other illustrative embodiments, the error routine involves error detection and the substitution of another level, for example, a default level or a level based on information received over the channel other than the received label, e.g., the same level obtained from a previous communication of the parameter.

In a further illustrative embodiment, the error routine is a combination of the above error correction and error detection and substitution methods. Certain of the redundant labels are decoded using an additional mapping of the encoding table and the other redundant labels are decoded as substitute levels. The selections of which redundant labels result in error correction and which ones result in error detection and substitution are obtained as a result of the single, simulated annealing optimization.

In the exemplary embodiments herein, the parameter is obtained at the source by analyzing input speech in accordance with a code excited linear prediction (CELP) model; the result obtained by decoding the received label is used at the destination to generate synthetic speech also in accordance with the CELP model. Example parameters are the gain factors and indices for the adaptive and stochastic codebooks used in an illustrative CELP speech processing arrangement. The encoding table and decoding table mappings are obtained to minimize distortion in the synthetic speech generated at the destination.

Another alternative embodiment uses a single, simulated annealing procedure to obtain optimized encoding and decoding tables for each of a number of parameters, where the error measure used in the optimization is an overall error measure.

In accordance with another aspect of the invention, a parameter is quantized at the source as one of p levels. The label that is transmitted over the channel is the one of p, m-bit labels that is associated with the quantized level in an encoding table defining a mapping from each of the p levels to a unique one of the p labels, where p<k=2.sup.m. When the m-bit label received at the destination is one of the p labels, it is decoded as the level associated with that label in a decoding table defining the inverse of the encoding table mapping. When the received label is one of the k-p labels other than the p labels, it is decoded in accordance with an error routine. When the received label is one of at least certain ones of the k-p other labels, it is decoded as defined by an additional mapping of the decoding table. The mapping of the encoding table and the inverse mapping of the decoding table are obtained to minimize the effect of channel errors and are obtained based on a probability distribution of the p levels for the parameter. The inverse and additional mappings are such that at least one of the p labels differs in b bits, 1<=b<m, from a label which maps into the same level as the one of the p labels and which also differs in b bits from a label which maps into a level other than that same level.

In accordance with still another aspect of the invention, a parameter is quantized at the source as one of p levels. The label that is transmitted over the channel is the one of p, m-bit labels that is associated with the quantized level in an encoding table defining a mapping from each of the p levels to a unique one of the p labels, where p<=k=2.sup.m. When the m-bit label received at the destination is one of the p labels, it is decoded as the level associated with that label in a decoding table defining the inverse of the encoding table mapping. The mapping of the encoding table and the inverse mapping of the decoding table are obtained to minimize the effect of channel errors using simulated annealing based on a probability distribution of the p levels for the parameter.

DRAWING DESCRIPTION

FIG. 1 is a block diagram of an exemplary speech coding arrangement using the channel coding method of the present invention;

FIG. 2 illustrates the quantization of an arbitrary parameter X of the type generated by the speech analyzer of FIG. 1;

FIG. 3 is a probability distribution for the parameter X;

FIG. 4 is an encoding table mapping for parameter X as obtained from a simulated annealing optimization procedure and used in the channel encoder of FIG. 1;

FIG. 5 is a decoding table inverse mapping for parameter X as obtained from the simulated annealing procedure and used in the channel decoder of FIG. 1;

FIG. 6 is a decoding table additional mapping for parameter X as obtained from the simulated annealing procedure and used in the channel decoder of FIG. 1 for the case where error correction is performed on redundant labels;

FIGS. 7 and 8 are diagrams depicting the inputs, outputs, and associated error routines for simulated annealing procedures for a single parameter and multiple parameters respectively, which procedures are described in detail with reference to Tables 2-4 herein, and

FIGS. 9 through 15 are data curves used in describing the performance of channel codes illustrating the present invention.

DETAILED DESCRIPTION

1. Introduction

An illustrative speech processing arrangement in accordance with the invention is shown in block diagram form in FIG. 1. Incoming analog speech signals are converted to digitized speech samples by an A/D converter 50. The digitized speech samples from converter 50 are processed by speech analyzer 100, which in the present example uses the CELP speech model for analysis. The results obtained by analyzer 100 are a number of parameters which are transmitted to a channel encoder 200 for encoding and transmission over a channel 300. Advantageously, channel 300 may be a communication transmission path or may be storage media so that voice synthesis may be provided for various applications at a later point in time. A channel decoder 400 receives the quantized parameters from channel 300, decodes them, and transmits the decoded parameters to a speech synthesizer 500. Synthesizer 500 processes the parameters using the CELP speech model to generate digital, synthetic speech samples which are in turn processed by a D/A converter 550 to reproduce the incoming analog speech signals. The present invention focuses on the channel encoding and decoding functions. An encoding table 210 within encoder 200 and a decoding table 410 within decoder 400 are obtained as the result of an optimization procedure referred to as simulated annealing to minimize the effect of channel errors in a manner described in detail herein.

In the present example, speech analyzer 100 and speech synthesizer 500 implement a particular CELP procedure referred to as stochastically excited linear prediction (SELP) as described in W. B. Kleijn, D. J. Krasinski, and R. H. Ketchum, "An Efficient Stochastically Excited Linear Predictive Coding Algorithm for High Quality Low Bit Rate Transmission of Speech", Speech Communication, Vol. VII, 305-316, 1988. The SELP procedure for speech coding offers good performance at bit rates as low as 4.8 kbit/s. Linear predictive coding (LPC) techniques remove the short-term correlation from the speech. A pitch loop removes long-term correlation, producing a noise-like residual, which is vector quantized. Parameters describing the LPC filter coefficients, the long-term predictor, and the vector quantization are obtained by analyzer 100. Several improvements to the SELP procedure are implemented which result in better speech quality and higher computational efficiency. In its closed-loop form, the pitch loop can be interpreted as a vector quantization of the desired excitation signal with an adaptive codebook populated by previous excitation sequences. To better model the non-stationarity of speech, the adaptive codebook is extended with a special set of candidate vectors which are transforms of other codebook entries. The second stage vector quantization is performed using a fixed stochastic codebook. In its original form, the SELP procedure requires a large computational effort. A recursive procedure is employed which performs a very fast search through the adaptive codebook. In this method, the error criterion is modified and th resulting symmetries are exploited. The same fast vector quantization procedure is applied to the stochastic codebook.

As mentioned previously, this invention relates to optimized channel encoding and decoding of parameters such as the codebook indices and gain factors obtained by speech analyzer 100. FIG. 2 illustrates the quantization of an arbitrary parameter, X, as one of six levels 0, 1, 2, 3, 4, and 5. At time t.sub.1, for example, X is quantized as level 5, at time t.sub.2 as level 2, and at time t.sub.3 as level 4. Since X is to be transmitted using a three-bit label and since only six of the possible eight labels are needed to transmit the six levels, two labels are available to provide redundancy. The probability distribution for parameter X is given in FIG. 3, where the levels 0, 1, 2, 3, 4, and 5 have finite probabilities, P(0), P(1), P(2), P(3), P(4), and P(5) and levels 6 and 7 each have zero probability. In a first exemplary embodiment, the redundant labels are used to provide error correction. As functionally depicted in FIG. 7, the probability distribution of parameter X is provided as input to a simulated annealing procedure described in detail herein. The simulated annealing procedure produces as its output the mappings given for example by FIGS. 4, 5, and 6. FIG. 4 illustrates a particular mapping for parameter X in encoding table 210 where the levels 0, 1, 2, 3, 4, and 5 are mapped into the three-bit lables 010, 110, 111, 001, 000, and 100 respectively. FIG. 5 illustrates the inverse mapping for parameter X in decoding table 410 where the labels 010, 110, 111, 001, 000, and 100 are mapped back into the levels 0, 2, 3, 4, and 5 respectively. Since the error routine used in this first exemplary embodiment is error correction, an additional mapping as given by FIG. 6 is included in decoding table 410 for parameter X. When the labels 011 and 101 are received, channel decoder 400 knows that a channel error was made since the labels 011 and 101 are redundant and are not transmitted by encoder 200. The result of the simulated annealing procedure in this embodiment is that the label 011 is mapped into level 0 and the label 101 is mapped in level 5.

In a second exemplary embodiment, the error routine comprises error detection and the substitution of the level obtained during the previous communication of parameter X. In a third exemplary embodiment, the error routine comprises error detection and the substitution of a default level, e.g., level 0. In both the second and third embodiments, no additional mapping for parameter X is required in decoding table 410 like that of FIG. 6 when the error routine was error correction. However, the error routine operation when a redundant label is received is used in determining the error measure which is minimized by the simulated annealing optimization.

In a fourth exemplary embodiment, the error routine is a combination of the above error correction and error detection and substitution methods. Certain of the redundant labels, for example label 011 in the simple, three-bit label case described, are decoded using an additional mapping of the encoding table and the other redundant labels, label 101 in the example, are decoded as substitute levels. The selections of which redundant labels result in error correction and which ones result in error detection and substitution are obtained as a result of the single, simulated annealing optimization.

In a fifth exemplary embodiment, a single, simulated annealing procedure is used to obtain optimized encoding and decoding tables for each of a number of parameters as functionally depicted in FIG. 8, where the error measure used in the optimization is an overall error measure.

The next section describes in detail the method of measuring the immediate effect of decoding errors in the excitation function of CELP caused by channel errors. For reference purposes this section includes a brief description of the CELP procedure used. In section 3 a description of simulated annealing for the optimization of a source-dependent channel encoding is provided. The presented simulated annealing procedures are applicable to the coding of parameters of many (speech) compression procedures, but the focus here is their application to the CELP speech coding procedure. Section 4 studies the error sensitivity of the codebook gains. It applies the simulated annealing procedures to the channel coding of these parameters. In section 5 the focus shifts to the channel encoding of the codebook indices, to which the simulated annealing procedure is applied. Included with the discussion of the codebook indices is an example of the effect that the probability distribution has on the performance of the annealing procedures. This is followed by a conclusion section. Finally, several appendices with tables containing optimal codes for some of the CELP parameters are provided.

2. An Error Criterion for the Effect of Channel Errors on CELP

2.1. Description of the CELP Procedure

The CELP procedure used here is identical to that described in W. B. Kleijn, D. J. Krasinski, and R. H. Ketchum, "An Efficient Stochastically Excited Linear Predictive Coding Algorithm for High Quality Low Bit Rate Transmission of Speech", Speech Communication, Vol. VII, 305-316, 1988. It efficiently encodes a digitized (usually sampled at a rate of 8000 Hz) speech signal on a frame by frame basis. Synthetic speech is generated by filtering an excitation signal. The filter adds the short term correlation to the signal, roughly modeling the effect of the vocal tract and the mouth. It is determined from a linear predictive (LP) analysis of the original speech signal. For transmission, the filter coefficients, are quantized with 35 bits using absolute line spectral frequencies (this method exhibits low sensitivity to channel errors). The ideal excitation signal segment which renders synthetic speech identical to the original speech for the present frame is vector quantized to facilitate transmission. The LP-analysis window length and the update intervals are 240 samples while a frame length of 60 samples is used for the vector quantization of the ideal excitation vector.

The target (or ideal) excitation vector for a frame, which results in a perfect match of the original speech (when it is filtered through the inverse LPC filter) is vector quantized, using a shape-gain vector quantizer, in two sequential stages. The candidate vectors of the two codebooks are selected to minimize a squared error criterion on the synthetic speech. Because of the finite size of a frame, the impulse response of the inverse LPC filter can be truncated and described by a finite impulse response (FIR) filter. The FIR filtering operation can be written as a matrix multiplication of a Toeplitz matrix H, which describes the filter, and a vector describing the excitation. If t is the target excitation vector, s the candidate excitation vector, and .lambda. the scaling of the excitation vector, then the mismatch of speech and synthetic speech is the vector H(.lambda.s-t). Thus, the error criterion to be minimized can be written as (.lambda.s-t).sup.T H.sup.T H(.lambda.s-t). The square matrix H.sup.T H is referred to as the spectral weighting matrix.

For the first stage an "adaptive" codebook is used which contains synthetic excitation functions of the recent past. It uses 4 bits for the gain and 7 or 8 bits for the index. The adaptive codebook is updated after each frame, and allows the excitation to become periodic in nature, facilitating the description of voiced speech. The second stage consists of a search through a fixed codebook, which further refines the excitation function resulting from the search through the adaptive codebook. The stochastic codebook consists of overlapping entries, with adjacent candidates separated by a shift of two samples. Its samples have a Gaussian distribution, center clipped at 1.5 standard deviation. Four bits are used for the gain and 8 bits for the indices. To improve the coding efficiency, the dynamic range of the stochastic codebook gain is reduced by multiplying the stochastic codebook by a scale factor prior to calculating the gain factor. The scale factor is based on energy of the contribution of the adaptive codebook to the present frame.

Thus, without error protection bits, the procedure used in the following sections requires 4233 or 4366 bits/second for a 7 or 8 bit adaptive codebook, respectively.

2.2. Definition of the Error Criterion

In this description, the effect of channel errors on the parameters describing the CELP excitation function and methods for performance improvement are discussed. To evaluate the performance of a CELP procedure under such conditions, an appropriate error criterion must be defined. A natural method would be to compare the signal to noise ratios (using the original speech as reference) of the synthetic speech with and without channel errors, or to compute the signal to noise ratio of the synthetic speech with channel errors using the synthetic speech without channel errors as reference. To evaluate the effect of channel errors on particular parameters, those parameters could be perturbed in a systematic way, while the other parameters are left untouched.

A difficulty with measuring the channel errors on synthetic speech which is corrupted by channel errors is that the result is dependent both on the size of the decoding error, and on the attenuation rate of the resulting distortion over the following frames. However, it may be argured that this method does provide a good measure of the overall channel error performance of a CELP procedure, provided that the errors are introduced such that they do not interfere with each other. Thus, one can evaluate the channel error performance of a particular encoding bit by systematically disturbing that bit in frames which are sufficiently far apart. (Note that at a 1% error rate the probability that the same parameter will be disturbed in adjacent frames is negligibly small. This can be expected to provide a better measure of the performance than perturbing the same bit in every successive frame, which may result in interaction of the errors in successive frames.

However, although systematically disturbing particular bits in frames sufficiently far apart may provide a satisfactory error criterion, it results in very laborious evaluations. This makes this error criterion unsuitable for the combinatorial optimization required to find a good channel code. Instead of using an evaluation criterion which operates on the output speech, including distortion over following frames, a criterion is introduced which maintains the features of the distance measure used in the CELP procedure to select the best candidate from the codebooks. By evaluating synthetic speech quality on a per frame basis the effect of the persistence of the distortion over time is eliminated.

The focus here will be on errors at low channel error rates (i.e. 1% or less); thus, it can safely be assumed that multiple bit errors are highly improbable in the description of a single parameter describing the CELP excitation. However, the following procedures are easily generalized to include more bit errors per parameter.

Let us define c.sup.(i) to be the channel code of a particular excitation parameter (codebook index or codebook gain), with quantization index i. The value or vector associated with the quantization index i is denoted as r.sub.i,i. (The meaning of the double subscript will become clear later.) The probability that channel code c.sup.(i) occurs is denoted as P(c.sup.(i)). Note that, in the case of no redundant codes, P(c.sup.(i)) is the probability that index i provides the parameter with its best fit. The target (optimal) parameter or vector is denoted as t. At the analyzer t is matched by the quantized parameter r.sub.i,i. In general, assume that r.sub.i,i can be the result of multiple quantizations; for example a vector may have a shape index as well as a gain index. If the quantization index is changed from the transmitted value i to j, then denote the resulting parameter or vector at the receive end by r.sub.i,j. Thus, in the parameter r.sub.i,j the index i indicates that the other quantizations describing r.sub.i,j were associated with the index i and not with the received index j.

In this description the target t is always the target excitation vector for the particular codebook search. If the codebook candidate index is considered, then r.sub.i,j is the excitation shape vector of index j but with the gain properly quantized for the excitation vector i. If the codebook gain is considered, r.sub.i,j denotes the gain indexed j with the codebook index obtained from the search. Let us denote by D.sub.i (r.sub.i,j) the mean distance between the parameter r.sub.i,j and the target (optimal) value of the parameter t. Note that D.sub.i (r.sub.i,j) describes a function of j. That is, D.sub.i (r.sub.i,j) is a penalty function for changing the transmission index from i to j under the constraint that the quantization index i is associated with the parameter level or codebook entry of best fit. Further, N is the number of entries or levels, and M denotes the number of bits of a transmission label. An appropriate error criterion, which describes the sensitivity of the parameter encoding to single bit errors is now: ##EQU1## where f(c.sup.(i),k) is the index of the parameter associated with the code word obtained by flipping the k'th bit of code word c.sup.(i). During optimization of the encodings {c}, one compares the criterion .epsilon. for various encoding configuration. Thus, the reference level D.sub.i (r.sub.i,i) is of no significance, and can be omitted from criterion (1).

The error criterion used in the codebook selection process of the CELP procedure cannot be used directly for the penalty function D.sub.i () of equation (1) because the latter is a statistical average of the performance, while the former is evaluated for individual frames only. However, the CELP error criterion can be used as a starting point in the selection of a proper penalty function, which can be evaluated quickly. The selection process of the CELP procedure uses a least-squares error distance measure. The selection process will give identical results if the least-squares criterion is replaced by a signal to noise ratio, or its logarithm. This is important since the least-squares error criterion is not appropriate for averaging over a large number of frames; it weighs frames with large absolute error unduly heavily. To eliminate this problem, the mean logarithmic segmental signal to noise ratio is commonly used to evaluate the objective performance of the CELP and multi-pulse procedures. Thus, it is reasonable to choose D.sub.i (r.sub.i,j) to be the mean logarithmic segmental signal to noise ratio of the distorted speech signal generated with the parameter value or vector r.sub.i,j.

The criterion used for the vector quantization of CELP is commonly modified to better model the perceived error. Due to masking, errors in spectral regions with high