|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention generally relates to digital speech coding at low bit
rates, and more particularly, is directed to an improved method for coding
the excitation information for code-excited linear predictive speech
coders.
Code-excited linear prediction (CELP) is a speech coding technique which
has the potential of producing high quality synthesized speech at low bit
rates, i.e., 4.8 to 9.6 kilobits-per-second (kbps). This class of speech
coding, also known as vector-excited linear prediction or stochastic
coding, will most likely be used in numerous speech communications and
speech synthesis applications. CELP may prove to be particularly
applicable to digital speech encryption and digital radiotelephone
communication systems wherein speech quality, data rate, size, and cost
are significant issues.
In a CELP speech coder, the long term ("pitch") and short term ("formant")
predictors which model the characteristics of the input speech signal are
incorporated in a set of time-varying linear filters. An excitation signal
for the filters is chosen from a codebook of stored innovation sequences,
or code vectors. For each frame of speech, the speech coder applies each
individual code vector to the filters to generate a reconstructed speech
signal, and compares the original input speech signal to the reconstructed
signal to create an error signal. The error signal is then weighted by
passing it through a weighting filter having a response based on human
auditory perception. The optimum excitation signal is determined by
selecting the code vector which produces the weighted error signal with
the minimum energy for the current frame
The term "code-excited" or "vector-excited" is derived from the fact that
the excitation sequence for the speech coder is vector quantized, i.e., a
single codeword is used to represent a sequence, or vector, of excitation
samples. In this way, data rates of less than one bit per sample are
possible for coding the excitation sequence. The stored excitation code
vectors generally consist of independent random white Gaussian sequences.
One code vector from the codebook is used to represent each block of N
excitation samples. Each stored code vector is represented by a codeword,
i.e., the address of the code vector memory location. It is this codeword
that is subsequently sent over a communications channel to the speech
synthesizer to reconstruct the speech frame at the receiver. See M. R.
Schroeder and B. S. Atal, "Code-Excited Linear Prediction (CELP):
High-Quality Speech at Very Low Bit Rates", Proceedings of the IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), Vol. 3, pp. 937-40, March 1985, for a detailed explanation of
CETP.
The difficulty of the CETP speech coding technique lies in the extremely
high computational complexity of performing an exhaustive search of all
the excitation code vectors in the codebook. For example, at a sampling
rate of 8 kilohertz (kHz), a 5 millisecond (msec) frame of speech would
consist of 40 samples. If the excitation information were coded at a rate
of 0.25 bits per sample (corresponding to 2 kbps), then 10 bits of
information are used to code each frame. Hence, the random codebook would
then contain 2.sup.10, or 1024, random code vectors. The vector search
procedure requires approximately 15 multiply-accumulate (MAC) computations
(assuming a third order long-term predictor and a tenth order short-term
predictor) for each of the 40 samples in each code vector. This
corresponds to 600 MACs per code vector per 5 msec speech frame, or
approximately 120,000,000 MACs per second (600 MACs/5 msec
frame.times.1024 code vectors). One can now appreciate the extraordinary
computational effort required to search the entire codebook of 1024
vectors for the best fit--an unreasonable task for real-time
implementation with today's digital signal processing technology.
Moreover, the memory allocation requirement to store the codebook of
independent random vectors is also exorbitant. For the above example, a
640 kilobit read-only-memory (ROM) would be required to store all 1024
code vectors, each having 40 samples, each sample represented by a 16-bit
word. This ROM size requirement is inconsistent with the size and cost
goals of many speech coding applications. Hence, prior art code excited
linear prediction is presently not a practical approach to speech coding.
One alternative for reducing the computational complexity of this code
vector search process is to implement the search calculations in a
transform domain. Refer to I. M. Trancoso and B. S. Atal, "Efficient
Procedures for Finding the Optimum Innovation in Stochastic Coders", Proc.
ICASSP, Vol. 4, pp. 2375-8, April 1986, as an example of such a procedure.
Using this approach, discrete Fourier transforms (DFT's) or other
transforms may be used to express the filter response in the transform
domain such that the filter computations are reduced to a single MAC
operation per sample per code vector. However, an additional 2 MACs per
sample per code vector are also required to evaluate the code vector, thus
resulting in a substantial number of multiply-accumulate operations, i.e.,
120 per code vector per 5 msec frame, or 24,000,000 MACs per second in the
above example. Still further, the transform approach requires at least
twice the amount of memory, since the transform of each code vector must
also be stored. In the above example, a 1.3 Megabit ROM would be required
for implementing CELP using transforms.
A second approach for reducing the computational complexity is to structure
the excitation codebook such that the code vectors are no longer
independent of each other. In this manner, the filtered version of a code
vector can be computed from the filtered version of the previous code
vector, again using only a single filter computation MAC per sample. This
approach results in approximately the same computational requirements as
transform techniques, i.e., 24,000,000 MACs per second, while
significantly reducing the amount of ROM required (16 kilobits in the
above example). Examples of these types of codebooks are given in the
article entitled "Speech Coding Using Efficient Pseudo-Stochastic Block
Codes", Proc. ICASSP, Vol. 3, pp. 1354-7, April 1987, by D. Lin.
Nevertheless, 24,000,000 MACs per second is presently beyond the
computational capability of a single DSP. Moreover, the ROM size is based
on 2.sup.M .times.#bits/word, where M is the number of bits in the
codeword such that the codebook contains 2.sup.M code victors. Therefore,
the memory requirements still increase exponentially with the number of
bits used to encode the frame of excitation information. For example, the
ROM requirements increase to 64 kilobits when using 12 bit codewords.
A need, therefore, exists to provide an improved speech coding technique
that addresses both the problems of extremely high computational
complexity for exhaustive codebook searching, as well as the vast memory
requirements for storing the excitation code vectors.
SUMMARY OF THE INVENTION
Accordingly, a general object of the present invention is to provide an
improved digital speech coding technique that produces high quality speech
at low bit rates.
Another object of the present invention is to provide an efficient
excitation vector generating technique having reduced memory requirements.
A further object of the present invention is to provide an improved
codebook searching technique having reduced computation complexity for
practical implementation in real time utilizing today's digital signal
processing technology.
These and other objects are achieved by the present invention, which,
briefly described, is an improved excitation vector generation and search
technique for a speech coder using a codebook having stored excitation
code vectors. In accordance with the invention, a set of basis vectors are
used along with the excitation signal codewords to generate the codebook
of excitation vectors according to a novel "vector sum" technique.
Apparatus which provides the set of 2.sup.M codebook vectors comprises a
memory which stores a set of selector codewords formed by . . .,
converting the selector codewords into a plurality of interim data
signals, generally based upon the value of each bit of each selector
codeword; inputting a set of M basis vectors, typically stored in memory
in place of storing the entire codebook; multiplying the set of M basis
vectors by the plurality of interim data signals to produce a plurality of
interim vectors; and summing the plurality of interim vectors to produce
the set of 2.sup.M code vectors means for addressing the memory with a
particular codeword, and means for outputting a particular codebook vector
from the memory when address with the particular codeword.
The "vector sum" codebook generation approach of the present invention
permits faster implementation of CELP speech coding while retaining the
advantages of high quality speech at low bit rates. More specifically, the
present invention provides an effective solution to the problems of
computational complexity and memory requirements. For example, the vector
sum approach disclosed herein requires only M+3 MACs for each codework
evaluation. In terms of the previous example, this corresponds to only 13
MACs, as opposed to 600 MACs for standard CELP or 120 MACs using the
transform approach. This improvement translates into a reduction in
complexity of approximately 10 times, resulting in approximately 2,600,000
MACs per second. This reduction in computational complexity makes possible
practical real-time implementation of CELP using a single DSP.
Furthermore, only M basis vectors need to be stored in memory, as opposed
to all 2.sup.M code vectors. Hence, the ROM requirements for the above
example are reduced from 640 kilobits to 6.4 kilobits for the present
invention. Still another advantage to the present speech coding technique
is that it is more robust to channel bit errors than standard CELP. Using
the vector sum excited speech coder of the present invention, a single bit
error in the received codeword will result in an excitation vector similar
to the desired one Under the same conditions, standard CELP, using a
random codebook, would yield an arbitrary excitation vector--entirely
unrelated to the desired one.
BRIEF DESCRIPTION OF THE DRAWINGS
The features of the present invention which are believed to be novel are
set forth with particularity in the appended claims. The invention,
together with further objects and advantages thereof, may best be
understood by reference to the following description taken in conjunction
with the accompanying drawings, in the several figures of which
like-referenced numerals identify like elements, and in which:
FIG. 1 is a general block diagram of a code excited linear predictive
speech coder utilizing the vector sum excitation signal generation
technique in accordance with the present invention;
FIGS. 2A/2B is a simplified flowchart diagram illustrating the general
sequence of operations performed by the speech coder of FIG. 1;
FIG. 3 is a detailed block diagram of the codebook generator block of FIG.
1, illustrating the vector sum technique of the present invention;
FIG. 4 is a general block diagram of a speech synthesizer using the present
invention;
FIG. 5 is a partial block diagram of the speech coder of FIG. 1,
illustrating the improved search technique according to the preferred
embodiment of the present invention;
FIGS. 6A/6B is a detailed flowchart diagram illustrating the sequence of
operations performed by the speech coder of FIG. 5, implementing the gain
calculation technique of the preferred embodiment; and
FIGS. 7A/7B/7C is a detailed flowchart diagram illustrating the sequence of
operations performed by an alternate embodiment of FIG. 5, using a
pre-computed gain technique.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring now to FIG. 1, there is shown a general block diagram of code
excited linear predictive speech coder 100 utilizing the excitation signal
generation technique according to the present invention. An acoustic input
signal to be analyzed is applied to speech coder 100 at microphone 102.
The input signal, typically a speech signal, is then applied to filter
104. Filter 104 generally will exhibit bandpass filter characteristics.
However, if the speech bandwidth is already adequate, filter 104 may
comprise a direct wire connection.
The analog speech signal from filter 104 is then converted into a sequence
of N pulse samples, and the amplitude of each pulse sample is then
represented by a digital code in analog-to-digital (A/D) converter 108, as
known in the art. The sampling rate is determined by sample clock SC,
which represents an 8.0 kHz rate in the preferred embodiment. The sample
clock SC is generated along with the frame clock FC via clock 112.
The digital output of A/D 108, which may be represented as input speech
vector s(n), is then applied to coefficient analyzer 110. This input
speech vector s(n) is repetitively obtained in separate frames, i.e.,
blocks of time, the length of which is determined by the frame clock FC.
In the preferred embodiment, input speech vector s(n),
1.ltoreq.n.ltoreq.N, represents a 5 msec frame containing N=40 samples,
wherein each sample is represented by 12 to 16 bits of a digital code. For
each block of speech, a set of linear predictive coding (LPC) parameters
are produced in accordance with prior art techniques by coefficient
analyzer 110. The short term predictor parameters STP, long term predictor
parameters LTP, weighting filter parameters WFP, and excitation gain
factor .gamma., (along with the best excitation codeword I as described
later) are applied to multiplexer 150 and sent over the channel for use by
the speech synthesizer. Refer to the article entitled "Predictive Coding
of Speech at Low Bit Rates," IEEE Trans. Commun., Vol. COM-30, pp. 600-14,
April 1982, by B. S. Atal, for representative methods of generating these
parameters. The input speech vector s(n) is also applied to subtractor
130, the function of which will subsequently be described.
Basis vector storage block 114 contains a set of M basis vectors v.sub.m
(n), wherein 1.ltoreq.m.ltoreq.M, each comprised of N samples, wherein
1.ltoreq.n.ltoreq.N. These basis vectors are used by codebook generator
120 to generate a set of 2.sup.M pseudo-random excitation vectors u.sub.i
(n), wherein 0.ltoreq.i.ltoreq.2.sup.m -1. Each of the M basis vectors are
comprised of a series of random white Gaussian samples, although other
types of basis vectors may be used with the present invention.
Codebook generator 120 utilizes the M basis vectors vm(n) and a set of
2.sup.M excitation codewords I.sub.i, where 0.ltoreq.i.ltoreq.2.sup.M -1,
to generate the 2.sup.M excitation vectors u.sub.i (n). In the present
embodiment, each codeword I.sub.i is equal to its index i, that is,
I.sub.i =i. If the excitation signal were coded at a rate of 0.25 bits per
sample for each of the 40 samples (such that M=10), then there would be 10
basis vectors used to generate the 1024 excitation vectors. These
excitation vectors are generated in accordance with the vector sum
excitation technique, which will subsequently be described in accordance
with FIGS. 2 and 3.
For each individual excitation vector u.sub.i (n), a reconstructed speech
vector s'.sub.i (n) is generated for comparison to the input speech vector
s(n). Gain block 122 scales the excitation vector u.sub.i (n) by the
excitation gain factor .gamma., which is constant for the frame The
excitation gain factor .gamma. may be precomputed by coefficient analyzer
110 and used to analyze all excitation vectors as shown in FIG. 1, or may
be optimized jointly with the search for the best excitation codeword I
and generated by codebook search controller 140. This optimized gain
technique will subsequently be described in accordance with FIG. 5.
The scaled excitation signal .gamma.u.sub.i (n) is then filtered by long
term predictor filter 124 and short term predictor filter 126 to generate
the reconstructed speech vector s'.sub.i (n). Filter 124 utilizes the long
term predictor parameters LTP to introduce voice periodicity, and filter
126 utilizes the short term predictor parameters STP to introduce the
spectral envelope. Note that blocks 124 and 126 are actually recursive
filters which contain the long term predictor and short term predictor in
their respective feedback paths. Refer to the previously mentioned article
for representative transfer functions of these time-varying recursive
filters.
The reconstructed speech vector s'.sub.i (n) for the i-th excitation code
vector is compared to the same block of the input speech vector s(n) by
subtracting these two signals in subtractor 130. The difference vector
e.sub.i (n) represents the difference between the original and the
reconstructed blocks of speech. The difference vector is perceptually
weighted by weighting filter 132, utilizing the weighting filter
parameters WTP generated by coefficient analyzer 110. Refer to the
preceding referency for a representative weighting filter transfer
function. Perceptual weighting accentuates those frequencies where the
error is perceptually more important to the human ear, and attenuates
other frequencies.
Energy calculator 134 computes the energy of the weighted difference vector
e'.sub.i (n), and applies this error signal E.sub.i to codebook search
controller 140. The search controller compares the i-th error signal for
the present excitation vector u.sub.i (n) against previous error signals
to determine the excitation vector producing the minimum error. The code
of the i-th excitation vector having a minimum error is then output over
the channel as the best excitation code I. In the alternative, search
controller 140 may determine a particular codeword which provides an error
signal having some predetermined criteria, such as meeting a predefined
error threshold.
The operation of speech coder 100 will now be described in accordance with
the flowchart of FIG. 2. Starting at step 200, a frame of N samples of
input speech vector s(n) are obtained in step 202 and applied to
subtractor 130. In the preferred embodiment, N=40 samples. In step 204,
coefficient analyzer 110 computes the long term predictor parameters LTP,
short term predictor parameters STP, weighting filter parameters WTP, and
excitation gain factor 7. The filter states FS of long term predictor
filter 124, short term predictor filter 126, and weighting filter 132, are
then saved in step 206 for later use. Step 208 initializes variables i,
representing the excitation codeword index, and E.sub.b, representing the
best error signal, as shown.
Continuing with step 210, the filter states for the long and short term
predictors and the weighting filter are restored to those filter states
saved in step 206. This restoration ensures that the previous filter
history is the same for comparing each excitation vector. In step 212, the
index i is then tested to see whether or not all excitation vectors have
been compared. If i is less than 2.sup.M, then the operation continues for
the next code vector. In step 214, the basis vectors v.sub.m (n) are used
to compute the excitation vector u.sub.i (n) via the vector sum technique.
FIG. 3, illustrating a representative hardware configuration for codebook
generator 120, will now be used to describe the vector sum technique.
Generator block 320 corresponds to codebook generator 120 of FIG. 1, while
memory 314 corresponds to basis vector storage 114. Memory block 314
stores all of the M basis vectors v.sub.1 (n) through v.sub.M (n), wherein
1.ltoreq.m.ltoreq.M, and wherein 1.ltoreq.n.ltoreq.N. All M basis vectors
are applied to multipliers 361 through 364 of generator 320.
The i-th excitation codeword is also applied to generator 320. This
excitation information is then converted into a plurality of interim data
signals .theta..sub.i1 through .theta..sub.iM, wherein
1.ltoreq.m.ltoreq.M, by converter 360. In the preferred embodiment, the
interim data signals are based on the value of the individual bits of the
selector codeword i, such that each interim data signal .theta..sub.im
represents the sign corresponding to the m-th bit bit of the i-th
excitation codeword. For example, if bit one of excitation codeword i is
0, then .theta..sub.i1 would be -1. Similarly, if the second bit of
excitation codeword i is 1, then .theta..sub.i2 would be +1. It is
contemplated, however, that the interim data signals may alternatively be
any other transformation from i to .theta..sub.im, e.g., as determined by
a ROM look-up table. Also note that the number of bits in the codeword do
not have to be the same as the number of basis vectors. For example,
codeword i could have 2M bits where each pair of bits defines 4 values for
each .theta..sub.im, i.e., 0, 1, 2, 3, or +1, -1 , +2, -2, etc.
The interim data signals are also applied to multipliers 361 through 364.
The multipliers are used to multiply the set of basis vectors v.sub.m (n)
by the set of interim data signals .theta..sub.im to produce a set of
interim vectors which are then summed together in summation network 365 to
produce the single excitation code vector u.sub.i (n). Hence, the vector
sum technique is described by the equation:
##EQU1##
where u.sub.i (n) is the n-th sample of the i-th excitation code vector,
and where 1.ltoreq.n.ltoreq.N.
Continuing with step 216 of FIG. 2A, the excitation vector u.sub.i (n) is
then multiplied by the excitation gain factor .gamma. via gain block 122.
This scaled excitation vector .gamma.u.sub.i (n) is then filtered in step
218 by the long term and short term predictor filters to compute the
reconstructed speech vector s'.sub.i (n). The difference vector ei(n) is
then calculated in step 220 by subtractor 130 such that:
e.sub.i (n)=S(n)-s'.sub.i (n) (2)
for all N samples, i.e., 1.ltoreq.n.ltoreq.N.
In step 222, weighting filter 132 is used to perceptually weight the
difference vector e.sub.i (n) to obtain the weighted difference vector
e'.sub.i (n). Energy calculator 134 then computes the energy E.sub.i of
the weighted difference vector in step 224 according to the equation:
##EQU2##
Step 226 compares the i-th error signal to the previous best error signal
E.sub.b to determine the minimum error. If the present index i corresponds
to the minimum error signal so far, then the best error signal E.sub.b is
updated to the value of the i-th error signal in step 228, and,
accordingly, the best codeword I is set equal to i in step 230. The
codeword index i is then incremented in step 240, and control returns to
step 210 to test the next code vector.
When all 2.sup.M code vectors have been tested, control proceeds from step
212 to step 232 to output the best codeword I. The process is not complete
until the actual filter states are updated using the best codeword I.
Accordingly, step 234 computes the excitation vector u.sub.I (n) using the
vector sum technique as was done in step 216, only this time utilizing the
best codeword I. The excitation vector is then scaled by the gain factor
.gamma. in 236, and filtered to compute reconstructed speech vector
s'.sub.I (n) in step 238. The difference signal e.sub.I (n) is then
computed in step 242, and weighted in step 244 so as to update the
weighting filter state. Control is then returned to step 202.
Referring now to FIG. 4, a speech synthesizer block diagram is illustrated
also using the vector sum generation technique according to the present
invention. Synthesizer 400 obtains the short term predictor parameters
STP, long term predictor parameters LTP, excitation gain factor .gamma.,
and the codeword I received from the channel, via de-multiplexer 450. The
codeword I is applied to codebook generator 420 along with the set of
basis vectors v.sub.m (n) from basis vector storage 414 to generate the
excitation vector u.sub.i (n) as described in FIG. 3. The single
excitation vector u.sub.I (n) is then multiplied by the gain factor
.gamma. in block 422, filtered by long term predictor filter 424 and short
term predictor filter 426 to obtain reconstructed speech vector s'.sub.I
(n). This vector, which represents a frame of reconstructed speech, is
then applied to analog-to-digital (A/D) convertor 408 to produce a
reconstructed analog signal, which is then low pass filtered to reduce
aliasing by filter 404, and applied to an output transducer such as
speaker 402. Clock 412 generates the sample clock and the frame clock for
synthesizer 400.
Referring now to FIG 5, a patial block diagram of an alternate embodiment
of the speech coder of FIG. 1 is shown so as to illustrate the preferred
embodiment of the invention. Note that there are two important differences
from speech coder 100 of FIG. 1. First, codebook search controller 540
computes the gain factor .gamma. itself in conjunction with the optimal
codeword I search and the excitation gain factor .gamma. generation will
be described in the corresponding flowchart of FIG. 6 Secondly, note that
a further alternate embodiment would be to use predetermined gains
calculated by coefficient analyzer 510. The flowchart of FIG. 7 describes
such an embodiment. FIG. 7 may be used to describe that block diagram of
FIG. 5 if the additional gain block 542 and gain factor output of
coefficient analyzer 510 are inserted, as shown in dotted lines.
Before proceeding with the detailed description of the operation of speech
coder 500, it may prove helpful to provide an explanation of the basic
search approach taken by th present invention. In the standard CELP speech
coder, the difference vector from equation {2}:
e.sub.i (n)=s(n)-s'.sub.i (n) {2}
was weighted to yield e'.sub.i (n), which was then used to calculate the
error signal according to the equation:
##EQU3##
which was minimized in order to determine the desired codeword I. All
2.sup.M excitation vectors had to be evaluated to try and find the best
match to s(n). This was the basis of the exhaustive search strategy.
In the preferred embodiment, it is necessary to take into account the
decaying response of the filters. This is done by initializing the filters
with filter states existing at the start of the frame, and letting the
filters decay with no external input. The output of the filters with no
input is called the zero input response Furthermore, the weighting filter
function can be moved from its conventional location at the output of the
subtractor to both input paths of the subtractor. Hence, if d(n) is the
zero input response vector of the filters, and if y(n) is the weighted
input speech vector, then the difference vector p(n) is:
p(n)=y(n)-d(n). {4}
Thus, the initial filter states are totally compensated for by subtracting
off the zero input response of the filters.
The weighted difference vector e'.sub.i (n) becomes:
e'.sub.i (n)=p(n)-s'.sub.i (n). {5}
However, since the gain factor .gamma. is to be optimized at the same time
as searching for the optimum codeword, the filtered excitation vector
f.sub.i (n) must be multiplied by each codeword's gain factor
.gamma..sub.i to replace s'.sub.i (n) in equation {5}, such that it
becomes:
e'.sub.i (n)=p(n)-.gamma..sub.i f.sub.i (n). {6}
The filtered excitation vector f.sub.i (n) is the filtered version of
u.sub.i (n) with the gain factor .gamma. set to one, and with the filter
states initialized to zero. In other words, f.sub.i (n) is the zero state
response of the filters excited by code vector u.sub.i (n). The zero stats
response is used since the filter state information was already
compensated for by the zero input response vector d(n) in equation {4}.
Using the value for e'.sub.i (n) from equation {6} in equation {3} gives:
##EQU4##
Expanding equation {7} produces:
##EQU5##
Defining the cross-correlation between f.sub.i (n) and p(n) as:
##EQU6##
and defining the energy in the filtered code vector f.sub.i (n) as:
##EQU7##
permits simplying equation {8} as:
##EQU8##
We now want to determine the optimal gain factor .gamma..sub.i which will
minimize E.sub.i in equation {11}. Taking the partial derivative of
E.sub.i with respect to .gamma..sub.i and setting it equal to zero permits
solving for the optimal gain factor .gamma..sub.i. This procedure yields:
.gamma..sub.i =C.sub.i /G.sub.i { 12}
which, when substituted into equation {11} gives:
##EQU9##
It can now be seen that to minimize the error E.sub.i in equation {13},
the term [C.sub.i ].sup.2 /G.sub.i must be maximized. The technique of
codebook searching which maximizes [C.sub.i ].sup.2 /G.sub.i will be
described in the flowchart of FIG. 6.
If the gain factor .gamma. is pre-calculated by coefficient analyzer 510,
then equation {7} can be rewritten as:
##EQU10##
where y'.sub.i (n) is the zero state response of the filters to excitation
vector u.sub.i (n) multiplied by the predetermined gain factor .gamma.. If
the second and third terms of equation {14} are re-defined as:
##EQU11##
and: respectively, then equation {14} can be reduced to:
##EQU12##
In order to minimize E.sub.i in equation {17} for all codewords, the term
[-2C.sub.i +G.sub.i ] must be minimized. This is the codebook searching
technique which will be described in the flowchart of FIG. 7.
Recalling that the present invention utilizes the concept of basis vectors
to generate u.sub.i (n), the vector sum equation:
##EQU13##
can be used for the substitution of u.sub.i as will be shown later. The
essence of this substitution is that the basis vectors v.sub.m (n) can be
utilized once each frame to directly pre-compute all of the terms required
for the search calculations. This permits the present invention to
evaluate each of the 2.sup.M codewords by performing a series of
multiply-accumulate operations that is linear in M. In the preferred
embodiment, only M+3 MACs are required.
FIG. 5, using optimized gains, will now be described in terms of its
operation, which is illustrated in the flowchart of FIGS. 6A and 6B.
Beginning at start 600, one frame of N input speech samples s(n) is
obtained in step 602 from the analog-to-digital converter, as was done in
FIG. 1. Next, the input speech vector s(n) is applied to coefficient
analyzer 510, and is used to compute the short term predictor parameters
STP, long term predictor parameters LTP, and weighting filter parameters
WFP in step 604. Note that coefficient analyzer 510 does not compute a
predetermined gain factor .gamma. in this embodiment, as illustrated by
the dotted arrow. The input speech vector s(n) is also applied to initial
weighting filter 512 so as to weight the input speech frame to generate
weighted input speech vector y(n) in step 606. As mentioned above, the
weighting filters perform the same function as weighting filter 132 of
FIG. 1, except that they can be moved from the conventional location at
the output of subtractor 130 to both inputs of the subtractor. Note that
vector y(n) actually represents a set of N weighted speech vectors,
wherein 1.ltoreq. n.ltoreq.N and wherein N is the number of samples in the
speech frame.
In step 608, the filter states FS are transferred from the first long term
predictor filter 524 to second long term predictor filter 525, from first
short term predictor filter 526 to second short term predictor filter 527,
and from first weighting filter 528 to second weighting filter 529. These
filter states are used in step 610 to compute the zero input response d(n)
of the filters. The vector d(n) represents the decaying filter state at
the beginning of each frame of speech. The zero input response vector d(n)
is calculated by applying a zero input to the second filter string 525,
527, 529, each having the respective filter states of their associated
filters 524, 526, 528, of the first filter string. Note that in a typical
implementation, the function of the long term predictor filters, short
term predictor filters, and weighting filters can be combined to reduce
complexity.
In step 612, the difference vector p(n) is calculated in subtractor 530.
Difference vector p(n) represents the difference between the weighted
input speech vector y(n) and the zero input response vector d(n),
previously described by equation {4}:
p(n)=y(n)-d(n). {4}
The difference vector p(n) is then applied to the first cross-correlator
533 to be used in the codebook searching process.
In terms of achieving the goal of maximizing [C.sub.i ].sup.2 /G.sub.i as
stated above, this term must be evaluated for each of the 2.sup.M codebook
vectors--not the M basis vectors. However, this parameter can be
calculated for each codeword based on parameters associated with the M
basis vectors rather than the 2.sup.M code vectors. Hence, the zero state
response vector q.sub.m (n) must be computed for each basis vector v.sub.m
(n) in step 614. Each basis vector v.sub.m (n) from basis vector storage
block 514 is applied directly to third long term predictor filter 544
(without passing through gain block 542 in this embodiment). Each basis
vector is then filtered by filter series #3, comprising long term
predictor filter 544, short term predictor filter 546, and weighting
filter 548. Zero state response vector q.sub.m (n), produced at the output
of filter series #3, is applied to first cross-correlator 533 as well as
second cross-correlator 535.
In step 616, the first cross-correlator computes cross-correlation array
R.sub.m according to the equation:
##EQU14##
Array R.sub.m represents the cross-correlation between the m-th filtered
basis vector qm(n) and p(n). Similarly, the second cross-correlator
computes cross-correlation matrix D.sub.mj in step 618 according to the
equation:
##EQU15##
where 1.ltoreq.m.ltoreq.j.ltoreq.M. Matrix D.sub.mj represents the
cross-correlation between pairs of individual filtered basis vectors. Note
that D.sub.mj is a symmetric matrix. Therefore, approximately half of the
terms need only be evaluated as shown by the limits of the subscripts.
The vector sum equation from above:
##EQU16##
can be used to derive f.sub.i (n) as follows:
##EQU17##
where f.sub.i (n) is the zero state response of the filters to excitation
vector u.sub.i (n), and where q.sub.m (n) is the zero state response of
the filters to basis vector v.sub.m (n). Equation }9}:
##EQU18##
can be rewritten using equation {20} as:
##EQU19##
Using equation {18}, this can be simplified to:
##EQU20##
For the first codework, where i=0, all bits are zero. Therefore,
.theta..sub.Om for 1.ltoreq.m.ltoreq.M equals -1 as previously discussed.
The first correlation CO, which is just C.sub.i from equation {22} where
i=0, then becomes:
##EQU21##
which is computed in step 620 of the flowchart. Using q.sub.m (n) and
equation {20}, the energy term G.sub.i may also be rewritten from equation
{10}:
##EQU22##
into the following:
##EQU23##
which may be expanded to be:
##EQU24##
Substituting by using equation {19} yields:
##EQU25##
By noting that a codeword and its complement, i.e., wherein all the
codeword bits are inverted, both have the same value of [C.sub.I ].sup.2
/G.sub.i, both code vectors can be evaluated at the same time. The
codeword computations are then halved Thus, using equation {26} evaluated
for i=0, the first energy term G.sub.0 becomes
##EQU26##
which is computed in step 622 Hence, up to this step, we have computed the
correlation term C.sub.0 and the energy term G.sub.0 for codeword zero.
Continuing with step 624, the parameters .theta..sub.im are initialized to
-1 for 1.ltoreq.m.ltoreq.M. These .theta..sub.im was parameters represent
the M interim data signals which would be used to generate the current
code vector as described by equation {1}. (The i subscript in
.theta..sub.im was dropped in the figures for simplicity.) Next, the best
correlation term C.sub.b is set equal to the pre-calculated correlation
C.sub.0, and the best energy term G.sub.b is set equal to the
pre-calculated G.sub.0. The codeword I, which represents the codeword for
the best excitation vector u.sub.I (n) for the particular input speech
frame s(n), is set equal to 0. A counter variable k is initialized to
zero, and is then incremented in step 626.
In FIG. 6B, the counter k is tested in step 628 to see if all 2.sup.M
combinations of basis vectors have been tested. Note that the maximum
value of k is 2.sup.M-1, since a codeword and its complement are evaluated
at the same time as described above. If k is less than 2.sup.M-1, then
step 630 proceeds to define a function "flip" wherein the variable l
represents the location of the next bit to flip in codeword i. This
function is performed since the present invention utilizes a Gray code to
sequence through the code vectors changing only one bit at a time.
Therefore, it can be assumed that each successive codeword differs from
the previous codeword in only one bit position. In other words, if each
successive codeword evaluated differs from the previous codeword by only
one bit, which can be accomplished by using a binary Gray code approach,
then only M add or subtract operations are needed to evaluate the
correlation term and energy term. Step 630 also sets .theta..sub.l to
-.theta..sub.l to reflect the change of bit l in the codeword.
Using this Gray code assumption, the new correlation term C.sub.k is
computed in step 632 according to the equation:
c.sub.k =c.sub.k-1 +2.theta..sub.l R.sub.l { 28}
This was derived from equation {22} by substituting -.theta..sub.l for
.theta..sub.l.
Next, in step 634, the new energy term G.sub.k is computed according to the
equation:
##EQU27##
which assumes that D.sub.jk is stored as a symmetric matrix with only
values for j.ltoreq.k being stored. Equation {29} was derived from
| | |