|
Description  |
|
|
DESCRIPTION
1. Technical Field
The present invention relates to digital coding techniques for a speech
signal mainly intended for transmission.
2. Background Art
Digital conversion (coding) of a signal varying with time is performed by
sampling and quantizing the samples. This involves prior division of the
signal amplitude measuring scale into segments and assigning a unique
digital value to each one of said segments. During the quantizing
operation, that is the conversion of the signal samples into digital
values, all signals samples the amplitude of which falls within the limits
of one segment are coded with the same digital value. Naturally, this
results in an inaccurate transcription leading to an error between the
original signal and its coded expression. The operations performed are
said to generate a quantizing noise. It is obvious that by reducing the
segments widths said noise can be reduced. However, this means that for
the limits of a given amplitude variation of the measuring scale the
number of segments increases and hence the number of digital values
required to code said segments, and consequently the number of bits
required to digitally define the number of bits also increases. This makes
the devices used to subsequently process the digitally expressed signal
more complex and, if the signal is to be transmitted from an emitter to a
receiver station, results in a congestion of the signal transmission
channels. It has therefore been necessary to try to reduce the bit rate
required for coding while ensuring an appropriate signal/noise ratio. Or
inversely, once the total bit rate assigned to the quantization has been
defined, attempts have been made to optimize the use of the available bits
so as to minimize the noise.
These findings are at the origin of the so-called differential or delta PCM
encoding where the quantizing bits are used to code the signal changes
only between two consecutive sampling times and not for the whole
amplitude of each sample. This results in a lower voltage swing to be
quantized and therefore in a better use of the quantizing bits owing to
the division of the measuring scale into thinner segments than those
obtained if the same number of bits had been used to directly code the
entire amplitude of the samples of the originally supplied signal.
These methods have then been further improved by taking into consideration
the statistical characteristics of the signal to be coded. For example,
realizing that for speech-originating signals the frequency range
containing these is relatively limited and that the energy of these
signals is in general concentrated in the low voice frequency range
regions, it has been proposed to split the original speech frequency range
into several bands and to allocate more bits for quantizing the low
frequency bands, hence to code the signals in these bands more accurately
than those in the higher bands. An example of an embodiment of this type
is described by Crochiere et al in an article published in the Bell System
Technical Journal of October 1976. The speech signal there is first
filtered by a set of adjacent band pass filters covering the whole
telephone range. The resulting signals frequency spectra are then shifted
into the base band frequency range by modulation, and sampled at their
Nyquist frequency. Then each signal (or sub-band) is separately quantized
in a non-uniform manner, that is by allocating more bits to the lower
bands than to the higher bands. A statistical study permits choosing
several quantizing bit rates and defining an appropriate distribution of
said bits over the subbands. This type of coding is, however, based on
statistical data and not on actual conditions and therefore does not
ensure optimum coding.
In other systems, the signal coding quality has been improved by basing the
bit allocation no longer on statistical results but on real data directly
obtained from the characteristics of the signal to be coded.
In this case, in order to prevent the coder from becoming too complex and
the coding/decoding system from becoming impractical, it was necessary to
use techniques such as those described in the U.S. Pat. No. 4,142,071
granted to the applicant of this application, "Quantizing Process with
Dynamic Allocation of the Available bit Resources, and Device for
Implementing Said Process". This process is essentially applied to the
well-known BCPCM coding technique, where the signal is coded by segments
of predetermined duration (K samples by segment). The signal is
furthermore split into p sub-bands in the frequency range and each subband
is separately coded according to its own characteristics. More precisely,
the number of bits n.sup.i to be allocated for quantizing the signal of
the ith sub-band is derived for K samples of a given block or segment from
the characteristics of said K samples. In other words, the process
described in the U.S. Pat. No. 4,142,071 allows optimizing to a certain
extent the distribution of the coding system resources and economically
using these resources based on the characteristics of the signal to be
coded.
For more information on the BCPCM-type coding, reference can be made to the
article by A. Croisier, relating to a presentation made in the
International Seminar of Digital Communications 1974 in Zurich and
entitled "Progress in PCM and Delta Modulation: Block Companded Coding of
Speech Signal". The method described in the article by A. Croisier can be
summarized as follows: the signal to be coded is first sampled and then
the sample flow obtained is divided into consecutive successive segments
of a given duration, or into blocks of K samples, each of said blocks
being then quantized. For this, each of the blocks is assigned a scale
factor "C" so that the biggest sample of the block cannot fall outside the
coding limits. Then the scale factor and the K samples of the block are
quantized. The scale factor C (or block characteristic) together with the
K samples supplies, after quantizing, the digital data which completely
define the sample block.
The U.S. Pat. No. 4,142,071 herein incorporated by reference describes how
the speech coding quality can be improved and the quantizing noise of the
speech-originating signal reduced by dynamically and efficiently
distributing the bits available for said quantization. For this, the whole
speech signal is distributed over several subbands in the frequency range,
and the content of each subband is BCPCM coded. This method allows better
use of the quantizing bits. But whether the signal be distributed over
several subbands or not, BCPCM coding does not permit all available coding
bits to be assigned to quantizing the signal samples. In effect, the scale
factors associated with the sample blocks must also be quantized.
Furthermore, the scale factor is so important for signal decoding that it
is necessary to protect it by associating it with one (or several) parity
bits. This further reduces the bits available for quantizing the samples.
It is of course possible to reduce the incidence of the scale factor
presence on the number of bits remaining available for quantizing the
samples of the signal properly speaking by prolonging the duration of each
segment or, in other words, by processing blocks including a larger number
of samples. For example, instead of processing blocks representing a 16 ms
signal, 32 ms blocks could be chosen which would reduce the number of
scale factors to be quantized by a factor of two. However, this solution
has secondary effects which, during decoding, produce parasitic noise
resembling low-level echoes.
SUMMARY OF THE INVENTION
An object of the present invention is to improve the so-called BCPCM coding
method.
Another object of the invention is to provide a BCPCM coding method for a
signal (especially of speech origin) the spectrum of which covers a
predetermined and relatively limited frequency band, said method allowing
the increase of the number of bits available for quantizing the samples of
the signal properly speaking while minimizing as far as possible the
negative secondary effects which might result therefrom.
A further object of the invention is to design a device for implementing
the above method.
More precisely, said invention relates to a speech signal BCPCM coding
method which includes an analysis of the characteristics of the signal to
be coded. This analysis qualifies each sample block to be processed as a
transient block or non-transient block, deduced from said qualification is
the number of scale factors to be associated with said block to derive
therefrom the number of bits required for the quantization of said scale
factor(s) before quantizing the samples of the signal by means of the
quantizing bits remaining available for the considered block.
The foregoing and other objects, features and advantages of the invention
will be apparent from the following, more particular description of a
preferred embodiment of the invention, as illustrated in the accompanying
drawings.
DESCRIPTION OF THE DRAWINGS
FIGS. 1 and 4 are block diagrams of transmission devices for implementing
the method according to the invention.
FIG. 2 is a block diagram of an embodiment of one of the elements of FIG.
1.
FIG. 3 is a schematic diagram of a circuit used in FIG. 2.
FIG. 5 is a schematic diagram of a circuit constructed according to the
invention.
FIG. 6 is a logarithmic plot which may be used for transcoding twelve bit
encoded scale factors into four bit words.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
FIG. 1 illustrates a device for subband coding of the speech-originating
signal for use in a multiplex transmission system. A system somewhat
similar to that of FIG. 1 has already been described in an article
published by D. Esteban and C. Galand based on a presentation at the 1978
IEEE ICASSP held at Tulsa, Oklahoma Apr. 10-12, 1978, and entitled: "32
KBps CCITT Compatible Split Band Coding Scheme". The speech-originating
signal which covers a low frequency band of up to about 4 KHz is applied
to input IN. It is A/D converted at its Nyquist frequency, that is, at 8
KHz, and coded in conventional PCM at 12 bits by means of the same A/D
device. The digital samples X.sub.s are transmitted to a bank of filters
FB. This bank distributes the signal over p contiguous frequency subbands
containing samples S.sub.j.sup.i (where i=1, 2, 3, . . . , p designates
the rank of the subband to which the samples belong, and where j=1, 2, . .
. , K' designates a parameter defined hereafter). Samples S.sub.j.sup.i,
initially quantized at 12 bits, are then requantized at lower bit rates.
However, these new bit rates are dynamically adjusted to the
characteristics (energy) of the signal contained in the subband considered
during a time interval of predetermined duration (block coding). For this,
the digital information associated with the subbands and originating from
filter bank FB are transmitted to a parameter generator PAR and to a
requantizing device DQ. Parameter generator PAR supplies to device DQ
parameters n.sup.i and C.sup.i. Parameter n.sup.i defines the bit rate to
be allocated to requantizing the i.sup.th subband for said predetermined
duration. Said bit rate is governed by the relation:
##EQU1##
where N is the total number of bits provided for requantizing the samples
of the set of p subbands, and where coefficients C.sup.i designate the
so-called scale factors defined hereafter.
Values n.sup.i and C.sup.i are used to adjust in DQ the requantizing step
size of the i.sup.th subband so that:
##EQU2##
(In practice, as explained hereafter, the term C.sup.i derived from
C.sup.i is used instead of C.sup.i).
Thus, requantizing device DQ provides the requantized samples
S.sub.j.sup.i.
The scale factor of the i.sup.th subband is defined by means of the
relation (3) given hereafter according to the BCPCM-type methods:
##EQU3##
where K' designates the number of samples contained in the i.sup.th
subband during said time interval of predetermined duration and j
designates the rank of the sample in a sample block obtained in said
i.sup.th subband during the same time interval.
It has been decided to divide the frequency band of 0 to 4 KHz into 16
adjacent subbands (p=16). (As can be seen hereafter, the three highest
sub-bands can be ignored since 13 subbands are sufficient to cover the
telephone frequency range of 0 to 3200 Hz). Furthermore, said
predetermined duration has been fixed at 32 ms. If the input signal IN is
sampled at the Nyquist frequency, that is at 8 KHz, each 32 ms sample
block contains 256 samples. After coding by the A/D device at 12 bits,
these samples pass into filter FB and are distributed over the 16
subbands. Such a filter has been described in the above-mentioned U.S.
patent. In addition to the filtering function per se, this filter provides
a so-called decimation operation. As a result, for each time interval of
predetermined duration considered, the number of samples available on each
of the 16 subbands after passing filter FB is:
##EQU4##
The quantized (or requantized) scale factors C.sup.i and the requantized
samples S.sub.j.sup.i are multiplexed on a digital transmission line by
means of the multiplexor MPX. In this case, block synchronization
characters must be added so that, at the other end of the transmission
line, the receiver can identify the received block samples and restore the
speech signal. When transmitting at 16 Kbps, 512 bits are available for 32
ms. The more bits are assigned to the data other than the signal samples
properly speaking (synchronization characters and scale factors mainly),
the less bits remain for said samples. One therefore tries to reduce the
number of bits for coding the scale factors while ensuring high-quality
coding. To avoid secondary effects such as echoes probably due to the fact
that the 32 ms blocks are too long, as indicated above, the length of each
block is first adapted to the characteristics of the scale factor
concerned. In other words, the value of C.sup.i is changed more or less
often depending on whether its variation is slow (non-transient block) or
fast (transient block). In the present case, it has been decided to
transmit a maximum of two values C.sub.i by subband, which limits the
block considered to two 16 ms blocks if the initial 32 ms block is of the
transient type, and to transmit only one C.sup.i by sub-band, in 32 ms, if
the sample block is non-transient. For each sample block, two values
C.sup.i are determined:
##EQU5##
In other words (see FIG. 2), the samples of each subband are submitted to
a sort operation to select the samples having the largest amplitude during
the first 16 ms and during the last 16 ms of the duration of the sample
block considered.
The values .sup.1 C.sup.i and .sup.2 C.sup.i are then recoded with four
instead of 12 bits by means of a logarithmic scale (represented in FIG. 6)
supplying the 4-bit coded expression .sup.1 C.sup.i and .sup.2 C.sup.i.
The transcoding operation of the 12-bit C.sup.i into a 4-bit C.sup.i can
be performed by means of a so-called T.L.U. table representing a storage
which, when addressed by 12-bit words, supplies an output of 4-bit words.
(In practice, the transcoding operation can be performed more economically
by means of a conventional successive test method). For example, if
C.sup.i =60, it is coded 000000111100 at 12 bits. For a 4-bit transcoding
operation, it is assimilated to C.sup.i =64, that is, the seventh level of
decimal-coded binary values at 12 bits and it is represented at 4 bits by
C.sup.i =0110 (see FIG. 6).
For each 32 ms sample block, there are thus two values C.sup.i by signal
subband, representing 16 C.sup.i pairs for all p subbands. Each of said
pairs is then used to determine the transient or non-transient type of the
signal segment represented by the sample block being processed. For this,
the increments of .DELTA.C.sup.i are determined as follows:
.DELTA.C.sup.i =.sup.1 C.sup.i -.sup.2 C.sup.i for i=1, 2, . . . , p (6)
While p=16 has been chosen to cover the frequency band (0-4000 Hz) when the
speech signal to be coded is to be transmitted in the telephone band
(300-3200 Hz), the last three sub-bands can be ignored thus preserving
only the sub-bands, for which i=1, 2, 3, . . . , 13.
The preserved values .DELTA.C.sup.i are then compared with predetermined
thresholds or limiting values, for example +3 and -4 which are
binary-coded with three bits. Any sample block is called transient for
which one of the values .DELTA.C.sup.i is:
.DELTA.C.sup.i >3 (7)
or {
.DELTA.C.sup.i <-4 (8)
That is, .DELTA.C.sup.i outside the limits defined by the +3 and -4
thresholds.
If one of the conditions (7) or (8) is fulfilled, the two corresponding
values .sup.1 C.sup.i and .sup.2 C.sup.i are transmitted to multiplexor
MPX. Otherwise, only the greater one of the two values C.sup.i which is
automatically determined because the sign of .DELTA.C.sup.i is already
known, is transmitted. Device PAR of FIG. 1 implemented according to the
aforementioned U.S. patent and ICASSP article is modified according to
FIG. 2 taking into consideration the above expression.
FIG. 2 includes a maximum generator (MAX) performing the operations (4) and
(5). (MAX) can be any sorting device operating in parallel on the
different subbands (for example, i=1, 2, . . . , 13). For each subband
considered, two 12-bit coded values of C.sup.i, that is .sup.1 C.sup.i and
.sup.2 C.sup.i, can be derived because the values S.sub.j.sup.i used to
determine C.sup.i are coded with 12 bits. The output of multiplexor MAX is
sent to a device T.L.U. which contains either a read only memory or a
device containing an algorithm of successive tests method. Device T.L.U.
provides on the one hand 4-bit coded C.sup.i pairs referred to as .sup.1,2
C.sup.i, and on the other hand 12-bit coded C.sup.i pairs referred to as
.sup.1,2 C.sup.i. The expressions .sup.1,2 C.sup.i are transmitted to a
set of gates Co and to a comparator COMP. Comparator COMP which performs
the operations according to (6), (7) and (8) and supplies a 1-bit output G
indicating whether the processed sample block is transient or
non-transient. This bit G is a so-called MODE bit which activates all
gates Go so that for each subband either the two values .sup.1,2 C.sup.i
and the two values .sup.1,2 C.sup.i pass (if the processed sample block is
transient) or the greater one of values .sup.1 C.sup.i and .sup.2 C.sup.i
and the greater one of values .sup.1 C.sup.i and .sup.2 C.sup.i pass (if
the block is non-transient).
FIG. 3 represents a logic circuit performing the operations of comparator
COMP. This circuit comprises subtractors 20 to 32 determining the values
.DELTA.C.sup.i. A set of comparators 33 to 58 compare the values
.DELTA.C.sup.i with the predetermined thresholds (-4 and +3). Then the
logic OR circuits referenced from 0.sup.1 to 0.sup.14 combine the outputs
of comparators 33 to 58 to determine if any one of values .DELTA.C.sup.i
is greater than 3 or less than -4. If so, the output of 0.sup.14 would
indicate this by means of the MODE bit G. This bit G is transmitted to the
multiplexor at the same time as the selected scale factors C.sup.i. The
importance of bit G is such that in practice it is useful to protect it by
associating it with one or two protection bits. Thus, a so-called 2- or
3-bit MODE character is obtained.
In order to facilitate decoding operations at the other end of the
transmission line, the receiver must be able to relocate the blocks within
the received bit train. For this, the multiplexor MPX associates a
predetermined so-called synchronization character with each sample block.
Under these conditions, the message defining a sample block has the
following format:
______________________________________
Samples C.sup.i MODE SYNCHRO
.rarw.time
______________________________________
FIG. 4 illustrates a receiver located at the end of the transmission line
opposed to that connected to the output of multiplexor MPX. This receiver
has the task of restoring the original speech signal. A similar receiver
has already been described in the above-mentioned IEEE ICASSP publication.
It should be noted that the bit train received at input IN connected to
the transmission line is demultiplexed in DMPX. This means that
demultiplexor DMPX identifies the data blocks reveived by means of the
synchronization character (SYNCHRO) which it retrieves from the message
received. It also suppresses in this message the so-called protection
bits, and it separates the values C.sup.i from the values S.sub.j.sup.i.
Values C.sup.i are transmitted to an inverse parameter generator PAR.
Demultiplexor DMPX, with the help of bit G, also recognizes whether it has
received a 32 ms block or two 16 ms blocks. In other words, it
distinguishes between non-transient and transient blocks and decoding is
organized accordingly. The values C.sup.i are transmitted to the inverse
generator PAR and the values S.sub.j.sup.i are sent to an inverse
requantizer DQ. The inverse generator PAR uses a table such as T.L.U., but
inverted, to transcode the four bits into 12 bits. It has been decided to
code at an average value all expressions C.sup.i within two limits and the
transcoding operation thus supplies the estimated values C.sup.i. (This
explains why at the input of DQ, the value C.sup.i has been represented
instead of C.sup.i). Having obtained the values C.sup.i, generator PAR
derives therefrom the values n.sup.i using expression (1). By means of the
values n.sup.i and C.sup.i, the inverse requantizer DQ determines the
values Q.sup.i (expression (2)). These are used to process the values
S.sub.j.sup.i to derive therefrom the values S.sub.j.sup.i which, when
supplied to inverse filter bank FB, allow the samples X.sub.s to be
rebuilt. The latter are sent to the digital/analog converter D/A which
supplies the reconstructed speech signal.
FIG. 5 illustrates a device for further improving the distribution of the
coding bits. It is to be noted that bit G distinguishes between
non-transient and transient blocks. This bit will be used to control the
coding type to be used for the scale factors. When two non-transient
blocks follow each other which indicates that the scale factor changes
very slowly (and if this variation takes place between two predetermined
thresholds, for example +1 and -2), the scale factor C.sup.i will be
delta-coded thus yielding .DELTA.C.sup.i. Other situations can be
envisaged; for example, when a non-transient block follows or precedes a
transient block, a delta-type coding could also be used for the values
C.sup.i and .sup.1 C.sup.i or .sup.2 C.sup.i. By way of example, let's
take the case where delta-coding is only used if two non-transient blocks
follow each other. If a block is non-transient, the consecutive values
C.sup.i are sent to the input (+) of a subtractor 50 whose input (-)
receives the values C.sup.i of the preceding block. These preceding values
C.sup.i originate from a 32 ms delay line DL. An adder 52 located at the
input of delay line DL adds the output of 50 and the DL output. Thus, the
output of subtractor 50 provides the variation of the values C.sup.i which
can be requantized, provided said variation is comprised between -1 and
+2. This can be checked by means of a circuit similar to that of FIG. 3,
which would supply a control signal G' instead of G. This control should
open the line located at the DL output (switch I on zero position).
While the invention has been particularly illustrated in the drawings and
described with reference to a preferred embodiment thereof, it will be
understood by those skilled in the art that numerous changes in form and
detail may be made therein without departing from the spirit and scope of
this invention. Those skilled in the art can choose the coding method
described here for example for storing speech information instead of
transmitting. In such a case, they can delete the so-called
synchronization characters added by the multiplexer and use the system
without major modifications.
* * * * *
|
|
|
|
|
Description  |
|