|
Claims  |
|
|
We claim:
1. An encoder for the encoding of digital information, said digital
information comprising signal sample blocks representing analog audio
signals, comprising
means for generating subband information blocks, each subband information
block comprising a set of digital words generated in response to a signal
sample block, said means comprising means for applying a discrete
transform function to each of said signal sample blocks, and
means for quantizing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, but not quantizing all digital words with a fixed
number of bits.
2. An encoder for the encoding of digital information, said digital
information comprising signal sample blocks representing analog audio
signals, comprising
means for defining subbands and for generating subband information blocks,
each subband information block comprising a set of digital words generated
in response to a signal sample block, and
means for quantizing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, and for quantizing one or more digital words with a
fixed number of bits and an adaptive number of bits.
3. An encoder according to claims 1 or 2 further comprising means for
grouping a plurality of subband information blocks, and for representing a
plurality of said digital words in block-floating-point form comprising a
mantissa associated with an exponent, said exponent being shared by
mantissas from a plurality of subbands in said plurality of subband
information blocks.
4. An encoder according to claims 1 or 2 further comprising means for
representing one or more of said digital words in block-floating-point
form comprising a mantissa associated with an exponent, for normalizing at
least one mantissa, and for dropping a sign bit for a single normalized
mantissa uniquely associated with an exponent whenever the value of said
sign bit can be established from the value of a most significant data bit
in said normalized mantissa.
5. An encoder according to claims 1 or 2, wherein said analog audio signals
represent a plurality of audio channels, further comprising means for
grouping a plurality of subband information blocks, each block in said
plurality of subband information blocks corresponding to a respective one
channel of said plurality of audio channels, and for representing a
plurality of said digital words in block-floating-point form comprising a
mantissa associated with an exponent, said exponent being shared by
mantissas from a plurality of subbands in said plurality of subband
information blocks.
6. A decoder for the recovery of digital information from a coded signal,
said digital information representing analog audio signals, comprising
means for reconstructing digital words from said coded signal, said means
reconstructing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, but not reconstructing all digital words with a
fixed number of bits, and
means for generating signal sample blocks in response to said digital
words, said means comprising means for applying an inverse discrete
transform function to the reconstructed digital words.
7. A decoder for the recovery of digital information from a coded signal,
said digital information representing analog audio signals, comprising
means for reconstructing digital words from said coded signal, said means
reconstructing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, and for reconstructing one or more digital words
with a fixed number of bits and an adaptive number of bits, and
means for generating signal sample blocks in response to said digital
words.
8. A decoder according to claims 6 or 7, wherein said coded signal
comprises mantissas associated with exponents, further comprising means
for reconstructing a plurality of digital words from a plurality of
mantissas sharing an exponent.
9. A decoder according to claims 6 or 7 further comprising means for
reconstructing any missing sign bit for any of said digital words
comprising a normalized mantissa uniquely associated with an exponent.
10. A decoder according to claim 8, wherein said analog audio signals
represent a plurality of audio channels, each digital word in said
plurality of digital words corresponding to a respective one of said
plurality of audio channels, further comprising means for generating a
plurality of signal sample blocks in response to said plurality of digital
words, each one of said plurality of signal sample blocks corresponding to
digital information representing a respective one of said plurality of
audio channels.
11. A system comprising an encoder according to claim 1 and a decoder
according to claim 6.
12. A system comprising an encoder according to claim 2 and a decoder
according to claim 7.
13. An encoding method for the encoding of digital information, said
digital information comprising signal sample blocks representing analog
audio signals, comprising
generating subband information blocks, each subband information block
comprising a set of digital words generated in response to a signal sample
block, by applying a discrete transform function to each of said signal
sample blocks, and
quantizing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, but not quantizing all digital words with a fixed
number of bits.
14. An encoding method for the encoding of digital information, said
digital information comprising signal sample blocks representing analog
audio signals, comprising
defining subbands and generating subband information blocks, each subband
information block comprising a set of digital words generated in response
to a signal sample block, and
quantizing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, and quantizing one or more digital words with a
fixed number of bits and an adaptive number of bits.
15. An encoding method according to claims 13 or 14 further comprising
grouping a plurality of subband information blocks, and representing a
plurality of said digital words in block-floating-point form comprising a
mantissa associated with an exponent, said exponent being shared by
mantissas from a plurality of subbands in said plurality of subband
information blocks.
16. An encoding method according to claims 13 or 14 further comprising
representing one or more of said digital words in block-floating-point
form comprising a mantissa associated with an exponent, normalizing at
least one mantissa, and dropping a sign bit for a single normalized
mantissa uniquely associated with an exponent whenever the value of said
sign bit can be established from the value of a most significant data bit
in said normalized mantissa.
17. An encoding method according to claims 13 or 14, wherein said analog
audio signals represent a plurality of audio channels, further comprising
grouping a plurality of subband information blocks, each block in said
plurality of subband information blocks corresponding to a respective one
channel of said plurality of audio channels, and representing a plurality
of said digital words in block-floating-point form comprising a mantissa
associated with an exponent, said exponent being shared by mantissas from
a plurality of subbands in said plurality of subband information blocks.
18. A decoding method for the recovery of digital information from a coded
signal, said digital information representing analog audio signals,
comprising
reconstructing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, but not reconstructing all digital words with a
fixed number of bits, and
generating signal sample blocks in response to said digital words by
applying an inverse discrete transform function to the reconstructed
digital words.
19. A decoding method for the recovery of digital information from a coded
signal, said digital information representing analog audio signals,
comprising
reconstructing with a fixed number of bits one or more digital words
corresponding to at least the lowest frequency spectral component of said
analog audio signals, and reconstructing one or more digital words with a
fixed number of bits and an adaptive number of bits, and
generating signal sample blocks in response to said digital words.
20. A decoding method according to claims 18 or 19, wherein said coded
signal comprises mantissas associated with exponents, further comprising
reconstructing a plurality of digital words from a plurality of mantissas
sharing an exponent.
21. A decoding method according to claims 18 or 19 further comprising
reconstructing any missing sign bit for any of said digital words
comprising a normalized mantissa uniquely associated with an exponent.
22. A decoding method according to claim 20, wherein said analog audio
signals represent a plurality of audio channels, each digital word in said
plurality of digital words corresponding to a respective one of said
plurality of audio channels, further comprising generating a plurality of
signal sample blocks in response to said plurality of digital words, each
one of said plurality of signal sample blocks corresponding to digital
information representing a respective one of said plurality of audio
channels.
23. A system method comprising an encoding method according to claim 13 and
a decoding method according to claim 18.
24. A system method comprising an encoding method according to claim 14 and
a decoding method according to claim 19. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The invention relates in general to high-quality low bit-rate digital
signal processing of audio signals, such as music signals.
There is considerable interest among those in the field of signal
processing to discover methods which minimize the amount of information
required to represent adequately a given signal. By reducing required
information, signals may be transmitted over communication channels with
lower bandwidth, or stored in less space. With respect to digital
techniques, minimal informational requirements are synonymous with minimal
binary bit requirements.
Two factors limit the reduction of bit requirements:
(1) A signal of bandwidth W may be accurately represented by a series of
samples taken at a frequency no less than 2.multidot.W. This is the
Nyquist sampling rate. Therefore, a signal T seconds in length with a
bandwidth W requires at least 2.multidot.W.multidot.T number of samples
for accurate representation.
(2) Quantization of signal samples which may assume any of a continuous
range of values introduces inaccuracies in the representation of the
signal which are proportional to the quantizing step size or resolution.
These inaccuracies are called quantization errors. These errors are
inversely proportional to the number of bits available to represent the
signal sample quantization.
If coding techniques are applied to the full bandwidth, all quantizing
errors, which manifest themselves as noise, are spread uniformly across
the bandwidth. Techniques which may be applied to selected portions of the
spectrum can limit the spectral spread of quantizing noise. Two such
techniques are subband coding and transform coding. By using these
techniques, quantizing errors can be reduced in particular frequency bands
where quantizing noise is especially objectionable by quantizing that band
with a smaller step size.
Subband coding may be implemented by a bank of digital bandpass filters.
Transform coding may be implemented by any of several time-domain to
frequency-domain transforms which simulate a bank of digital bandpass
filters. Although transforms are easier to implement and require less
computational power and hardware than digital filters, they have less
design flexibility in the sense that each bandpass filter "frequency bin"
represented by a transform coefficient has a uniform bandwidth. By
contrast, a bank of digital bandpass filters can be designed to have
different subband bandwidths. Transform coefficients can, however, be
grouped together to define "subbands" having bandwidths which are
multiples of a single transform coefficient bandwidth. The term "subband"
is used hereinafter to refer to selected portions of the total signal
bandwidth, whether implemented by a subband coder or a transform coder. A
subband as implemented by transform coder is defined by a set of one or
more adjacent transform coefficients or frequency bins. The bandwidth of a
transform coder frequency bin depends upon the coder's sampling rate and
the number of samples in each signal sample block (the transform length).
Two characteristics of subband bandpass filters are particularly critical
to the performance of high-quality music signal processing systems. The
first is the bandwidth of the regions between the filter passband and
stopbands (the transition bands). The second is the attenuation level in
the stopbands. As used herein, the measure of filter "selectivity" is the
steepness of the filter response curve within the transition bands
(steepness of transition band rolloff), and the level of attenuation in
the stopbands (depth of stopband rejection).
These two filter characteristics are critical because the human ear
displays frequency-analysis properties resembling those of highly
asymmetrical tuned filters having variable center frequencies. The
frequency-resolving power of the human ear's tuned filters varies with
frequency throughout the audio spectrum. The ear can discern signals
closer together in frequency at frequencies below about 500 Hz, but
widening as the frequency progresses upward to the limits of audibility.
The effective bandwidth of such an auditory filter is referred to as a
critical band. An important quality of the critical band is that
psychoacoustic-masking effects are most strongly manifested within a
critical band--a dominant signal within a critical band can suppress the
audibility of other signals anywhere within that critical band. Signals at
frequencies outside that critical band are not masked as strongly. See
generally, the Audio Engineering Handbook, K. Blair Benson ed.,
McGraw-Hill, San Francisco, 1988, pages 1.40-1.42 and 4.8-4.10.
Psychoacoustic masking is more easily accomplished by subband and transform
coders if the subband bandwidth throughout the audible spectrum is about
half the critical bandwidth of the human ear in the same portions of the
spectrum. This is because the critical bands of the human ear have
variable center frequencies that adapt to auditory stimuli, whereas
subband and transform coders typically have fixed subband center
frequencies. To optimize the opportunity to utilize psychoacoustic-masking
effects, any distortion artifacts resulting from the presence of a
dominant signal should be limited to the subband containing the dominant
signal. If the subband bandwidth is about half or less than half of the
critical band (and if the transition band rolloff is sufficiently steep
and the stopband rejection is sufficiently deep), the most effective
masking of the undesired distortion products is likely to occur even for
signals whose frequency is near the edge of the subband passband
bandwidth. If the subband bandwidth is more than half a critical band,
there is the possibility that the dominant signal will cause the ear's
critical band to be offset from the coder's subband so that some of the
undesired distortion products outside the ear's critical bandwidth are not
masked. These effects are most objectionable at low frequencies where the
ear's critical band is narrower.
Transform coding performance depends upon several factors, including the
signal sample block length, transform coding errors, and aliasing
cancellation.
Block Length
As block lengths become shorter, transform encoder and decoder performance
is adversely affected not only by the consequential widening of the
frequency bins, but also by degradation of the response characteristics of
the bandpass filter frequency bins: (1) decreased rate of transition band
rolloff, and (2) reduced level of stopband rejection. This degradation in
filter performance results in the undesired creation of or contribution to
transform coefficients in nearby frequency bins in response to a desired
signal. These undesired contributions are called sidelobe leakage.
Thus, depending on the sampling rate, a short block length may result in a
nominal filter bandwidth exceeding the ear's critical bandwidth at some or
all frequencies, particularly low frequencies. Even if the nominal subband
bandwidth is narrower than the ear's critical bandwidth, degraded filter
characteristics manifested as a broad transition band and/or poor stopband
rejection may result in significant signal components outside the ear's
critical bandwidth. In such cases, greater constraints are ordinarily
placed on other aspects of the system, particularly quantization accuracy.
Another disadvantage resulting from short sample block lengths is the
exacerbation of transform coding errors, described in the next section.
Transform Coding Errors
Discrete transforms do not produce a perfectly accurate set of frequency
coefficients because they work only a finite segment of the signal.
Strictly speaking, discrete transforms produce a time-frequency
representation of the input time-domain signal rather than a true
frequency-domain representation which would require infinite transform
lengths. For convenience of discussion here, however, the output of
discrete transforms will be referred to as a frequency-domain
representation. In effect, the discrete transform assumes the sampled
signal only has frequency components whose periods are a submultiple of
the finite sample interval. This is equivalent to an assumption that the
finite-length signal is periodic. The assumption in general is not true.
The assumed periodicity creates discontinuities at the edges of the finite
time interval which cause the transform to create phantom high-frequency
components.
One technique which minimizes this effect is to reduce the discontinuity
prior to the transformation by weighting the signal samples such that
samples near the edges of the interval are close to zero. Samples at the
center of the interval are generally passed unchanged, i.e., weighted by a
factor of one. This weighting function is called an "analysis window" and
may be of any shape, but certain windows contribute more favorably to
subband filter performance.
As used herein, the term "analysis window" refers merely to the windowing
function performed prior to application of the forward transform. As will
be discussed below, the design of an analysis window used in the invention
is constrained by synthesis window design considerations. Therefore,
design and performance properties of an "analysis window" as that term is
commonly used in the art may differ from such analysis windows as
implemented in this invention.
While there is no single criteria which may be used to assess a window's
quality, general criteria include steepness of transition band rolloff and
depth of stopband rejection. In some applications, the ability to trade
steeper rolloff for deeper rejection level is a useful quality.
The analysis window is a time-domain function. If no other compensation is
provided, the recovered or "synthesized" signal will be distorted
according to the shape of the analysis window. There are several
compensation methods. For example:
(a) The recovered signal interval or block may be multiplied by an inverse
window, one whose weighting factors are the reciprocal of those for the
analysis window. A disadvantage of this technique is that it clearly
requires that the analysis window not go to zero at the edges.
(b) Consecutive input signal blocks may be overlapped. By carefully
designing the analysis window such that two adjacent windows add to unity
across the overlap, the effects of the window will be exactly compensated.
(But see the following paragraph.) When used with certain types of
transforms such as the Discrete Fourier Transform (DFT), this technique
increases the number of bits required to represent the signal since the
portion of the signal in the overlap interval must be transformed and
transmitted twice. For these types of transforms, it is desirable to
design the window with an overlap interval as small as possible.
(c) The synthesized output from the inverse transform may also need to be
windowed. Some transforms, including one used in the current invention,
require it. Further, quantizing errors may cause the inverse transform to
produce a time-domain signal which does not go to zero at the edges of the
finite time interval. Left alone, these errors may distort the recovered
time-domain signal most strongly within the window overlap interval. A
synthesis window can be used to shape each synthesized signal block at its
edges. In this case, the signal will be subjected to an analysis and a
synthesis window, i.e., the signal will be weighted by the product of the
two windows. Therefore, both windows must be designed such that the
product of the two will sum to unity across the overlap. See the
discussion in the previous paragraph.
Short transform sample blocks impose greater compensation requirements on
the analysis and synthesis windows. As the transform sample blocks become
shorter there is more sidelobe leakage through the filter's transition
band and stopband. A well shaped analysis window reduces this leakage.
Sidelobe leakage is undesirable because it causes the transform to create
spectral coefficients which misrepresent the frequency of signal
components outside the filter's passband. This misrepresentation is a
distortion called aliasing.
Aliasing Cancellation
The Nyquist theorem holds that a signal may be accurately recovered from
discrete samples when the interval between samples is no larger than
one-half the period of the signal's highest frequency component. When the
sampling rate is below this Nyquist rate, higher-frequency components are
misrepresented as lower-frequency components. The lower-frequency
component is an "alias" for the true component.
Subband filters and finite digital transforms are not perfect passband
filters. The transition between the passband and stopband is not
infinitely sharp, and the attenuation of signals in the stopband is not
infinitely great. As a result, even if a passband-filtered input signal is
sampled at the Nyquist rate suggested by the passband cut-off frequency,
frequencies in the transition band above the cutoff frequency will not be
faithfully represented.
It is possible to design the analysis and synthesis filters such that
aliasing distortion is automatically cancelled by the inverse transform.
Quadrature Mirror Filters in the time domain possess this characteristic.
Some transform coder techniques, including one used in the present
invention, also cancel alias distortion.
Suppressing the audible consequences of aliasing distortion in transform
coders becomes more difficult as the sample block length is made shorter.
As explained above, shorter sample blocks degrade filter performance: the
passband bandwidth increases, the passband-stopband transition becomes
less sharp, and the stopband rejection deteriorates. As a result, aliasing
becomes more pronounced. If the alias components are coded and decoded
with insufficient accuracy, these coding errors prevent the inverse
transform from completely cancelling aliasing distortion. The residual
aliasing distortion will be audible unless the distortion is
psychoacoustically masked. With short sample blocks, however, some
transform frequency bins may have a wider passband than the auditory
critical bands, particularly at low frequencies where the ear's critical
bands have the greatest resolution. Consequently, alias distortion may not
be masked. One way to minimize the distortion is to increase quantization
accuracy in the problem subbands, but that increases the required bit
rate.
Bit-rate Reduction Techniques
The two factors listed above (Nyquist sample rate and quantizing errors)
should dictate the bit-rate requirements for a specified quality of signal
transmission or storage. Techniques may be employed, however, to reduce
the bit rate required for a given signal quality. These techniques exploit
a signal's redundancy and irrelevancy. A signal component is redundant if
it can be predicted or otherwise provided by the receiver. A signal
component is irrelevant if it is not needed to achieve a specified quality
of representation. Several techniques used in the art include:
(1) Prediction: a periodic or predictable characteristic of a signal
permits a receiver to anticipate some component based upon current or
previous signal characteristics.
(2) Entropy coding: components with a high probability of occurrence may be
represented by abbreviated codes. Both the transmitter and receiver must
have the same code book. Entropy coding and prediction have the
disadvantages that they increase computational complexity and processing
delay. Also, they inherently provide a variable rate output, thus
requiring buffering if used in a constant bit-rate system.
(3) Nonuniform coding: representations by logarithms or nonuniform
quantizing steps allow coding of large signal values with fewer bits at
the expense of greater quantizing errors.
(4) Floating point: floating-point representation may reduce bit
requirements at the expense of lost precision. Block-floating-point
representation uses one scale factor or exponent for a block of
floating-point mantissas, and is commonly used in coding time-domain
signals. Floating point is a special case of nonuniform coding.
(5) Bit allocation: the receiver's demand for accuracy may vary with time,
signal content, strength, or frequency. For example, lower frequency
components of speech are usually more important for comprehension and
speaker recognition, and therefore should be transmitted with greater
accuracy than higher frequency components. Different criteria apply with
respect to music signals. Some general bit-allocation criteria are:
(a) Component variance: more bits are allocated to transform coefficients
with the greatest level of AC power.
(b) Component value: more bits are allocated to transform coefficients
which represent frequency bands with the greatest amplitude or energy.
(c) Psychoacoustic masking: fewer bits are allocated to signal components
whose quantizing errors are masked (rendered inaudible) by other signal
components. This method is unique to those applications where audible
signals are intended for human perception. Masking is understood best with
respect to single-tone signals rather than multiple-tone signals and
complex waveforms such as music signals.
SUMMARY OF THE INVENTION
It is an object of this invention to provide for the digital processing of
wideband audio information, particularly music, using an encode/decode
apparatus and method which provides high subjective sound quality at an
encoded bit rate as low as 128 kilobits per second (kbs).
It is a further object of this invention to provide such an encode/decode
apparatus and method suitable for the high-quality transmission or storage
and reproduction of music, wherein the quality of reproduction is
suitable, for example, for broadcast audio links.
It is a further object of the invention to provide a quality of
reproduction subjectively as good as that obtainable from Compact Discs.
It is a further object of the invention to provide such an encode/decode
apparatus and method embodied in a digital processing system having a high
degree of immunity against signal corruption by transmission paths.
It is yet a further object of the invention to provide such an
encode/decode apparatus and method embodied in a digital processing system
requiring a small amount of space to store the encoded signal.
Another object of the invention is to provide improved
psychoacoustic-masking techniques in a transform coder processing music
signals.
It is still another object of the invention to provide techniques for
psychoacoustically compensating for otherwise audible distortion artifacts
in a transform coder.
Further details of the above objects and still other objects of the
invention are set forth throughout this document, particularly in the
Detailed Description of the Invention, below.
In accordance with the teachings of the present invention, an encoder
provides for the digital encoding of wideband audio information. The
wideband audio signals are sampled and quantized into time-domain sample
blocks. Each sample block is then modulated by an analysis window.
Frequency-domain spectral components are then generated in response to the
analysis-window weighted time-domain sample block. A transform coder
having adaptive bit allocation nonuniformly quantizes each transform
coefficient, and those coefficients are assembled into a digital output
having a format suitable for storage or transmission. Error correction
codes may be used in applications where the transmitted signal is subject
to noise or other corrupting effects of the communication path.
Also in accordance with the teachings of the present invention, a decoder
provides for the high-quality reproduction of digitally encoded wideband
audio signals encoded by the encoder of the invention. The decoder
receives the digital output of the encoder via a storage device or
transmission path. It derives the nonuniformly coded spectral components
from the formatted digital signal and reconstructs the frequency-domain
spectral components therefrom. Time-domain signal sample blocks are
generated in response to frequency-domain spectral components by means
having characteristics inverse to those of the means in the encoder which
generated the frequency-domain spectral components. The sample blocks are
modulated by a synthesis window. The synthesis window has characteristics
such that the product of the synthesis-window response and the response of
the analysis-window in the encoder produces a composite response which
sums to unity for two adjacent overlapped sample blocks. Adjacent sample
blocks are overlapped and added to cancel the weighting effects of the
analysis and synthesis windows and recover a digitized representation of
the time-domain signal which is then converted to a high-quality analog
output.
Further in accordance with the teachings of the present invention, an
encoder/decoder system provides for the digital encoding and high-quality
reproduction of wideband audio information. In the encoder portion of the
system, the analog wideband audio signals are sampled and quantized into
time-domain sample blocks. Each sample block is then modulated by an
analysis window. Frequency-domain spectral components are then generated
in response to the analysis-window weighted time-domain sample block.
Nonuniform spectral coding, including adaptive bit allocation, quantizes
each spectral component, and those components are assembled into a digital
format suitable for storage or transmission over communication paths
susceptible to signal corrupting noise. The decoder portion of the system
receives the digital output of the encoder via a storage device or
transmission path. It derives the nonuniformly coded spectral components
from the formatted digital signal and reconstructs the frequency-domain
spectral components therefrom. Time-domain signal sample blocks are
generated in response to frequency-domain transform coefficients by means
having characteristics inverse to those of the means in the encoder which
generated the frequency-domain transform coefficients. The sample blocks
are modulated by a synthesis window. The synthesis window has
characteristics such that the product of the synthesis-window response and
the response of the analysis-window in the encoder produces a composite
response which sums to unity for two adjacent overlapped sample blocks.
Adjacent sample blocks are overlapped and added to cancel the weighting
effects of the analysis and synthesis windows and recover a digitized
representation of the time-domain signal which is then converted to a
high-quality analog output.
In an embodiment of the encoder of the present invention, a discrete
transform generates frequency-domain spectral components in response to
the analysis-window weighted time-domain sample blocks. Preferably, the
discrete transform has a function equivalent to the alternate application
of a modified Discrete Cosine Transform (DCT) and a modified Discrete Sine
Transform (DST). In an alternative embodiment, the discrete transform is
implemented by a single modified Discrete Cosine Transform (DCT), however,
virtually any time-domain to frequency-domain transform can be used.
In a preferred embodiment of the invention, a single FFT is utilized to
simultaneously calculate the forward transform for two adjacent signal
sample blocks in a single-channel system, or one signal sample block from
each channel of a two-channel system. In a preferred embodiment of the
invention for the decoder, a single FFT is utilized to simultaneously
calculate the inverse transform for two transform blocks.
In the preferred embodiments of the encoder and decoder, the sampling rate
is 44.1 kHz. While the sampling rate is not critical, 44.1 kHz is a
suitable sampling rate and it is convenient because it is also the
sampling rate used for Compact Discs. An alternative embodiment employs a
48 kHz sampling rate. In the preferred embodiment employing the 44.1 kHz
sampling rate, the nominal frequency response extends to 15 kHz and the
time-domain sample blocks have a length of 512 samples. In the preferred
embodiment of the invention, music coding at subjective quality levels
suitable for professional broadcasting applications may be achieved using
serial bit rates as low as 128 kBits per second (including overhead
information such as error correction codes). Other bit rates yielding
varying levels of signal quality may be used without departing from the
basic spirit of the invention.
In a preferred embodiment of the encoder, the nonuniform transform coder
computes a variable bit-length code word for each transform coefficient,
which code-word bit length is the sum of a fixed number of bits and a
variable number of bits determined by adaptive bit allocation based on
whether, because of current signal content, noise in the subband is less
subject to psychoacoustic masking than noise in other subbands. The fixed
number of bits are assigned to each subband based on empirical
observations regarding psychoacoustic-masking effects of a single-tone
signal in the subband under consideration. The assignment of fixed bits
takes into consideration the poorer subjective performance of the system
at low frequencies due to the greater selectivity of the ear at low
frequencies. Although masking performance in the presence of complex
signals ordinarily is better than in the presence of single tone signals,
masking effects in the presence of complex signals are not as well
understood nor are they as predictable. The system is not aggressive in
the sense that most of the bits are fixed bits and a relatively few bits
are adaptively assigned. This approach has several advantages. First, the
fixed bit assignment inherently compensates for the undesired distortion
products generated by the inverse transform because the empirical
procedure which established the required fixed bit assignments included
the inverse transform process. Second, the adaptive bit-allocation
algorithm can be kept relatively simple. In addition, adaptively-assigned
bits are more sensitive to signal transmission errors occuring between the
encoder and decoder since such errors can result in incorrect assignment
as well as incorrect values for these bits in the decoder.
The empirical technique for allocating bits in accordance with the
invention may be better understood by reference to FIG. 13 which shows
critical band spectra of the output noise and distortion (e.g., the noise
and distortion shown is with respect to auditory critical bands) resulting
from a 500 Hz tone (sine wave) for three different bit allocations
compared to auditory masking. The Figure is intended to demonstrate an
empirical approach rather than any particular data.
Allocation A (the solid line) is a reference, showing the noise and
distortion products produced by the 500 Hz sine wave when an arbitrary
number of bits are allocated to each of the transform coefficients.
Allocation B (the short dashed line) shows the noise and distortion
products for the same relative bit allocation as allocation A but with 2
fewer bits per transform coefficient. Allocation C (the long dashed line)
is the same as allocation A for frequencies in the lower part of the audio
band up to about 1500 Hz. Allocation C is then the same as allocation B
for frequencies in the upper part of the audio band above about 1500 Hz.
The dotted line shows the auditory masking curve for a 500 Hz tone.
It will be observed that audible noise is present at frequencies below the
500 Hz tone for all three cases of bit allocation due to the rapid fall
off of the masking curve: the noise and distortion product curves are
above the masking threshold from about 100 Hz to 300 or 400 Hz. The
removal of two bits (allocation A to allocation B) exacerbates the audible
noise and distortion; adding back the two bits over a portion of the
spectrum including the region below the tone, as shown in allocation C,
restores the original audible noise and distortion levels. Audible noise
is also present at high frequencies, but does not change as substantially
when bits are removed and added because at that extreme portion of the
audio spectrum the noise and distortion products created by the 500 Hz
tone are relatively low.
By observing the noise and distortion created in response to tones at
various frequencies for various bit allocations, bit lengths for the
various transform coefficients can be allocated that result in acceptable
levels of noise and distortion with respect to auditory masking throughout
the audio spectrum. With respect to the example in FIG. 13, in order to
lower the level of the noise and distortion products below the masking
threshold in the region from about 100 Hz to 300 to 400 Hz, additional
bits could be added to the reference allocation for the transform
coefficient containing the 500 Hz tone and nearby coefficients until the
noise and distortion dropped below the masking threshold. Similar steps
would be taken for other tones throughout the audio spectrum until the
overall transform-coefficient bit-length allocation resulted in acceptably
low audible noise in the presence of tones, taken one at a time,
throughout the audio spectrum. This is most easily done by way of computer
simulations. The fixed bit allocation assignment is then taken as somewhat
less by removing one or more bits from each transform coefficient across
the spectrum (such as allocation B). Adaptively allocated bits are added
to reduce the audible noise to acceptable levels in the problem regions as
required (such as allocation C). Thus, empirical observations regarding
the increase and decrease of audible noise with respect to bit allocation
such as in the example of FIG. 13 form the basis of the fixed and adaptive
bit allocation scheme of the present invention.
In a preferred embodiment of the encoder, the nonuniformly quantized
transform coefficients are expressed by a block-floating-point
representation comprised of block exponents and variable-length code
words. As described above, the variable-length code words are further
comprised of a fixed bit-length portion and a variable length portion of
adaptively assigned bits. The encoded signal for a pair of transform
blocks is assembled into frames composed of exponents and the fixed-length
portion of the code words followed by a string of all adaptively allocated
bits. The exponents and fixed-length portion of code words are assembled
separately from adaptively allocated bits to reduce vulnerability to noise
burst errors.
Unlike many coders in the prior art, an encoder conforming to this
invention need not transmit side information regarding the assignment of
adaptively allocated bits in each frame. The decoder can deduce the
correct assignment by applying the same allocation algorithm to the
exponents as that used by the encoder.
In applications where frame synchronization is required, the encoder
portion of the invention appends the formatted data to frame
synchronization bits. The formatted data bits are first randomized to
reduce the probability of long sequences of bits with values of all ones
or zeroes. This is necessary in many environments such as T-1 carrier
which will not tolerate such sequences beyond specified lengths. In
asynchronous applications, randomization also reduces the probability that
valid data within the frame will be mistaken for the block synchronization
sequence. In the decoder portion of the invention, the formatted data bits
are recovered by removing the frame synchronization bits and applying an
inverse randomization process.
In applications where the encoded signal is subject to corruption, error
correction codes are utilized to protect the most critical information,
that is, the exponents and possibly the fixed portions of the
lowest-frequency coefficient code words. Error codes and the protected
data are scattered throughout the formatted frame to reduce sensitivity to
noise burst errors, i.e., to increase the length of a noise burst required
before critical data cannot be corrected.
The various features of the invention and its preferred embodiments are set
forth in greater detail in the following Detailed Description of the
Invention and in the accompanying drawings.
BRIEF DESCRIPTION OF DRAWINGS
FIGS. 1a and 1b are functional block diagrams illustrating the basic
structure of the invention.
FIGS. 2a through 2e are block diagrams showing the hardware architecture
for one embodiment of the invention.
FIGS. 3a and 3b are block diagrams showing in greater detail the
serial-communications interface of the processor for a two-channel
embodiment of the invention.
FIG. 4 is a hypothetical graphical representation showing a time-domain
signal sample block.
FIG. 5 is a further hypothetical graphical representation of a time-domain
signal sample block showing discontinuites at the edges of the sample
block caused by a discrete transform assuming the signal within the block
is periodic.
FIG. 6a is a functional block diagram showing the modulation of a function
X(t) by a function W(t) to provide the resulting function Y(t).
FIGS. 6b through 6d are further hypothetical graphical representations
showing the modulation of a time-domain signal sample block by an analysis
window.
FIG. 7 is a flow chart showing the high level logic for the nonuniform
quantizer utilized in the invention.
FIG. 8 is a flow chart showing more detailed logic for the adaptive bit
allocation process utilize | | |