|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates to audio compression and decompression
systems.
BACKGROUND OF THE INVENTION
While digital audio recordings provide many advantages over analog systems,
the data storage requirements for high-fidelity recordings are
substantial. A high fidelity recording typically requires more than one
million bits per second of playback time. The total storage needed for
even a short recording is too high for many computer applications. In
addition, the digital bit rates inherent in non-compressed high fidelity
audio recordings makes the transmission of such audio tracks over limited
bandwidth transmission systems difficult. Hence, systems for compressing
audio sound tracks to reduce the storage and bandwidth requirements are in
great demand.
One class of prior art audio compression systems divide the sound track
into a series of segments. Over the time interval represented by each
segment, the sound track is analyzed to determine the signal components in
each of a plurality of frequency bands. The measured components are then
replaced by approximations requiring fewer bits to represent, but which
preserve features of the sound track that are important to a human
listener. At the receiver, an approximation to the original sound track is
generated by reversing the analysis process with the approximations in
place of the original signal components.
The analysis and synthesis operations are normally carried out with the aid
of perfect, or near perfect, reconstruction filter banks. The systems in
question include an analysis filter bank which generates a set of
decimated subband outputs from a segment of the sound track. Each
decimated subband output represents the signal in a predetermined
frequency range. The inverse operation is carried out by a synthesis
filter bank which accepts a set of decimated subband outputs and generates
therefrom a segment of audio sound track. In practice, the synthesis and
analysis filter banks are implemented on digital computers which may be
general purpose computers or special computers designed to more
efficiently carry out the operations. If the analysis and synthesis
operations are carried out with sufficient precision, the segment of audio
sound track generated by the synthesis filter bank will match the original
segment of audio sound track that was inputted to the analysis filter
bank. The differences between the reconstructed audio sound track and the
original sound track can be made arbitrarily small. In this case, the
specific filter bank characteristics such as the length of the segment
analyzed, the number of filters in the filter bank, and the location and
shape of filter response characteristics would be of little interest,
since any set of filter banks satisfying the perfect, or near-perfect,
reconstruction condition would exactly regenerate the audio segment.
Unfortunately, the replacement of the frequency components generated by the
analysis filter bank with a quantized approximation thereto results in
artifacts that do depend on the detail characteristics of the filter
banks. There is no single segment length for which the artifacts in the
reconstructed audio track can be minimized. Hence, the length of the
segments analyzed in prior art systems is chosen to be a compromise. When
the frequency components are replaced by approximations, an error is
introduced in each component. An error in a given frequency component
produces an acoustical effect which is equivalent to the introduction of a
noise signal with frequency characteristics that depend on filter
characteristics of the corresponding filter in the filter bank. The noise
signal will be present over the entire segment of the reconstructed sound
track. Hence, the length of the segments is reflected in the types of
artifacts introduced by the approximations. If the segment is short, the
artifacts are less noticeable. Hence, short segments are preferred.
However, if the segment is too short, there is insufficient spectral
resolution to acquire information needed to properly determine the minimum
number of bits needed to represent each frequency component. On the other
hand, if the segment is too long, temporal resolution of the human
auditory system will detect artifacts.
Prior art systems also utilize filter banks in which the frequency bands
are uniform in size. Systems with a few (16-32) sub-bands in a 0-22 kHz
frequency range are generally called "subband coders" while those with a
large number of sub-bands (.gtoreq.64) are called "transform coders". It
is known from psychophysical studies of the human auditory system that
there are critical bandwidths which vary with frequency. The information
in a critical band may be approximated by a component representing the
time averaged signal amplitude in the critical band.
In addition, the ear's sensitivity to a noise source in the presence of a
localized frequency component such as a sine tone depends on the relative
levels of the signals and on the relation of the noise spectral components
to the tone. The errors introduced by approximating the frequency
components may be viewed as "noise". The noise becomes significantly less
audible if its spectral energy is within one critical bandwidth of the
tone. Hence, it is advantageous to use frequency decompositions which
approximate the critical band structure of the auditory system.
Systems which utilize uniform frequency bands are poorly suited for systems
designed to take advantage of this type of approximation. In principle,
each audio segment can be analyzed to generate a large number of uniform
frequency bands, and then, several bands at the higher frequencies could
be merged to provide a decomposition into critical bands. This approach
imposes the same temporal constraints on all frequency bands. That is, the
time window over which the low frequency data is generated for each band
is the same as the time window over which each high-frequency band is
generated. To provide accuracy in the low frequency ranges, the time
window must be very long. This leads to temporal artifacts that become
audible at higher frequencies. Hence, systems in which the audio segment
is decomposed into uniform sub-bands with adequate low-frequency
resolution cannot take full advantage of the critical band properties of
the auditory system.
Prior art systems that recognize this limitation have attempted to solve
the problem by utilizing analysis and synthesis filter banks based on QMF
filter banks that analyze a segment of an audio sound track to generate
frequency components in two frequency bands. To obtain a decomposition of
the segment into frequency components representing the amplitudes of the
signal in critical bands, these two frequency band QMF filters are
arranged in a tree-structured configuration. That is, each of the outputs
of the first level filter becomes the input to another filter bank at
least one of whose two outputs is fed to yet another level, and so on. The
leaf nodes of this tree provide an approximation to a critical band
analysis of the input audio track. It can be shown that this type of
filter bank used different length audio segments to generate the different
frequency components. That is, a low frequency component represents the
signal amplitude in an audio segment that is much longer than a
high-frequency component. Hence, the need to choose a single compromise
audio segment length is eliminated.
While tree structured filter banks having many layers may be used to
decompose the frequency spectrum into critical bands, such filter banks
introduce significant aliasing artifacts that limit their utility. In a
multilevel filter bank, the aliasing artifacts are expected to increase
exponentially with the number of levels. Hence, filter banks with large
numbers of levels are to be avoided. Unfortunately, filter banks based on
QMF filters which divide the signal into two bandlimited signals require
large numbers of levels.
Prior art audio compression systems are also poorly suited to applications
in which the playback of the material is to be carried out on a digital
computer. The use of audio for computer applications is increasingly in
demand. Audio is being integrated into multimedia applications such as
computer based entertainment, training, and demonstration systems. Over
the course of the next few years, many new personal computers will be
outfitted with audio playback and recording capability. In addition,
existing computers will be upgraded for audio with the addition of plug-in
peripherals.
Computer based audio and video systems have been limited to the use of
costly outboard equipment such as an analog laser disc player for playback
of audio and video. This has limited the usefulness and applicability of
such systems. With such systems it is necessary to provide a user with a
highly specialized playback configuration, and there is no possibility of
distributing the media electronically. However, personal computer based
systems using compressed audio and video data promise to provide
inexpensive playback solutions and allow distribution of program material
on digital disks or over a computer network.
Until recently, the use of high quality audio on computer platforms has
been limited due to the enormous data rate required for storage and
playback. Quality has been compromised in order to store the audio data
conveniently on disk. Although some increase in performance and some
reduction in bandwidth has been gained using conventional audio
compression methods, these improvements have not been sufficient to allow
playback of high fidelity recordings on the commonly used computer
platforms without the addition of expensive special purpose hardware.
One solution to this problem would be to use lower quality playback on
computer platforms that lack the computational resources to decode
compressed audio material at high fidelity quality levels. Unfortunately,
this solution requires that the audio material be coded at various quality
levels. Hence, each audio program would need to be stored in a plurality
of formats. Different types of users would then be sent the format suited
to their application. The cost and complexity of maintaining such
multi-format libraries makes this solution unattractive. In addition, the
storage requirements of the multiple formats partially defeats the basic
goal of reducing the amount of storage needed to store the audio material.
Furthermore, the above discussion assumes that the computational resources
of a particular playback platform are fixed. This assumption is not always
true in practice. The computational resources of a computing system are
often shared among a plurality of applications that are running in a
time-shared environment. Similarly, communication links between the
playback platform and shared storage facilities also may be shared. As the
playback resources change, the format of the audio material must change in
systems utilizing a multi-format compression approach. This problem has
not been adequately solved in prior art systems.
Broadly, it the object of the present invention to provide an improved
audio compression system.
It is a further object of the present invention to provide an audio
compression system which utilizes a frequency decomposition system that
has good frequency resolution at low frequencies and good temporal
resolution at high frequencies without utilizing tree structured filter
banks having large numbers of levels.
It is yet another object of the present invention to provide an audio
compression system that allows the compressed material to played back on a
variety of playback platforms with different computational capabilities
without maintaining multiple copies of the compressed material.
It is a still further object of the present invention to provide an audio
compression system in which the bandwidth needed to transmit the audio
material may be varied in response to changes in the available bandwidth.
These and other objects of the present invention will become apparent from
the following detailed description of the invention and the accompanying
drawings.
SUMMARY OF THE INVENTION
The present invention comprises audio compression and decompression
systems. An audio compression system according to the present invention
converts an audio signal into a series of sets of frequency components.
Each frequency component represents an approximation to the audio signal
in a corresponding frequency band over a time interval that depends on the
frequency band. The received audio signal is analyzed in a tree-structured
sub-band analysis filter. The sub-band analysis filter bank comprises a
tree-structured array of sub-band filters, the audio signal forming the
input of the root node of the tree-structured array and the frequency
components being generated at the leaf nodes of the tree-structured array.
Each of the sub-band filter banks comprises a plurality of FIR filters
having a common input for receiving an input audio signal. Each filter
generates an output signal representing the input audio signal in a
corresponding frequency band, the number of FIR filters in at least one of
the sub-band filter bank is greater than two, and the number of said FIR
filters in at least one of the sub-band filters is different than the
number of FIR filters in another of the sub-band filters. The frequency
components generated by the sub-band analysis filter are then quantized
using information about the masking features of the human auditory system.
A decompression system according to the present invention regenerates a
time-domain audio signal from the sets of frequency components such as
those generated by a compression system according to the present
invention. The decompression system receives a compressed audio signal
comprising sets of frequency components, the number of frequency
components in each set being M. The decompression apparatus synthesizes M
time domain audio signal values from each of the received set of frequency
components. The synthesis sub-system generates 2M polyphase components
from the set of frequency components. Then it generates a W entry array
from the polyphase phase components and multiples each entry in the array
by a corresponding weight value derived from a prototype filter. The time
domain audio samples are then generated from the weighted array. The
generated samples are stored in a FIFO buffer and outputted to a D/A
converter. The FIFO buffer generates a signal indicative of the number of
time domain audio signal values stored therein. The rate at which these
sample values are outputted to the D/A converters is determined by clock.
The preferred embodiment of the decompression system includes a controller
that uses the level indicator in the FIFO buffer or other operating system
loading parameter to adjust the computational complexity of the algorithm
used to synthesize the time domain samples. When the level indicator
indicates that the number of time domain samples stored in the FIFO buffer
is less than a first predetermined value, the normal synthesis operation
is replaced by one that generates an approximation to the time domain
samples. This approximation requires a smaller number of computations than
would be required to generate the time domain audio signal values. The
approximation may be generated by substituting a truncated or shorter
prototype filter or by eliminating the contributions of selected frequency
components from the computation of the polyphase components. In
stereophonic systems, the controller may also switch the synthesis system
to a monaural mode based on average frequency components which are
obtained by averaging corresponding frequency components for the left and
right channels.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of an audio compression system.
FIG. 2 is a block diagram of a sub-band decomposition filter according to
the present invention.
FIG. 3 illustrates the relationship between the length of the segment of
the original audio signal used to generate the frequency of each sub-band
and the bandwidth of each band.
FIG. 4 illustrates the relationship between successive overlapping segments
of an audio signal.
FIG. 5(a) is a block diagram of an audio filter based on a low-frequency
filter and a modulator.
FIG. 5(b) is a block diagram of a sub-band analysis filter for generating a
set of frequency components.
FIG. 6 illustrates the manner in which a sub-band analysis filter may be
utilized to obtain the frequency information needed for psycho-acoustical
analysis of the audio signal prior to quantization.
FIG. 7 is a block diagram of an audio decompression system for
decompressing the compressed audio signals generated by a compression
system.
FIG. 8 is a block diagram of a synthesizer according to the present
invention.
FIG. 9 is a block diagram of an audio decompression system utilizing the
variable computational load techniques of the present invention.
FIG. 10 is a block diagram of a stereophonic decompression system according
to the present invention.
FIG. 11 is a block diagram of a stereophonic decompression system according
to the present invention using a serial computation system.
FIG. 12 is a block diagram of an audio compression apparatus utilizing
variable computational complexity.
DETAILED DESCRIPTION OF THE INVENTION
The manner in which the present invention obtains its advantages over prior
an audio compression systems may be more easily understood with reference
to the manner in which a conventional audio compression system operates.
FIG. 1 is a block diagram of an audio compression system 10 using a
conventional sub-band analysis system. The audio compression system
accepts an input signal 11 which is divided into a plurality of segments
19. Each segment is analyzed by a filter bank 12 which provides the
frequency components for the segment. Each frequency component is a time
average of the amplitude of the signal in a corresponding frequency band.
The time average is, in general, a weighted average. The frequencies of
the sub-bands are uniformly distributed between a minimum and maximum
value which depend on the number of samples in each segment 19 and the
rate at which samples are taken. The input signal is preferably digital in
nature; however, it will be apparent to those skilled in the art that an
analog signal may be used by including an analog-to-digital converter
prior to filter bank 12.
The component waveforms generated by filter bank 12 are replaced by digital
approximations by quantizer 14. The number of bits assigned to each
amplitude is determined by a psycho-acoustic analyzer 16 which utilizes
information about the auditory system to minimize the distortions
introduced by the quantization. The quantized frequency components are
then further coded by coder 18 which makes use of the redundancy in the
quantized components to further reduce the number of bits needed to
represent the coded coefficients. Coder 18 does not introduce further
errors into the frequency components. Coding algorithms are well known to
those skilled in the signal compression arts, and hence, will not be
discussed in more detail here.
The quantization process introduces errors into the frequency coefficients.
A quantization scheme replaces the amplitude of each frequency component
by an integer having a finite precision. The number of bits used to
represent the integers will be denoted by P. The integers in question are
then transmitted in place of the individual frequency components. At the
receiver, the inverse of the mapping used to assign the integer values to
the frequency components is used to produce amplitudes that are used in
place of the original amplitudes for the frequency components. There are
at most 2.sup.P distinct values that can be represented; hence, if there
are more than 2.sup.P different frequency component values, at least some
of the frequency components will not be exactly recovered. The goal of the
quantization algorithm is to minimize the overall effect of the
quantization errors on the listener.
The errors introduced by the quantization algorithm affect the
reconstructed audio track for a time period equal to the length of the
segment analyzed to calculate the frequency components. The artifacts
introduced by these errors are particularly noticeable in regions of the
audio track in which the sound increases or decreases in amplitude over a
period of time which is short compared to the length of the segments being
analyzed. Because of the rapid rise, the set of frequency components of
audio track in the segment will have a number of high-frequency components
of significant amplitude which are not present in the segments on either
side of the segment in question. Consider a quantization error in one of
these high-frequency components. The error is equivalent to adding noise
to the original signal. The amplitude of the noise will be determined by
the quantization error. This noise will be present for the entire length
of the segment in the reconstructed audio track. The noise resulting from
the quantization error commences at the boundary of the segment even
though the attack begins in the middle of the segment. The amplitude of
the noise in the early part of segment may be of the same order of
magnitude as the reconstructed audio track; hence, the noise will be
particularly noticeable. Since the noise precedes the actual rise in
intensity of the audio track, it is perceived as a "pre-echo". If the
segment duration is long compared to the rise time of the audio signal,
the pre-echo is particularly noticeable. Hence, it would be advantageous
to choose filter bands in which the high-frequency components are
calculated from segments that are shorter than those used to calculate the
low-frequency components. This arrangement avoids the situation in which
the segment used to compute high-frequency components is long compared to
the rate of change of the component being computed.
Low bit rate audio compression systems operate by distributing the noise
introduced by quantization so that it is masked by the signal. The ear's
sensitivity to a noise source in the presence of a localized frequency
component such as a sine tone depends on the relative levels of the
signals and on the relation of the noise spectral components to the tone.
The noise becomes significantly less audible if its spectral energy is
within one critical bandwidth of the tone. Hence, it would be advantageous
to choose filter bands that more closely match the critical bands of the
human auditory system.
The present invention utilizes a filter bank in which different frequency
bands utilize different segment lengths. In prior art systems, each
segment is analyzed in a bank of finite impulse response filters. The
number of samples in the input segment over which each frequency component
is computed is the same. The present invention uses different width
segments for different frequency components. Ideally, an audio
decomposition should exhibit a time and frequency dependency similar to
that of human hearing. This may be accomplished by relating the frequency
divisions or sub-bands of the decomposition to the critical bandwidths of
human hearing. The resulting decomposition has fine frequency resolution
with relatively poor temporal resolution at low frequencies, and coarse
frequency resolution with fine temporal resolution at high frequencies. As
a result, the segment length corresponding to high-frequency components
does not greatly exceed the rise time of attacks in the audio track. This
reduces the pre-echo artifacts discussed above.
In one embodiment of the present invention, a tree structured decomposition
which approximates the ear's time and frequency sensitivity is utilized.
This filter may be used to replace sub-band analysis filter bank 12 shown
in FIG. 1. A block diagram of a sub-band decomposition filter for carrying
out this decomposition is shown at 30 in FIG. 2. Filter 30 includes two
levels of filter banks. The manner in which the filter banks are
constructed will be discussed in more detail below. For the purposes of
the present discussion, it is important to note that the decomposition is
carried out with only two levels of filters, and hence, avoids the
aliasing problems inherent in QMF filter banks that require many levels.
The aliasing problems encountered with QMF filter banks become significant
when the number of levels exceeds 4.
The first level of filter 30 consists of a filter bank 31 which divides the
input signal into eight sub-bands of equal size. The second level
sub-divides the lowest three frequency bands from filter bank 31 into
finer sub-divisions. The second level consists of three filter banks
32-34. Filter bank 32 divides the lowest sub-band from filter bank 31 into
8 equal sub-bands. Filter bank 33 and filter bank 34 divide the second and
third sub-bands created by filter bank 31 into four sub-bands. The
combination of the two levels generates 21 frequency sub-bands. The
relationship between the length of the segment of the original audio
signal used to generate the frequency and phase of each sub-band and the
bandwidth of each band is shown schematically in FIG. 3. The lower
frequencies, bands 1-8, have the finest frequency resolution, but the
poorest temporal resolution. The highest frequencies, bands 17-21, have
the poorest frequency resolution, but the finest time resolution. This
arrangement more nearly approximates the cat's sensitivity than systems
utilizing filter banks in which all bands have the same temporal
resolution, while avoiding the aliasing problems inherent in
tree-structured filters having many levels of filters.
While quantization errors in each of the amplitudes still introduces noise,
the noise spectrum obtained with this embodiment of the present invention
is less objectionable to a human listener than that obtained with prior
art systems. As noted above, prior an systems tend to have a noise
spectrum which changes abruptly at the segment boundaries. In the present
invention, the amplitude of the quantization noise can switch more rapidly
at higher frequencies. If the length of the low frequency segments is
denoted by T, then the medium frequencies are measured on segments that
are T/2, and the highest frequencies are measured on segments that are T/8
in length. The quantization noise is the sum of all of the quantization
errors in all of the frequency bands. As a result, the quantization noise
changes every T/8. To obtain the same resolution in the low frequency
components, a conventional filter system would measure all of the
frequency components on segments of length T. Hence, the prior art would
introduce quantization noise which changes abruptly every T samples. The
present invention introduces a more gradual change in the noise level in
the T/8 interval for the high and medium sub-bands thus giving less
perceptible distortion at higher frequencies.
The manner in which the input signal is divided into segments can effect
the quality of the regenerated audio signal. Consider the case in which
the signal is analyzed on segments that do not overlap. This analysis is
equivalent to employing a model in which the regenerated signal is
produced by summing the signals of a number of harmonic oscillators whose
amplitudes remain constant over the duration of the segment on which each
amplitude was calculated. In general, this model is a poor approximation
to an actual audio track. In general, the amplitudes of the various
frequency components would be expected to change over the duration of the
segments in question. Models that do not take this change into account
will have significantly greater distortions than models in which the
amplitudes can change over the duration of the segment, since there will
be abrupt changes in the amplitudes of the frequency components at each
segment boundary.
One method for reducing the discontinuities in the frequency component
amplitudes at the segment boundaries is to employ a sub-band analysis
filter that utilizes overlapping segments to generate successive frequency
component amplitudes. The relationship of the segments is shown in FIG. 4
for a signal 301. The sub-band analysis filter generates M frequency
components for signal 301 for each M signal values. However, each
frequency component is generated over a segment having a duration much
greater than M. Each component is generated over a segment having a length
of W sample values, where W>M. Typical segments are shown at 312 and 313.
It should be noted that successive segments overlap by (W-M) samples.
In the preferred embodiment of the present invention, the various frequency
bands in a sub-band analysis filter bank have the same shape but are
shifted relative to one another. This arrangement guarantees that all
frequency bands have the same aliasing properties. Such a filter bank can
be constructed from a single low frequency band pass filter having the
desired band shape. The manner in which the various filter bands are
constructed is most easily understood with reference to FIG. 5(a) which is
a block diagram of a single filter constructed from a low-frequency
bandpass filter 377 and a mixer 376. Assume that the low-pass filter 377
has a center frequency of Fc and that the desired center frequency of
filter 350 is to be F. Then by shifting the input audio signal by a
frequency of F-Fc prior to analyzing the signal with low-frequency
bandpass filter 377, the output of low-frequency bandpass filter 377 will
be the amplitude of the audio signal in a band having a center frequency
of F. Modulator 376 accomplishes this frequency shift.
A filter bank can then be constructed from a single prototype low-frequency
bandpass filter by using different modulation frequencies to shift the
incoming audio signal prior to analysis by the prototype filter. While
such a filter bank can be constructed from analog circuit components, it
is difficult to obtain filter performance of the type needed. Hence, the
preferred embodiment of the present invention utilizes digital filter
techniques.
A block diagram of a sub-band analysis filter 350 for generating a set of M
frequency components, S.sub.i, from a W sample window is shown in FIG.
5(b). The M audio samples are clocked into a W-sample shift register 320
by controller 325. The oldest M samples in shift register 320 are shifted
out the end of the shift register and discarded. The contents of the shift
register are then used to generate 2M polyphase components P.sub.k, for
k=0 to 2M-1. The polyphase components are generated by a windowing
operation followed by partial summation. The windowing operation generates
a W- component array Z.sub.i from the contents of shift register 320 by
multiplying each entry in the shift register by a corresponding weight,
i.e.,
Z.sub.i =h.sub.i *x.sub.i (1)
where the x.sub.i, for i=0 . . . W-1 are the values stored in shift
register 320, and the h.sub.i are coefficients of a low pass prototype
filter which are stored in controller 325. For those wishing a more
detailed explanation of the process for generating sets of filter
coefficients, see J. Rothweiler, "POLYPHASE QUADRATURE FILTERS--A NEW
SUB-BAND CODING TECHNIQUE" IEEE Proceedings of the 1983 ICASSP Conference,
pp 1280-1283. The polyphase components are then generated from the Z.sub.i
by the following summing operations:
##EQU1##
The frequency components, S.sub.i, are obtained via the following matrix
multiplication from the polyphase components
##EQU2##
This operation is equivalent to passing the polyphase components through M
finite impulse response filters of length 2M. The cosine modulation of the
polyphase components shown in Eq. (3a) may be replaced by other such
modulation terms. The form shown in Eq. (3a) leads to near-perfect
reconstruction. An alternative modulation scheme which allows for perfect
reconstruction is as follows:
##EQU3##
It can be seen by comparison to FIG. 5(a) that the matrix multiplication
provides an operation analogous to the modulation of the incoming audio
signal. The windowing operation performs the analysis with the prototype
low-frequency filter.
As will be discussed in more detail below, the computational workload in
analyzing and synthesizing audio tracks are of a great importance in
providing systems that can operate on general purpose computing platforms.
It will be apparent from the above discussion that the computational
workload inherent in generating M frequency components from a window of W
audio sample values is approximately (W+2M.sup.2) multiplies and adds. In
this regard, it should be noted that a two level filter bank of the type
used in the present invention significantly reduces the overall
computational workload even in situations in which the frequency spectrum
is to be divided into uniform bands. For example, consider a system in
which the frequency spectrum is to be divided into 64 bands utilizing a
window of 512 samples. If a prior art one level filter bank is utilized,
the workload will be approximately 8,704 multiplies and adds. If the
filter bank is replaced by a two level filter bank according to the
present invention, then the filter bank will consist of 9 filter banks,
each dividing the frequency spectrum into 8 bands. The computational
workload inherent in this arrangement is only 5,760 multiplies and adds.
Hence, a filter bank according to the present invention typically requires
less computational capability than a one level filter bank according to
the prior art. In addition, a filter bank according to the present
invention also provides a means for providing a non-uniform band
structure.
The transformation of the audio signal into sets of frequency components as
described above does not, in itself, result in a decrease in the number of
bits needed to represent the audio signal. For each M audio samples
received by a sub-band analysis filter, M frequency components are
generated. The actual signal compression results from the quantization of
the frequency components. As noted above, the number of bits that must be
allocated to each frequency component is determined by a phenomena known
as "masking". Consider a tone at a frequency f. The ability of the ear to
detect a signal at frequency f' depends on the energy in the tone and
difference in frequency between the signal and the tone, i.e., (f-f').
Research in human hearing has led to measurements of a threshold function
T(E,f,f') which measures the minimum energy at which the second frequency
component can be detected in the presence of the first frequency component
with energy E. In general, the threshold function will vary in shape with
frequency.
The threshold function is used to construct a masking function as follows.
Consider a segment of the incoming audio signal. Denote the energy as a
function of frequency in this segment by E(f). Then a mask level, L(f), is
constructed by convolving E(f) and T(f,f'), i.e.,
L(f)=.intg.T(E(f')f,f')E(f')df' (4)
Consider the filtered signal value in a band f.sub.0 .+-..DELTA.f. Denote
the minimum value of L in this frequency band by L.sub.mi. It should be
noted that L.sub.min may depend on frequency components outside the band
in question, since a peak in an adjacent band may mask a signal in the
band in question.
According to the masking model, any noise in this frequency band that has
an energy less than L.sub.min will not be perceived by the listener. In
particular, the noise introduced by replacing the measured signal
amplitude in this band by a quantized approximation therefore will not be
perceived if the quantization error is less than L.sub.min. The noise in
question will be less than L.sub.min if the signal amplitude is quantized
to accuracy equal to S/L.sub.min, where S is the energy of the signal in
the band in question.
The above-described quantization procedure requires a knowledge of
frequency spectrum of the incoming audio signal at a resolution which is
significantly greater than that of the sub-analysis of the incoming
signal. In general, the minimum value of the mask function L will depend
on the precise location of any peaks in the frequency spectrum of the
audio signal. The signal amplitude provided by the sub-band analysis
filter measures the average energy in the frequency band; however, it does
not provide any information about the specific location of any spectral
peaks within the band.
Hence, a more derailed frequency analysis of the incoming audio signal is
required. This can be accomplished by defining a time window about each
filtered signal component and performing a frequency analysis of the audio
samples in this window to generate an approximation to E(f). In prior art
systems, the frequency analysis is typically performed by calculating a
FFT of the audio samples in the time window.
In one embodiment of a quantization sub-component according to the present
invention, this is accomplished by further subdividing each sub-band using
another layer of filter banks. The output of each of the sub-band filters
in the analysis filter bank is inputted to another sub-band analysis
filter which splits the original sub-band into a plurality of finer
sub-bands. These finer sub-bands provide a more detailed spectral
measurement of the audio signal in the frequency band in question, and
hence, can be used to compute the overall mask function L discussed above.
While a separate L.sub.min value may be calculated for each filtered signal
value from each sub-band filter, the preferred embodiment of the present
invention operates on blocks of filtered signal values. If a separate
quantization step size is used for each filtered value, then the step size
would need to be communicated with each filtered value. The bits needed to
specify the step size reduce the degree of compression. To reduce this
"overhead", a block of samples is quantized using the same step size. This
approach reduces the number of overhead bits/sample, since the step size
need only be communicated once. The blocks of filtered samples utilized
consist of a sequential set of filtered signa | | |