|
|
|
| United States Patent | 5479562 |
| Link to this page | http://www.wikipatents.com/5479562.html |
| Inventor(s) | Fielder; Louis D. (Millbrae, CA);
Davidson; Grant A. (Oakland, CA) |
| Abstract | The invention relates to formatting encoded audio information in a form
suitable for transmission or storage. Audio information is encoded into a
binary form, using an invariant number of bits to represent at least some
but not all of the encoded information. The information represented by an
invariant number of bits is assembled into pre-established positions
within a formatted frame. |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 5479562 |
|
|
Method and apparatus for encoding and decoding audio information |
|
|
|
|
|
| Publication Date |
*
December 26, 1995 |
|
|
|
|
|
| Filing Date |
June 18, 1993 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is:
a continuation-in-part of application Ser. No. 07/582,956 filed Sep. 26,
1990, now in issue as U.S. Pat. No. 5,222,189, which is a
continuation-in-part of application Ser. No. 07/439,868 filed Nov. 20,
1989, now abandoned, which was a continuation-in-part of application Ser.
No. 07/303,714 filed Jan. 27, 1989, now abandoned; and
a continuation-in-part of application Ser. No. 07/787,665 filed Nov. 4,
1991, now in issue as U.S. Pat. No. 5,230,038, which is a divisional of
application Ser. No. 07/458,894 filed Dec. 29, 1989, now U.S. Pat. No.
5,109,417, which was a continuation-in-part of application Ser. No.
07/303,714 filed Jan. 27, 1989, now abandoned. |
|
|
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
We claim:
1. An encoder for the encoding of audio information comprising signal
samples, said encoder comprising
means for receiving said signal samples,
subband means, including adaptive bit allocation means, for generating
subband information comprising digital words in response to said signal
samples, wherein said adaptive bit allocation means allocates an adaptive
number of bits to represent at least a portion of at least some of said
digital words, and wherein at least a portion of some of said digital
words is represented by an invariant number of bits, and
formatting means for assembling digital information including said digital
words into a digital output having a format suitable for transmission or
storage, wherein said formatting means places said portion of digital
words represented by an invariant number of bits into one or more
pre-established positions within a frame of said digital output.
2. An encoder according to claim 1 wherein said formatting means assembles
said digital information such that said portion of digital words placed
into said one or more pre-established positions is placed ahead of other
portions of said digital words within said frame.
3. An encoder according to claim 1 wherein said formatting means places
said portion of digital words represented by an invariant number of bits
into adjacent positions within said frame.
4. A decoder of a formatted signal comprising digital words, said decoder
comprising
deformatting means for obtaining said digital words from said formatted
signal, wherein at least a portion of some of said digital words are
represented by an invariant number of bits and at least a portion of at
least some of said digital words are represented by an adaptive number of
bits, said portion of digital words represented by an invariant number of
bits is obtained from one or more pre-established positions within a frame
of said formatted signal, and said portion of digital words represented by
an adaptive number of bits is obtained from one or more positions within
said frame established in response to said portion of subband signals
represented by an invariant number of bits,
inverse subband means for generating signal samples in response to said
digital words, and
means for sending said signal samples.
5. A decoder according to claim 4 wherein said deformatting means obtains
said portion of digital words from said one or more pre-established
positions which are ahead of other portions of said digital words within
said frame.
6. A decoder according to claim 4 wherein said deformatting means obtains
said portion of digital words represented by an invariant number of bits
from adjacent positions within said frame.
7. An encoder for the encoding of audio information comprising signal
samples, said encoder comprising
means for receiving said signal samples,
subband means, including adaptive bit allocation means, for generating
subband information comprising digital words in response to said signal
samples, wherein said adaptive bit allocation means allocates an adaptive
number of bits to represent at least a portion of at least some of said
digital words, and wherein at least a portion of some of said digital
words is represented by an invariant number of bits, and
formatting means for assembling digital information including said digital
words into a digital output having a format suitable for transmission or
storage, wherein said formatting means places said portion of digital
words represented by an invariant number of bits into adjacent positions
within a frame of said digital output.
8. An encoder according to claim 7 wherein said formatting means assembles
said digital information such that said portion of digital words placed
into said adjacent positions is placed ahead of other portions of said
digital words within said frame.
9. An encoder according to any one of claims 1, 2, 7 or 8 wherein said
digital words comprise scaling factors and scaled values, and wherein said
portion of digital words to which an adaptive number of bits is allocated
constitutes at least a portion of at least some of said scaled values.
10. A decoder of a formatted signal comprising digital words, said decoder
comprising
deformatting means for obtaining said digital words from said formatted
signal, wherein at least a portion of some of said digital words are
represented by an invariant number of bits and at least a portion of at
least some of said digital words are represented by an adaptive number of
bits, said portion of digital words represented by an invariant number of
bits is obtained from one or more adjacent positions within a frame of
said formatted signal, and said portion of digital words represented by an
adaptive number of bits is obtained from one or more positions within said
frame established in response to said portion of subband signals
represented by an invariant number of bits,
inverse subband means for generating signal samples in response to said
digital words, and
means for sending said signal samples.
11. A decoder according to claim 10 wherein said deformatting means obtains
said portion of digital words from said one or more adjacent positions
which are ahead of other portions of said digital words within said frame.
12. A decoder according to any one of claims 4, 5, 10 or 11 wherein said
digital words comprise scaling factors and scaled values, and wherein said
portion of digital words represented by an adaptive number of bits
constitutes at least a portion of at least some of said scaled values.
13. An encoder for the encoding of audio information comprising signal
samples, said encoder comprising
means for receiving said signal samples,
subband means for generating, in response to said signal samples, subband
signals comprising scaling factors and associated scaled values, and
formatting means for assembling digital information including said subband
signals into a digital output having a format suitable for transmission or
storage, wherein said formatting means places said scaling factors into
adjacent positions within a frame of said digital output.
14. An encoder according to claim 13 wherein said subband means generates
subband signals comprising one or more blocks of scaled values, each block
comprising one or more scaled values associated with a respective scaling
factor.
15. An encoder according to claim 14 wherein said subband means generates
subband signals comprising two or more sets of respective blocks of scaled
values, wherein a respective block of scaled values in each set is
associated with a respective common scaling factor.
16. An encoder according to claim 13 wherein said subband means generates
subband signals comprising two or more sets of respective scaled values,
wherein a respective scaled value in each set is associated with a
respective common scaling factor.
17. An encoder for the encoding of audio information comprising signal
samples, said encoder comprising
means for receiving said signal samples,
subband means for generating, in response to said signal samples, subband
signals comprising scaling factors and associated scaled values, and
formatting means for assembling digital information including said subband
signals into a digital output having a format suitable for transmission or
storage, wherein said formatting means places said scaling factors into
one or more pre-established positions within a frame of said digital
output.
18. An encoder according to claim 17 wherein said subband means generates
subband signals comprising one or more blocks of scaled values, each block
comprising one or more scaled values associated with a respective scaling
factor.
19. An encoder according to claim 18 wherein said subband means generates
subband signals comprising two or more sets of respective blocks of scaled
values, wherein a respective block of scaled values in each set is
associated with a respective common scaling factor.
20. An encoder according to claim 17 wherein said subband means generates
subband signals comprising two or more sets of respective scaled values,
wherein a respective scaled value in each set is associated with a
respective common scaling factor.
21. An encoder according to any one of claims 13 through 20 wherein said
subband means generates subband signals represented in a floating-point
form, wherein said scaling factors are exponents and said scaled values
are mantissas.
22. An encoder according to any one of claims 13 through 20 wherein said
formatting means assembles said digital information such that scaling
factors placed into said adjacent positions are placed ahead of said
scaled values within said frame.
23. A decoder of a formatted signal including subband signals comprising
scaling factors and scaled values, said decoder comprising
deformatting means for deriving said subband signals by obtaining said
scaling factors from adjacent positions within a frame of said formatted
signal and by obtaining said scaled values from said formatted signal,
inverse subband means for generating signal samples in response to said
derived subband signals, and
means for sending said signal samples.
24. A decoder according to claim 23 wherein said deformatting means obtains
subband signals comprising one or more blocks of scaled values, each block
comprising one or more scaled values associated with a respective scaling
factor.
25. A decoder according to claim 24 wherein said deformatting means obtains
subband signals comprising two or more sets of respective blocks of scaled
values, wherein a respective block of scaled values in each set is
associated with a respective common scaling factor.
26. A decoder according to claim 23 wherein said deformatting means obtains
subband signals comprising two or more sets of respective scaled values,
wherein a respective scaled value in each set is associated with a
respective common scaling factor.
27. A decoder of a formatted signal including subband signals comprising
scaling factors and scaled values, said decoder comprising
deformatting means for deriving said subband signals by obtaining said
scaling factors from one or more pre-established positions within a frame
of said formatted signal and by obtaining said scaled values from said
formatted signal,
inverse subband means for generating signal samples in response to said
derived subband signals, and
means for sending said signal samples.
28. A decoder according to claim 27 wherein said deformatting means obtains
subband signals comprising one or more blocks of scaled values, each block
comprising one or more scaled values associated with a respective scaling
factor.
29. A decoder according to claim 28 wherein said deformatting means obtains
subband signals comprising two or more sets of respective blocks of scaled
values, wherein a respective block of scaled values in each set is
associated with a respective common scaling factor.
30. A decoder according to claim 27 wherein said deformatting means obtains
subband signals comprising two or more sets of respective scaled values,
wherein a respective scaled value in each set is associated with a
respective common scaling factor.
31. A decoder according to any one of claims 23 through 30 wherein said
deformatting means obtains subband signals represented in a floating-point
form, wherein said scaling factors are exponents and said scaled values
are mantissas.
32. A decoder according to any one of claims 23 through 30 wherein said
deformatting means obtains said scaling factors from said adjacent
positions ahead of said scaled values within said frame.
33. An encoding method for the encoding of audio information comprising
signal samples, said encoding method comprising
receiving said signal samples,
generating subband information comprising digital words in response to said
signal samples, and allocating an adaptive number of bits to represent at
least a portion of at least some of said digital words, and wherein at
least a portion of some of said digital words is represented by an
invariant number of bits, and
assembling digital information including said digital words into a digital
output having a format suitable for transmission or storage, and placing
said portion of digital words represented by an invariant number of bits
into one or more pre-established positions within a frame of said digital
output.
34. An encoding method according to claim 33 wherein said assembling places
said portion of digital words represented by an invariant number of bits
into adjacent positions within said frame.
35. A decoding method of a formatted signal comprising digital words, said
decoding method comprising
obtaining said digital words from said formatted signal, wherein at least a
portion of some of said digital words are represented by an invariant
number of bits and at least a portion of at least some of said digital
words are represented by an adaptive number of bits, said portion of
digital words represented by an invariant number of bits is obtained from
one or more pre-established positions within a frame of said formatted
signal, and said portion of digital words represented by an adaptive
number of bits is obtained from one or more positions within said frame
established in response to said portion of subband signals represented by
an invariant number of bits,
generating signal samples in response to said digital words, and
sending said signal samples.
36. A decoding method according to claim 35 wherein said portion of digital
words represented by an invariant number of bits is obtained from adjacent
positions within said frame.
37. An encoding method for the encoding of audio information comprising
signal samples, said encoding method comprising
receiving said signal samples,
generating subband information comprising digital words in response to said
signal samples, and allocating an adaptive number of bits to represent at
least a portion of at least some of said digital words, and wherein at
least a portion of some of said digital words is represented by an
invariant number of bits, and
assembling digital information including said digital words into a digital
output having a format suitable for transmission or storage, and placing
said portion of digital words represented by an invariant number of bits
into adjacent positions within a frame of said digital output.
38. An encoding method according to claim 33 or 37 wherein said generating
generates digital words comprising scaling factors and scaled values, and
wherein said portion of digital words to which an adaptive number of bits
is allocated constitutes at least a portion of at least some of said
scaled values.
39. A decoding method of a formatted signal comprising digital words, said
decoding method comprising
obtaining said digital words from said formatted signal, wherein at least a
portion of some of said digital words are represented by an invariant
number of bits and at least a portion of at least some of said digital
words are represented by an adaptive number of bits, said portion of
digital words represented by an invariant number of bits is obtained from
one or more adjacent positions within a frame of said formatted signal,
and said portion of digital words represented by an adaptive number of
bits is obtained from one or more positions within said frame established
in response to said portion of subband signals represented by an invariant
number of bits,
generating signal samples in response to said digital words, and
sending said signal samples.
40. A decoding method according to claim 35 or 39 wherein said digital
words comprise scaling factors and scaled values, and wherein said portion
of digital words represented by an adaptive number of bits constitutes at
least a portion of at least some of said scaled values.
41. An encoding method for the encoding of audio information comprising
signal samples, said encoding method comprising
receiving said signal samples,
generating, in response to said signal samples, subband signals comprising
scaling factors and associated scaled values, and
assembling digital information including said subband signals into a
digital output having a format suitable for transmission or storage, and
placing said scaling factors into adjacent positions within a frame of
said digital output.
42. An encoding method for the encoding of audio information comprising
signal samples, said encoding method comprising
receiving said signal samples,
generating, in response to said signal samples, subband signals comprising
scaling factors and associated scaled values, and
assembling digital information including said subband signals into a
digital output having a format suitable for transmission or storage, and
placing said scaling factors into one or more pre-established positions
within a frame of said digital output.
43. An encoding method according to claim 41 or 42 wherein said generating
generates subband signals comprising one or more blocks of scaled values,
each block comprising one or more scaled values associated with a
respective scaling factor.
44. An encoding method according to claim 41 or 42 wherein said generating
generates subband signals comprising two or more sets of respective scaled
values, wherein a respective scaled value in each set is associated with a
respective common scaling factor.
45. A decoding method of a formatted signal including subband signals
comprising scaling factors and scaled values, said decoding method
comprising
deriving said subband signals by obtaining said scaling factors from
adjacent positions within a frame of said formatted signal and by
obtaining said scaled values from said formatted signal,
generating signal samples in response to said derived subband signals, and
sending said signal samples.
46. A decoding method of a formatted signal including subband signals
comprising scaling factors and scaled values, said decoding method
comprising
deriving said subband signals by obtaining said scaling factors from one or
more pre-established positions within a frame of said formatted signal and
by obtaining said scaled values from said formatted signal,
generating signal samples in response to said derived subband signals, and
sending said signal samples.
47. A decoding method according to claim 45 or 46 wherein said subband
signals comprise one or more blocks of scaled values, each block
comprising one or more scaled values associated with a respective scaling
factor.
48. A decoding method according to claim 45 or 46 wherein said subband
signals comprise two or more sets of respective scaled values, wherein a
respective scaled value in each set is associated with a respective common
scaling factor. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The invention relates in general to high-quality low bit-rate encoding and
decoding of signals carrying information intended for human perception
such as audio signals, and more particularly music signals.
There is considerable interest among those in the field of signal
processing to discover methods which minimize the amount of information
required to represent adequately a given signal. By reducing required
information, signals may be transmitted over communication channels with
lower bandwidth, or stored in less space. With respect to digital
techniques, minimal informational requirements are synonymous with minimal
binary bit requirements.
Two factors limit the reduction of bit requirements:
(1) A signal of bandwidth W may be accurately represented by a series of
samples taken at a frequency no less than 2.multidot.W. This is the
Nyquist sampling rate. Therefore, a signal T seconds in length with a
bandwidth W requires at least 2.multidot.W.multidot.T number of samples
for accurate representation.
(2) Quantization of signal samples which may assume any of a continuous
range of values introduces inaccuracies in the representation of the
signal which are proportional to the quantizing step size or resolution.
These inaccuracies are called quantization errors. These errors are
inversely proportional to the number of bits available to represent the
signal sample quantization.
If coding techniques are applied to the full bandwidth, all quantizing
errors, which manifest themselves as noise, are spread uniformly across
the bandwidth. Split-band techniques which may be applied to selected
portions of the spectrum can limit the spectral spread of quantizing
noise. Two known split-band techniques, subband coding and transform
coding, are discussed in Tribolet and Crochiere, "Frequency Domain Coding
of Speech," IEEE Trans. on Acoust., Speech, Signal Proc., vol. ASSP-27,
October, 1979, pp. 512-30. By using subband coding or transform coding,
quantizing errors can be reduced in particular frequency bands where
quantizing noise is especially objectionable by quantizing that band with
a smaller step size.
Subband coding may be implemented by a bank of digital bandpass filters.
Transform coding may be implemented by any of several time-domain to
frequency-domain transforms which simulate a bank of digital bandpass
filters. Although transforms are easier to implement and require less
computational power and hardware than digital filters, they have less
design flexibility in the sense that each bandpass filter "frequency bin"
represented by a transform coefficient has a uniform bandwidth. By
contrast, a bank of digital bandpass filters can be designed to have
different subband bandwidths. Transform coefficients can, however, be
grouped together to define "subbands" having bandwidths which are
multiples of a single transform coefficient bandwidth. The term "subband"
is used hereinafter to refer to selected portions of the total signal
bandwidth, whether implemented by a subband coder or a transform coder.
The term is used in this manner because, as discussed by Tribolet and
Crochiere, the mathematical basis of subband coders and transform coders
are interchangeable, theretore the two coding Inethods are potentially
capable of duplicating each other. A subband as implemented by transform
coder is defined by a set of one or more adjacent transform coefficients
or frequency bins. The bandwidth of a transform coder frequency bin
depends upon the coder's sampling rate and the number of samples in each
signal sample block (the transform length).
Tribolet and Crochiere observed that two characteristics of subband
bandpass filters are particularly critical to the performance of subband
coder systems because they affect the amount of signal leakage between
subbands. The first is the bandwidth of the regions between the filter
passband and stopbands (the transition bands). The second is the
attenuation level in the stopbands. As used herein, the measure of filter
"selectivity" is the steepness of the filter response curve within the
transition bands (steepness of transition band rolloff), and the level of
attenuation in the stopbands (depth of stopband rejection).
It is known from Tribolet and Crochiere that reducing leakage between
subbands is important to subband coder performance because such leakage
distorts the results of spectral analysis, and therefore adversely affects
coding decisions made in response to the derived spectral shape. Such
leakage can also cause frequency-domain aliasing. These effects are
discussed in more detail below.
The two filter characteristics, steepness of transition band rolloff and
depth of stopband rejection, are also critical because the human auditory
system displays frequency-analysis properties resembling those of highly
asymmetrical tuned filters having variable center frequencies. The ability
of the human auditory system to detect distinct tones generally increases
as the difference in frequency between the tones increases; however, the
frequency resolution of the human auditory system remains substantially
constant for frequency differences less than the bandwidth of the above
mentioned filters. The effective bandwidth of these filters, which is
referred to as a critical band, varies throughout the audio spectrum. A
dominant signal within a critical band is more likely to mask or render
inaudible other signals anywhere within that critical band than other
signals at frequencies outside that critical band. A dominant signal may
mask other signals which occur not only at the same time as the masking
signal, but also which occur before and after the masking signal. The
duration of pre- and post-masking effects within a critical band depend
upon the magnitude of the masking signal, but pre-masking effects are
usually of much shorter duration than post-masking effects. See generally,
the Audio Engineering Handbook , K. Blair Benson ed., McGraw-Hill, San
Francisco, 1988, pages 1.40-1.42 and 4.8-4.10.
Psychoacoustic masking is more easily accomplished by subband and transform
coders if the subband bandwidth throughout the audible spectrum is less
than the critical bandwidth of the human auditory system in the same
portions of the spectrum. This is because the critical bands of the human
auditory system have variable center frequencies that adapt to auditory
stimuli, whereas subband and transform coders typically have fixed subband
center frequencies. To optimize the opportunity to utilize
psychoacoustic-masking effects, any distortion artifacts resulting from
the presence of a dominant signal should be limited to the subband
containing the dominant signal. If the subband bandwidth is about half or
less than half of the critical band (and if the transition band rolloff is
sufficiently steep and the stopband rejection is sufficiently deep), the
most effective masking of the undesired distortion products is likely to
occur even for signals whose frequency is near the edge of the subband
passband bandwidth. If the subband bandwidth is more than half a critical
band, there is the possibility that the dominant signal will cause the
human auditory system's critical band to be offset from the coder's
subband so that some of the undesired distortion products outside the
critical bandwidth are not masked. These effects are most objectionable at
low frequencies where the critical band is narrower.
Transform coding performance depends upon several factors, including the
signal sample block length, transform coding errors, and aliasing
cancellation.
Block Length
The signal sample block length affects the temporal and frequency
resolution of a transform coder. As block lengths become longer, transform
encoder and decoder temporal resolution is adversely affected. Encoder
quantization errors may be manifested as audible artifacts of signal
transients caused by the "smearing" or temporal spreading of the transient
across the length of the sample block recovered by the decoder. Such
artifacts are usually manifested as pre- and post-transient ringing.
Unless other remedial steps are taken, the block length in high-quality
coding systems is usually chosen such that temporal smearing does not
exceed the pre- and post-masking intervals of the human auditory system.
As block lengths become shorter, on the other hand, transform encoder and
decoder frequency resolution is adversely affected not only by the
consequential widening of the frequency bins, but also by degradation of
the response characteristics of the bandpass filter frequency bins: (1)
decreased rate of transition band rolloff, and (2) reduced level of
stopband rejection. This degradation in filter performance results in the
undesired creation of or contribution to transform coefficients in nearby
frequency bins in response to a desired signal. These undesired
contributions are called sidelobe leakage.
Depending on the sampling rate, a short block length may result in a
nominal filter bandwidth exceeding the critical bandwidth at some or all
frequencies, particularly low frequencies. Even if the nominal subband
bandwidth is narrower than the critical bandwidth, degraded filter
characteristics manifested as a broad transition band and/or poor stopband
rejection may result in significant signal components outside the critical
bandwidth. In such cases, greater constraints are ordinarily placed on
other aspects of the system, particularly quantization accuracy.
Another disadvantage resulting from short sample block lengths is the
exacerbation of transform coding errors, described in the next section.
Transform Coding Errors
Discrete transforms do not produce a perfectly accurate set of frequency
coefficients because they work with only a finite segment of the signal.
Strictly speaking, discrete transforms produce a time-frequency
representation of the input time-domain signal rather than a true
frequency-domain representation which would require infinite transform
lengths. For convenience of discussion here, however, the output of
discrete transforms will be referred to as a frequency-domain
representation. In effect, the discrete transform assumes the sampled
signal only has frequency components whose periods are a submultiple of
the finite sample interval. This is equivalent to an assumption that the
finite-length signal is periodic. The assumption in general is not true.
The assumed periodicity creates discontinuities at the edges of the finite
time interval which cause the transform to create phantom high-frequency
components.
One technique which minimizes this effect is to reduce the discontinuity
prior to the transformation by weighting the signal samples such that
samples near the edges of the interval are close to zero. Samples at the
center of the interval are generally passed unchanged, i.e., weighted by a
factor of one. This weighting function is called an "analysis window" and
may be of any shape, but certain windows contribute more favorably to
subband filter performance.
As used herein, the term "analysis window" refers merely to the windowing
function performed prior to application of the forward transform. As will
be discussed below, the design of an analysis window is constrained by
synthesis window design considerations. Therefore, design and performance
properties of an "analysis window" as that term is commonly used in the
art may differ from such analysis windows as discussed herein. While there
is no single criteria which may be used to assess a window's quality,
general criteria include steepness of transition band rolloff and depth of
stopband rejection. In some applications, the ability to trade steeper
rolloff for deeper rejection level is a useful quality.
The analysis window is a time-domain function. If no other compensation is
provided, the recovered or "synthesized" signal will be distorted
according to the shape of the analysis window. There are several
compensation methods. For example:
(a) The recovered signal interval or block may be multiplied by an inverse
window, one whose weighting factors are the reciprocal of those for the
analysis window. A disadvantage of this technique is that it clearly
requires that the analysis window not go to zero at the edges.
(b) Consecutive input signal blocks may be overlapped. By carefully
designing the analysis window such that two adjacent windows add to unity
across the overlap, the effects of the window will be exactly compensated.
(But see the following paragraph.) When used with certain types of
transforms such as the Discrete Fourier Transform (DFT), this technique
increases the number of bits required to represent the signal since the
portion of the signal in the overlap interval must be transformed and
transmitted twice. For these types of transforms, it is desirable to
design the window with an overlap interval as small as possible.
(c) Signal synthesis or decoding performed in a decoder may also require
synthesis filtering. As discussed in Crochiere, "A Weighted Overlap-Add
Method of Short-Time Fourier Analysis/Synthesis," IEEE Trans. Acoust.,
Speech, and Signal Proc., vol. ASSP-28, February, 1980, pp. 99-102,
synthesis interpolation filtering can be implemented more efficiently by a
synthesis-window weighted overlap-add method. Thus, some subband coders
implemented with transforms, including one used in an embodiment discussed
in more detail below, use synthesis windowing with overlap-add. Further,
quantizing errors may cause the inverse transform to produce a time-domain
signal which does not go to zero at the edges of the finite time interval.
Left alone, these errors may distort the recovered time-domain signal most
strongly within the window overlap interval. A synthesis window can be
used to shape each synthesized signal block at its edges. In this case,
the signal will be subjected to an analysis and a synthesis window, i.e.,
the signal will be weighted by the product of the two windows. Therefore,
both windows must be designed such that the product of the two will sum to
unity across the overlap. See the discussion in the previous paragraph.
Short transform sample blocks impose greater compensation requirements on
the analysis and synthesis windows. As the transform sample blocks become
shorter there is more sidelobe leakage through the filter's transition
band and stopband. A well shaped analysis window reduces this leakage.
Sidelobe leakage is undesirable because it causes the transform to create
spectral coefficients which misrepresent the frequency of signal
components outside the filter's passband. This misrepresentation is a
distortion called aliasing.
Aliasing Cancellation
The Nyquist theorem holds that a signal may be accurately recovered from
discrete samples when the interval between samples is no larger than
one-half the period of the signal's highest frequency component. When the
sampling rate is below this Nyquist rate, higher-frequency components are
misrepresented as lower-frequency components. The lower-frequency
component is an "alias" for the true component.
Subband filters and finite digital transforms are not perfect passband
filters. The transition between the passband and stopband is not
infinitely sharp, and the attenuation of signals in the stopband is not
infinitely great. As a result, even if a passband-filtered input signal is
sampled at the Nyquist rate suggested by the passband cut-off frequency,
frequencies in the transition band above the cutoff frequency will not be
faithfully represented.
It is possible to design the analysis and synthesis filters such that
aliasing distortion is automatically cancelled by the inverse filter.
Quadrature Mirror Filters in the time domain possess this characteristic.
The transform coder technique discussed in Johnson and Bradley, "Adaptive
Transform Coding Incorporating Time Domain Aliasing Cancellation," Speech
Communications, vol 6, North Holland: Elsevier Science Publishers, 1987,
pp. 299-308, also cancels aliasing distortion.
Suppressing the audible consequences of aliasing distortion in transform
coders becomes more difficult as the sample block length is made shorter.
As explained above, shorter sample blocks degrade filter performance: the
passband bandwidth increases, the passband-stopband transition becomes
less sharp, and the stopband rejection deteriorates. As a result, aliasing
becomes more pronounced. If the alias components are coded and decoded
with insufficient accuracy, these coding errors prevent the inverse
transform from completely cancelling aliasing distortion. The residual
aliasing distortion will be audible unless the distortion is
psychoacoustically masked. With short sample blocks, however, some
transform frequency bins may have a wider passband than the auditory
critical bands, particularly at low frequencies where the critical bands
have the greatest resolution. Consequently, alias distortion may not be
masked. One way to minimize the distortion is to increase quantization
accuracy in the problem subbands, but that increases the required bit
rate.
Bit-rate Reduction Techniques
The two factors listed above (Nyquist sample rate and quantizing errors)
should dictate the bit-rate requirements for a specified quality of signal
transmission or storage. Techniques may be employed, however, to reduce
the bit rate required for a given signal quality. These techniques exploit
a signal's redundancy and irrelevancy. A signal component is redundant if
it can be predicted or otherwise provided by the receiver. A signal
component is irrelevant if it is not needed to achieve a specified quality
of representation. Several techniques used in the art include:
(1) Prediction: a periodic or predictable characteristic of a signal
permits a receiver to anticipate some component based upon current or
previous signal characteristics.
(2) Entropy coding: components with a high probability of occurrence may be
represented by abbreviated codes. Both the transmitter and receiver must
have the same code book. Entropy coding and prediction have the
disadvantages that they increase computational complexity and processing
delay. Also, they inherently provide a variable rate output, thus
requiring buffering if used in a constant bit-rate system.
(3) Nonuniform coding: representations by logarithms or nonuniform
quantizing steps allow coding of large signal values with fewer bits at
the expense of greater quantizing errors.
(4) Floating point: floating-point representation may reduce bit
requirements at the expense of lost precision. Block-floating-point
representation uses one scale factor or exponent for a block of
floating-point mantissas, and is commonly used in coding time-domain
signals. Floating point is a special case of nonuniform coding.
(5) Bit allocation: the receiver's demand for accuracy may vary with time,
signal content, strength, or frequency. For example, lower frequency
components of speech are usually more important for comprehension and
speaker recognition, and therefore should be transmitted with greater
accuracy than higher frequency components. Different criteria apply with
respect to music signals. Some general bitallocation criteria are:
(a) Component variance: more bits are allocated to transform coefficients
with the greatest level of AC power.
(b) Component value: more bits are allocated to transform coefficients
which represent frequency bands with the greatest amplitude or energy.
(c) Psychoacoustic masking: fewer bits are allocated to signal components
whose quantizing errors are masked (rendered inaudible) by other signal
components. This method is unique to those applications where audible
signals are intended for human perception. Masking is understood best with
respect to single-tone signals rather than multiple-tone signals and
complex waveforms such as music signals.
The foregoing discussion applies to subband coders implemented with either
a true subband filter bank or with a time-domain to frequency-domain
transform. Most of the following discussion pertains more particularly to
transform-implemented coders in order to simplify the discussion of
embodiments of the present invention.
SUMMARY OF THE INVENTION
In accordance with the teachings of the present invention, an encoder
provides for the digital encoding of wideband audio information. The
wideband audio signals are sampled and quantized into time-domain sample
blocks. Each sample block is then modulated by an analysis window.
Frequency-domain spectral components are then generated in response to the
analysis-window weighted time-domain sample block. A transform coder
having adaptive bit allocation nonuniformly quantizes each transform
coefficient, and those coefficients are assembled into a digital output
having a format suitable for storage or transmission. Error correction
codes may be used in applications where the transmitted signal is subject
to noise or other corrupting effects of the communication path.
Also in accordance with the teachings of the present invention, a decoder
provides for the high-quality reproduction of digital | | |