WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method and apparatus for encoding and decoding audio information    
United States Patent5479562   
Link to this pagehttp://www.wikipatents.com/5479562.html
Inventor(s)Fielder; Louis D. (Millbrae, CA); Davidson; Grant A. (Oakland, CA)
AbstractThe invention relates to formatting encoded audio information in a form suitable for transmission or storage. Audio information is encoded into a binary form, using an invariant number of bits to represent at least some but not all of the encoded information. The information represented by an invariant number of bits is assembled into pre-established positions within a formatted frame.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5479562
Method and apparatus for encoding and decoding audio information - US Patent 5479562 Drawing
Method and apparatus for encoding and decoding audio information
Inventor     Fielder; Louis D. (Millbrae, CA); Davidson; Grant A. (Oakland, CA)
Owner/Assignee     Dolby Laboratories Licensing Corporation (San Francisco, CA)
Patent assignment
All assignments
Publication Date     * December 26, 1995
Application Number     08/079,169
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 18, 1993
US Classification     704/229 704/200.1 704/205
Int'l Classification     G10L 009/00
Examiner     Knepper; David D.
Assistant Examiner    
Attorney/Law Firm     Gallagher; Thomas A. Lathrop; David N. ,
Address
Parent Case     CROSS-REFERENCE TO RELATED APPLICATIONS This application is: a continuation-in-part of application Ser. No. 07/582,956 filed Sep. 26, 1990, now in issue as U.S. Pat. No. 5,222,189, which is a continuation-in-part of application Ser. No. 07/439,868 filed Nov. 20, 1989, now abandoned, which was a continuation-in-part of application Ser. No. 07/303,714 filed Jan. 27, 1989, now abandoned; and a continuation-in-part of application Ser. No. 07/787,665 filed Nov. 4, 1991, now in issue as U.S. Pat. No. 5,230,038, which is a divisional of application Ser. No. 07/458,894 filed Dec. 29, 1989, now U.S. Pat. No. 5,109,417, which was a continuation-in-part of application Ser. No. 07/303,714 filed Jan. 27, 1989, now abandoned.
Priority Data    
USPTO Field of Search     395/2 395/2.1 395/2.12 395/2.15 395/2.33 395/2.34 395/2.38 395/2.39 381/29 381/30 381/31 381/32 381/33 381/34 381/35 381/36 381/37 381/38 381/39 381/40
Patent Tags     encoding decoding audio information
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
8709685



[0 after 0 votes]
5301205
Tsutsui
375/340
Apr,1994

[0 after 0 votes]
5294925
Akagiri

Mar,1994

[0 after 0 votes]
5264846
Oikawa
341/76
Nov,1993

[0 after 0 votes]
5222189
Fielder
704/229
Jun,1993

[0 after 0 votes]
5142656
Fielder
704/229
Aug,1992

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. An encoder for the encoding of audio information comprising signal samples, said encoder comprising

means for receiving said signal samples,

subband means, including adaptive bit allocation means, for generating subband information comprising digital words in response to said signal samples, wherein said adaptive bit allocation means allocates an adaptive number of bits to represent at least a portion of at least some of said digital words, and wherein at least a portion of some of said digital words is represented by an invariant number of bits, and

formatting means for assembling digital information including said digital words into a digital output having a format suitable for transmission or storage, wherein said formatting means places said portion of digital words represented by an invariant number of bits into one or more pre-established positions within a frame of said digital output.

2. An encoder according to claim 1 wherein said formatting means assembles said digital information such that said portion of digital words placed into said one or more pre-established positions is placed ahead of other portions of said digital words within said frame.

3. An encoder according to claim 1 wherein said formatting means places said portion of digital words represented by an invariant number of bits into adjacent positions within said frame.

4. A decoder of a formatted signal comprising digital words, said decoder comprising

deformatting means for obtaining said digital words from said formatted signal, wherein at least a portion of some of said digital words are represented by an invariant number of bits and at least a portion of at least some of said digital words are represented by an adaptive number of bits, said portion of digital words represented by an invariant number of bits is obtained from one or more pre-established positions within a frame of said formatted signal, and said portion of digital words represented by an adaptive number of bits is obtained from one or more positions within said frame established in response to said portion of subband signals represented by an invariant number of bits,

inverse subband means for generating signal samples in response to said digital words, and

means for sending said signal samples.

5. A decoder according to claim 4 wherein said deformatting means obtains said portion of digital words from said one or more pre-established positions which are ahead of other portions of said digital words within said frame.

6. A decoder according to claim 4 wherein said deformatting means obtains said portion of digital words represented by an invariant number of bits from adjacent positions within said frame.

7. An encoder for the encoding of audio information comprising signal samples, said encoder comprising

means for receiving said signal samples,

subband means, including adaptive bit allocation means, for generating subband information comprising digital words in response to said signal samples, wherein said adaptive bit allocation means allocates an adaptive number of bits to represent at least a portion of at least some of said digital words, and wherein at least a portion of some of said digital words is represented by an invariant number of bits, and

formatting means for assembling digital information including said digital words into a digital output having a format suitable for transmission or storage, wherein said formatting means places said portion of digital words represented by an invariant number of bits into adjacent positions within a frame of said digital output.

8. An encoder according to claim 7 wherein said formatting means assembles said digital information such that said portion of digital words placed into said adjacent positions is placed ahead of other portions of said digital words within said frame.

9. An encoder according to any one of claims 1, 2, 7 or 8 wherein said digital words comprise scaling factors and scaled values, and wherein said portion of digital words to which an adaptive number of bits is allocated constitutes at least a portion of at least some of said scaled values.

10. A decoder of a formatted signal comprising digital words, said decoder comprising

deformatting means for obtaining said digital words from said formatted signal, wherein at least a portion of some of said digital words are represented by an invariant number of bits and at least a portion of at least some of said digital words are represented by an adaptive number of bits, said portion of digital words represented by an invariant number of bits is obtained from one or more adjacent positions within a frame of said formatted signal, and said portion of digital words represented by an adaptive number of bits is obtained from one or more positions within said frame established in response to said portion of subband signals represented by an invariant number of bits,

inverse subband means for generating signal samples in response to said digital words, and

means for sending said signal samples.

11. A decoder according to claim 10 wherein said deformatting means obtains said portion of digital words from said one or more adjacent positions which are ahead of other portions of said digital words within said frame.

12. A decoder according to any one of claims 4, 5, 10 or 11 wherein said digital words comprise scaling factors and scaled values, and wherein said portion of digital words represented by an adaptive number of bits constitutes at least a portion of at least some of said scaled values.

13. An encoder for the encoding of audio information comprising signal samples, said encoder comprising

means for receiving said signal samples,

subband means for generating, in response to said signal samples, subband signals comprising scaling factors and associated scaled values, and

formatting means for assembling digital information including said subband signals into a digital output having a format suitable for transmission or storage, wherein said formatting means places said scaling factors into adjacent positions within a frame of said digital output.

14. An encoder according to claim 13 wherein said subband means generates subband signals comprising one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

15. An encoder according to claim 14 wherein said subband means generates subband signals comprising two or more sets of respective blocks of scaled values, wherein a respective block of scaled values in each set is associated with a respective common scaling factor.

16. An encoder according to claim 13 wherein said subband means generates subband signals comprising two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.

17. An encoder for the encoding of audio information comprising signal samples, said encoder comprising

means for receiving said signal samples,

subband means for generating, in response to said signal samples, subband signals comprising scaling factors and associated scaled values, and

formatting means for assembling digital information including said subband signals into a digital output having a format suitable for transmission or storage, wherein said formatting means places said scaling factors into one or more pre-established positions within a frame of said digital output.

18. An encoder according to claim 17 wherein said subband means generates subband signals comprising one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

19. An encoder according to claim 18 wherein said subband means generates subband signals comprising two or more sets of respective blocks of scaled values, wherein a respective block of scaled values in each set is associated with a respective common scaling factor.

20. An encoder according to claim 17 wherein said subband means generates subband signals comprising two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.

21. An encoder according to any one of claims 13 through 20 wherein said subband means generates subband signals represented in a floating-point form, wherein said scaling factors are exponents and said scaled values are mantissas.

22. An encoder according to any one of claims 13 through 20 wherein said formatting means assembles said digital information such that scaling factors placed into said adjacent positions are placed ahead of said scaled values within said frame.

23. A decoder of a formatted signal including subband signals comprising scaling factors and scaled values, said decoder comprising

deformatting means for deriving said subband signals by obtaining said scaling factors from adjacent positions within a frame of said formatted signal and by obtaining said scaled values from said formatted signal,

inverse subband means for generating signal samples in response to said derived subband signals, and

means for sending said signal samples.

24. A decoder according to claim 23 wherein said deformatting means obtains subband signals comprising one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

25. A decoder according to claim 24 wherein said deformatting means obtains subband signals comprising two or more sets of respective blocks of scaled values, wherein a respective block of scaled values in each set is associated with a respective common scaling factor.

26. A decoder according to claim 23 wherein said deformatting means obtains subband signals comprising two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.

27. A decoder of a formatted signal including subband signals comprising scaling factors and scaled values, said decoder comprising

deformatting means for deriving said subband signals by obtaining said scaling factors from one or more pre-established positions within a frame of said formatted signal and by obtaining said scaled values from said formatted signal,

inverse subband means for generating signal samples in response to said derived subband signals, and

means for sending said signal samples.

28. A decoder according to claim 27 wherein said deformatting means obtains subband signals comprising one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

29. A decoder according to claim 28 wherein said deformatting means obtains subband signals comprising two or more sets of respective blocks of scaled values, wherein a respective block of scaled values in each set is associated with a respective common scaling factor.

30. A decoder according to claim 27 wherein said deformatting means obtains subband signals comprising two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.

31. A decoder according to any one of claims 23 through 30 wherein said deformatting means obtains subband signals represented in a floating-point form, wherein said scaling factors are exponents and said scaled values are mantissas.

32. A decoder according to any one of claims 23 through 30 wherein said deformatting means obtains said scaling factors from said adjacent positions ahead of said scaled values within said frame.

33. An encoding method for the encoding of audio information comprising signal samples, said encoding method comprising

receiving said signal samples,

generating subband information comprising digital words in response to said signal samples, and allocating an adaptive number of bits to represent at least a portion of at least some of said digital words, and wherein at least a portion of some of said digital words is represented by an invariant number of bits, and

assembling digital information including said digital words into a digital output having a format suitable for transmission or storage, and placing said portion of digital words represented by an invariant number of bits into one or more pre-established positions within a frame of said digital output.

34. An encoding method according to claim 33 wherein said assembling places said portion of digital words represented by an invariant number of bits into adjacent positions within said frame.

35. A decoding method of a formatted signal comprising digital words, said decoding method comprising

obtaining said digital words from said formatted signal, wherein at least a portion of some of said digital words are represented by an invariant number of bits and at least a portion of at least some of said digital words are represented by an adaptive number of bits, said portion of digital words represented by an invariant number of bits is obtained from one or more pre-established positions within a frame of said formatted signal, and said portion of digital words represented by an adaptive number of bits is obtained from one or more positions within said frame established in response to said portion of subband signals represented by an invariant number of bits,

generating signal samples in response to said digital words, and

sending said signal samples.

36. A decoding method according to claim 35 wherein said portion of digital words represented by an invariant number of bits is obtained from adjacent positions within said frame.

37. An encoding method for the encoding of audio information comprising signal samples, said encoding method comprising

receiving said signal samples,

generating subband information comprising digital words in response to said signal samples, and allocating an adaptive number of bits to represent at least a portion of at least some of said digital words, and wherein at least a portion of some of said digital words is represented by an invariant number of bits, and

assembling digital information including said digital words into a digital output having a format suitable for transmission or storage, and placing said portion of digital words represented by an invariant number of bits into adjacent positions within a frame of said digital output.

38. An encoding method according to claim 33 or 37 wherein said generating generates digital words comprising scaling factors and scaled values, and wherein said portion of digital words to which an adaptive number of bits is allocated constitutes at least a portion of at least some of said scaled values.

39. A decoding method of a formatted signal comprising digital words, said decoding method comprising

obtaining said digital words from said formatted signal, wherein at least a portion of some of said digital words are represented by an invariant number of bits and at least a portion of at least some of said digital words are represented by an adaptive number of bits, said portion of digital words represented by an invariant number of bits is obtained from one or more adjacent positions within a frame of said formatted signal, and said portion of digital words represented by an adaptive number of bits is obtained from one or more positions within said frame established in response to said portion of subband signals represented by an invariant number of bits,

generating signal samples in response to said digital words, and

sending said signal samples.

40. A decoding method according to claim 35 or 39 wherein said digital words comprise scaling factors and scaled values, and wherein said portion of digital words represented by an adaptive number of bits constitutes at least a portion of at least some of said scaled values.

41. An encoding method for the encoding of audio information comprising signal samples, said encoding method comprising

receiving said signal samples,

generating, in response to said signal samples, subband signals comprising scaling factors and associated scaled values, and

assembling digital information including said subband signals into a digital output having a format suitable for transmission or storage, and placing said scaling factors into adjacent positions within a frame of said digital output.

42. An encoding method for the encoding of audio information comprising signal samples, said encoding method comprising

receiving said signal samples,

generating, in response to said signal samples, subband signals comprising scaling factors and associated scaled values, and

assembling digital information including said subband signals into a digital output having a format suitable for transmission or storage, and placing said scaling factors into one or more pre-established positions within a frame of said digital output.

43. An encoding method according to claim 41 or 42 wherein said generating generates subband signals comprising one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

44. An encoding method according to claim 41 or 42 wherein said generating generates subband signals comprising two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.

45. A decoding method of a formatted signal including subband signals comprising scaling factors and scaled values, said decoding method comprising

deriving said subband signals by obtaining said scaling factors from adjacent positions within a frame of said formatted signal and by obtaining said scaled values from said formatted signal,

generating signal samples in response to said derived subband signals, and

sending said signal samples.

46. A decoding method of a formatted signal including subband signals comprising scaling factors and scaled values, said decoding method comprising

deriving said subband signals by obtaining said scaling factors from one or more pre-established positions within a frame of said formatted signal and by obtaining said scaled values from said formatted signal,

generating signal samples in response to said derived subband signals, and

sending said signal samples.

47. A decoding method according to claim 45 or 46 wherein said subband signals comprise one or more blocks of scaled values, each block comprising one or more scaled values associated with a respective scaling factor.

48. A decoding method according to claim 45 or 46 wherein said subband signals comprise two or more sets of respective scaled values, wherein a respective scaled value in each set is associated with a respective common scaling factor.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The invention relates in general to high-quality low bit-rate encoding and decoding of signals carrying information intended for human perception such as audio signals, and more particularly music signals.

There is considerable interest among those in the field of signal processing to discover methods which minimize the amount of information required to represent adequately a given signal. By reducing required information, signals may be transmitted over communication channels with lower bandwidth, or stored in less space. With respect to digital techniques, minimal informational requirements are synonymous with minimal binary bit requirements.

Two factors limit the reduction of bit requirements:

(1) A signal of bandwidth W may be accurately represented by a series of samples taken at a frequency no less than 2.multidot.W. This is the Nyquist sampling rate. Therefore, a signal T seconds in length with a bandwidth W requires at least 2.multidot.W.multidot.T number of samples for accurate representation.

(2) Quantization of signal samples which may assume any of a continuous range of values introduces inaccuracies in the representation of the signal which are proportional to the quantizing step size or resolution. These inaccuracies are called quantization errors. These errors are inversely proportional to the number of bits available to represent the signal sample quantization.

If coding techniques are applied to the full bandwidth, all quantizing errors, which manifest themselves as noise, are spread uniformly across the bandwidth. Split-band techniques which may be applied to selected portions of the spectrum can limit the spectral spread of quantizing noise. Two known split-band techniques, subband coding and transform coding, are discussed in Tribolet and Crochiere, "Frequency Domain Coding of Speech," IEEE Trans. on Acoust., Speech, Signal Proc., vol. ASSP-27, October, 1979, pp. 512-30. By using subband coding or transform coding, quantizing errors can be reduced in particular frequency bands where quantizing noise is especially objectionable by quantizing that band with a smaller step size.

Subband coding may be implemented by a bank of digital bandpass filters. Transform coding may be implemented by any of several time-domain to frequency-domain transforms which simulate a bank of digital bandpass filters. Although transforms are easier to implement and require less computational power and hardware than digital filters, they have less design flexibility in the sense that each bandpass filter "frequency bin" represented by a transform coefficient has a uniform bandwidth. By contrast, a bank of digital bandpass filters can be designed to have different subband bandwidths. Transform coefficients can, however, be grouped together to define "subbands" having bandwidths which are multiples of a single transform coefficient bandwidth. The term "subband" is used hereinafter to refer to selected portions of the total signal bandwidth, whether implemented by a subband coder or a transform coder. The term is used in this manner because, as discussed by Tribolet and Crochiere, the mathematical basis of subband coders and transform coders are interchangeable, theretore the two coding Inethods are potentially capable of duplicating each other. A subband as implemented by transform coder is defined by a set of one or more adjacent transform coefficients or frequency bins. The bandwidth of a transform coder frequency bin depends upon the coder's sampling rate and the number of samples in each signal sample block (the transform length).

Tribolet and Crochiere observed that two characteristics of subband bandpass filters are particularly critical to the performance of subband coder systems because they affect the amount of signal leakage between subbands. The first is the bandwidth of the regions between the filter passband and stopbands (the transition bands). The second is the attenuation level in the stopbands. As used herein, the measure of filter "selectivity" is the steepness of the filter response curve within the transition bands (steepness of transition band rolloff), and the level of attenuation in the stopbands (depth of stopband rejection).

It is known from Tribolet and Crochiere that reducing leakage between subbands is important to subband coder performance because such leakage distorts the results of spectral analysis, and therefore adversely affects coding decisions made in response to the derived spectral shape. Such leakage can also cause frequency-domain aliasing. These effects are discussed in more detail below.

The two filter characteristics, steepness of transition band rolloff and depth of stopband rejection, are also critical because the human auditory system displays frequency-analysis properties resembling those of highly asymmetrical tuned filters having variable center frequencies. The ability of the human auditory system to detect distinct tones generally increases as the difference in frequency between the tones increases; however, the frequency resolution of the human auditory system remains substantially constant for frequency differences less than the bandwidth of the above mentioned filters. The effective bandwidth of these filters, which is referred to as a critical band, varies throughout the audio spectrum. A dominant signal within a critical band is more likely to mask or render inaudible other signals anywhere within that critical band than other signals at frequencies outside that critical band. A dominant signal may mask other signals which occur not only at the same time as the masking signal, but also which occur before and after the masking signal. The duration of pre- and post-masking effects within a critical band depend upon the magnitude of the masking signal, but pre-masking effects are usually of much shorter duration than post-masking effects. See generally, the Audio Engineering Handbook , K. Blair Benson ed., McGraw-Hill, San Francisco, 1988, pages 1.40-1.42 and 4.8-4.10.

Psychoacoustic masking is more easily accomplished by subband and transform coders if the subband bandwidth throughout the audible spectrum is less than the critical bandwidth of the human auditory system in the same portions of the spectrum. This is because the critical bands of the human auditory system have variable center frequencies that adapt to auditory stimuli, whereas subband and transform coders typically have fixed subband center frequencies. To optimize the opportunity to utilize psychoacoustic-masking effects, any distortion artifacts resulting from the presence of a dominant signal should be limited to the subband containing the dominant signal. If the subband bandwidth is about half or less than half of the critical band (and if the transition band rolloff is sufficiently steep and the stopband rejection is sufficiently deep), the most effective masking of the undesired distortion products is likely to occur even for signals whose frequency is near the edge of the subband passband bandwidth. If the subband bandwidth is more than half a critical band, there is the possibility that the dominant signal will cause the human auditory system's critical band to be offset from the coder's subband so that some of the undesired distortion products outside the critical bandwidth are not masked. These effects are most objectionable at low frequencies where the critical band is narrower.

Transform coding performance depends upon several factors, including the signal sample block length, transform coding errors, and aliasing cancellation.

Block Length

The signal sample block length affects the temporal and frequency resolution of a transform coder. As block lengths become longer, transform encoder and decoder temporal resolution is adversely affected. Encoder quantization errors may be manifested as audible artifacts of signal transients caused by the "smearing" or temporal spreading of the transient across the length of the sample block recovered by the decoder. Such artifacts are usually manifested as pre- and post-transient ringing.

Unless other remedial steps are taken, the block length in high-quality coding systems is usually chosen such that temporal smearing does not exceed the pre- and post-masking intervals of the human auditory system.

As block lengths become shorter, on the other hand, transform encoder and decoder frequency resolution is adversely affected not only by the consequential widening of the frequency bins, but also by degradation of the response characteristics of the bandpass filter frequency bins: (1) decreased rate of transition band rolloff, and (2) reduced level of stopband rejection. This degradation in filter performance results in the undesired creation of or contribution to transform coefficients in nearby frequency bins in response to a desired signal. These undesired contributions are called sidelobe leakage.

Depending on the sampling rate, a short block length may result in a nominal filter bandwidth exceeding the critical bandwidth at some or all frequencies, particularly low frequencies. Even if the nominal subband bandwidth is narrower than the critical bandwidth, degraded filter characteristics manifested as a broad transition band and/or poor stopband rejection may result in significant signal components outside the critical bandwidth. In such cases, greater constraints are ordinarily placed on other aspects of the system, particularly quantization accuracy.

Another disadvantage resulting from short sample block lengths is the exacerbation of transform coding errors, described in the next section.

Transform Coding Errors

Discrete transforms do not produce a perfectly accurate set of frequency coefficients because they work with only a finite segment of the signal. Strictly speaking, discrete transforms produce a time-frequency representation of the input time-domain signal rather than a true frequency-domain representation which would require infinite transform lengths. For convenience of discussion here, however, the output of discrete transforms will be referred to as a frequency-domain representation. In effect, the discrete transform assumes the sampled signal only has frequency components whose periods are a submultiple of the finite sample interval. This is equivalent to an assumption that the finite-length signal is periodic. The assumption in general is not true. The assumed periodicity creates discontinuities at the edges of the finite time interval which cause the transform to create phantom high-frequency components.

One technique which minimizes this effect is to reduce the discontinuity prior to the transformation by weighting the signal samples such that samples near the edges of the interval are close to zero. Samples at the center of the interval are generally passed unchanged, i.e., weighted by a factor of one. This weighting function is called an "analysis window" and may be of any shape, but certain windows contribute more favorably to subband filter performance.

As used herein, the term "analysis window" refers merely to the windowing function performed prior to application of the forward transform. As will be discussed below, the design of an analysis window is constrained by synthesis window design considerations. Therefore, design and performance properties of an "analysis window" as that term is commonly used in the art may differ from such analysis windows as discussed herein. While there is no single criteria which may be used to assess a window's quality, general criteria include steepness of transition band rolloff and depth of stopband rejection. In some applications, the ability to trade steeper rolloff for deeper rejection level is a useful quality.

The analysis window is a time-domain function. If no other compensation is provided, the recovered or "synthesized" signal will be distorted according to the shape of the analysis window. There are several compensation methods. For example:

(a) The recovered signal interval or block may be multiplied by an inverse window, one whose weighting factors are the reciprocal of those for the analysis window. A disadvantage of this technique is that it clearly requires that the analysis window not go to zero at the edges.

(b) Consecutive input signal blocks may be overlapped. By carefully designing the analysis window such that two adjacent windows add to unity across the overlap, the effects of the window will be exactly compensated. (But see the following paragraph.) When used with certain types of transforms such as the Discrete Fourier Transform (DFT), this technique increases the number of bits required to represent the signal since the portion of the signal in the overlap interval must be transformed and transmitted twice. For these types of transforms, it is desirable to design the window with an overlap interval as small as possible.

(c) Signal synthesis or decoding performed in a decoder may also require synthesis filtering. As discussed in Crochiere, "A Weighted Overlap-Add Method of Short-Time Fourier Analysis/Synthesis," IEEE Trans. Acoust., Speech, and Signal Proc., vol. ASSP-28, February, 1980, pp. 99-102, synthesis interpolation filtering can be implemented more efficiently by a synthesis-window weighted overlap-add method. Thus, some subband coders implemented with transforms, including one used in an embodiment discussed in more detail below, use synthesis windowing with overlap-add. Further, quantizing errors may cause the inverse transform to produce a time-domain signal which does not go to zero at the edges of the finite time interval. Left alone, these errors may distort the recovered time-domain signal most strongly within the window overlap interval. A synthesis window can be used to shape each synthesized signal block at its edges. In this case, the signal will be subjected to an analysis and a synthesis window, i.e., the signal will be weighted by the product of the two windows. Therefore, both windows must be designed such that the product of the two will sum to unity across the overlap. See the discussion in the previous paragraph.

Short transform sample blocks impose greater compensation requirements on the analysis and synthesis windows. As the transform sample blocks become shorter there is more sidelobe leakage through the filter's transition band and stopband. A well shaped analysis window reduces this leakage.

Sidelobe leakage is undesirable because it causes the transform to create spectral coefficients which misrepresent the frequency of signal components outside the filter's passband. This misrepresentation is a distortion called aliasing.

Aliasing Cancellation

The Nyquist theorem holds that a signal may be accurately recovered from discrete samples when the interval between samples is no larger than one-half the period of the signal's highest frequency component. When the sampling rate is below this Nyquist rate, higher-frequency components are misrepresented as lower-frequency components. The lower-frequency component is an "alias" for the true component.

Subband filters and finite digital transforms are not perfect passband filters. The transition between the passband and stopband is not infinitely sharp, and the attenuation of signals in the stopband is not infinitely great. As a result, even if a passband-filtered input signal is sampled at the Nyquist rate suggested by the passband cut-off frequency, frequencies in the transition band above the cutoff frequency will not be faithfully represented.

It is possible to design the analysis and synthesis filters such that aliasing distortion is automatically cancelled by the inverse filter. Quadrature Mirror Filters in the time domain possess this characteristic. The transform coder technique discussed in Johnson and Bradley, "Adaptive Transform Coding Incorporating Time Domain Aliasing Cancellation," Speech Communications, vol 6, North Holland: Elsevier Science Publishers, 1987, pp. 299-308, also cancels aliasing distortion.

Suppressing the audible consequences of aliasing distortion in transform coders becomes more difficult as the sample block length is made shorter. As explained above, shorter sample blocks degrade filter performance: the passband bandwidth increases, the passband-stopband transition becomes less sharp, and the stopband rejection deteriorates. As a result, aliasing becomes more pronounced. If the alias components are coded and decoded with insufficient accuracy, these coding errors prevent the inverse transform from completely cancelling aliasing distortion. The residual aliasing distortion will be audible unless the distortion is psychoacoustically masked. With short sample blocks, however, some transform frequency bins may have a wider passband than the auditory critical bands, particularly at low frequencies where the critical bands have the greatest resolution. Consequently, alias distortion may not be masked. One way to minimize the distortion is to increase quantization accuracy in the problem subbands, but that increases the required bit rate.

Bit-rate Reduction Techniques

The two factors listed above (Nyquist sample rate and quantizing errors) should dictate the bit-rate requirements for a specified quality of signal transmission or storage. Techniques may be employed, however, to reduce the bit rate required for a given signal quality. These techniques exploit a signal's redundancy and irrelevancy. A signal component is redundant if it can be predicted or otherwise provided by the receiver. A signal component is irrelevant if it is not needed to achieve a specified quality of representation. Several techniques used in the art include:

(1) Prediction: a periodic or predictable characteristic of a signal permits a receiver to anticipate some component based upon current or previous signal characteristics.

(2) Entropy coding: components with a high probability of occurrence may be represented by abbreviated codes. Both the transmitter and receiver must have the same code book. Entropy coding and prediction have the disadvantages that they increase computational complexity and processing delay. Also, they inherently provide a variable rate output, thus requiring buffering if used in a constant bit-rate system.

(3) Nonuniform coding: representations by logarithms or nonuniform quantizing steps allow coding of large signal values with fewer bits at the expense of greater quantizing errors.

(4) Floating point: floating-point representation may reduce bit requirements at the expense of lost precision. Block-floating-point representation uses one scale factor or exponent for a block of floating-point mantissas, and is commonly used in coding time-domain signals. Floating point is a special case of nonuniform coding.

(5) Bit allocation: the receiver's demand for accuracy may vary with time, signal content, strength, or frequency. For example, lower frequency components of speech are usually more important for comprehension and speaker recognition, and therefore should be transmitted with greater accuracy than higher frequency components. Different criteria apply with respect to music signals. Some general bitallocation criteria are:

(a) Component variance: more bits are allocated to transform coefficients with the greatest level of AC power.

(b) Component value: more bits are allocated to transform coefficients which represent frequency bands with the greatest amplitude or energy.

(c) Psychoacoustic masking: fewer bits are allocated to signal components whose quantizing errors are masked (rendered inaudible) by other signal components. This method is unique to those applications where audible signals are intended for human perception. Masking is understood best with respect to single-tone signals rather than multiple-tone signals and complex waveforms such as music signals.

The foregoing discussion applies to subband coders implemented with either a true subband filter bank or with a time-domain to frequency-domain transform. Most of the following discussion pertains more particularly to transform-implemented coders in order to simplify the discussion of embodiments of the present invention.

SUMMARY OF THE INVENTION

In accordance with the teachings of the present invention, an encoder provides for the digital encoding of wideband audio information. The wideband audio signals are sampled and quantized into time-domain sample blocks. Each sample block is then modulated by an analysis window. Frequency-domain spectral components are then generated in response to the analysis-window weighted time-domain sample block. A transform coder having adaptive bit allocation nonuniformly quantizes each transform coefficient, and those coefficients are assembled into a digital output having a format suitable for storage or transmission. Error correction codes may be used in applications where the transmitted signal is subject to noise or other corrupting effects of the communication path.

Also in accordance with the teachings of the present invention, a decoder provides for the high-quality reproduction of digital