A subband audio coder employs perfect/non-perfect reconstruction filters, predictive/non-predictive subband encoding, transient analysis, and psycho-acoustic/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to encode/decode a data stream to generate high fidelity reconstructed audio. The audio coder windows the multi-channel audio signal such that the frame size, i.e. number of bytes, is constrained to lie in a desired range, and formats the encoded data so that the individual subframes can be played back as they are received thereby reducing latency. Furthermore, the audio coder processes the baseband portion (0-24 kHz) of the audio bandwidth for sampling frequencies of 48 kHz and higher with the same encoding/decoding algorithm so that audio coder architecture is future compatible.
RELATED APPLICATION
This application is a divisional of application Ser. No. 08/642,254 filed May 2, 1996 entitled MULTI-CHANNEL PREDICTIVE SUBBAND AUDIO CODER USING PSYCHOACOUSTIC ADAPTIVE BIT ALLOCATION IN FREQUENCY, TIME AND OVER THE MULTIPLE CHANNELS, which is hereby incorporated by reference and which is itself a continuation-in-part of provisional application Ser. No. 60/007,896 filed Dec. 1, 1995.
A multiple description (MD) joint source-channel (JSC) encoder in accordance with the invention encodes n components of an audio signal for transmission over m channels of a communication medium, where n and m may take on any desired values. In an illustrative embodiment, the encoder combines a multiple description transform coder with elements of a perceptual audio coder (PAC). The encoder is configured to select one or more transform parameters for a multiple description transform, based on a characteristic of the audio signal to be encoded. For example, the transform parameters may be selected such that the resulting transformed coefficients have a variance distribution of a type expected by a subsequent entropy coding operation. The components of the audio signal may be quantized coefficients separated into a number of factor bands, and the transform parameter for a given factor band may be set to a value determined based on a transform parameter from at least one other factor band, e.g., the previous factor band. As another example, the transform parameter for one or more of the factor bands may be selected based on a determination as to whether the audio signal to be encoded is of a particular predetermined type. A desired variance distribution may also be obtained for the transformed coefficients by, e.g., pairing or otherwise grouping coefficients such that the coefficients of each pair or group are required to be in the same factor band.
A multi-channel linear predictive analysis-by-synthesis signal encoding method detects (S26, S27) inter-channel correlation and select one of several possible encoding modes (S24, S29, S30) based on the detected correlation.
A nonlinear operation method suitable for audio encoding/decoding and an applied hardware thereof. The nonlinear operation method suitable for audio encoding exists in a quantization process for the audio encoding. The nonlinear operation equation is .function.(X)=X.sup.3/4, where X represents the frequency-field data. The method comprises following steps. Building a query table that comprises the frequency-field data X and the corresponding value f(X) that corresponds to the frequency-field data X, wherein the query table is represented as a function T(X), and T(X)=X.sup.3/4, 1.ltoreq.X .ltoreq.S, where S represents a data range included in the query table. Analyzing and providing a modified error quantity function f.sub.a (z) represented by an equation of power of 2, where ##EQU1## n=1, 2 or 3, so that z falls in the data range S. When the frequency-field data X intended to be queried is greater than the data range S, the value of T(z) and T(z+1) are obtained from the query table and defined as Y.sub.1 and Y.sub.2, respectively. The value of the f(X) corresponding to any one of the frequency-field data X outside the data range S is subsequently calculated by using the two-phase interpolation method.
A data-compressed audio waveform is temporally modified without requiring complete decompression of the audio signal. Packets of compressed audio data are first unpacked, to remove scaling that was applied in the formation of the packets. The unpacked data is then temporally modified, using one of a number of different approaches. This modification takes place while the audio information remains in a data-compressed format. New packets are then assembled from the modified data, to produce a data-compressed output stream that can be subsequently processed in a conventional manner to reproduce the desired sound. The assembly of the new packets employs a technique for inferring an auditory model from the original packets, to requantize the data in the output packets.
Disclosed is an audio encoding apparatus for splitting an audio signal into a plurality of bands, allocating a number of quantization bits to each band and transmitting the audio signal of each band upon quantizing the audio signal by the number of allocated bits. The apparatus includes a bit allocation unit for (1) calculating an MNR for each band, where MNR is the ratio of an audio masking level M to a quantization noise level N, (2) comparing a set MNR with the smallest MNR from among the MNRs of the respective bands, (3) incrementing the number of quantization bits of the band that corresponds to the smallest MNR if the smallest MNR is smaller than the set MNR, and (4) performing allocation control for allocating quantization bits to each band until the smallest MNR becomes equal to or greater than the set MNR. A quantizer quantizes the audio signal of each band by the allocated number of quantization bits, and a bit-rate calculation unit decides a bit rate for transmission of audio data based upon the number of quantization bits allocated to each band.