A digital acoustic signal coding apparatus, a method of coding the digital acoustic signal, and a recording medium for recording a program of coding the digital acoustic signal are respectively realized. It is possible to provide the digital acoustic signal coding method and apparatus, in which, corresponding to the difference between the sampling frequencies of the input acoustic signal, short blocks can be suitably classified into groups without deteriorating sound quality and the suitability of using either long/short blocks can be judged. The coding apparatus is composed of a calculation medium for calculating the sensation entropy of an input acoustic signal per each of the respective short sensation blocks; a sensation entropy sum total calculation medium for obtaining a total sum in a frame of the sensation entropy; a comparison medium for comparing an absolute value of the difference between the respective total sums of the sensation entropy of successive two frames with a previously determined threshold value; and a long/short block judgment medium for judging whether a long block or short blocks should be used to convert a block of the input acoustic signal on the basis of the comparison result.
Analysis and synthesis filter banks such as those used in audio and video coding systems are each implemented by a hybrid transform that comprises a primary transform in cascade with one or more secondary transforms. The primary transforms for the filter banks implement an analysis/synthesis system in which time-domain aliasing artifacts are cancelled. The secondary transforms, which are in cascade with the primary transforms, are applied to blocks of transform coefficients. The length of the blocks is varied to adapt the time resolution of the analysis and synthesis filter banks.
In various embodiments of the present invention, a noisy signal denoiser is tuned and optimized by selecting denoiser parameters that provide relatively highly compressible denoiser output. When the original signal can be compared to the output of a denoiser, the denoiser can be accurately tuned and adjusted in order to produce a denoised signal that resembles as closely as possible the clear signal originally transmitted through a noise-introducing channel. However, when the clear signal is not available, as in many communications applications, other methods are needed. By adjusting the parameters to provide a denoised signal that is globally or locally maximally compressible, the denoiser can be optimized despite inaccessibility of the original, clear signal.
In connection with a classification system for classifying media entities that merges perceptual classification techniques and digital signal processing classification techniques for improved classification of media entities, a system and methods are provided for automatically classifying and characterizing sonic properties of media entities. Such a system and methods may be useful for the indexing of a database or other storage collection of media entities, such as media entities that are audio files, or have portions that are audio files. The methods also help to determine media entities that have similar sonic properties by utilizing classification chain techniques that test distances between media entities in terms of their properties. For example, a neighborhood of songs may be determined within which each song has similar sonic properties.
Windows of the first type and windows of the second type are identified within a frame using energy associated with each short window within the frame. The short windows of the first type and the short windows of the second type are then grouped into two preliminary groups based on the window type of each short window. Further, if the number of short windows in any of the two preliminary groups exceeds a threshold number, the short windows in this large preliminary group are further grouped into at least two more groups.
Blocks of audio are encoded based upon corresponding first and second frequencies. The first and second frequencies are hopped from block to block. An audio quality measure (AQM) is computed for each block of audio such that, if x out of y blocks of audio have an AQM greater than a first predetermined threshold, encoding is suspended. For example, x may be nine and y may be 16. Also, if a ratio of the energy in a front part of a block of audio to the energy in a rear part of the block of audio is greater than a second predetermined threshold, that block of audio is not encoded even though x out of y blocks of audio have an AQM greater than the first predetermined threshold. Multiple distributors of the audio may encode the audio with their corresponding identities using the above processes.