An efficient and accurate classification method for classifying speech and music signals, or other diverse signal types, is provided. The method and system are especially, although not exclusively, suited for use in real-time applications. Long-term and short-term features are extracted relative to each frame, whereby short-term features are used to detect a potential switching point at which to switch a coder operating mode, and long-term features are used to classify each frame and validate the potential switch at the potential switch point according to the classification and a predefined criterion.
The present invention relates to method and system for distinguishing speech from music in a digital audio signal in real time. A method for distinguishing speech from music in a digital audio signal in real time for the sound segments that have been segmented from an input signal of the digital sound processing systems by means of a segmentation unit on the base of homogeneity of their properties, comprises the steps of: (a) framing an input signal into sequence of overlapped frames by a windowing function; (b) calculating frame spectrum for every frame by FFT transform; (c) calculating segment harmony measure on base of frame spectrum sequence; (d) calculating segment noise measure on base of the frame spectrum sequence; (e) calculating segment tail measure on base of the frame spectrum sequence; (f) calculating segment drag out measure on base of the frame spectrum sequence; (g) calculating segment rhythm measure on base of the frame spectrum sequence; and (h) making the distinguishing decision based on characteristics calculated.
The invention relates to a method for supporting an encoding of an audio signal, wherein a first coder mode and a second coder mode are available for encoding a respective section of an audio signal. The second coder mode enables a coding of a respective section based on a first coding model, which requires for an encoding of a respective section only information from the section itself, and based on a second coding model, which requires for an encoding of a respective section in addition an overlap signal with information from a preceding section. After a switch from the first coder mode to the second coder mode, always the first coding model is used for encoding a first section of the audio signal. This section can then be employed to generate an artificial overlap signal for a subsequent section, which is possibly to be encoded with the second coding model.