Speech versus music is decided by comparing the presence of pauses greater than 32, 60 and 500 ms in the lower input spectrum, and 4 ms in the higher input spectrum.
A signal processing unit separates voice signals and non-voice audio signals contained in a mixed audio signal. The mixed audio signal is channel divided, and the voice signal portions of the channel divided mixed audio signal are detected and extracted at one output. Non-voice audio signals contained in the voice signal portions are predicted based on the non-voice audio signal portions of the mixed audio signal. The thus predicted non-voice audio signals are combined with extracted non-voice audio signals to obtain continuous non-voice audio signals which are output at a second output. Alternately, instead of extracting the voice signals from the mixed audio signal, the predicted non-voice signals are removed from the mixed audio signal to obtain the voice signals which are output on the first output.
Audio signals are compressed by selecting either a first compression mode providing high-quality sound and a relatively long processing time or a second compression mode having a shorter processing time in accordance with the contents of the audio input signals, for example, whether the signals are audio signals such as music or voice signals such as an announcement. By providing selectable compression time periods music can be enjoyed in high-quality sound, while the announcement of a message can be delivered satisfactorily without the loss of any information.
A signal identifying device which can identify an input signal easily includes a pitch extracting (4Y) for extracting a pitch component of the input signal (S1), and energy calculating unit (4X) for calculating an energy component of the input signal, and identifying unit (4Z) for executing a predetermined operation to the pitch component and the energy component and for identifying whether the input signal is a voice signal or music signal. The voice signal generally has the characteristics evident in energy, and has strong periodicity (i.e., pitch component) comparing compared to the music signal.
An apparatus for discriminating a received audio signal as vocal sound or musical sound includes a pre-processing circuit 100 for separating the audio signal into a vocal frequency band signal and a musical frequency band signal, an intermediate decision circuit having a plurality of decision units for producing a plurality of vocal and musical decision signals, each decision unit distinguishing whether vocal or musical frequency band signal includes properties of voice or music, and a final decision circuit 600 for systematically analyzing the vocal and musical decision signals to produce a final decision signal for discriminating the audio signal as the vocal or musical sound.
A sound identification apparatus which reduces the chance of a drop in the identification rate, including: a frame sound feature extraction unit which extracts a sound feature per frame of an inputted audio signal; a frame likelihood calculation unit which calculates a frame likelihood of the sound feature in each frame, for each of a plurality of sound models; a confidence measure judgment unit which judges a confidence measure based on the frame likelihood; a cumulative likelihood output unit time determination unit which determines a cumulative likelihood output unit time based on the confidence measure; a cumulative likelihood calculation unit which calculates a cumulative likelihood in which the frame likelihoods of the frames included in the cumulative likelihood output unit time are cumulated, for each sound model; a sound type candidate judgment unit which determines, for each cumulative likelihood output unit time, a sound type corresponding to the sound model that has a maximum cumulative likelihood; a sound type frequency calculation unit which calculates the frequency of the sound type candidate; and a sound type interval determination unit which determines the sound type of the inputted audio signal and the interval of the sound type, based on the frequency of the sound type.