|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
| Add a new Other reference: |
| Post related web sites and other references in this section |
| | Reference | Relevancy | Comments | S.C. Douglas, A. Cichocki, S. Amari, "Self-Whitening Algorithms for Adaptive Equalization and Deconvolution", IEEE Transactions on Signals Processing, EDICS Category No. SP 1.6.5.. Nov,2007 |      Your vote accepted [0 after 0 votes] | | S. Amari, S.C. Douglas, A. Cichocki, H.H. Yang, "Multichannel Blind Deconvolution and Equalization Using the Natural Gradient", Proc. 1st IEEE Workshop on Signal Processing Appl Wireless Comm., Paris, France 1997.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | S.C. Douglas, A. Cichocki, S. Amari, "Quasi-Newton Filtered-Error and Filtered-Regressor Algorithms for Adaptive Equalization and Deconvolution", IEEE Workshop Signal Proc. Adv. Wireless Comm., Paris, France, Apr. 1997.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | Hoang-Lan Nguyen Thi, C. Jutten, "Blind Source Separation for Convolutive Mixtures", Signal Processing 45 (1995) 209-229.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | Te-Won Lee, A.J. Bell, R.H. Lambert, "Blind Separation of Delayed and Convolved Sources", Advances in Neural Information Processing Systems, 1996, 1997.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | E. Weinstein, M. Feder, A.V. Oppenheim, "Multi-Channel Signal Separation by Decorrelation", IEEE Transactions on Speech and Audio Processing, vol. 1, No. 4, Oct. 1993.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | H. Sahlin, H. Broman, "Separation of real-world signals", Signal Processing 64 (1998) 103-113.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | C. Simon, G. d'Urso, C. Vignat, Ph. Loubaton, C. Jutten, "On the Convolutive Mixture Source Separation by the Decorrelation Approach", KASSP, 1998.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | U. Lindgren, T. Wigren, H. Broman, "On Local Convergence of a Class of Blind Separation Algorithms", IEEE Transactions on Signal Processing, Vo. 43, No. 12, Dec. 1995, pp. 3054-3058.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | S.V. Gerven, D.V. Compernolle, "Signal Separation by Symmetric Adaptive Decorrelation: Stability, Convergence, and Uniqueness", IEEE Transactions on Signal Processing, vol. 43, No. 7, Jul. 1995.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | F. Ehlers, H.G. Schuster, "Blind Separation of Convolutive Mixtures and an Application in Automatic Speech Recognition in a Noisy Environment", IEEE Transactions on Signal Processing, vol. 45, No. 10, Oct. 10, 1997.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | R.H. Lambert, A.J. Bell, "Blind Seperation of Multiple Speakers in a Multipath Environment", Copyright 1997 IIEEE, pp. 423-426.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | S Amari, S.C. Douglas, A. Cichocki, H.H. Yang, "Novel on-Line Adaptive Learning Algorithms for Blind Deconvolution Using the Natural Gradient
Approach", Unknown publication and publication date.
. Nov,2007 |      Your vote accepted [0 after 0 votes] | | |
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
Description  |
|
|
The invention relates to
signal processing and, more particularly, the invention relates to a method and apparatus for performing signal separation using a multiple decorrelation technique.
BACKGROUND OF THE INVENTION
A growing number of researchers have recently published techniques that perform blind source separation (BSS), i.e., separating a composite signal into its constituent component signals without a priori knowledge of those signals. These
techniques find use in various applications such as speech detection using multiple microphones, crosstalk removal in multichannel communications, multipath channel identification and equalization, direction of arrival (DOA) estimation in sensor arrays,
improvement of beam forming microphones for audio and passive sonar, and discovery of independent source signals in various biological signals, such as EEG, MEG and the like. Many of the BSS techniques require (or assume) a statistical dependence
between the component signals to accurately separate the signals. Additional theoretical progress in signal modeling has generated new techniques that address the problem of identifying statistically independent signals--a problem that lies at the heart
of source separation.
The basic source separation problem is simply described by assuming d.sub.s statistically independent sources s(t)=[s.sub.1 (t), . . . , s.sub.d.sbsb.s (t)].sup.T that have been convolved and mixed in a linear medium leading to d.sub.x sensor
signals x(t)=[x.sub.1 (t), . . . , x.sub.d.sbsb.s (t)].sup.T that may include additional sensor noise n(t). The convolved, noisy signal is represented in the time domain by the following equation (known as a forward model): ##EQU1## Source separation
techniques are used to identify the d.sub.x d.sub.s P coefficients of the channels A and to ultimately determine an estimate s(t) for the unknown source signals.
Alternatively, the convolved signal may be filtered using a finite impulse response (FIR) inverse model represented by the following equation (known as the backward model): ##EQU2## In this representation, a BSS technique must estimate the FIR
inverse components W such that the model source signals u(t)=[u.sub.1 (t), . . . , u.sub.d.sbsb.u (t)].sup.T are statistically independent.
An approach to performing source separation under the statistical independence condition has been discussed in Weinstein et al., "MultiChannel Signal Separation by Decorrelation", IEEE Transaction on Speech and Audio Processing, vol. 1, no. 4,
pp. 405-413, 1993, where, for non-stationary signals, a set of second order conditions are specified that uniquely determine the parameters A in the forward model. However, no specific algorithm for performing source separation based on
non-stationarity is given in the Weinstein et al. paper.
Early work in the signal processing community had suggested decorrelating the measured signals, i.e., diagonalizing measured correlations for multiple time delays. For an instantaneous mix, also referred to as the constant gain case, it has been
shown that for non-white signal decorrelation using multiple filter taps is sufficient to recover the source signals. However, for convolutive mixtures of wide-band signals, this technique does not produce a unique solution and, in fact, may generate
source estimates that are decorrelated but not statistically independent. As clearly identified by Weinstein et al. in the paper cited above, additional conditions are required to achieve a unique solution of statistically independent sources. In order
to find statistically independent source signals, it is necessary to capture more than second order statistics, since statistical independence requires that not only second but all higher cross moments vanish.
In the convolutive case, Yellin and Weinstein in "Multichannel Signal Separation: Methods and Analysis", IEEE Transaction on Signal Processing; vol. 44, no. 1, pp. 106-118, January 1996 established conditions on higher order multi-tap cross
moments that allow convolutive cross talk removal. Although the optimization criteria extends naturally to higher dimensions, previous research has concentrated on a two dimensional case because a multi-channel FIR model (see equation 2) can be inverted
with a properly chosen architecture using estimated forward filters. Heretofore, for higher dimensions, finding a stable approximation of the forward model has been illusive.
These prior art techniques generally operate satisfactorily in computer simulations but perform poorly for real signals, e.g., audio signals. One could speculate that the signal densities of the real signals may not have the hypothesized
structures, the higher order statistics may lead to estimation instabilities, or a violation of the signal stationarity condition may cause inaccurate solutions.
Therefore, there is a need in the art for a blind source separation technique that accurately performs convolutive signal decorrelation.
SUMMARY OF THE INVENTION
The disadvantages of the prior art are overcome by a method and apparatus that performs blind source separation using convolutive signal decorrelation by simultaneously diagonalizing second order statistics at multiple time periods. More
specifically, the invention accumulates a length (segment) of input signal that comprises a mixture of independent signal sources. The invention then divides the length of input signal into a plurality of T-length periods (windows) and performs a
discrete Fourier transform (DFT) on the mixed signal over each T-length period. Thereafter, the invention computes K cross-correlation power spectra that are each averaged over N of the T-length periods. Using the cross-correlation power values, a
gradient descent process computes the coefficients of a FIR filter that will effectively separate the source signals within the input signal by simultaneously decorrelating the K cross-correlation power spectra. To achieve an accurate solution, the
gradient descent process is constrained in that the time-domain values of the filter coefficients can attain only certain values, i.e., the time-domain filter coefficient values W(.tau.) are constrained within the T-length period to be zero for any time
.tau.>Q<<T. In this manner, the so-called "permutation problem" is solved and a unique solution for the FIR filter coefficients is computed such that a filter produced using these coefficients will effectively separate the source signals.
Generally, the invention is implemented as a software routine that is stored in a storage medium and executed on a general purpose computer system. However, a hardware implementation is readily apparent from the following detailed description.
The present invention finds application in a voice recognition system as a signal preprocessor system for decorrelating signals from different sources such that a voice recognition processor can utilize the various voice signals that are
separated by the invention. In response to the voice signals, the voice recognition processor can then produce computer commands or computer text.
BRIEF DESCRIPTION OF THE DRAWINGS
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
FIG. 1 depicts a system for executing a software implementation of the present invention;
FIG. 2 is a flow diagram of a method of the present invention;
FIG. 3 depicts a frequency domain graph of the filter coefficients generated by the present invention; and
FIG. 4 depicts a time domain graph of the filter coefficients generated by the present invention.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to
the figures.
DETAILED DESCRIPTION
The present invention estimates values of parameters W for the backward model of equation 2 by assuming non-stationary source signals and using a least squares (LS) optimization to estimate W as well as signal and noise powers. The invention
transforms the source separation problem into the frequency domain and solves simultaneously a source separation problem for every frequency.
FIG. 1 depicts a system 100 for implementing the source separation method of the present invention. The system 100 comprises a convolved signal source 126 that supplies the signal that is to be separated into its component signals and a computer
system 108 that executes the multiple decorrelation routine 124 of the present invention. The source 126 may contain any source of convolved signals, but is illustratively shown to contain a sensor array 102, a signal processor 104 and a recorded signal
source 106. The sensor array contains one or more transducers 102A, 102B, 102C such as microphones. The transducers are coupled to a signal processor 104 that performs signal digitization. A digital signal is coupled to the computer system 108 for
signal separation and further processing. A recorded signal source 106 may optionally form a source of the convolutive signals that require separation.
The computer system 108 comprises a central processing unit (CPU) 114, a memory 122, support circuits 116, and an input/output (I/O) interface 120. The computer system 108 is generally coupled through the I/O interface 120 to a display 112 and
various input devices 110 such as a mouse and keyboard. The support circuits generally contain well-known circuits such as cache, power supplies, clock circuits, a communications bus, and the like. The memory 122 may include random access memory (RAM),
read only memory (ROM), disk drive, tape drive, and the like, or some combination of memory devices. The invention is implemented as the multiple decorrelation routine 124 that is stored in memory 122 and executed by the CPU 114 to process the signal
from the signal source 126. As such, the computer system 108 is a general purpose computer system that becomes a specific purpose computer system when executing the routine 124 of the present invention. Although a general purpose computer system is
illustratively shown as a platform for implementing the invention, those skilled in the art will realize that the invention can also be implemented in hardware as an application specific integrated circuit (ASIC), a digital signal processing (DSP)
integrated circuit, or other hardware device or devices. As such, the invention may be implemented in software, hardware, or a combination of software and hardware.
The illustrative computer system 108 further contains speech recognition processor 118, e.g., a speech recognition circuit card or a speech recognition software, that is used to process the component signals that the invention extracts from the
convolutive signal. As such, a conference room having a plurality of people speaking and background noise can be monitored with multiple microphones 102. The microphones 102 produce a composite speech signal that requires separation into component
signals if a speech recognition system is to be used to convert each person's speech into computer text or into computer commands. The composite speech signal is filtered, amplified and digitized by the signal processor 104 and coupled to the computer
system 108. The CPU 114, executing the multiple decorrelation routine 124, separates the composite signal into its constituent signal components. From these constituent components, background noise can easily be removed. The constituent components
without noise are then coupled to the speech recognition processor 118 to process the component signals into computer text or computer commands. In this manner, the computer system 108 while executing the multiple decorrelation routine 124 is performing
signal preprocessing or conditioning for the speech recognition processor 118.
FIG. 2 depicts a flow diagram of the multiple decorrelation routine 124 of the present invention. At step 200, the convolutive (mixed) signal is input, the signal is parsed into a plurality of windows containing T-samples of the input signal
X(t), and the routine produces a discrete Fourier transform (DFT) values for each window .chi.(t), i.e., one DFT value for each window of length T samples.
At step 202, the routine 124 uses the DFT values to accumulate K cross-correlation power spectra, where each of the K spectra are averaged over N windows of length T samples.
For non-stationary signals, the cross correlation estimates will be dependent on the absolute time and will indeed vary from one estimation segment (an NT period) to the next. The cross correlation estimates computed in step 204 are represented
as: ##EQU3## where: .chi.(t+nT, .nu.)=FFT X(t+nT)
X(t)=[x(t)-x(t+T-1)]
and .chi.(v) is the FFT of the input signal within a window containing T samples. As such, the routine, at step 204, computes a matrix for each time t and for each frequency v and then sums all the matrix components with each other matrix
component. Steps 206, 208, 210 and 212 iterate the correlation estimation of step 204 over n=0 to N and k=0 to K to produce the K spectra.
Equation 5 can then be simplified to a matrix representation:
If N is sufficiently large, .LAMBDA..sub.s (t,.upsilon.) and .LAMBDA..sub.n (t,.upsilon.) can be modeled as diagonal matrices due to the signal independence assumption. For Equation 6 to be linearly independent for different times, it will be
necessary that .LAMBDA.(t,.upsilon.) changes over time, i.e., the signals are non-stationary.
Using the cross correlation estimates of equation 6, the invention computes the source signals using cross-power-spectra satisfying the following equation:
In order to obtain independent conditions for every time period, the time periods are generally chosen to have non-overlapping estimation times for R.sub.x (t.sub.k,.upsilon.), i.e., t.sub.k =kTN. But if the signals vary sufficiently fast,
overlapping estimation times may be utilized. Furthermore, although the windows T are generally sequential, the windows may overlap one another such that each DFT value is derived from signal information that is also contained in the previous window.
In an audio signal processing system, the specific value of T is selected based upon room acoustics of the room in which the signals are recorded. For example, a large room with many reverberation paths requires a long window T such that the invention
can process a substantial amount of signal information to achieve source signal separation. The value of N is generally determined by the amount of available data for processing. Typical values are N=20, T=1024 samples and K=5.
The inventive method computes a multipath channel W (i.e., the tap values of a multidimensional FIR filter) that simultaneously satisfies equation 7 for K estimation periods, e.g., 2 to 5 estimation periods for processing audio signals. Such a
process is performed at steps 214, 216, 218 (collectively a filter parameter estimation process 224) and is represented using a least squares estimation procedure as follows: ##EQU4## where
For simplicity, a short form nomenclature has been used in equation 8, where .LAMBDA..sub.s (k,.nu.)=.LAMBDA..sub.s (t.sub.k,.nu.) and .LAMBDA..sub.s =.LAMBDA..sub.s (t.sub.1,.nu.), . . . , .LAMBDA..sub.s (t.sub.K,.nu.) and the same simplified
notation also applies to .LAMBDA..sub.n (t,.upsilon.) and R.sub.x (t,.upsilon.).
To produce the parameters W, a gradient descent process 224 (containing steps 214, 216, 218, and 220) is used that iterates the values of W as cost function (8) is minimized. In step 216, the W values are updated as W.sup.new =W.sup.old
-.mu..gradient..sub.w E, where .gradient..sub.w E is the gradient step value and .mu. is a weighting constant that controls the size of the update.
More specifically, the gradient descent process determines the gradient values as follows. ##EQU5##
With equation 11 equal to zero, the routine can solve explicitly for parameters .LAMBDA..sub.s (k,.upsilon.), while parameters .LAMBDA..sub.n (k,.upsilon.) and W(.upsilon.) are computed with a gradient descent rule, e.g., new values of
.LAMBDA..sub.s (k,.upsilon.) and W(.upsilon.) are computed with each pass through the routine until the new values of W(.upsilon.) are not very different from the old values of W(.upsilon.), i.e., W is converged.
Note that equation 8 contains an additional constraint on the filter size in the time domain. Up to that constraint it would seem the various frequencies .upsilon.=1, . . . , T represent independent problems. The solutions W(.upsilon.) however
are restricted to those filters that have no time response beyond .tau.>Q<<T. Effectively, the routine parameterizes Td.sub.s d.sub.x filter coefficients in W(.upsilon.) with Qd.sub.s d.sub.x parameters W(.tau.). In practice, the values of W
are produced in the frequency domain, at step 214, e.g., W(.upsilon.), then, at step 218, an FFT is performed on these frequency domain values to convert the values of W(.upsilon.) to the time domain, e.g., W(.tau.). In the time domain, any W value that
appears at a time greater than a time Q is set to zero and all values in the range below Q are not adjusted. The adjusted time domain values of are then converted using an inverse FFT back to the frequency domain. By zeroing the filter response in the
time domain for all time greater than Q, the frequency response of the filter is smoothed such that a unique solution at each frequency is readily determined.
FIG. 3 depicts an illustration of two frequency responses 302A and 302B and FIG. 4 depicts an illustration of their corresponding time-domain responses 304A and 304B. The least squares solutions for the coefficients are found using a gradient
descent process performed at step 224 such that an iterative approach is used to determine the correct values of W. Once the gradient in equation 10 becomes "flat" as identified in step 220, the routine, at step 222, applies the computed filter
coefficients to an FIR filter. The FIR filter is used to filter the samples of the input (mixed) signal x(t) in the time period KNT in length. The FIR filter generates, at step 226, the decorrelated component signals of the mixed signal. Then, the
routine at step 228 gets the next KNT number of samples for processing and proceeds to step 200 to filter the next KNT samples. The previous KNT samples are removed from memory.
As mentioned above, the gradient equations are constrained to remain in the subspace of permissible solutions with W(.tau.)=0 for .tau.>Q<<T. This is important since it is a necessary condition for equation 8 to achieve a good
approximation.
In practical applications, such as voice recognition signal preprocessing, the inventive routine substantially enhances the performance of the voice recognition accuracy, i.e., word error rates improve by 5 to 50% and in some instances approach
the error rate that is achieved when no noise is present. Error rate improvement has been shown when a desired voice signal is combined with either music or another voice signal as background noise.
Although various embodiments which incorporate the teachings of the present invention have been shown and described in detail herein, those skilled in the art can readily devise many other varied embodiments that still incorporate these
teachings.
* * * * *
|
|
|
|
|
Description  |
|