|
Claims  |
|
|
What is claimed is:
1. A signal processing system for processing waves from a plurality of sources, said system comprising:
a plurality of transducer means for receiving waves from said plurality of sources, including echoes and reverberation thereof and for generating a plurality of signals in response thereto, wherein each of said plurality of transducer means
receives waves from said plurality of sources including echoes and reverberations thereof, and for generating one of said plurality of signals;
first processing means for receiving said plurality of signals and for generating a plurality of first processed signals in response thereto, said first processing means comprises:
a plurality of delay means, each for receiving one of said plurality of signals and for generating a delayed signal in response thereto, and
a plurality of first combining means, each for receiving at least one of said plurality of signals and one of said delayed signals not associated with said one of said plurality of signals, and for combining said received delayed signal and said
signal, by an active cancellation process, to produce one of said first processed signals; and
second processing means for receiving said plurality of first processed signals and for generating a plurality of second processed signals in response thereto, wherein each of said second processed signals represents waves from one different
source, said second processing means including feedback means for supplying said plurality of second processed signals to said second processing means for combining each of said plurality of second processed signals with at least one of said plurality of
first processed signals not associated with said each second processed signal to generate said plurality of second processed signals.
2. A signal processing system for processing waves from a plurality of sources, said system comprising:
a plurality of transducer means for receiving waves from said plurality of sources, including echoes and reverberation thereof and for generating a plurality of signals in response thereto, wherein each of said plurality of transducer means
receives waves from said plurality of sources including echoes and reverberations thereof, and for generating one of said plurality of signals;
first processing means for receiving said plurality of signals and for generating a plurality of first processed signals in response thereto, said first processing means comprises:
a plurality of multiplying means, each for receiving different ones of said plurality of signals and for generating a scaled signal in response thereto, and
a plurality of first combining means, each for receiving at least one of said plurality of signals and one scaled signal not associated with said one of said plurality of signals, and for combining said received scaled signal and said signal to
produce one of said first processed signals; and
second processing means for receiving said plurality of first processed signals and for generating a plurality of second processed signals in response thereto, wherein each of said second processed signals represents waves from one different
source, said second processing me including feedback means for supplying said plurality of second processed signals to said second processing means for combining each of said plurality of second processed signals with at least one of said plurality of
first processed signals not associated with said each second processed signal to generate said plurality of second processed signals.
3. The system of claim 1, further comprising:
means for generating a direction of arrival signal for said waves; and
wherein said first processing means generates said plurality of first processed signals, in response to said direction of arrival signal.
4. The system of claim 1, wherein the number of transducer means is two, the number of first processed signals is two, and the number of second processed signals is two.
5. The system of claim 1, wherein said transducer means are spaced apart omnidirectional microphones.
6. The system of claim 1 wherein said microphones are co-located directional microphones.
7. The system of claim 1, 3, 4, 5, 6, or 2 wherein said second processing means comprises:
a plurality of second combining means, each of said second combining means having a first input, at least one other input, and an output; each of said second combining means for receiving one of said first processed signals at said first input,
an input signal at said other input, and for generating an output signal, at said output; said output signal being one of said plurality of second processed signals and is a difference between said first processed signal received at said first input and
the sum of said input signal received at said other input;
a plurality of adaptive filter means for generating a plurality of adaptive signals, each of said adaptive filter means for receiving said output signal from one of said plurality of second combining means and for generating an adaptive signal in
response thereto; and
means for supplying each of said plurality of adaptive signals to one of said other input of said plurality of second combining means other than the associated one of said second combining means.
8. The system of claim 7 further comprising means for filtering each of said second processed signals to generate a plurality of third processed signals.
9. The system of claim 8 wherein said second processed signals are characterized by having a low frequency component and a high frequency component, and wherein said filtering means boosts the low frequency component relative to the high
frequency component of said second processed signals.
10. A signal processing system for processing waves from a plurality of sources, said system comprising:
a plurality of transducer means for receiving waves from said plurality of sources, including echoes and reverberations thereof and for generating a plurality of signals in response thereto, wherein each of said plurality of transducer means
receives waves from said plurality of sources including echoes and reverberations thereof, and for generating one of said plurality of signals;
first processing means for receiving said plurality of signals and for generating a plurality of first processed signals in response thereto, wherein in the absence of echoes and reverberations of said waves from said plurality of sources, each
of said first processed signals represents waves from only one different source; said first processing means comprising:
a plurality of delay means, each for receiving one of said plurality of signals and for generating a delayed signal in response thereto, and
a plurality of first combining means, for receiving said plurality of signals and for feedforward combining said plurality of signals in an active cancellation process to produce said plurality of processed signals, wherein each of said plurality
of first combining means receives at least one of said plurality of signals and one of said delayed signals not associated with said one of said plurality of signals, and for combining said received delayed signal and said one signal to produce one of
said first processed signals; and
second processing means for receiving said plurality of first processed signals and for generating a plurality of second processed signals in response thereto, wherein in the presence of echoes and reverberations of said waves from said plurality
of sources, each of said second processed signals represents waves from one different source; said second processing means including feedback means for supplying said plurality of second processed signals to said second processing means for combining
each of said plurality of second processed signals with at least one of said plurality of first processed signals not associated with said each second processed signal to generate said plurality of second processed signals.
11. A signal processing system for processing waves from a plurality of sources, said system comprising:
a plurality of transducer means for receiving waves from said plurality of sources, including echoes and reverberations thereof and for generating a plurality of signals in response thereto, wherein each of said plurality of transducer means
receives waves from said plurality of sources including echoes and reverberations thereof, and for generating one of said plurality of signals;
first processing means for receiving said plurality of signals and for generating a plurality of first processed signals in response thereto, wherein in the absence of echoes and reverberations of said waves from said plurality of sources, each
of said first processed signals represents waves from only one different source; said first processing means comprising:
a plurality of first combining means, for receiving said plurality of signals and for feedforward combining said plurality of signals in an active cancellation process to produce said plurality of processed signals,
a plurality of multiplying means, each for receiving different ones of said plurality of signals and for generating a scaled signal in response thereto; and
wherein each of said plurality of first combining means receives at least one scaled signal and one of said plurality of signals not associated with said one scaled signal, and for combining said received scaled signal and said signal to produce
one of said first processed signals;
second processing means for receiving said plurality of first processed signals and for generating a plurality of second processed signals in response thereto, wherein in the presence of echoes and reverberations of said waves from said plurality
of sources, each of said second processed signals represents waves from one different source; said second processing means including feedback means for supplying said plurality of second processed signals to said second processing means for combining
each of said plurality of second processed signals with at least one of said plurality of first processed signals not associated with said each second processed signal to generate said plurality of second processed signals.
12. The system of claim 10 wherein said waves are acoustic waves, and said transducer means are microphones.
13. The system of claim 12 further comprising means for filtering each of said second processed signals to generate a plurality of third processed signals.
14. The system of claim 13 wherein said second processed signals are characterized by having a low frequency component and a high frequency component, and wherein said filtering means boosts the low frequency component relative to the high
frequency component of said second processed signals.
15. The system of claim 10, wherein the number of transducer means is two, the number of first processed signals is two, and the number of second processed signals is two.
16. The system of claim 10, wherein said transducer means are spaced apart omnidirectional microphones.
17. The system of claim 10 wherein said microphones are co-located directional microphones.
18. The system of claim 10, 12, 13, 14, 15, 16, 17 or 11 wherein said second processing means comprises:
a plurality of second combining means, each of said second combining means having a first input, at least one other input, and an output; each of said second combining means for receiving one of said first processed signals at said first input,
an input signal at said other input, and for generating an output signal, at said output; said output signal being one of said plurality of second processed signals and is a difference between said first processed signal received at said first input and
the sum of said input signal received at said other input;
a plurality of adaptive filter means for generating a plurality of adaptive signals, each of said adaptive filter means for receiving said output signal from one of said plurality of second combining means and for generating an adaptive signal in
response thereto; and
means for supplying each of said plurality of adaptive signals to one of said other input of said plurality of second combining means other than the associated one of said second combining means.
19. The system of claim 18 wherein each of said adaptive filter means comprises a tapped delay line.
20. A method of processing waves from a plurality of sources, comprising:
receiving said waves, including echoes and reverberations thereof, by a plurality of transducer means;
converting said waves, including echoes and reverberations thereof from said plurality of sources, by each of said plurality of transducer means into an electrical signal, thereby generating a plurality of electrical signals;
first processing said plurality of electrical signals, by an active cancellation process, to generate a plurality of first processed signals, wherein in the absence of echoes and reverberations of said waves from said plurality of sources, each
of said first processed signals represents waves from one source, and a reduced amount of waves from other sources, said first processing step including:
delaying each one of said plurality of electrical signals and generating a delayed signal in response thereto, and
combining each one of said plurality of electrical signals with one of said delayed signals not associated with said one of said plurality of signals to generate one of said first processed signals; and then
secondly processing said plurality of first processed signals to generate a plurality of second processed signals, including combining each of said plurality of second processed signals with at least one of said plurality of first processed
signals not associated with said each second processed signal to generate said plurality of second processed signals, wherein in the presence of echoes and reverberations of said waves from said plurality of sources, each of said second processed signals
represents waves from only one different source.
21. The method of claim 20 further comprising the step of:
filtering each of said second processed signals to generate a plurality of third processed signals.
22. The method of claim 20 further comprising the step of:
sampling and converting each one of said plurality of electrical signals and for supplying same to said plurality of delay means and to said plurality of combining means, as said electrical signal.
23. The method of claim 20 wherein said second processing step further comprises:
subtracting, by a plurality of subtracting means, a different one of said first processed signals by an adaptive signal and generating an output signal, thereby generating a plurality of output signals;
adaptively filtering said output signals to generate a plurality of adaptive signals; and
supplying each one of said plurality of adaptive signals to a different one of said subtracting means.
24. A signal processing system for processing waves from a plurality of sources, said system comprising:
at least a first and second transducer for receiving waves from said plurality of sources, including echoes and reverberation thereof and for generating at least a first and a second signal in response thereto, wherein each of said transducers
receives waves from said plurality of sources including echoes and reverberations thereof, and for generating one of said first and second signals;
first processing means for receiving said first and second signals and for generating a first and a second processed signals in response thereto, said first processing means comprises:
first delay means for receiving said first signal and for generating a first delayed signal in response thereto,
second delay means for receiving said second signal and for generating a second delayed signal in response thereto,
first combining means for receiving said first signal and said second delayed signal, and for combining said received first signal and said second delay signal, by an active cancellation process, to produce said first processed signal, and
second combining means for receiving said second signal and said first delayed signal, and for combining said received second signal and said first delayed signal, by an active cancellation process, to produce said second processed signal; and
second processing means for receiving said first and second processed signals and for generating a third and a fourth processed signals in response thereto, said second processing means comprises:
third combining means for receiving the first processed signal to produce the third processed signal in response thereto;
fourth combining means for receiving the second processed signal to produce the fourth processed signal in response thereto;
first adaptive filter means for receiving said third processed signal, for generating a first adaptive signal in response thereto, and for supplying said first adaptive signal to said fourth combining means;
second adaptive filter means for receiving said fourth processed signal, for generating a second adaptive signal in response thereto, and for supplying said second adaptive signal to said third combining means;
wherein the third combining means combines the first processed signal and the second adaptive signal to produce the third processed signal so that the third processed signal is a difference between the first processed signal and the second
adaptive signal; and
wherein the fourth combining means combines the second processed signal and the first adaptive signal to produce the fourth processed signal so that the fourth processed signal is a difference between the second processed signal and the first
adaptive signal. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
This application is submitted with a microfiche appendix, containing copyrighted material, Copyright 1994, Interval Research Corporation. The Appendix consists of one (1)
microfiche with forty-six (46) frames. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever in the appendices.
TECHNICAL FIELD
This invention relates to the field of microphone-array signal processing, and more particularly to a two stage processor for extracting one substantially pure sound signal from a mixture of such signals even in the presence of echoes and
reverberations.
BACKGROUND OF THE INVENTION
It is well known that a human being can focus attention on a single source of sound even in an environment that contains many such sources. This phenomenon is often called the "cocktail-party effect."
Considerable effort has been devoted in the prior art to solve the cocktail-party effect, both in physical devices and in computational simulations of such devices. One prior technique is to separate sound based on auditory scene analysis. In
this analysis, vigorous use is made of assumptions regarding the nature of the sources present. It is assumed that a sound can be decomposed into small elements such as tones and bursts, which in turn can be grouped according to attributes such as
harmonicity and continuity in time. Auditory scene analysis can be performed using information from a single microphone or from several microphones. For an early example of auditory scene analysis, see Weintraub (1984, 1985, 1986). Other prior art
work related to sound separation by auditory scene analysis are due to Parsons (1976), von der Malsburg and Schneider (1986), Naylor and Porter (1991), and Mellinger (1991).
Techniques involving auditory scene analysis, although interesting from a scientific point of view as models of human auditory processing, are currently far too computationally demanding and specialized to be considered practical techniques for
sound separation until fundamental progress is made.
Other techniques for separating sounds operate by exploiting the spatial separation of their sources. Devices based on this principle vary in complexity. The simplest such devices are microphones that have highly selective, but fixed patterns
of sensitivity. A directional microphone, for example, is designed to have maximum sensitivity to sounds emanating from a particular direction, and can therefore be used to enhance one audio source relative to others (see Olson, 1967). Similarly, a
close-talking microphone mounted near a speaker's mouth rejects distant sources (see, for example, the Knowles CF 2949 data sheet).
Microphone-array processing techniques related to separating sources by exploiting spatial separation of their sources are also well known and have been of interest for several decades. In one early class of microphone-array techniques,
nonlinear processing is employed. In each output stream, some source direction of arrival, a "look direction," is assumed. The microphone signals are delayed to remove differences in time of flight from the look direction. Signals from any direction
other than the look direction are thus misaligned in time. The signal in the output stream is formed, in essence, by "gating" sound fragments from the microphones. At any given instant, the output is chosen to be equal to one of the microphone signals. These techniques, exemplified by Kaiser and David (1960), by Mitchell et al. (1971), and by Lyon (1983), perform best when the undesired sources consist predominantly of impulse trains, as is the case with human speech. While these nonlinear techniques
can be very computationally efficient and are of scientific interest as models of human cocktail-party processing, they do not have practical or commercial significance because of their inherent inability to bring about full suppression of unwanted
sources. This inability originates from the incorrect assumption that at every instant in time, at least one microphone contains only the desired signal.
One widely known class of techniques in the prior art for linear microphone-array processing is often referred to as "classical beamforming" (Flanagan et al., 1985). As with the nonlinear techniques mentioned above, processing begins with the
removal of time-of-flight differences among the microphone signals with respect to the look direction. In place of the "gating" scheme, the delayed microphone signals are simply averaged together. Thus, any signal from the look direction is represented
in the output with its original power, whereas signals from other directions are relatively attenuated.
Classical beamforming was employed in a patented directional hearing aid invented by Widrow and Brearley (1988). The degree to which a classical beamformer is able to attenuate undesired sources relative to the desired source is limited by (1)
the number of microphones in the array, and (2) the spatial extent of the array relative to the longest wavelength of interest present in the undesired sources. In particular, a classical beamformer cannot provide relative attenuation of frequency
components whose wavelengths are larger than the array. For example, an array one foot wide cannot greatly attenuate frequency components below approximately 1 kHz.
Also known from the prior art is a class of active-cancellation algorithms, which is related to sound separation. However, it needs a "reference signal," i.e., a signal derived from only of one of the sources. For example, active
noise-cancellation techniques (see data sheets for Bose.RTM. Aviation Headset, NCT proACTIVE.RTM. Series, and Sennheiser HDC451 Noiseguard.RTM. Mobile Headphone) reduce the contribution of noise to a mixture by filtering a known signal that contains
only the noise, and subtracting it from the mixture. Similarly, echo-cancellation techniques such as those employed in full-duplex modems (Kelly and Logan, 1970; Gritton and Lin, 1984) improve the signal-to-noise ratio of an outgoing signal by removing
noise due to echoes from the known incoming signal.
Techniques for active cancellation that do not require a reference signal are called "blind." They are now classified, based on the degree of realism of the underlying assumptions regarding the acoustic processes by which the unwanted signals
reach the microphones. To understand the practical significance of this classification, recall a feature common to the principles by which active-cancellation techniques operate: the extent to which a given undesired source can be canceled by
subtracting processed microphone signals depends ultimately on the exactness with which copies of the undesired source in the different microphones can be made to match one another. This depends, in turn, on how accurately the signal processing models
the acoustic processes by which the unwanted signals reach the microphones.
One class of blind active-cancellation techniques may be called "gain-based": it is presumed that the waveform produced by each source is received by the microphones simultaneously, but with varying relative gains. (Directional microphones must
be employed to produce the required differences in gain.) Thus, a gain-based system attempts to cancel copies of an undesired source in different microphone signals by applying relative gains to the microphone signals and subtracting, but never applying
time delays or otherwise filtering. Numerous gain-based methods for blind active cancellation have been proposed; see Herault and Jutten (1986), Bhatti and Bibyk (1991), Cohen (1991), Tong et al. (1991), and Molgedey and Schuster (1994).
The assumption of simultaneity is violated when microphones are separated in space. A class of blind active-cancellation techniques that can cope with non-simultaneous mixtures may be called "delay-based": it is assumed that the waveform
produced by each source is received by the various microphones with relative time delays, but without any other filtering. (See Morita, 1991 and Bar-Ness, 1993.) Under anechoic conditions, this assumption holds true for a microphone array that consists
of omnidirectional microphones. However, this simple model of acoustic propagation from the sources to the microphones is violated when echoes and reverberation are present.
When the signals involved are narrowband, some gain-based techniques for blind active cancellation can be extended to employ complex gain coefficients (see Strube (1981), Cardoso (1989,1991), Lacoume and Ruiz (1992), Comon et al. (1994)) and can
therefore accommodate, to a limited degree, time delays due to microphone separation as well as echoes and reverberation. These techniques can be adapted for use with audio signals, which are broadband, if the microphone signals are divided into
narrowband components by means of a filter bank. The frequency bands produced by the filter bank can be processed independently, and the results summed (for example, see Strube (1981) or the patent of Comon (1994)). However, they are computationally
intensive relative to the present invention because of the duplication of structures across frequency bands, are slow to adapt in changing situations, are prone to statistical error, and are extremely limited in their ability to accommodate echoes and
reverberation.
The most realistic active-cancellation techniques currently known are "convolutive": the effect of acoustic propagation from each source to each microphone is modeled as a convolutive filter. These techniques are more realistic than gain-based
and delay-based techniques because they explicitly accommodate the effects of inter-microphone separation, echoes and reverberation. They are also more general since, in principle, gains and delays are special cases of convolutive filtering.
Convolutive active-cancellation techniques have recently been described by Jutten et al. (1992), by Van Compernolle and Van Gerven (1992), by Platt and Faggin (1992), and by Dinc and Bar-Ness (1994). While these techniques have been used to
separate mixtures constructed by simulation using oversimplified models of room acoustics, to the best of our knowledge none of them has been applied successfully to signals mixed in a real acoustic environment. The simulated mixtures used by Jutten et
al., by Platt and Faggin, and by Dinc and Bar-Ness differ from those that would arise in a real room in two respects. First, the convolutive filters used in the simulations are much shorter than those appropriate for modeling room acoustics; they
allowed for significant indirect propagation of sound over only one or two feet, compared with tens of feet typical of echoes and reverberation in an office. Second, the mixtures used in the simulations were partially separated to begin with, i.e., the
crosstalk between the channels was weak. In practice, the microphone signals must be assumed to contain strong crosstalk unless the microphones are highly directional and the geometry of the sources is constrained.
To overcome some of the limitations of the convolutive active-cancellation techniques named above, the present invention employs a two-stage architecture. Its two-stage architecture is substantially different from other two-stage architectures
found in prior art.
A two-stage signal processing architecture is employed in a Griffiths-Jim beamformer (Griffiths and Jim, 1982). The first stage of a Griffiths-Jim beamformer is delay-based: two microphone signals are delayed to remove time-of-flight differences
with respect to a given look direction, and in contrast with classical beamforming, these delayed microphone signals are subtracted to create a reference noise signal. In a separate channel, the delayed microphone signals are added, as in classical
beamforming, to create a signal in which the desired source is enhanced relative to the noise. Thus, the first stage of a Griffiths-Jim beamformer produces a reference noise signal and a signal that is predominantly desired source. The noise reference
is then employed in the second stage, using standard active noise-cancellation techniques, to improve the signal-to-noise ratio in the output.
The Griffiths-Jim beamformer suffers from the flaw that under reverberant conditions, the delay-based first stage cannot construct a reference noise signal devoid of the desired signal, whereas the second stage relies on the purity of that noise
reference. If the noise reference is sufficiently contaminated with the desired source, the second stage suppresses the desired source, not the noise (Van Compernolle, 1990). Thus, the Griffiths-Jim beamformer incorrectly suppresses the desired signal
under conditions that are normally considered favorable: when the signal-to-noise ratio in the microphones is high.
Another two-stage architecture is described by Najar et al. (1994). Its second stage employs blind convolutive active cancellation. However, its first stage differs significantly from the first stage of the Griffiths-Jim beamformer. It
attempts to produce separated outputs by adaptively filtering each microphone signal in its own channel. When the sources are spectrally similar, filters that produce partially separated outputs after the first stage are unlikely to exist.
Thus, it is desirable to provide an architecture for separation of sources that avoids the difficulties exhibited by existing techniques.
SUMMARY OF THE INVENTION
An audio signal processing system for processing acoustic waves from a plurality of sources, comprising a plurality of spaced apart transducer means for receiving acoustic waves from the plurality of sources, including echoes and reverberations
thereof. The transducer means generates a plurality of acoustic signals in response thereto. Each of the plurality of transducer means receives acoustic waves from the plurality of sources including echoes and reverberations thereof, and generates one
of the plurality of acoustic signals. A first processing means receives the plurality of acoustic signals and generates a plurality of first processed acoustic signals in response thereto. In the absence of echoes and reverberations of the acoustic
waves from the plurality of sources, each of the first processed acoustic signals represent acoustic waves from only one different source. A second processing means receives the plurality of first processed acoustic signals and generates a plurality of
second processed acoustic signals in response thereto. In the presence of echoes and reverberations of the acoustic waves from the plurality of sources, each of the second processed acoustic signals represent acoustic waves from only one different
source.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a schematic block diagram of an embodiment of an acoustic signal processor of the present invention, using two microphones.
FIG. 2 is a schematic block diagram of an embodiment of the direct-signal separator portion, i.e., the first stage of the processor shown in FIG. 1.
FIG. 3 is a schematic block diagram of an embodiment of the crosstalk remover portion, i.e., the second stage of the processor shown in FIG. 1.
FIG. 4 is an overview of the delay in the acoustic waves arriving at the direct signal separator portion of the signal processor of FIG. 1, and showing the separation of the signals.
FIG. 5 is an overview of a portion of the crosstalk remover of the signal processor of FIG. 1 showing the removal of the crosstalk from one of the signal channels.
FIG. 6 is a detailed schematic block diagram of an embodiment of a direct-signal separator using three microphones.
FIG. 7 is a detailed schematic block diagram of an embodiment of a crosstalk remover suitable using three microphones.
DETAILED DESCRIPTION OF THE INVENTION
The present invention is a device that mimics the cocktail-party effect using a plurality of microphones with as many output audio channels, and a signal-processing module. When situated in a complicated acoustic environment that contains
multiple audio sources with arbitrary spectral characteristics, it supplies output audio signals, each of which contains sound from at most one of the original sources. These separated audio signals can be used in a variety of applications, such as
hearing aids or voice-activated devices.
FIG. 1 is a schematic diagram of a signal separator processor of one embodiment of the present invention. As previously discussed, the signal separator processor of the present invention can be used with any number of microphones. In the
embodiment shown in FIG. 1, the signal separator processor receives signals from a first microphone 10 and a second microphone 12, spaced apart by about two centimeters. As used herein, the microphones 10 and 12 include transducers (not shown), their
associated pre-amplifiers (not shown), and A/D converters 22 and 24 (shown in FIG. 2).
The microphones 10 and 12 in the preferred embodiment are omnidirectional microphones, each of which is capable of receiving acoustic wave signals from the environment and for generating a first and a second acoustic electrical signal 14 and 16
respectively. The microphones 10 and 12 are either selected or calibrated to have matching sensitivity. The use of matched omnidirectional microphones 10 and 12, instead of directional or other microphones leads to simplicity in the direct-signal
separator 30, described below. In the preferred embodiment, two Knowles EM-3046 omnidirectional microphones were used, with a separation of 2 centimeters. The pair was mounted at least 25 centimeters from any large surface in order to preserve the
omnidirectional nature of the microphones. Matching was achieved by connecting the two microphone outputs to a stereo microphone preamplifier and adjusting the individual channel gains so that the preamplifier outputs were closely matched. The
preamplifier outputs were each digitally sampled at 22,050 samples per second, simultaneously. These sampled electrical signals 14 and 16 are supplied to the direct signal separator 30 and to a Direction of Arrival (DOA) estimator 20.
The direct-signal separator 30 employs information from a DOA estimator 20, which derives its estimate from the microphone signals. In a different embodiment of the invention, DOA information could come from an source other than the microphone
signals, such as direct input from a user via an input device.
The direct signal separator 30 generates a plurality of output signals 40 and 42. The direct signal separator 30 generates as many output signals 40 and 42 as there are microphones 10 and 12, generating as many input signals 14 and 16 as are
supplied to the direct signal separator 30. Assuming that there are two sources, A and B, generating acoustic wave signals in the environment in which the signal processor 8 is located, then each of the microphones 10 and 12 would detect acoustic waves
from both sources. Hence, each of the electrical signals 14 and 16, generated by the microphones 10 and 12, respectively, contains components of sound from sources A and B.
The direct-signal separator 30 processes the signals 14 and 16 to generate the signals 40 and 42 respectively, such that in anechoic conditions (i.e., the absence of echoes and reverberations), each of the signals 40 and 42 would be of an
electrical signal representation of sound from only one source. In the absence of echoes and reverberations, the electrical signal 40 would be of sound only from source A, with electrical signal 42 being of sound only from source B, or vice versa.
Thus, under anechoic conditions the direct-signal separator 30 can bring about full separation of the sounds represented in signals 14 and 16. However, when echoes and reverberation are present, the separation is only partial.
The output signals 40 and 42 of the direct signal separator 30 are supplied to the crosstalk remover 50. The crosstalk remover 50 removes the crosstalk between the signals 40 and 42 to bring about fully separated signals 60 and 62 respectively.
Thus, the direct-signal separator 30 and the crosstalk remover 50 play complementary roles in the system 8. The direct-signal separator 30 is able to bring about full separation of signals mixed in the absence of echoes and reverberation, but produces
only partial separation when echoes and reverberation are present. The crosstalk remover 50 when used alone is often able to bring about full separation of sources mixed in the presence of echoes and reverberation, but is most effective when given
inputs 40 and 42 that are partially separated.
After some adaptation time, each output 60 and 62 of the crosstalk remover 50 contains the signal from only one sound source: A or B. Optionally, these outputs 60 and 62 can be connected individually to post filters 70 and 72, respectively, to
remove known frequency coloration produced by the direct signal separator 30 or the crosstalk remover 50. Practitioners skilled in the art will recognize that there are many ways to remove this known frequency coloration; these vary in terms of their
cost and effectiveness. An inexpensive post filtering method, for example, consists of reducing the treble and boosting the base.
The filters 70 and 72 generate output signals 80 and 82, respectively, which can be used in a variety of applications. For example, they may be connected to a switch box and then to a hearing aid.
Referring to FIG. 2 there is shown one embodiment of the direct signal separator 30 portion of the signal processor 8 of the present invention. The microphone transducers generate input signals 11 and 13, which are sampled and digitized, by
clocked sample-and-hold circuits followed by analog-to-digital converters 22 and 24, respectively, to produce sampled digital signals 14 and 16 respectively.
The digital signal 14 is supplied to a first delay line 32. In the preferred embodiment, the delay line 32 delays the digitally sampled signal 14 by a non-integral multiple of the sampling interval T, which was 45.35 microseconds given the
sampling rate of 22,050 samples per second. The integral portion of the delay was implemented using a digital delay line, while the remaining subsample delay of less than one sample interval was implemented using a non-causal, truncated sinc filter with
41 coefficients. Specifically, to implement a subsample delay of t, given that t<T, the following filter is used: ##EQU1## where x(n) is the signal to be delayed, y(n) is the delayed signal, and w(k) {k=-20, -19, . . . 19,20} are the 41 filter
coefficients. The filter coefficients are determined from the subsample delay t as follows:
where ##EQU2## and S is a normalization factor given by ##EQU3##
The output of the first delay line 32 is supplied to the negative input of a second combiner 38. The first digital signal 14 is also supplied to the positive input of a first combiner 36. Similarly, for the other channel, the second digital
signal 16 is supplied to a second delay line 34, which generates a signal which is supplied to the negative input of the first combiner 36.
In the preferred embodiment, the sample-and-hold and A/D operations were implemented by the audio input circuits of a Silicon Graphics Indy workstation, and the delay lines and combiners were implemented in software running on the same machine.
However, other delay lines such as analog delay lines, surface acoustic wave delays, digital low-pass filters, or digital delay | | |