|
Description  |
|
|
BACKGROUND THE INVENTION
Field of the Invention
The present invention relates to a microphone array apparatus which has an
array of microphones in order to detect the position of a sound source,
emphasize a target sound and suppress noise.
The microphone array apparatus has an array of a plurality of
omnidirectional microphones and equivalently define a directivity by
emphasizing a target sound and suppressing noise. Further, the microphone
array apparatus is capable of detecting the position of a sound source on
the basis of a relationship among the phases of output signals of the
microphones. Hence, the microphone array apparatus can be applied to a
video conference system in which a video camera is automatically oriented
towards a speaker and a speech signal and a video signal can concurrently
be transmitted. In addition, the speech of the speaker can be clarified by
suppressing ambient noise. The speech of the speaker can be emphasized by
adding the phases of speech components. It is now required that the
microphone array apparatus can stably operate.
If the microphone array apparatus is directed to suppressing noise, filters
are connected to respective microphones and filter coefficients are
adaptively or fixedly set so as to minimize noise components (see, for
example, Japanese Laid-Open Patent Application No. 5-111090). If the
microphone array apparatus is directed to detecting the position of a
sound source, the relationship among the phases of the output signals of
the microphones is detected, and the distance to the sound source is
detected (see, for example, Japanese Laid-Open Patent Application Nos.
63-177087 and 4-236385).
An echo canceller is known as a device which utilizes the noise suppressing
technique. For example, as shown in FIG. 1, a transmit/receive interface
202 of a telephone set is connected to a network 203. An echo canceller is
connected between a microphone 204 and a speaker 205. A speech of a
speaker is input to the microphone 204. A speech of a speaker on the other
(remote) side is reproduced through the speaker 205. Hence, a mutual
communication can take place.
A speech transferred from the speaker 205 to the microphone 204, as
indicated by a dotted line shown in FIG. 1 forms an echo (noise) to the
other-side telephone set. Hence, the echo canceller 201 is provided that
includes a subtracter 206, an echo component generator 207 and a
coefficient calculator 208. Generally, the echo generator 207 has a filter
structure which produces an echo component from the signal which drives
the speaker 205. The subtracter 206 subtracts the echo component from the
signal from the microphone 204. The coefficient calculator 208 controls
the echo generator 207 to update the filter coefficients so that the
residual signal from the subtracter 206 is minimized.
The updating of the filter coefficients c1, c2, . . . , cr of the echo
component generator 207 having the filter structure can be obtained by a
known maximum drop method. For example, the following evaluation function
J is defined based on an output signal e (the residual signal in which the
echo component has been subtracted) of the subtracter 206:
J=e.sup.2 (1)
According to the above evaluation function, the filter coefficients c1, c2,
. . . , cr are updated as follows:
##EQU1##
where 0.0<.alpha.<0.5
f.sub.norm =(f(1).sup.2 +f(2).sup.2 + . . . f(r).sup.2).sup.1/2 (3)
In the above expressions, a symbol "*" denotes multiplication, and "r"
denotes the filter order. Further, f(1), . . . , f(r) respectively denote
the values of a memory (delay unit) of the filter (in other words, the
output signals of delay units each of which delays the respective input
signal by a sample unit). A symbol "f.sub.norm " is defined as equation
(3), and a symbol ".alpha." is a constant, which represents the speed and
precision of convergence of the filter coefficients towards the optimal
values.
The echo canceller 201 has filter orders as many as 100. Hence, another
echo canceller using a microphone array as shown in FIG. 2 is known. There
are provided an echo canceller 211, a transmit/receive interface 212,
microphones 214-1-214-n forming a microphone array, a speaker 215, a
subtracter 216, filters 217-1-217-n, and a filter coefficient calculator
218.
In the structure shown in FIG. 2, acoustic components from the speaker 215
to the microphones 214-1-214-n are propagated along routes indicated by
broken lines and serve as echoes. Hence, the speaker 215 is a noise
source. The updating control of the filter coefficients c11, c12, . . . ,
c1r, . . . , cn1, cn2, . . . , cnr in the case where the speaker does not
make any speech is expressed by using the evaluation function (1) as
follows:
##EQU2##
where p=2, 3, . . . , n
The equation (4) relates to a case where one of the microphones
214-1-214-n, for example, the microphone 214-1 is defined as a reference
microphone, and indicates the filter coefficients c11, c12, . . . , c1r of
the filter 217-1 which receives the output signal of the above reference
microphone 214-1. The equation (5) relates to the microphones 214-2 -
214-n other than the reference microphones, and indicates the filter
coefficients c21, c22, . . . , c2r, . . . , cn1, cn2, . . . , cnr. The
subtracter 216 subtracts the output signals 217-2-217-n of the microphones
214-2-214-n from the output signal 217-1 of the reference microphone
214-1.
FIG. 3 is a block diagram for explaining a conventional process of
detecting the position of a sound source and emphasizing a target sound.
The structure shown in FIG. 3 includes a target sound emphasizing unit
221, a sound source detecting unit 222, delay units 223 and 224, a
number-of-delayed-samples calculator 225, an adder 226, a crosscorrelation
coefficient calculator 227, a position detection processing unit 228 and
microphones 229-1 and 229-2.
The target sound emphasizing unit 221 includes the delay units 223 and 224
of Z.sup.-da and Z.sup.-db, the number-of-delayed-samples calculator 225
and the adder 226. The sound source position detecting unit 222 includes
the crosscorrelation coefficient calculator 227 and the position detection
processing unit 228. The number-of-delayed samples calculator 225 is
controlled by the following factors. The crosscorrelation coefficient
calculator 227 of the sound source position detecting unit 222 obtains a
crosscorrelation coefficient r(i) of output signals a(j) and b(j) of the
microphones 229-1 and 229-2. The position detection processing unit 228
obtains the sound source position by referring to a value of i, imax, at
which the maximum of the crosscorrelation coefficient r(i) can be
obtained.
The crosscorrelation coefficient r(i) is expressed as follows:
r(i)=.SIGMA..sup.n.sub.j=1 a(j)*b(j+i) (6)
where .SIGMA..sup.n.sub.j=1 denotes a summation of j=1 to j=n, and i has a
relationship -m.ltoreq.i .ltoreq.m. The symbol "m" is a value dependent on
the distance between the microphones 229-1 and 229-2 and the sampling
frequency, and is written as follows:
m=[(sampling frequency)*(intermichrophone distance)]/(speed of sound) (7)
where n is the number of samples for a convolutional operation.
The number of delayed samples da of the Z.sup.-da delay unit 223 and the
number of delayed samples db of the Z.sup.-db delay unit 224 can be
obtained as follows from the value imax at which the maximum value of the
crosscorrelation coefficient r(i) can be obtained:
where i.gtoreq.0, da=i, db=0
where i.gtoreq.0, da=0, db=-i.
Hence, the phases of the target sound from the sound source are made to
coincide with each other and are added by the adder 226. Hence, the target
sound can be emphasized.
However, the above-mentioned conventional microphone array apparatus has
the following disadvantages.
In the conventional structure directed to suppressing noise, when the
speaker of the target sound source does not speak, the echo components
from the speaker to the microphone array can be canceled by the echo
canceller. However, when a speech of the speaker and the reproduced sound
from the speaker are concurrently input to the microphone array, the
updating of the filter coefficients for canceling the echo components
(noise components) does not converge. That is, the residual signal e in
the equations (4) and (5) corresponds to the sum of the components which
cannot be suppressed by the subtracter 216 and the speech of the speaker.
Hence, if the filter coefficients are updated so that the residual signal
e is minimized, the speech of the speaker which is the target sound is
suppressed along with the echo components (noise). Hence, the target noise
cannot be suppressed.
In the conventional structure directed to detecting the sound source
position and emphasizing the target sound, the output signals a(j) and
b(j) of the microphones 229-1 and 229-2 shown in FIG. 3 generally have an
autocorrelation in the vicinity of the sampled values. If the sound source
is white noise or pulse noise, the autocorrelation is reduced, while the
autocorrelation for vice is increased. The crosscorrelation function r(i)
defined in the equation (6) has a less variation as a function of i with
respect to a signal having comparatively large autocorrelation than a
variation with respect to a signal having comparatively small
autocorrelation. Hence, it is very difficult to obtain the correct maximum
value and precisely and rapidly detect the position of the sound source.
In the conventional structure directed to emphasizing the target sound so
that the phases of the target sounds are synchronized, the degree of
emphasis depends on the number of microphones forming the microphone
array. If there is a small crosscorrelation between the target sound and
noise, the use of N microphones emphasizes the target sound so that the
power ratio is as large as N times. If there is a large correction between
the target sound and noise, the power ratio is small. Hence, in order to
emphasize the target sound which has a large crosscorrelation to the
noise, it is required to use a large number of microphones. This leads to
an increase in the size of the microphone array. It is very difficult to
identify, under noisy environment, the position of the power source by
utilizing the crosscorrelation coefficient value of the equation (6).
SUMMARY OF THE INVENTION
It is a general object of the present invention to provide a microphone
array apparatus in which the above disadvantages are eliminated.
A more specific object of the present invention is to provide a microphone
array apparatus capable of stably and precisely suppressing noise,
emphasizing a target sound and identifying the position of a sound source.
The above objects of the present invention are achieved by a microphone
array apparatus comprising: a microphone array including microphones
(which correspond to parts indicated by reference numbers 1-1-1-n in the
following description), one of the microphones being a reference
microphone (1-1); filters (2-1-2-n) receiving output signals of the
microphones; and a filter coefficient calculator (4) which receives the
output signals of the microphones, a noise and a residual signal obtained
by subtracting filtered output signals of the microphones other than the
reference microphone from a filtered output signal of the reference
microphone and which obtain filter coefficients of the filters in
accordance with an evaluation function based on the residual signal. With
this structure, even when speech of a speaker corresponding to the sound
source and the noise are concurrently applied to the microphones, the
crosscorrelation function value is reduced so that the noise can be
effectively suppressed and the filter coefficients can continuously be
updated.
The above microphone array apparatus may be configured so that it further
comprises: delay units (8-1-8-n) provided in front of the filters; and a
delay calculator (9) which calculates amounts of delays of the delay units
on the basis of a maximum value of a crosscorrelation function of the
output signals of the microphones and the noise. Hence, the filter
coefficients can easily be updated.
The microphone array apparatus may be configured so that the noise is a
signal which drives a speaker. This structure is suitable for a system
that has a speaker in addition to the microphones. A reproduced sound from
the speaker may serve as noise. By handling the speaker as a noise source,
the signal driving the speaker can be handled as the noise, and thus the
filter coefficients can easily be updated.
The microphone array apparatus may further comprise a supplementary
microphone (21) which outputs the noise. This structure is suitable for a
system which has microphones but does not have a speaker. The output
signal of the supplementary microphone can be used as the noise.
The microphone array apparatus may be configured so that the filter
coefficient calculator includes a cyclic type low-pass filter (FIG. 10)
which applies a comparatively small weight to memory values of a filter
portion which executes a convolutional operation in an updating process of
the filter coefficients.
The above objects of the present invention are also achieved by a
microphone array apparatus comprising: a microphone array including
microphones (51-1, 51-2); linear predictive filters (52-1, 52-2) receiving
output signals of the microphones; linear predictive analysis units (53-1,
53-2) which receives the output signals of the microphones and update
filter coefficients of the linear predictive filters in accordance with a
linear predictive analysis; and a sound source position detector (54)
which obtains a crosscorrelation coefficient value based on linear
predictive residuals of the linear predictive filters and outputs
information concerning the position of a sound source based on a value
which maximizes the crosscorrelation coefficient. Hence, even when speech
of a speaker corresponding to the sound source and the noise are
concurrently applied to the microphones, autocorrelation function values
of samples about the speech signal are reduced to the linear predictive
analysis, so that the position of the target source can accurately be
detected. Thus, speech from the target sound can be emphasized and noise
components other than the target sound can be suppressed.
The microphone array apparatus may be configured so that: a target sound
source is a speaker; and the linear predictive analysis unit updates the
filter coefficients of the linear predictive filters by using a signal
which drives the speaker. Hence, the linear predictive analysis unit can
be commonly used to the linear predictive filters corresponding to the
microphones.
The above-mentioned objects of the present invention are achieved by a
microphone array apparatus comprising: a microphone array including
microphones (61-1, 61-2); a signal estimator (62) which estimates
positions of estimated microphones in accordance with intervals at which
the microphones are arranged by using the output signals of the
microphones and a velocity of sound and which outputs output signals of
the estimated microphones together with the output signals of the
microphones forming the microphone array; and a synchronous adder (63)
which pulls phases of the output signals of the microphones and the
estimated microphones and then adds the output signals. Hence, even if a
small number of microphones is used to form an array, the target sound can
be emphasized and the position of the target sound source can precisely be
detected as if a large number of microphones is used.
The microphone array apparatus may further comprise a reference microphone
(71) located on an imaginary line connecting the microphones forming the
microphone array and arranged at intervals at which the microphones
forming the microphone array are arranged, wherein the signal estimator
which corrects the estimated positions of the estimated microphones and
the output signals thereof on the basis of the output signals of the
microphones forming the microphone array.
The microphone array apparatus may further comprise an estimation
coefficient decision unit (74) weights an error signal which corresponds
to a difference between the output signal of the reference microphone and
the output signals of the signal estimator in accordance with an acoustic
sense characteristic so that the signal estimator performs a signal
estimating operation on a band having a comparatively high acoustic sense
with a comparatively high precision.
The microphone array apparatus may be configured so that: given angles are
defined which indicate directions of a sound source with respect to the
microphones forming the microphone array; the signal estimator includes
parts which are respectively provided to the given angles; the synchronous
adder includes parts which are respectively provided to the given angles;
and the microphone array apparatus further comprises a sound source
position detector which outputs information concerning the position of a
sound source based on a maximum value among the output signals of the
parts of the synchronous adder.
The above objects of the present invention are also achieved by a
microphone array apparatus comprising: a microphone array including
microphones (91-1, 91-2); a sound source position detector (92) which
detects a position of a sound source on the basis of output signals of the
microphones; a camera (90) generating an image of the sound source; a
second detector (93) which detects the position of the sound source on the
basis of the image from the camera; and a joint decision processing unit
(94) which outputs information indicating the position of the sound source
on the basis of the information from the sound source position detector
and the information from the second detector. Hence, the position of the
target sound source can by rapidly and precisely detected.
BRIEF DESCRIPTION OF THE DRAWINGS
Other objects, features and advantages of the present invention will become
more apparent from the following detailed description when read in
conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram of a conventional echo canceller;
FIG. 2 is a diagram of a conventional echo canceller using a microphone
array;
FIG. 3 is a block diagram of a structure directed to detecting the position
of a sound source and emphasizing the target sound;
FIG. 4 is a block diagram of a first embodiment of the present invention;
FIG. 5 is a block diagram of a filter which can be used in the first
embodiment of the present invention;
FIG. 6 is a block diagram of a second embodiment of the present invention;
FIG. 7 is a flowchart of an operation of a delay calculator used in the
second embodiment of the present invention;
FIG. 8 is a block diagram of a third embodiment of the present invention;
FIG. 9 is a block diagram of a fourth embodiment of the present invention;
FIG. 10 is a block diagram of a low-pass filter used in a filter
coefficient updating process executed in the embodiments of the present
invention;
FIG. 11 is a block diagram of a structure using a digital signal processor
(DSP);
FIG. 12 is a block diagram of an internal structure of the DSP shown in
FIG. 11;
FIG. 13 is a block diagram of a delay unit;
FIG. 14 is a block diagram of a fifth embodiment of the present invention;
FIG. 15 is a block diagram of a detailed structure of the fifth embodiment
of the present invention;
FIG. 16 is a diagram showing a relationship between the sound source
position and imax;
FIG. 17 is a block diagram of a sixth embodiment of the present invention;
FIG. 18 is a block diagram of a seventh embodiment of the present
invention;
FIG. 19 is a block diagram of a detailed structure of the seventh
embodiment of the present invention;
FIG. 20 is a block diagram of an eighth embodiment of the present
invention;
FIG. 21 is a block diagram of a ninth embodiment of the present invention;
and
FIG. 22 is a block diagram of a tenth embodiment of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
A description will now be given, with reference to FIG. 4, of a microphone
array apparatus according to a first embodiment of the present invention.
The apparatus shown in FIG. 4 is made up of n microphones 1-1-1-n forming
a microphone array, filters 2-1-2-n, an adder 3, a filter coefficient
calculator 4, a speaker (target sound source) 5, and a speaker (noise
source). The speech of the speaker 5 is input to the microphones 1-1-1-n,
which converts the received acoustic signals into electric signals, which
pass through the filters 2-1-2-n and are then applied to the adder 3. The
output signal of the adder 3 is then to a remote terminal via a network or
the like. A speech signal from the remote side is applied to the speaker
6, which is thus driven to reproduce the original speech. Hence, the
speaker 5 communicates with the other-side speaker. The reproduced speech
is input to the microphones 1-1-1-n, and thus functions as noise to the
speech of the speaker 5. Hence, the speaker 6 is a noise source with
respect to the target sound source.
The filter coefficient calculator 4 is supplied with the output signals of
the microphones 1-1-1-n, a noise (an input signal for driving the speaker
serving as noise source), and the output signal (residual signal) of the
adder 3, and thus updates the coefficients of the filters 2-1-2-n. In this
case, the microphone 1-1 is handled as a reference microphone. The
subtracter 3 subtracts the output signals of the filters 2-2-2-n from the
output signal of the filter 2-1.
Each of the filters 2-1-2-n can be configured as shown in FIG. 5. Each
filter includes Z-.sup.-1 delay units 11-1-11-r-l, coefficient units
12-1-12-r for multiplication of filter coefficients cp1, cp2, . . . , cpr,
and adders 13 and 14. A symbol "r" denotes the order of the filter.
When the signal from the noise source (speaker 6) is denoted as xp(i) and
the signal from the target sound source (speaker 5) is denoted as yp(i)
(where i denotes the sample number and p is equal to 1, 2, . . . , n), the
values fp(i) of the memories of the filters 2-1-2-n (the input signals to
the filters and the output signals of the delay units 11-1-11-r-1) are
defined as follows:
fp(i)=xp(i)+yp(i) (8)
The output signal e of the adder in the echo canceller using the
conventional microphone array is as follows:
##EQU3##
where f1(1), f1(2), . . . , f1(r), . . . , fi(1), fi(2), . . . , fi(r)
denote the values of the memories of the filters. The adder subtracts the
output signals of the filters other than the reference filter from the
output signal of the reference filter.
In contrast, the present invention controls the signals xp(i) in phase and
performs the convolutional operation. The output signal e' of the adder
thus obtained is as follows:
##EQU4##
where (p) in x(1)(p), . . . , x(q)(p) denotes signals from the noise source
obtained when the microphones 1-1-1-n are in phase, and the symbol "q"
denotes the number of samples on which the convolutional operation is
executed.
When the signals xp(i) from the noise source and the signals yp(i) of the
target sound source are concurrently input, that is, when the speaker 5
speaks at the same time as the speaker 6 outputs a reproduced speech,
there is a small crosscorrelation therebetween because the coexisting
speeches are uttered by different speakers. Hence, the equation (11) can
be rewritten as follows:
##EQU5##
It can be seen from the above equation (12), an influence of the signals
yp(i) from the target sound source to [fp(1)', . . . , fp(r)'] is reduced.
The signal e' in the equation (10) is obtained by using the equation (12),
and then, an evaluation function J=(e').sup.2 is calculated based on the
obtained signal e'. Then, based on the evaluation function J=(e').sup.2,
the filter coefficients of the filters 2-1-2-n are updated. That is, even
in the state in which speeches from the speaker (target sound source) 5
and the speaker (noise source) 6 are concurrently applied to the
microphones 1-1-1-n, the noise contained in the output signals of the
microphones 1-1-1-n has a large crosscorrelation to the input signal
applied to the filter coefficient calculator 4 and used to drive the
speaker 6, while having a small crosscorrelation to the target sound
source 5. Hence, the filter coefficients can be updated in accordance with
the evaluation function J=(e').sup.2. Hence, the output signal of the
adder 3 is the speech signal of the speaker 5 in which the noise is
suppressed.
FIG. 6 is a block diagram of a microphone array apparatus according to a
second embodiment of the present invention in which parts that are the
same as those shown in the previously described figures are given the same
reference numbers. The structure shown in FIG. 6 includes delay units
8-1-8-n (Z.sup.-d1 -Z.sup.-dn), and a delay calculator 9.
The updating of the filter coefficients according to the second embodiment
of the present invention is based on the following. The delay calculator 9
calculates the number of delayed samples in each of the delay units
81-1-8-n so that the output signals of the microphones 1-1-1-n are pulled
in phase. Further, the filter coefficient calculator 4 calculates the
filter coefficients of the filters 2-1-2-n. The delay calculator 9 is
supplied with the output signals of the microphones 1-1-1-n, and the input
signal (noise) for driving the speaker 6. The filter coefficient
calculator 4 is supplied with the output signals of the delay units
8-1-8-n, the output signal of the adder 3 and the input signal (noise) for
driving the speaker 6.
When the output signals of the microphones 1-1-1-n are denoted as gp(i)
where p=1, 2, . . . , n; j is the sample number, a crosscorrelation
function Rp(i) to the signals x(j) from the noise source is as follows:
Rp(i)=.SIGMA..sup.s.sub.j=1 gp(j+i)*x(j) (13)
where .SIGMA..sup.s.sub.j=1 denotes a summation from j=1 to j=s, and s
denotes the number of samples on which the convolutional operation is
executed. The number s of samples may be equal to tens to hundreds of
samples. When a symbol "D" denotes the maximum delayed sample
corresponding to the distances between the noise source and the
microphones, the term "i" in the equation (13) is such that i=0, 1, 2, . .
. , D.
For example, when the maximum distance between the noise source and the
furthest microphone is equal to 50 cm, and the sampling frequency is equal
to 8 kHz, the speed of sound is approximately equal to 340 m/s, and thus
the maximum number D of delayed samples is as follows:
D=(sampling frequency)*(maximum distance between the noise source and
microphone)/(speed of sound)=8000*(50/34000)=11.76.div.12.
Hence, the symbol "i" is equal to 1, 2, . . . , 12. When the maximum
distance between the noise source and the microphone is equal to 1m, the
maximum number D of delayed samples is equal to 24.
The value ip (p=1, 2, . . . , n) is obtained which is the value of i
obtained when the absolute value of the crosscorrelation function value
Rp(i) obtained by equation (13). Further, the maximum value imax of the ip
is obtained. The above process is comprised of steps (A1)-(A11) shown in
FIG. 7. The term imax is set to an initial value (equal to, for example,
0) and the variable p is set equal to 1, at step A1. At step A2, the term
Rpmax is set to an initial value (equal to, for example, 0.0), and the
term ip is set to an initial value (equal to, for example, 0). Further, at
step A2, the variable i is set equal to 0. At step A3, the
crosscorrelation function value Rp(i) defined by the equation (13) is
obtained.
At step A4, it is determined whether the crosscorrelation function value
Rp(i) is greater than the term Rpmax. If the answer is YES, the Rp(i)
obtained at that time is set to Rpmax at step A5. If the answer is NO, the
variable i is incremented by 1 (i=i+1) at step A6. At step A7, it is
determined whether i.ltoreq.D. If the value i is equal to or smaller than
the maximum number D of delayed samples, the process returns to step A3.
If the value i exceeds the maximum number D of delayed samples, the
process proceeds with step A8. At step A8, it is determined that the value
ip is greater than the value imax. If the answer is YES, the value ip
obtained at that time is set to imax at step A9. If the answer is NO, the
variable p is incremented by 1 (p=p+1) at step A10. At step All it is
determined whether p.ltoreq.n. If the answer of step All is YES, the
process returns to step A2. If the answer is NO, the retrieval of the
crosscorrelation function value Rp(i) ends, so that the maximum value imax
of the IP within the range of i<D.
The number dp of delayed samples of the delay unit can be obtained as
follows by using the terms ip and imax obtained by the above maximum value
detection:
dp=imax-ip (14)
Hence, the numbers di-dn of delayed samples of the delay units 8-1-8-n can
be set by the delay calculator 9.
The filters 2-1-2-n can be configured as shown in FIG. 5. When the output
signals of the filters 2-1-2-n are denoted as outp (p=1, 2, . . . , n)
defined by the following:
outp=.SIGMA..sup.n.sub.i=1 cpi*fp(i) (15)
where .SIGMA..sup.n.sub.i=1 denotes a summation from i=1 to i=n, cpi
denotes the filter coefficients, and fp(i) denotes the values of the
memories of the filters and are also input signals applied to the filters.
The filter coefficient calculator 4 calculates the crosscorrelation between
the present and past input signals of the filters 2-1-2-n and the signals
form the noise source, and thus updates the filler coefficients. The
crosscorrelation function value fp(i)' is written as follows:
fp(i)'=.SIGMA..sup.q.sub.n=1 x(j)*fP(i+j-r1) (16)
where .SIGMA..sup.q.sub.n=1 denotes a summation from j=1 to J=q, and the
symbol q denotes the number of samples on which the convolutional
operation is carried out in order to calculate the crosscorrelation
function value and is normally equal to tens to hundreds of samples.
By using the above crosscorrelation function value fp(i)', the output
signal e' of the adder 3 is obtained as follows:
e'=.SIGMA..sup.r.sub.j=1 [f1(j)'*c1j]-.SIGMA..sup.n.sub.j=1 [fi(j)'*cij]
(17)
The above operation is the convolutional operation and can be thus
implemented by a digital signal processor (DSP). In this case, the adder 3
subtracts the output signals of the microphones 1-2-1-n obtained via the
filters 2-2-2-n from the output signal of the reference microphone 1-1
obtained via the filter 2-1.
The evaluation function is defined so that J=(e').sup.2 where the output
signal e' of the adder 3 is handled as an error signal. By using the
evaluation function J=(e').sup.2, the filter coefficients are obtained.
For example, the filter coefficients can be obtained by the steepest
descent method. By using the following expressions, the filter
coefficients c11, c12, . . . , cn1, cn2, . . . , cnr can be obtained as
follows:
##EQU6##
where the norm fp.sub.norm corresponds to the aforementioned formula (3)
and can be written as follows:
fp.sub.norm =[(fp(1)').sup.2 +(fp(2)').sup.2 +. . . +(fp(r)').sup.2
].sup.1/2 (20)
The term .alpha. in the equations (18) and (19) is a constant as has been
described previously, and represents the speed and precision of
convergence of the filter coefficients towards the optimal values.
Hence, the output signal e' of the adder 3 is obtained as follows:
e'=out1-.SIGMA..sup.n.sub.i=2 outi (21)
The delay units 8-1-8-n change the phases of the input signals applied to
the filters 2-1-2-n. Hence, the filter coefficients can easily be updated
by the filter coefficient calculator 4. Even under a situation such that
the speaker 5 speaks at the same time as a sound is emitted from the
speaker 6, the updating of the filter coefficients can be realized. Hence,
it is possible to definitely suppress the noise components that enter the
microphones 1-1-1-n from the speaker 6 which serves as a noise source.
FIG. 8 is a block diagram of a third embodiment of the present invention,
in which parts that are the same as those shown in FIG. 4 are given the
same reference numbers. In FIG. 8, there are a noise source 16 and a
supplementary microphone 21. The supplementary microphone 21 can have the
same structure as that of the microphones 1-1-1-n forming the microphone
array.
The structure shown in FIG. 8 differs from that shown in FIG. 4 in that the
output signal of the supplementary microphone 21 can be input to the
filter coefficient calculator 4 as a signal from the noise source. Hence,
even in a case where the noise source 16 is an arbitrary noise source
other than the speaker, such as an air conditioning system, the noise can
be suppressed by using the evaluation function J=(e').sup.2 used to update
the filter coefficients, as has been described with reference to FIG. 4.
FIG. 9 is a block diagram of a fourth embodiment of the present invention,
in which parts that are the same as those shown in FIGS. 6 and 7 are given
the same reference numbers. The structure shown in FIG. 9 is almost the
same as that shown in FIG. 6 except that the output signal of the
supplementary microphone 21 is applied, as the signal from a noise source,
to the delay calculator 9 and the filter coefficient calculator 4. Hence,
as in the case of the structure shown in FIG. 6, the numbers of delayed
samples of the delay units 2-1-2-n are controlled by the delay calculator
9, and the filter coefficients of the filters 2-1-2-n are updated by the
filter coefficient calculator 4. Hence, noise can be compressed.
FIG. 10 is a block diagram of a low-pass filter used in the filter
coefficient updating process used in the embodiments of the present
invention. The low-pass filter shown in FIG. 10 includes coefficient units
22 and 23, an adder 24 and a delay unit 25. The structure shown in FIG. 10
is directed to calculating the aforementioned crosscorrelation function
value fp(i)' in which the coefficient unit 23 has a filter coefficient
.beta. and the coefficient unit 22 has a filter coefficient (1-.beta.).
The value fp(i)' is obtained as follows:
fp(i)'=.beta.*fp(i)'.sub.old +(1-.beta.)*[x(1)*fp(i)] (22)
where the coefficient .beta. is set so as to satisfy 0.0<.beta.<1.0
and fp(i)'.sub.old denotes the value of a memory (delay unit 25) of the
low-pass filter.
The low-pass filter shown in FIG. 10 is a cyclic type low-pass filter, in
which weighting for the past signals is made comparatively light in order
to prevent the convolutional operation from outputting an excessive output
value and thus stably obtain the crosscorrelation function value fp(i)'.
FIG. 11 is a block diagram of a structure directed to implementing the
embodiments of the present invention by using a digital signal processor
(DSP). Referring to FIG. 11, there are provided the microphones 1-1-1-n
forming a microphone array, a DSP 30, low-pass filters (LPF) 31-1-31-n,
analog-to-digital (A/D) converters 32-1-32-n, a digital-to-analog (D/A)
converter 33, a low-pass filter (LPF) 34, an amplifier 35 and a speaker
36.
The aforementioned filters 2-1-2-n and the filter coefficient calculator 4
used in the structure shown in FIG. 4 and the filters 2-1-2-n, the filter
coefficient calculator 4 and the delay units 8-1-8-n used in the structure
shown in FIG. 6 can be realized by the combinations of a repetitive
process, a sum-of-product operation and a condition branching process.
Hence, the above processes can be implemented by operating functions of
the DSP 30.
The low-pass filters 31-1-31-n function to eliminate signal components
located outside the speech band. The A/D converters 32-1-32-n converts the
output signals of the microphones 1-1-1-n obtained via the low-pass
filters 31-1-31-n into digital signals and have a sampling frequency of,
for example, 8 kHz. The digital signals have the number of bits which
corresponds to the number of bits processed in the DSP 30. For example,
the digital signals consists of 8 bits or 16 bits.
An input signal obtained via a network or the like is converted into an
analog signal by the D/A converter 33. The analog signal thus obtained
passes through the low-pass filter 34, and is then applied to the
amplifier 35. An amplified signal drives the speaker 36. The reproduced
sound emitted from the speaker 36 serves as noise with respect to the
microphones 1-1-1-n. However, as has been described previously, the noise
can be suppressed by updating the filter coefficients by the DSP 30.
FIG. 12 is a block diagram showing functions of the DSP that can be used in
the embodiments of the present invention. In FIG. 12, parts that are the
same as those shown in the previously described figures are given the same
reference numbers. In FIG. 12, the low-pass filters 31-1-31-n and 34, the
A/D converters 32-1-32-n, the D/A converter 33 and the amplifier 35 shown
in FIG. 11 are omitted. The filer coefficient calculator 4 includes a
crosscorrelation calculator 41 and a filter coefficient updating unit 42.
The delay calculator 9 includes a crosscorrelation calculator 43, a
maximum value detector 44 and a number-of-delayed-samples calculator 3545.
The crosscorrelation calculator 43 of the delay calculator 9 receives the
output signals gp(j9 of the microphones 1-1-1-n and the drive signal for
he speaker 36 (which functions as a noise source), and calculates the
crosscorrelation function value Rp(i) defined in formula (13). The maximum
value detector 44 detects the maximum value of the crosscorrelation
function value Rp(i) in accordance with the flowchart of FIG. 7. The
number-of-delayed-samples calculator 45 obtain the numbers dp of delayed
samples of the delay units 8-1-8-n by using the ip and imax obtained
during the maximum value detecting process. The numbers of delayed samples
thus obtained are then set in the delay units 8- | | |