|
|
|
| United States Patent | 6246760 |
| Link to this page | http://www.wikipatents.com/6246760.html |
| Inventor(s) | Makino; Shoji (Machida, JP);
Shimauchi; Suehiro (Tokyo, JP);
Haneda; Yoichi (Tokyo, JP);
Nakagawa; Akira (Kokubunji, JP);
Kojima; Junji (Tokyo, JP) |
| Abstract | In a subband echo cancellation for a multichannel teleconference, received
signals x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) of each channel are
divided into N subband signals, an echo y(k) picked up by a microphone
16.sub.j after propagation over an echo path is divided into N subband
signals y.sub.0 (k), . . . ,y.sub.N-1 (k), and vectors each composed of a
time sequence of subband received signals x.sub.1 (k), . . . , x.sub.I (k)
are combined for each corresponding subband. The combined vector and an
echo cancellation error signal in the corresponding subband are input into
an estimation part 19.sub.n, wherein a cross-correlation variation
component is extracted. The extracted component is used as an adjustment
vector to iteratively adjust the impulse response of an estimated echo
path. The combined vector is applied to an estimated echo path 18.sub.n
formed by the adjusted value to obtain an echo replica. An echo
cancellation error signal e.sub.n (k) is calculated from the echo replica
and a subband echo y.sub.n (k). |
|
|
|
Title Information  |
|
|
|
|
|
Drawing from US Patent 6246760 |
|
|
Subband echo cancellation method for multichannel audio teleconference and
echo canceller using the same |
|
|
|
|
|
| Publication Date |
June 12, 2001 |
|
|
|
|
|
| Filing Date |
September 11, 1997 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Priority Data |
Sep 13, 1996[JP]8-243524 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
References  |
|
|
| *references marked with an asterisk below are user-added references |
|
U.S. References |
|
|
|
|
|
|
U.S. References |
|
|
Foreign References |
|
|
|
|
|
|
Foreign References |
|
|
Other References |
|
|
|
|
|
|
Other References |
|
|
|
|
|
References  |
|
|
|
|
|
| Market Size |
|
Estimate the gross annual revenues of the relevant market
sector:
|
| | |
| |
|
|
| Market Share |
|
Estimate the percentage of the relevant market sector this invention will capture:
|
| | |
| |
|
|
| Reasonable Royalty |
|
What percentage of gross sales should the inventor or assignee be paid?
|
| | |
| |
|
|
|
Public's "Guesstimation" of Royalty Value
|
| Market Size | N/A | [No votes] | | x | Market Share | N/A | [No votes] | | x | Reasonable Royalty | N/A | [No votes] |
| | N/A | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
Market Review  |
|
|
Technical Review  |
|
|
Claims  |
|
|
What is claimed is:
1. A subband echo cancellation method for a multichannel teleconference in
which received signals of plural channels are reproduced as acoustic
signals by loudspeakers corresponding to said plural channels, said
acoustic signals being received by at least one microphone after
propagating over each echo path thereto, an echo replica being subtracted
from an echo provided from said at least one microphone, an echo
cancellation error signal resulting from said subtraction and said
received signal of each of said plural channels being used to calculate an
adjustment vector, said adjustment vector being used iteratively to adjust
an estimated value of an impulse response of each echo path, estimated
echo paths having said adjusted impulse response being generated for each
of said echo paths, and the corresponding one of said received signals
being applied to each estimated echo path to generate said echo replica,
the method further comprising the steps of:
(a) dividing said received signal and said echo into N subbands in each of
said plural channels, and decimating them with predetermined decimation
rates, respectively, to generate N subband received signals and N subband
echoes, N being an integer equal to or greater than 2;
(b) generating N echo replicas by providing said N subband received signals
to N estimated echo paths, each formed by a digital filter having a filter
coefficient of a predetermined number of taps which simulates the impulse
response of said echo path in each of said N subbands;
(c) subtracting said N echo replicas from corresponding N subband echoes to
generate echo cancellation error signals in said N subbands;
(d) iteratively adjusting said filter coefficients of said digital filters
in a manner to minimize said N echo cancellation error signals on the
basis of said N echo cancellation error signals and corresponding N
subband received signals; and
(e) combining said echo cancellation error signals in said N subbands into
a full band send signal having said echoes suppressed; and
(f) extracting a variation component of the cross-correlation between said
received signals of said channels as said adjustment vector.
2. The method of claim 1, wherein a combined received signal vector by
combining received signal vectors of a sequence of received signals of
each of said channels is calculated and a variation in the correlation
between current and previous ones of said combined received signal vector
is detected and used as said cross-correlation variation component.
3. The method of claim 2, wherein a method for detecting said variation in
the cross-correlation between said current and previous combined received
signal vectors in each of said channels is set optimum in said N subbands.
4. The method of claim 3, wherein said method for detecting said variation
in the cross-correlation between said current and previous combined
received signal vectors in each of said each channel is a projection
algorithm or ESP algorithm and the projection order is set at an optimum
value in each of said N subbands.
5. The method of claim 4, wherein the order of said projection algorithm or
ESP algorithm is set at a minimum value at which the convergence speed of
an echo return loss enhancement substantially saturates with respect to
said received signal in said each subband, the number of taps of said
digital filter corresponding to a lower one of said N subbands is larger
than the number of taps corresponding to a higher subband.
6. The method of claim 4, wherein the order of said projection algorithm or
ESP algorithm in each of said N subbands is set at a minimum value at
which whitening of an estimation error vector at the time of having
whitened said received signal by a linear predictive coding filter
substantially saturates, the order of said projection or ES projection
algorithm in said lower subband being set larger than the order of said
projection or ES projection algorithm in said higher subband.
7. The method of claim 4, wherein the number of taps of said digital filter
forming said estimated echo path in each of said N subbands is
predetermined on the basis of at least one of the energy distribution in
the frequency region of a desired one of said received signals, the room
reverberation characteristic and the human psychoacoustic characteristic.
8. The method of claim 4, wherein the number of taps of said digital filter
corresponding to a lower one of said N subbands is larger than the number
of taps of said digital filter corresponding to a higher subband.
9. The method of claim 4, wherein the order of said projection algorithm or
ESP algorithm in a lower one of said subbands is set larger than the order
of said projection algorithm or ESP algorithm in a higher subband.
10. A subband echo cancellation method for a multichannel teleconference in
which received signals of plural channels are reproduced as acoustic
signals by loudspeakers corresponding to said plural channels, said
acoustic signals being received by at least one microphone after
propagating over each echo path thereto, an echo replica being subtracted
from an echo provided from said at least one microphone, an echo
cancellation error signal resulting from said subtraction and said
received signal of each of said plural channels are used to calculate an
adjustment vector, said adjustment vector is used to iteratively adjust an
estimated value of an impulse response of said each echo path, estimated
echo paths having said adjusted impulse responses corresponding to said
each echo paths, and the corresponding one of said received signals is
applied to said each estimated echo path to generate said echo replica,
said method comprising the steps of:
(a) dividing said received signal and said echo into N subbands in each of
said plural channels and decimating them with predetermined decimation
rates, respectively, to generate N subband received signals and N subband
echoes, N being an integer equal to or greater than 2;
(b) generating N echo replicas by providing said N subband received signals
to N estimated echo paths each being formed by a digital filter having a
filter coefficient of a predetermined number of taps which simulates the
impulse response of said echo path in each of said N subbands;
(c) subtracting said N echo replicas from corresponding N subband echoes to
generate echo cancellation error signals in said N subbands;
(d) iteratively adjusting said filter coefficients of said digital filters
in a manner to minimize said N echo cancellation error signals on the
basis of said N echo cancellation error signals and corresponding N
subband received signals; and
(e) combining said echo cancellation error signals in said N subbands into
a full band send signal having said echoes suppressed;
(f) adding a variation component to the cross-correlation between said
received signals of said plural channels, each of said received signals
being reproduced by said loudspeaker of one of said plural channels; and
(g) deriving said adjustment vector from said received signal added with
said cross-correlation variation component.
11. The method of claim 10, wherein, letting the number of reproduction
channels be represented by I and said received signals of said plural
channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as functions of
a discrete time k, said received signals x.sub.1 (k), x.sub.2 (k), . . . ,
x.sub.I (k) are input into time-variant filters with different
time-variant characteristics for said plural channels, wherein they are
convoluted, indicated by *, with impulse responses f.sub.1 (k), f.sub.2
(k), . . . , f.sub.I (k) of said filters for conversion into signals
x.sub.1 (k), x.sub.2 (k), . . . ,x.sub.I (k) which satisfy
x.sub.1 (k)=f.sub.1 (k)*x.sub.1 (k)
x.sub.2 (k)=f.sub.2 (k)*x.sub.2 (k)
x.sub.I (k)=f.sub.I (k)*x.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
12. The method of claim 10, wherein letting the number of reproduction
channels be represented by I and said received signals of said plural
channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as functions of
a discrete time k, said received signals x.sub.1 (k), x.sub.2 (k), . . . ,
x.sub.I (k) are multiplied by different functions g.sub.1 (k), g.sub.2
(k), . . . , g.sub.I (k) for conversion into signals x.sub.1 (k), x.sub.2
(k), . . . x.sub.I (k) which satisfy
x.sub.1 (k)=g.sub.1 (k).multidot.x.sub.1 (k)
x.sub.2 (k)=g.sub.2 (k).multidot.x.sub.2 (k)
x.sub.I (k)=g.sub.I (k).multidot.x.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
13. The method of claim 10, wherein letting the number of reproduction
channels be represented by I and said received signals of said plural
channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as functions of
a discrete time k, said received signals x.sub.1 (k), x.sub.2 (k), . . . ,
x.sub.I (k) are added to different functions n.sub.1 (k), n.sub.2 (k),
n.sub.I (k), respectively, for conversion into signals x.sub.1 (k),
x.sub.2 (k), . . . , x.sub.I (k) which satisfy
x.sub.1 (k)=x.sub.1 (k)+n.sub.1 (k)
x.sub.2 (k)=x.sub.2 (k)+n.sub.2 (k)
x.sub.I (k)=x.sub.I (k)+n.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
14. The method of claim 10, wherein letting the number of reproduction
channels be represented by I and said received signals of said plural
channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as functions of
a discrete time k, said received signals x.sub.1 (k), x.sub.2 (k), . . . ,
x.sub.I (k) are converted into signals x.sub.1 (k), x.sub.2 (k), . . . ,
x.sub.I (k) by subjecting the frequency characteristic of each of said
received signals to different time-variant frequency axis
expansion/compression processing, whereby said variation component of said
cross-correlation between said received signal of said plural channels is
added thereto.
15. The method of claim 10, wherein the method of adding said variation
component of said cross-correlation between said received signals of said
plural channels is set optimum in each of said N subbands in a manner to
reduce degradation of the psychoacoustic quality of said acoustic signal.
16. The method of any one of claims 1 to 15, wherein said subband received
signals and said subband echoes are real-number signals.
17. The method of any one of claims 1 to 15, wherein said subband received
signals and said subband echoes are complex signals.
18. A subband echo canceller for a multichannel teleconference in which
received signals of plural channels are reproduced as acoustic signals by
loudspeakers corresponding to said plural channels, said acoustic signals
being received by at least one microphone after propagating over each echo
path thereto, an echo replica being subtracted from an echo provided from
said at least one microphone, an echo cancellation error signal resulting
from said subtraction and said received signal of each of said plural
channels being used to calculate an adjustment vector, said adjustment
vector being used iteratively to adjust an estimated value of an impulse
response of each echo path, estimated echo paths each having said adjusted
impulse response being generated for each of said each echo paths, and the
corresponding one of said received signals being applied to said each
estimated echo path to generate said echo replica, said echo canceller
comprising:
subband echo generating means for dividing said received signal and said
echo into N subbands in each of said plural channels, and decimating them
with predetermined decimation rates, respectively, to generate N subband
received signals and N subband echoes, N being an integer equal to or
greater than 2;
N estimated echo path means, each formed by a digital filter which is given
a filter coefficient of a predetermined number of taps and simulates the
impulse response of said echo path in each of said N subbands, said N
estimated echo path means being supplied with said N subband received
signals and generating N echo replicas, respectively;
error signal generating means for subtracting said N echo replicas from
corresponding N subband echoes to generate echo cancellation error signals
in said N subbands;
echo path estimating means for iteratively adjusting said filter
coefficients of said digital filters in a manner to minimize said N echo
cancellation error signals on the basis of said N echo cancellation error
signals and said corresponding N subband received signals, said echo path
estimation means comprising: cross-correlation variation extracting means
for extracting a variation component of the cross-correlation between said
received signals of said plural channels, and including adjustment means
for using said variation component as said adjustment vector;
subband synthesis means for combining said echo cancellation error signals
in said N subbands into a full band send signal having said echoes
suppressed.
19. A subband echo canceller for a multichannel teleconference in which
received signals of plural channels are reproduced as acoustic signals by
loudspeakers corresponding to said plural channels, said acoustic signals
being received by at least one microphone after propagating over each echo
path thereto, an echo replica being subtracted from an echo provided from
said at least one microphone, an echo cancellation error signal resulting
from said subtraction and said received signal of each of said plural
channels being used to calculate an adjustment vector, said adjustment
vector being used iteratively to adjust an estimated value of an impulse
response of each echo path, estimated echo paths each having said adjusted
impulse response, and the corresponding one of said received signals is
applied to said each estimated echo path to generate said echo replica,
said echo canceller comprising:
subband echo generating means for dividing said received signal and said
echo into N subbands in each of said plural channels and decimating them
with predetermined decimation rates, respectively, to generate N subband
received signals and N subband echoes, N being an integer equal to or
greater than 2;
N estimated echo path means, each formed by a digital filter which is given
a filter coefficient of a predetermined number of taps and simulates the
impulse response of said echo path in each of said N subbands, said N
estimated echo path means being supplied with said N subband received
signals and generating N echo replicas, respectively;
error signal generating means for subtracting said N echo replicas from
corresponding N subband echoes to generate echo cancellation error signals
in said N subbands;
echo path estimating means for iteratively adjusting said filter
coefficients of said digital filters in a manner to minimize said N echo
cancellation error signals on the basis of said N echo cancellation error
signals and said corresponding N subband received signals;
subband synthesis means for combining said echo cancellation error signals
in said N subbands into a full band send signal having said echoes
suppressed; and
cross-correlation variation adding means for adding a variation component
of the cross-correlation between said received signals of said plural
channels, received signals added with said cross-correlation variation
component being used to derive said adjustment vector.
20. The echo canceller of claim 19, wherein said cross-correlation
variation adding means is means by which, letting the number of
reproduction channels be represented by I and said received signals of
said plural channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as
functions of a discrete time k, said received signals x.sub.1 (k), x.sub.2
(k), . . . , x.sub.I (k) are input into time-variant filters with
different time-variant characteristics for said plural channels, wherein
they are convoluted, indicated by *, with impulse responses f.sub.1 (k),
f.sub.2 (k), . . . , f.sub.I (k) of said filters for conversion into
signals x.sub.1 (k), x.sub.2 (k), . . . ,x.sub.I (k) which satisfy
x.sub.1 (k)=f.sub.1 (k)*x.sub.1 (k)
x.sub.2 (k)=f.sub.2 (k)*x.sub.2 (k)
x.sub.I (k)=f.sub.I (k)*x.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
21. The echo canceller of claim 19, wherein said cross-correlation
variation adding means is means by which, letting the number of
reproduction channels be represented by I and said received signals of
said plural channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as
functions of a discrete time k, said received signals x.sub.1 (k), x.sub.2
(k), . . . , x.sub.I (k) are multiplied by different functions g.sub.1
(k), g.sub.2 (k), . . . , g.sub.I (k) for conversion into signals x.sub.1
(k), x.sub.2 (k), . . . x.sub.I (k) which satisfy
x.sub.1 (k)=g.sub.1 (k).multidot.x.sub.1 (k)
x.sub.2 (k)=g.sub.2 (k).multidot.x.sub.2 (k)
x.sub.I (k)=g.sub.I (k).multidot.x.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
22. The echo canceller of claim 19, wherein said cross-correlation
variation adding means is means by which, letting the number of
reproduction channels be represented by I and said received signals of
said plural channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as
functions of a discrete time k, said received signals x.sub.1 (k), x.sub.2
(k), . . . , x.sub.I (k) are added to different functions n.sub.1 (k),
n.sub.2 (k), n.sub.I (k), respectively, for conversion into signals
x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) which satisfy
x.sub.1 (k)=x.sub.1 (k)+n.sub.1 (k)
x.sub.2 (k)=x.sub.2 (k)+n.sub.2 (k)
x.sub.I (k)=x.sub.I (k)+n.sub.I (k)
whereby said variation component of said cross-correlation between said
received signal of said plural channels is added thereto.
23. The echo canceller of claim 19, wherein said cross-correlation
variation adding means is means by which, letting the number of
reproduction channels be represented by I and said received signals of
said plural channels by x.sub.1 (k), x.sub.2 (k), . . . , x.sub.I (k) as
functions of a discrete time k, said received signals x.sub.1 (k), x.sub.2
(k), . . . , x.sub.I (k) are converted into signals x.sub.1 (k), x.sub.2
(k), . . . , x.sub.I (k) by subjecting the frequency characteristic of
each of said received signals to different time-variant frequency axis
expansion/compression processing, whereby said variation component of said
cross-correlation between said received signal of said plural channels is
added thereto.
24. The echo canceller of any one of claims 18 to 23, wherein said subband
received signals and said subband echoes are real-number signals.
25. The echo canceller of any one of claims 18-23, wherein said subband
received signals and said subband echoes are complex signals. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to an echo cancellation method for cancelling
room echoes which would otherwise cause howling and give rise to
psychoacoustic problems in a teleconferencing system using a
multi-receive-system and, more particularly, to a subband echo
cancellation method and apparatus for a multichannel audio teleconference
which updates or corrects an estimated impulse response of an echo path
for each subband through utilization of a projection algorithm or the
like.
ONE-CHANNEL ECHO CANCELLATION
An echo canceller is used to offer a hands-free telecommunication system
that has an excellent double-talk function and is virtually free from
echoes.
A description will be given first, with reference to FIG. 1, of a
one-channel echo canceller. In hands-free communication, speech uttered by
a person at a remote place is provided as a received signal to a received
signal terminal 11 and is radiated from a loudspeaker 12 directly or after
being subjected to some processing by a received signal processing part 13
that automatically adjusts the gain of the received signal according to
its amplitude, power or similar magnitude. For this reason, the received
signal x.sub.1 (k) herein mentioned is not limited specifically to the
received signal itself but shall refer to a processed received signal as
well when the received signal processing part 13 is employed. In FIG. 1, k
indicates discrete time. An echo canceller 14 cancels an echo y(k) which
is produced when the received signal x.sub.1 (k) radiated from the
loudspeaker 12 is picked up by a microphone 16 after propagating over an
echo path 15. The echo y.sub.1 (k) can be modeled by such a convolution as
follows:
##EQU1##
where .SIGMA. indicates a summation from 1=0 to L-1, h.sub.11 (k,n) is the
impulse response indicating the transfer function of the echo path 15 at
time k and L is the number of taps, which is a constant preset
corresponding to the reverberation time of the echo path 15. In the first
place, received signals x.sub.1 (k) from the current time to L-1 are
stored in a received signal storage and vector generating part 17. The L
received signals thus stored are outputted as a received signal vector
x.sub.1 (k), that is, as
x.sub.1 (k)=[x.sub.1 (k), x.sub.1 (k-1), . . . , x.sub.1 (k-L+1)].sup.T
(2)
where *.sup.T indicates a transposition. In an estimated echo generating
part 18, the inner product of the received signal vector x.sub.1 (k) of
Eq. (2) and an estimated echo path vector h.sub.11 (k), which is provided
from an echo path estimating part 19, is calculated as follows:
y.sub.1 (k)=h.sub.11.sup.T (k)x.sub.1 (k) (3)
As a result, an estimated echo or echo replica y.sub.1 (k) is generated.
This inner product calculation is equivalent to such a convolution as Eq.
(1). In the echo path estimating part 19, the estimated echo path vector
h.sub.11 (k) is generated which is used in the estimated echo generating
part 18.
Since the impulse response h.sub.11 (k,1) of the echo path 15 from the
loudspeaker 12 to the microphone 16 varies with a sound field variation by
a movement of a person or object, for instance, the estimated echo path
vector h.sub.11 (k) needs to be varied following the time-varying impulse
response of the echo path 15. In this example, the echo canceller 14 is
formed by an adaptive FIR (Finite Impulse Response) filter. The most
common algorithm for the echo path estimation is an NLMS (Normalized Least
Mean Square) algorithm. With the NLMS algorithm, the received signal
vector x.sub.1 (k) at time k and a residual echo e.sub.1 (k), i.e. the
following error, obtained by subtracting the estimated echo signal y.sub.1
(k) from the output y.sub.1 (k) of the microphone 16 by a subtractor 21,
e.sub.1 (k)=y.sub.1(k)-y.sub.1 (k) (4)
are used to calculate an estimated echo path vector h.sub.11 (k+1) which is
used at time k+1, by the following equation:
h.sub.11 (k+1)=h.sub.11 (k)+.mu.e.sub.1 (k)x.sub.1 (k)/(x.sub.1.sup.T
(k)x.sub.1 (k)) (5)
where .mu. is called a step size parameter, which is used to adjust
adaptation within the range of 0<.mu.<2. By repeating the above
processing, the estimated echo path vector h.sub.11 (k) in the echo path
estimating part 19 can be gradually brought into agreement with a true
echo path vector h.sub.11 (k) whose elements are impulse response
sequences h.sub.11 (k, 1) of the true echo path 15, that is, the following
echo path vector:
h.sub.11 (k)=[h.sub.11 (k,0), h.sub.11 (k,1), . . . , h.sub.11
(k,L-1)].sup.T (6)
As the result of this, the residual echo e.sub.1 (k) given by Eq. (4) can
be reduced.
The most effective algorithm now in use for the echo path estimation is a
projection algorithm or ES projection algorithm (hereinafter referred to
as an ESP algorithm). The projection algorithm is based on an idea of
improving the convergence speed for correlated signals such as speech by
removing the auto-correlation between input signals in the algorithm. The
removal of auto-correlated components means whitening of signals in the
time domain. The projection algorithm is described in detail in K. Ozeki
and T. Umeda, "An Adaptive filtering Algorithm Using an orthogonal
Projection to an Affine Subspace and Its Properties," T rans.(A), IEICE
Japan, vol.J67-A, No.2, pp.126-132, February, 1984.
In general, the p-order projection algorithm updates the estimated echo
path vector h(k) in such a manner as to obtain correct outputs y(k),
y(k-1), . . . , y(k-p+1) for the last p input signal vectors x(k), x(k-1),
. . . , x(k-p+1). That is, h(k+1) is computed which satisfies the
following equations:
x.sup.T (k)h(k+1)=y(k)
x.sup.T (k-1)h(k+1)=y(k-1)
x.sup.T (k-p+1)h(k+1)=y(k-p+1) (7)
where
x(k)=[x(k),x(k-1), . . . ,x(k-L+1)].sup.T (8)
When the number p of equations is smaller than the number of unknown
numbers (the number of taps) L, the solution h(k+1) of the simultaneous
equations (7) is indeterminate. Hence, the estimated echo path vector is
updated to minimize the value or magnitude of the updating
.parallel.h(k+1)-h(k).parallel.. The p-order projection algorithm in such
an instance is expressed by the following equation:
h(k+1)=h(k)+.mu.[X.sup.T (k)].sup.+ e(k) =h(k)+.mu.X(k)[X.sup.T
(k)X(k)].sup.-1 e(k) =h(k)+.mu.X(k).beta.(k) =h(k)+.mu.[.beta..sub.1
x(k)+.beta..sub.2 x(k-1)+. . . +.beta..sub.p x(k-p+1)] (9)
where
X(k)=[x(k),x(k-1), . . . ,x(k-p+1)] (10)
e(k)=[e(k),(1-.mu.)e(k-1), . . . ,(1-.mu.).sup.P-1 e(k-p+1)].sup.T (11)
e(k)=y(k)-y(k) (12)
y(k)=h(k).sup.T X(k) (13)
.beta.(k)=[.beta..sub.1, .beta..sub.2, . . . , .beta..sub.P ].sup.T (14)
.sup.+ : generalized inverse matrix
-.sup.1 : inverse matrix.
In the above, .beta.(k) is the solution of the following simultaneous
linear equation with p unknowns:
[X.sup.T (k)X(k)].beta.(k)=e(k) (15)
To avoid instability in the inverse matrix operation, a small positive
constant .delta. may be used as follows:
[X.sup.T (k)X(k)+.delta.I].beta.(k)=e(k) (15)'
where I is a unit matrix. The second term on the right-hand side of Eq. (9)
is an updated vector, with which the estimated echo path vector is
iteratively updated. X(k).beta.(k) in Eq. (9) represents processing for
removing the auto-correlation of the input signal. The removal of
auto-correlation means suppression of input signal variations in the time
domain, and hence it means whitening of the signals in the time domain.
That is, the projection algorithm can be said to increase the impulse
response updating speed by the whitening of the input signal in the time
domain. Several fast projection algorithms have been proposed to reduce
the computational complexity, and they are described in detail in [X.sup.T
(k)AX(k)].beta.(k)=e(k) Japanese Patent Application Laid-Open Gazettes
Nos. 312535/95 and 92980/95. Further, setting the input/output at a
negative time zero and p infinity corresponds to the RLS algorithm.
The ESP algorithm is a combination of the projection algorithm with the ES
algorithm that only reflects the variation characteristic of the echo path
and permits implementation of an echo canceler of higher convergence speed
than does the projection algorithm. The p-order ESP algorithm can be
expressed by the following equation:
h(k+1)=h(k)+.mu.[{AX(k)}].sup.+ e(k) =(k)+.mu.AX(k)[X.sup.T
(k)AX(k)].sup.-1 e(k) =h(k)+.mu.AX(k).beta.(k) =h(k)+.mu.A[.beta..sub.1
x(k)+.beta..sub.2 x(k-1)+ . . . +.beta..sub.P x (k-p+1)] (16)
where:
A=diag[.alpha..sub.1, .alpha..sub.2, . . . ,.alpha..sub.L ]: step size
matrix
.alpha..sub.i =.alpha..sub.0.lambda. (i=1,2, . . . ,L)
.lambda.: attenuation rate of impulse response variation (0<.lambda.<1)
.mu.: second step size (scalar quantity)
In the above, .beta.(k)is the solution of the following simultaneous linear
equation with p unknowns:
(17)
To avoid instability in the inverse matrix operation, a small positive
constant .delta. may be used as follows:
(17)'
where I is a unit matrix.
When the estimated echo path 18 is formed by a digital FIR filter, its
filter coefficient vector h.sub.11 (k) is a direct simulation of the
impulse response h.sub.11 (k) of the room echo path 15. Accordingly, the
[X.sup.T (k)AX(k)+.delta.I].beta.(k)=e(k)value of adjustment of the filter
coefficient that is required according to variations of the room echo path
15 is equal to the variation in its impulse response h.sub.11 (k). Then,
the step size matrix A, which represents the step size in the filter
coefficient adjustment, is weighted using the time-varying characteristic
of the impulse response. The impulse response variation in a room sound
field is usually expressed as an exponential function using the
attenuation rate .lambda.. As depicted in FIG. 2A, the diagonal elements
.alpha..sub.1 (where 1=1,2, . . . ,L) of the step size matrix A
exponentially attenuates, as 1 increases, from .alpha..sub.0 and gradually
approaches zero with the same gradient as that of the exponential
attenuation characteristic of the impulse response. This algorithm
utilizes an acoustics finding or knowledge that when the impulse response
of a room echo path varies as a person or object moves, its variation (a
difference in the impulse response) exponentially attenuates with the same
attenuation rate as that of the impulse response. By adjusting initial
coefficients of the impulse response with large variations in large steps
and the subsequent coefficients with small variations in small steps, it
is possible to offer an echo canceler of fast convergence.
In the case of constructing the echo canceler with plural DSP (Digital
Signal Processor) chips, the exponential decay curve of the step size
.alpha..sub.1 is approximated stepwise and the step size .alpha..sub.1 is
set in discrete steps with a fixed value for each chip as shown in FIG.
2B. This permits implementation of the ESP algorithm with the
computational load and storage capacity held about the same as in the case
of the conventional projection algorithm. The ESP algorithm is described
in detail in S. Makino and Y. Kaneda, "Exponentially weighted step-size
projection algorithm for acoustic echo cancellers", Trans. IEICE Japan,
vol. E75-A, No. 11, pp. 1500-1508, November, 1992.
In the case of adjusting the estimated echo path vector h(k) by the
conventional NLMS algorithm based on Eq. (5), it is adjusted in the
direction of the input signal vector x(k). on the other hand, according to
the ESP algorithm based on Eqs. (9) and (16), the second term on the right
side of the fourth equation of Eq. (9) and (16) is set as follows:
v(k)=.beta..sub.1 x(k)+.beta..sub.2 x(k-1)+ . . . +.beta..sub.P x(k-p+1)
(18)
and the estimated echo path vector h(k) is adjusted in the direction of the
vector v(k), that is, in the direction in which the correlation
(auto-correlation) to all of previous combined input signal vectors
x(k-1), . . . , x(k-p+1) has been removed from the current combined vector
x(k) of input signals. In other words, the coefficients .beta..sub.1 to
.beta..sub.P are determined so that vectors similar to the previous input
signal vectors are removed as much as possible from the current adjusted
input signal vector v(k). In consequence, the input signal is whitened in
the time domain.
As described above, the conventional projection algorithm whitens the
monoral input signal in the time domain by removing the auto-correlation
component of the input signal so as to provide increased convergence speed
of the echo path estimation. The afore-mentioned Makino et al literature
shows the results of computer simulations of convergence of ERLE
(Echo-Return-Loss-Enhancement) by the ESP algorithm and by the NLMS
algorithm in the case where the received signal was a male voice.
According to the results of computer simulations, the time for the ERLE to
reach 20 dB is about 1 sec in the case of the NLMS algorithm and 0.2 sec
or less in the case of the ESP algorithm, and the time for substantial
convergence of the ERLE is approximately in the range of 1 to 3 sec at the
longest in either algorithm. This is considered to indicate the whitening
effect of the input signal.
On the other hand, there is known a subband scheme that increases the
convergence speed of the echo path estimation by whitening the monoral
input signal in the frequency domain. This scheme divides the input signal
into plural subbands, then sequentially adjusts in each subband the filter
coefficient of the estimated echo path 18 based on variations of the echo
path 15 by the NLMS algorithm or the like, and combines and outputs
residuals in the respective subbands. This is disclosed in, for instance,
U.S. Pat. No. 5,272,695, S. Gay and R. Mammone, "Fast converging subband
acoustic echo cancellation using RAP on the WE.sup.R DSP16A", Proc.
ICASSP90, pp. 1141-1144, April 1990, and Makino et al, "Subband Echo
Canceller with an Exponentially Weighted Stepsize NLMS Adaptive Filter",
Trans. IEICE Japan, A Vol. 379-A, No. p6, pp.1138-1146, June 1996. This
subband scheme involves flattening or what is called whitening of signals
in the frequency domain, increasing the convergence speed in the
estimation of the filter coefficient of the estimated echo path at the
time of variations of the echo path. This subband scheme is used in the
echo path estimation for a one-channel input signal and increases the
convergence speed of the echo cancellation by flattening (whitening) of
the signal in each subband. This is attributable to the whitening of the
signal and hence has nothing to do with the number of channels of the
input signal. That is, in a teleconferencing system using plural
loudspeakers and plural microphones the application of the subband scheme
to each of the multichannel input signals would produce the same whitening
effect as described above. However, it has not been considered that the
subband scheme could be expected to produce any further effects.
Echo Cancellation for Teleconferencing System
In general, a teleconferencing system of the type having an I (.gtoreq.2)
channel loudspeaker system and a J (.gtoreq.1) channel microphone system
employs, for echo cancellation, such a configuration as shown in FIG. 3.
That is to say, an echo cancellation system 23 is composed of I-channel
echo cancellers 221, 222, . . . , 22J for processing I-input-one-output
time sequence signals, which are each interposed between all of I channels
of the receiving (loudspeaker) side and one channel of the sending
(microphone) side. In this instance, the echo cancellation system has a
total of I.times.J echo paths 15ij (1.ltoreq.i.ltoreq.I,
1.ltoreq.j.ltoreq.J). The I-channel echo cancellers 221, 222, . . . , 22J,
which are each connected between all of the I channels of the receiving
side and one channel of the sending side, have such a configuration as
shown in FIG. 4, which is an extended version of the configuration of the
echo canceller 14 depicted in FIG. 1. This is described in detail, for
example, in T.Fujii, S.Shimada "Multichannel Adaptive Digital Filter,"
Trans. IEICE Japan, '86/10, V ol.J69-A, No.10.
Now, consider the I-channel echo canceller 22J connected to an j-th channel
(1.ltoreq.j.ltoreq.J) of the sending side. The echo signal that is picked
up the j-th channel microphone 16J is obtained by adding together
respective received signals of all channels at the sending side after
propagation over respective echo paths 151j to 15Ij. Hence, it is
necessary to devise how to make the echo path estimation by evaluating
only one residual echo ej(k) in common to all the receiving side channels.
In the first place, for the received signal of each channel, the following
received signal vectors are generated in the received signal storage and
vector generating parts (171, 172, . . . 17r):
x.sub.1 (k)=[x.sub.1 (k), x.sub.1 (k-1), . . . , x.sub.1 (k-L.sub.1
+1)].sup.T (19)
x.sub.2 (k)=[x.sub.2 (k), x.sub.2 (k-1), . . . , x.sub.2 (k-L.sub.2
+1)].sup.T (20)
x.sub.I (k)=[x.sub.I (k), x.sub.I (k-1), . . . , x.sub.I (k-L.sub.I
+1)].sup.T (21)
where L.sub.1, L.sub.2, . . . , L.sub.I are the numbers of taps, which are
constants preset corresponding to reverberation times of the respective
echo paths 151j, 152j, . . . , 15Ij. The vectors thus generated are
combined in a vector combining part 24 as follows:
x(k)=[x.sub.1.sup.T (k), x.sub.2.sup.T (k), . . . , x.sub.I.sup.T
(k)].sup.T (22)
Also in the echo path estimating part 19j, estimated echo path vectors
h.sub.1j (k), h.sub.2j (k), . . . , h.sub.Ij (k), which are used to
simulate I echo paths between the respective receiving side channels and
the j-th sending side channel, are combined as follows:
h.sub.j (k)=[h.sub.Ij.sup.T (k), h.sub.2j.sup.T (k), . . . , h.sub.Ij.sup.T
(k)].sup.T (23)
In the case of using the NLMS algorithm, the updating of the combined
estimated echo path vector h.sub.j (k) is done as follows:
h.sub.j (k+1)=h.sub.j (k)+.mu.e.sub.j (k)x(k)/{x.sup.T (k)x(k)} (24)
In the estimated echo generating part 18j, an estimated echo y.sub.j (k)
for the echo y.sub.j (k) picked up in the j-th sending channel is
generated by the following inner product calculation:
y.sub.j (k)=h.sub.j.sup.T (k)x(k) (25)
By combining vectors in the respective channels into one vector, the flow
of basic processing becomes the same as in the one-channel echo canceller
of FIG. 1.
Of the defects of the conventional echo cancellation system for application
to the teleconferencing system composed of an I-channel speaker system and
a J-channel microphone system, the defect that the present invention is to
solve will be described in connection with a concrete example.
In the case of applying the conventional echo cancellation system to the
stereo teleconferencing system which sends and receives signals between
the points A and B over two channels as shown in FIG. 5, there is
presented a problem that each time a speaker at the point A moves or
changes to another, an echo from the point B by the speech at the point A
increases even if the echo paths 1511and 1521remain unchanged. The reason
for this is that the echo path | | |