|
Claims  |
|
|
What is claimed is:
1. A method for the acquisition and processing of acoustic signals inherent
in an acoustic event manifested in a given spatial region, the method
comprising the operations of:
acquiring the said acoustic signals at a plurality of different points in
the said spatial region,
generating from the said acoustic signals first signals indicative of
cross-spectra for a plurality of pairs of the said acoustic signals,
extracting phase information present in the said cross-spectra for the
purposes of acquisition and/or processing, and
locating at any moment the said acoustic event on the basis of delays
calculated on the basis of the estimation of signals obtained by
antitransformation of the said first signals.
2. A method according to claim 1, including the operation of reconstructing
the said acoustic event using the said acoustic signals in conjunction
with delays calculated on the basis of the estimation of the said first
signals.
3. A method according to claim 2, including basing the said reconstruction
of the acoustic event on a modeling of the acoustic signal to be
reconstructed substantially according to the formula:
s(t)=a.sub.0 s.sub.0 (t)+a.sub.1 s.sub.1 (t+.delta..sub.1 (t))+a.sub.2
s.sub.2 (t+.delta..sub.2 (t))+a.sub.3 s.sub.3 (t+.delta..sub.3 (t))
in which s(t) is the said acoustic signal to be reconstructed, so(t),
s.sub.0 (t), s.sub.1 (t), s.sub.2 (t), s.sub.3 (t) are the said acoustic
signals .delta..sub.1 (t), .delta..sub.2 (t), .delta..sub.3 (t) are the
said delays, and a.sub.0, a.sub.1, a.sub.2, a.sub.3 are numerical
coefficients.
4. A method according to claim 1, wherein the said acoustic signals are
converted into digital format after measurement.
5. A method according to claim 4, wherein the said conversion into digital
format occurs at a given sampling frequency which is higher than a
frequency band of the said acoustic event.
6. A method according to claim 1, wherein the operation for generating the
said first signals on the basis of the said acoustic signals comprises the
phases of:
extracting sampling frames from the said acoustic signals,
calculating an integral transform from the said frames,
calculating cross power spectra for a plurality of pairs of the integral
transform of the said frames,
calculating an antitransform of the said cross power spectra.
7. A method according to claim 6, wherein the phase for extracting the
frames comprises the phases of:
extracting frames having predetermined lengths t.sub.f, corresponding to a
predetermined number N of samples, with a pitch t.sub.a,
weighting the said frames by means of a window.
8. A method according to claim 7, wherein the said window is a Blackman
window.
9. A method according to claim 6, wherein when a sampling frequency F.sub.c
=48 kHz, N is selected such that it is of the order of 1024 and t.sub.f is
of the order of 21.33 ms and t.sub.a is of the order of t.sub.f /2=10.66
ms.
10. A method according to claim 6, wherein the integral transform of the
frames is a Fourier transform.
11. A method according to claim 10, characterized in that the Fourier
transform is a fast Fourier transform or FFT.
12. A method according to claim 6, wherein the said cross power spectra are
normalised cross power spectra.
13. A method according to claim 5, wherein the phase of calculating the
cross power spectra comprises:
the phase of calculating for each of said pairs of the transform a vector
.rho..sub.i having n components substantially in accordance with the
formula
.rho..sub.i =FFT.sup.-1 [Y.sub.j ]
when j=1, 2, 3, the pairs being X.sub.0, X.sub.1 ; X.sub.0, X.sub.2 ;
X.sub.0, X.sub.3 ; and the l-th complex generic component of the vector
Y.sub.j being defined as:
##EQU4##
in which X.sub.j * is the conjugate complex vector of the vector X.sub.j.
14. A method according to claim 13, characterized in that the components of
the vector .rho..sub.j are normalised.
15. A method according to claim 1, wherein the operation for generating the
said first signals on the basis of the said acoustic signals comprises the
phases of:
extracting sampling frames from the said acoustic signals,
calculating an integral transform from the said frames,
calculating cross power spectra for a plurality of pairs of the integral
transform of the said frames,
calculating an antitransform of the said cross power spectra,
wherein the phase of calculating the cross power spectra comprises:
the phase of calculating for each of the pairs a vector .rho..sub.i having
n components substantially in accordance with the formula
.rho..sub.i =FFT.sup.-1 [Y.sub.j ]
when j=1, 2, 3, the pairs being X.sub.0, X.sub.1 ; X.sub.0, X.sub.2 ;
X.sub.0, X.sub.3 ; and the l-th complex generic component of the vector
Y.sub.j being defined as:
##EQU5##
in which X.sub.j * is the conjugate complex vector of the vector X.sub.j,
and
wherein the method includes the phase of estimating the relative delay
between pairs of frames of signals which comprises the phase of using the
vector .rho..sub.j to calculate an index of coherence between the frame
x.sub.0 and a frame obtained by disphasing the frame x.sub.j by a number
of samples i corresponding to a delay .tau..sub.i =i/F.sub.c, equivalent
to an index of coherence between the acoustic signal S.sub.0 and the
acoustic signal S.sub.j disphased by a delay .tau..sub.i.
16. A method according to claim 12, wherein the first signals comprise
coherence functions C.sub.j (t, .tau.) consisting of the vectors
.rho..sub.j respectively.
17. A method according to claim 6, wherein the sample frames are extracted
in pairs each comprising a first frame present in each pair, and a second
frame selected from the frames which are different from the first frame
common to all the pairs such that there is one pair for each of the frames
different from the said first frame.
18. A method according to claim 6, wherein the said antitransform is an
inverse Fourier transform.
19. A method according to claim 18, wherein the said inverse Fourier
transform is an inverse fast Fourier transform or FFT.
20. A method according to claim 1, wherein the said first signals are
estimated by means of filtering and interpolation.
21. A method according to claim 20, wherein the filtering of said first
signals is activated by the use of at least one finite impulse response
filter or FIR.
22. A method according to claim 1, wherein the said first signals are
submitted to an operation for the search of the maximum of the first
signals to generate second signals.
23. A method according to claim 12, wherein the first signals comprise
coherence functions C.sub.j (t, .tau.) consisting of the vectors
.rho..sub.j respectively, wherein the said first signals are submitted to
an operation for the search of the maximum of the first signals to
generate second signals, and wherein the phase of searching for the
maximum comprises the phases of:
searching for the maximum of the said coherence functions, filtrated and
interpolated, C.sub.j ' (t, .tau.) when a delay .tau.' is varied,
generating functions M.sub.j (t) defined substantially according to the
formula
M.sub.j (t)=maxC.sub.j '(t,.tau.') .tau.'
when t is varied, and
calculating the delays (.delta..sub.1, .delta..sub.2, .delta..sub.3) as
delays .delta..sub.j (t)=.tau.'.sub.max corresponding to the functions
M.sub.j (t).
24. A method according to claim 1, wherein the phase of detecting the said
acoustic event comprises the phases of:
generating a detection signal on the basis of the said second signals,
detecting that the detection signal has passed a predetermined threshold.
25. A method according to claim 12, wherein the first signals comprise
coherence functions C.sub.j (t, .tau.) consisting of the vectors
.rho..sub.j respectively, the said first signals are submitted to an
operation for the search of the maximum of the first signals to generate
second signals, wherein the phase of searching for the maximum comprises
the phases of:
searching for the maximum of the coherence functions, filtrated and
interpolated, C.sub.j '(t, .tau.') when a delay .tau.' is varied,
generating functions M.sub.j (t) defined substantially according to the
formula
##EQU6##
when t is varied, and calculating the delays (.delta..sub.1,
.delta..sub.2, .delta..sub.3) as delays .delta..sub.j (t)=.tau.'.sub.max
corresponding to the functions M.sub.j (t),
wherein the phase of detecting the said acoustic event comprises the phases
of:
generating a detection signal on the basis of the said second signals,
detecting that the detection signal has passed a predetermined threshold,
and
wherein the said detection signal is substantially generated according to
the formula
d(t)=max[M.sub.1 (t),M.sub.2 (t),M.sub.3 (t)]
in which d(t) is the said detection signal.
26. A method according to claim 1, wherein the method comprises the
operation of locating at any moment the said acoustic event on the basis
of delays calculated on the basis of the estimation of signals obtained by
antitransformation of the said first signals,
wherein the operation for generating the said first signals on the basis of
the said acoustic signals comprises the phases of:
extracting sampling frames from the said acoustic signals,
calculating an integral transform from the said frames,
calculating an antitransform of the said cross power spectra, and
wherein the operation for locating the said acoustic event comprises the
phases of:
calculating a branch of a hyperbola having its focus at one of the two
detection points, for any pair of detection points corresponding to the
pairs of frames,
calculating an area, defined by the said branches of a hyperbola, inside
which the acoustic event is located.
27. A method according to claim 1, wherein the operation of generating said
first signals on the basis of said acoustic signals comprises the phases
of:
calculating an integral transform from the said acoustic signals, and
calculating cross power spectra for a plurality of pairs of the integral
transform, wherein the said cross power spectra are normalized cross power
spectra.
28. A system for the acquisition and processing of acoustic signals
inherent in an acoustic event manifested in a given spatial region,
comprising:
means for acquiring the said acoustic signals at a plurality of different
points in the said spatial region,
means for generating from the said acoustic signals first signals
indicative of cross-spectra for a plurality of pairs of the said acoustic
signals,
means for extracting phase information present in the said cross-spectra,
including:
means for locating at any moment the said acoustic event on the basis of
delays calculated on the basis of delays calculated on the basis of the
estimation of signals obtained by antitransformation of the said first
signals. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates in general to methods and systems for the
acquisition and processing of acoustic signals, such as for example the
methods and systems for detecting, locating and reconstructing acoustic
signals. Typical examples of applications of systems of this type are
voice acquisition and speaker location.
DESCRIPTION OF THE PRIOR ART
The acquisition of a voice message for the purposes of recognising, coding
and verifying speakers, etc. is conventionally performed by the use of a
fixed ("head-mounted") microphone in front of the speaker or held in the
speaker's hand ("hand-held"). These devices have disadvantages associated
with the low signal/noise ratio and with the dependence of the performance
of the system on the manner in which it is used (distance between the
mouth and the microphone, knocks and vibrations, etc.). The use of an
array of microphones can overcome some of these problems and also permits
easier interaction between the user and the system.
The technical literature over the past ten years illustrates various
examples of the use of arrays of microphones for the acquisition of voice
messages.
Reference can be made, for example, to the articles "Some Analyses of
Microphone Arrays for Speech Data Acquisition" by H. F. Silverman, IEEE
Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-35, no. 12,
Dec. 1987 and "Computer-steered Microphone Arrays for Sound Transduction
in Large Rooms" by J. L. Flanagan, J. D. Johnston, R. Zahn, G. W. Elko, J.
Acoust. Soc. Am., 78(5), November, 1985, pp 1508-1518.
The acquisition of voice messages by means of an array of microphones has
conventionally been achieved using techniques typical of the processing of
underwater acoustic signals and radar signals, since the object is to
detect the position of the acoustic source by means of more sensors
distributed about the space and to utilise this knowledge to improve the
ratio between useful signals and ambient noise.
At times, these techniques enable the information coming from the source to
be extracted, without resorting to an express detection of its position
(for example, beamforming techniques, LMS adaptive filtering: see, for
example, the articles "Time Delay Estimation Using the LMS Adaptive
Filter-Static Behaviour" by F. A. Reed, P. L. Feintuch, N. J. Bershad,
IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-29, no.
3, June, 1981 and "On Time Delay Estimation Involving Received Signals" by
C. Y. Wuu, A. E. Pearson, IEEE Trans. on Acoustics, Speech and Signals
Processing, vol. ASSP-32, no.4, August, 1976).
The problem of locating an acoustic source by the use of an array of
microphones is substantially due to the problem of measuring time delays
between the signals acquired from different sensors. When the relative
delays with which the sound wave has reached the different microphones are
known, the curve of the incident wave front emitted by the acoustic source
can be reconstructed and traced back to its centre, at which the source
which produced it is assumed to be located.
The most widely used technique for estimating the relative delay between
two signals is based on finding the maximum of the cross-correlation: see,
for example, the articles "An Algorithm for Determining Talker Location
using a Linear Microphone Array and Optimal Hyperbolic Fit" by H. F.
Silverman, Proc. Speech and Natural Language Workshop DARPA, June, 1990,
pp. 151-156, and "A Two-stage Algorithm for Determining Talker Location
from Linear Microphone Array Data" by H. F. Silverman, S. E. Kirtman,
Computer Speech and Language (1992) 6, pp. 129-152.
However, the efficacy of this method is largely influenced by the spectral
content of the signals in question. For example, in the case of
narrow-band signals (such as a whistle) or signals of high periodicity
(such as a spoken sound), the estimation of the delay becomes critical or
even impossible in the presence of echoes and reverberations: in these
cases it is most efficient to attempt to extract the most useful
information for assessing the delay and thus the phase delay directly.
The phase of detecting an acoustic event consists in preprocessing the
signals acquired from the microphones, for determining the acoustically
significant time segments on which a subsequent source-locating operation
will be performed.
In the general case of sources of unknown and arbitrary acoustic events it
is impossible to make assumptions a priori about the spectral
characteristics of the signals emitted and the detection method cannot be
based on particular signal models.
The characterisation in terms of power of the acoustic signal is the most
direct and simplest which can be taken into consideration for performing
the detection method: overcoming fixed or adjustable thresholds (dependent
on the estimated noise level) can be sufficient in cases in which the
signal/noise ratio is not too low.
As said above, some conventional methods of processing signals acquired by
means of arrays of microphones enable an optimum signal to be
reconstructed without the position of the acoustic source being estimated
beforehand; this signal can be considered equivalent to the initial
acoustic message, all the undesired acoustic components, attributable to
secondary sources, being attenuated.
OBJECTS AND SUMMARY OF THE INVENTION
The object of the present invention is to provide a method and a system for
the acquisition and processing of acoustic signals inherent in an acoustic
event which enable the above disadvantages with respect to the prior art
to be overcome or at least reduced.
In accordance with the present invention, this object is achieved by means
of a method and a system having the characteristics indicated in the
claims following the present description.
More specifically, the solution according to the invention has
characteristics of strength, speed of calculation, accuracy and
insensitivity to interference which are superior to the prior art systems.
Solutions of this type can be used for the acquisition of a voice message
or other types of acoustic event and for their location.
The present invention provides for the use of at least one array of
microphones in a system enabling the acquisition of a general acoustic
message in a noisy environment to be improved.
The present invention also provides for the possibility of processing
information extracted from the signals acquired by means of the array of
microphones, also enabling the speaker or the acoustic source which
produced the message to be located.
Both the detection and the location of the message are performed, in an
original manner, using the phase information present in the normalised
cross-spectrum (estimated by means of a fast Fourier transform or FFT)
relative to the signals acquired from a pair of microphones in the array.
The successive derivation of a new version of the message, improved from
the point of view of the useful signal/ambient noise ratio relative to the
single acquisitions attributed to each microphone in the array, is
performed on the basis of the information obtained during the phase in
which the message itself is detected and located: thus, still using simply
a linear combination of the signals from the microphones in the array,
suitably delayed, this method of reconstructing the signals is also
distinguished by the originality with which the information relating to
the disphasing between the signals acquired via the different microphones
in the array is used.
What is to be understood by the term "array of microphones" in the present
description and the following claims is a device composed of a plurality
of microphones, preferably acting in all directions, which are aligned
with respect to one another and at regular spacings from one another.
Although it is not specifically mentioned in the following description, it
is in all cases also possible to perform the invention with other types of
microphones spatially distributed in a different manner: for example, in
the manner described in the article "An approach of Dereverberation Using
Multi-Microphone Sub-Band Envelope Estimation" by H. Wang and F. Itakura,
Proc. IEEE Int. Conf. on Acoust. Speech Signal Processing, May, 1991, pp.
953-956.
It is self-evident that the expression "microphone" as used in the present
context generally embraces all mechanical-electrical transducers which can
convert an acoustic vibratory phenomenon (in which the ultra-sounds are
comprised) into a processable electrical signal.
It will thus be appreciated that the microphones are connected to an
analogue-to-digital conversion system operating at a sufficiently high
sampling frequency (for example 24-48 kHz) synchronously between the
various channels.
Specifically, in the present description reference is made to an embodiment
using four microphones, although, theoretically, three would be sufficient
for locating the source; however, a larger number of microphones can
ensure that the system performs better.
The method described below refers in particular to the processing of
acoustic messages consisting of a preliminary detection of the event
itself, the accurate location of the position in which this event was
generated, and, finally, of an optional reconstruction of a version of the
original message cleared of the noise and reverberation components, etc.
In this way it is possible to consider using the module for locating
and/or detecting the acoustic event independently of the fact that the
message then has to be converted into a version with optimum quality for
the purposes of coding and voice recognition.
It can thus be assumed that the method and system according to the
invention operate efficiently on sounds having their origin in a zone
which is spatially restricted and the corresponding acoustic pressure wave
of which has particular directionality features, unlike background noise
which is assumed to be diffused almost uniformly in the environment.
Thus, the present description does not take into consideration cases in
which speakers (or generic acoustic sources) emit simultaneous messages
having comparable dynamics and for which the method described would be
integrated (in a known manner) with methods for separating the sources.
In a particularly advantageous embodiment, the present invention provides
for the use of a technique of estimating phase delays, such as the one
described in the article "The Generalized Correlation Method for
Estimation of Time Delay" by C. H. Knapp, G. C. Carter, IEEE Trans. on
Acoustics, Speech and Signal Processing, Vol. ASSP-24, no. 4, August,
1976, never used previously in this area of acoustic analysis.
A technique of this type uses the Fourier antitransform of a version of the
cross-spectrum of the two signals in which only the phase information is
maintained. Thus, amplitude information, which is irrelevant for measuring
delays when the signal/noise ratio is sufficiently high, is eliminated
from the cross-spectrum of the signals.
The application to real signals acquired in a reverberating environment has
demonstrated that the efficiency of this method is to a large extent
independent of the type of source to be located (voice, whistling,
explosions, various types of noises). It is furthermore possible to
discriminate signals of a directional nature from other acoustic phenomena
of a different type (background noise, reverberations, resonance), even if
they are of the same intensity. The cost in terms of computation is
comparable to that of the most efficient cross-correlation calculus and
less than that of other delay estimators based on adaptive filtering.
The present invention thus proposes a novel detection method based on a
function of coherence between pairs of signals exceeding a threshold, the
same function also being used in the subsequent location phase. A function
of this type represents an indication of reliability of the presence of an
acoustic event, of a duration which is also very short and has obvious
directionality features.
The invention further proposes a method which enables an optimum signal,
such as linear combinations of the signals acquired by means of
microphones and disphased according to the estimation of the position of
the source (or the delays between the various pairs) supplied by the
locating module, to be reconstructed.
The method and system according to the invention can be used mainly for the
acquisition of a voice message in a noisy environment, without the need
for the speaker to speak the message in front of the microphone. If the
acquisition environment is noisy and reverberating, the message is cleared
of some of the undesired components. The message acquired in this manner
can then be supplied to a coding system (for teleconference or voice
message applications) or to a voice recognition system.
DETAILED DESCRIPTION OF THE INVENTION
Further advantages and characteristics of the present invention will appear
from the following description, given purely by way of non-limiting
example and with reference to the appended drawings, in which:
FIG. 1 shows schematically the operating conditions of the system according
to the present invention,
FIG. 2 is a schematic block diagram of the system according to the present
invention,
FIG. 3 is a schematic block diagram of part of the system according to the
present invention, and
FIG. 4 is a schematic block diagram of a block of the part of the system
illustrated in FIG. 3.
FIG. 1 illustrates schematically an environment in which the system
operates. The acoustic source (speaker, generic sound sources, etc. that
is, the acoustic event which is to be detected) is indicated AS, whilst
the array of microphones consists of four microphones P.sub.0, P.sub.1,
P.sub.2, P.sub.3 shown aligned along an axis X.
The relative positions of the microphones and of the acoustic source are
expressed in the form of co-ordinates in a cartesian plane x, y. The
acoustic source AS emits wave fronts which are detected in different times
and ways at the different points in the spatial region in which they are
distributed, the microphones in the array P.sub.0, P.sub.1, P.sub.2,
P.sub.3 thus allowing the functions of the system to develop at different
points.
FIG. 2 shows the general diagram of the system. The signals are acquired by
the use of four microphones P.sub.0, P.sub.1, P.sub.2, P.sub.3, acting in
all directions, which are supposed to be equally spaced relative to one
another (for example, a 15 cm spacing between two adjacent microphones)
and are connected to four analogue-to-digital converters A/D.sub.0,
A/D.sub.1, A/D.sub.2, A/D.sub.3 operating at a given sampling frequency
F.sub.c, of, for example, 48 kHz. The four outputs of these acquisition
modules, indicated S.sub.0, S.sub.1, S.sub.2, S.sub.3 (S.sub.i in which
i=0, . . . , 3), are connected to a processing module generally indicated
RLR (detection of the events, location of the source and reconstruction of
the signal).
FIG. 3 shows the operating block diagram of the module RLR. At the inlet,
the module RLR receives all the signals S.sub.i (in which i=0, . . . , 3);
the outputs of this module consist of a pair of co-ordinates X and Y (if
necessary with an angular co-ordinate .THETA. which identifies the
direction of the source AS), of a detection index d and of a reconstructed
signal RS.
In the following, the modules constituting the module RLR and the
operations they perform to obtain the said outputs will be described.
In practice, the module RLR can be constituted by an electronic processing
device such as a minicomputer or by a specialised processor specifically
programmed for this purpose. The criteria for producing, programming and
using computers and/or processors of this type are well known in the art
and need not therefore be described herein.
The module RLR comprises a first series of modules EST.sub.0, EST.sub.1,
EST.sub.2, EST.sub.3 (EST.sub.i, where i=0, . . . , 3) which convert the
signals S.sub.i (from the microphones P.sub.0, P.sub.1, P.sub.2, P.sub.3),
received respectively at the input, into numerical sampling frames and
furthermore arrange the windows for the frames obtained. The output of the
modules EST thus consists of the frames indicated x.sub.0, x.sub.1,
x.sub.2, x.sub.3 respectively (x.sub.i where i=0, . . . , 3).
A second series of modules, indicated CFFT.sub.0, CFFT.sub.1, CFFT.sub.2,
CFFT.sub.3 (CFFT.sub.i, where i=0, . . . , 3), the inputs of which are
connected to the respective outputs of the modules EST.sub.i, perform the
fast Fourier transform calculus (or FFT)--or optionally another integral
transform--for all the frames. The outputs of the modules CFFT.sub.i in
which i=0, . . . , 3 are designated X.sub.0, X.sub.1, X.sub.2, X.sub.3
(X.sub.i, where i=0, . . . , 3) respectively.
A third series of modules, indicated CS.sub.1, CS.sub.2, CS.sub.3,
(CS.sub.i, in which i=1, . . . , 3), calculates the cross-spectra, or
normalised cross (power) spectra estimated by the use of an FFT (Fast
Fourier Transform), between pairs of frames. Each of the modules CS.sub.i
in fact receives as input the outputs of two modules of the preceding
series, that is, of the modules CFFT.sub.i. In particular, each module
CS.sub.i receives as input the output X.sub.i of the corresponding module
CFFT.sub.i and then the output X.sub.0 of the module CFFT.sub.0.
In this way, the modules CS.sub.i calculate the normalised cross-spectrum
of the pairs of frames (X.sub.0, X.sub.1), (X.sub.0, X.sub.2), (X.sub.0,
X.sub.3) extracted from the signals S.sub.0, S.sub.1, S.sub.2, S.sub.3.
The modules CS.sub.i furthermore calculate the inverse FFTs of the
normalised cross-spectra. The outputs of the modules CS.sub.i consist of
the signals C.sub.1, C.sub.2, C.sub.a (C.sub.i, where i=1, . . . , 3)
respectively.
A fourth series of modules, indicated ICM.sub.1, ICM.sub.2, ICM.sub.3
(ICM.sub.i where i=1, . . . , 3), interpolates the signals C.sub.1,
C.sub.2, C.sub.3, obtained in this manner, and searches for their time
maxima. The outputs of the modules ICM.sub.i are provided by the pairs of
signals M.sub.1 and .delta..sub.1, M.sub.2 and .delta..sub.2, M.sub.3 and
.delta..sub.3.
A module RIL performs the detection function on the basis of the signals
M.sub.1, M.sub.2, M.sub.3. The output of the module RIL is the signal d.
A module LOC performs the location function, that is, determining the
direction .THETA. from which the wave front arrives and calculating the
co-ordinates (X, Y) of the source. The module LOC operates on the basis of
the signals .delta..sub.1, .delta..sub.2, .delta..sub.3 and emits the
signal .THETA. and the pair of co-ordinates X, Y at the output.
A module RIC performs the reconstruction function, that is, constructing a
new version of the acoustic message represented by the signal emitted at
the output RS. The module RIC operates on the basis of the input signals
.delta..sub.1, .delta..sub.2, .delta..sub.3 and S.sub.0, S.sub.1, S.sub.2,
S.sub.3.
The various modules constituting the system according to the present
invention and the operations they perform will now be described in more
detail module by module.
Modules EST.sub.i
For each signal S, each module EST.sub.i extracts respective frames x.sub.i
of a length t.sub.f ms, corresponding to N samples, with an analysing
pitch of t.sub.a ms. Each frame is then weighted with a Blackman window
defined in the method described in "Digital Signal Processing" by A. V.
Oppenheim, R. W. Schafer, Prentice Hall 1975. The use of the Blackman
window has proved more effective for the purposes of the present invention
than the use of a conventional Hamming window.
Modules CFFT.sub.i
The modules CFFT.sub.i receive as input the frames x.sub.i of N samples,
extracted from the signals S and weighted as described above. The frames
then undergo an FFT to produce a complex sequence X.sub.i of N components.
One possible calculation of the FFT is described for example in the
above-mentioned article by Oppenheim. The embodiment described is set up
such that Fc=48 kHz, N=1024 (and consequently t.sub.f =21.33) and t.sub.a
=t.sub.f /2=10.66. It will be appreciated that the above values need not
be interpreted in a strictly limitative sense. They are nevertheless
indicative of the respective orders of magnitude in which parameters of
this type are selected.
Modules CS.sub.i
In practice all modules CS.sub.i comprise three submodules, shown in FIG. 4
for better understanding.
A first submodule X-SP calculates the cross-spectrum of a pair of complex
sequences X.sub.0, X.sub.i. A second submodule NORM normalises the
abovementioned cross-spectrum calculated by the submodule X-SP generating
a complex vector Y.sub.i at the output. Finally, a third submodule
CFFT.sup.-1 performs an inverse FFT of the said vector Y.sub.i.
These operations, described briefly above, will now be described in further
detail, particularly as regards the mathematical aspect.
For each analysis moment t, for each pair of sequences (X.sub.0, X.sub.1),
(X.sub.0, x.sub.2), (X.sub.0, X.sub.3) the vector P.sub.j of N components
is calculated and defined as:
.rho..sub.i =FFT.sup.-1 [Y.sub.j ]
when j=1, 2, 3, where the l-th generic complex component of the vector
Y.sub.j is defined as:
##EQU1##
which X.sub.j * indicates the conjugate complex vector of the vector
X.sub.j.
The components .rho..sub.j (i) of the vector .rho..sub.j express a measure
of coherence between the original signal frames when the relative delay
.tau..sub.i is equal to i sampling intervals. A positive delay k/F.sub.c
corresponds to the k-th generic component of the first half of the vector
(components from index 0 to index N/2-1); a negative delay (or a leader)
equal to (N-k)/F.sub.c corresponds to the k-th generic component of the
second half of the vector (components from index N/2 to index N-1).
In ideal conditions, in which the two signals are equal except for a scale
factor and a delay .tau..sub.0, equal to a whole number of sampling
intervals, a sequence .rho..sub.j consisting of a pulse centred on the
component corresponding to the delay .tau..sub.0 would be obtained. In
practice, .rho..sub.j (i) can be interpreted as an index of coherence
between the frame X.sub.0 and the frame obtained by disphasing X.sub.j of
a number of samples corresponding to the delay .tau..sub.i =i/F.sub.c, or,
in the case of a fixed acoustic source, as an index of coherence between
the signal S.sub.0 and the signal S.sub.j disphased by .tau..sub.j. The
components of the vector .rho. are normalised between 0 and 1. As defined
above, the analysis performed on the frames every t.sub.a ms leads to the
determination of three coherence functions C.sub.1 (t, .tau.), C.sub.2 (t,
.tau.), C.sub.3 (t, .tau.) consisting at any moment t=n.multidot.t.sub.a
of the vectors .rho..sub.1, .rho..sub.2, .rho..sub.3, respectively.
Modules ICM.sub.i
In order to render the abovementioned coherence information more detailed,
each vector .rho..sub.j is reprocessed in the modules ICM by means of an
interpolation and filtering operation. In this way the estimation of the
delay between two signals can be made more accurate.
In practice, as a result of the function C.sub.j (t, .tau.) being applied
to the vector .sub.j at any moment t=n.multidot.t.sub.a of an operation
(described, for example, in the article "Optimum FIR Digital Filter
Implementation for Decimation, Interpolation and Narrow Band Filtering" by
R. E. Crochiere, L. R. Rabiner, IEEE Trans. on Acoustics, Speech and
Signal Processing, Vol, ASSP-23, no. 5, pp. 444-456, October, 1975), a new
coherence function C'.sub.j (t, .tau.') is obtained in which the discrete
variable .tau.' has a larger resolution than the discrete variable .tau..
For each coherence function C'.sub.j (t, .tau.') a search is then performed
at any moment t=n.multidot.t.sub.a, for the maximum of the function
itself, when the delay .tau.' is varied (in practice, the position of this
maximum expresses the phase information present in the cross-spectra
calculated above). The maximum of this function when .tau.' is varied is
defined as M.sub.j (t) and when j=1,2,3:
##EQU2##
and the delay .tau.'.sub.max corresponding thereto is defined as
.delta..sub.j (t).
Module RIL: Detection
The detection of the acoustic event is based at any moment t on the values
M.sub.1 (t), M.sub.2 (t), M.sub.3 (t). A detection index d(t) such as:
d(t)=max[M.sub.1 (t),M.sub.2 (t),M.sub.3 (t)]
is derived from these functions.
Whenever this index exceeds an empirically predefined threshold S.sub.d,
for example in the present embodiment the set up is such that S.sub.d
=0.7, an acoustic event is considered to be initiated. The event is
considered to be terminated when the said index returns below this
threshold.
Module LOC: Location
The location operation of the acoustic source is performed in any time
interval in which detection has provided a positive result (see FIG. 1).
At any moment t, the value .delta..sub.j (t) can be returned to the
direction in which the wave front arrived, with respect to the centre of
the pair of microphones (O, j): this direction can be expressed, in
angular terms, as:
.THETA..sub.j (t)=arccos(v.delta..sub.j (t)/d.sub.j)
in which v is the speed of the sound and d.sub.j is the distance between
the microphone P.sub.0 and the microphone P.sub.j. For any moment t, a
direction .THETA..sub.j (t), corresponding to the delay .delta..sub.j (t),
is associated with each pair of microphones (O, j) .
This modeling is based on the assumption that the acoustic pressure wave
has reached the array in the form of a flat wave. The assumption is no
longer valid in the case in which the source is a short distance away from
the array.
In this case, which is the one in which the embodiment described is used,
the possible points which may give rise to the acoustic event in question
plot a branch of a hyperbola which has its focus in the position of one of
the two microphones. The use of four microphones, and thus of three pairs,
enables three branches of a hyperbola to be determined, the intersections
of which delimit the area inside which the source should be located.
The following procedure is used to calculate the intersection between two
branches of a hyperbola, for example, corresponding to the pairs (0, 1)
and (0, 2).
With the co-ordinates of the microphones 0, 1, 2 being set as p.sub.0,
p.sub.1, p.sub.2, along the axis of the array and the delays estimated by
each pair being indicated as .delta..sub.01 and .delta..sub.02, the
co-ordinates of the point of intersection are given as:
##EQU3##
The co-ordinates x.sub.p13, y.sub.p13, x.sub.p23, y.sub.23 of the points of
intersection between the other two pairs of branches of a hyperbola are
determined in a similar manner.
The co-ordinates (x, y) of the acoustic source are derived from these three
points, as the barycentre of the triangle of which they form the vertices.
Module RIC: Reconstruction
The reconstruction of the signals on the basis of the signals s.sub.0 (t),
s.sub.1 (t), s.sub.2 (t), s.sub.3 (t) and of the delays .delta..sub.1 (t),
.delta..sub.2 (t), .delta..sub.3 (t), respectively between the pairs of
signals (0, 1), (0, 2), (0, 3) is based on a modeling of the desired
signal, of the following type:
Using this modeling, the array can be "directed" at any moment towards the
position determined from the given delays.
It will be appreciated that, as the principle of the invention remains the
same, the details of construction and forms of embodiment may vary widely
with respect to those described and illustrated, without thereby departing
from the scope of the present invention.
* * * * *
|
|
|
|
|
Description  |
|