WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Blind signal processing system employing information maximization to recover unknown signals through unsupervised minimization of output redundancy    

Get related patents on CD
United States Patent5706402   
Link to this pagehttp://www.wikipatents.com/5706402.html
Inventor(s)Bell; Anthony J. (San Diego, CA)
AbstractA neural network system and unsupervised learning process for separating unknown source signals from their received mixtures by solving the Independent Components Analysis (ICA) problem. The unsupervised learning procedure solves the general blind signal processing problem by maximizing joint output entropy through gradient ascent to minimize mutual information in the outputs. The neural network system can separate a multiplicity of unknown source signals from measured mixture signals where the mixture characteristics and the original source signals are both unknown. The system can be easily adapted to solve the related blind deconvolution problem that extracts an unknown source signal from the output of an unknown reverberating channel.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Inventor     Bell; Anthony J. (San Diego, CA)
Owner/Assignee     The Salk Institute for Biological Studies (La Jolla, CA)
Patent assignment
All assignments
Company News
Publication Date     January 6, 1998
Application Number     08/346,535
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     November 29, 1994
US Classification    
Int'l Classification    
Examiner     Davis; George B.
Assistant Examiner    
Attorney/Law Firm     Baker, Maxham, Jester & Meador
Address
Parent Case    
Priority Data    
USPTO Field of Search    
Patent Tags     blind signal processing employing information maximization to recover unknown signals through unsupervised minimization output redundancy
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5539832
Weinstein

Jul,1996

[0 after 0 votes]
5383164
Sejnowski
367/134
Jan,1995

[0 after 0 votes]
5272656
Genereux
708/322
Dec,1993

[0 after 0 votes]
4965732
Roy, III
342/147
Oct,1990

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. A method performed in a neural network having input means for receiving a plurality J of input signals (X.sub.j) and output means for producing a plurality I of output signals (U.sub.i) each said output signal U.sub.i representing a combination of said input signals (X.sub.j) weighted by a plurality I of bias weights (W.sub.i0) and a plurality I.sup.2 of scaling weights (W.sub.ij) such that (U.sub.i)=(W.sub.ij)(X.sub.j)+(W.sub.i0), said method minimizing the information redundancy among said output signals (U.sub.j), wherein 0<i.ltoreq.I>1 and 0<j.ltoreq.J>1 are integers, said method comprising:

(a) selecting initial values for said bias weights (W.sub.i0) and said scaling weights (W.sub.ij);

(b) producing a plurality I of training signals (Y.sub.i) responsive to a transformation of said input signals (X.sub.j) such that Y.sub.i =g(U.sub.i), wherein g(x) is a nonlinear function and the Jacobian of said transformation is J=det(.differential.Y.sub.i /.differential.X.sub.j) when J=I; and

(c) adjusting said bias weights (W.sub.i0) and said scaling weights (W.sub.ij) responsive to one or more samples of said training signals (Y.sub.i) such that each said bias weight Wi.sub.i0 is changed proportionately to a corresponding bias measure .DELTA.W.sub.i0 accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot..differential.(ln.vertline.J.vertline.)/.differential. W.sub.ij accumulated over said one or more samples, wherein .epsilon.>0 is a learning rate.

2. The method of claim 1 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation ##EQU22## and said .DELTA.W.sub.i0=.epsilon..multidot.(-r.vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)) accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij)).multidot.rX.sub.j .vertline.Y.sub.i.vertline..sup.r-1 sgn(Y.sub.i)) accumulated over said one or more samples.

3. The method of claim 1 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g.sub.1 (x)=tanh(x) and g.sub.2 (x)=(1-e.sup.-x).sup.-1 and said .DELTA.W.sub.i0 selected from the group consisting essentially of .DELTA..sub.1 W.sub.i0 =.epsilon..multidot.(-2Y.sub.i) and .DELTA..sub.2 W.sub.i0 =.epsilon..multidot.(1-2Y.sub.i) accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to the a corresponding scaling measure .DELTA.W.sub.ij selected from the group consisting essentially of .DELTA..sub.1 W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij))-2X.sub.j Y.sub.i) and .DELTA..sub.2 W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij))+X.sub.j (1-2Y.sub.i)) accumulated over said one or more samples.

4. A neural-network implemented method for recovering one or more of a plurality I of independent source signals (S.sub.i) from a plurality J>I of sensor signals (X.sub.j) each including a combination of at least some of said source signals (S.sub.i) wherein 0<i<I>1 and 0<j.ltoreq.J>I are integers, said method comprising:

(a) selecting a plurality I of bias weights (W.sub.i0) and a plurality I.sup.2 of scaling weights (W.sub.ij);

(b) adjusting said bias weights (W.sub.i0) and said scaling weights (W.sub.ij) by repeatedly performing the steps of:

(b.1) producing a plurality I of estimation signals (U.sub.i) responsive to said sensor signals (X.sub.j) such that (U.sub.i)=(W.sub.ij)(X.sub.j)+(W.sub.i0),

(b.2) producing a plurality I of training signals (Y.sub.i) responsive to a transformation of said sensor signals (X.sub.j) such that Y.sub.i =g(U.sub.i), wherein g(x) is a nonlinear function and the Jacobian of said transformation is J=det(.differential.Y.sub.i /.differential.X.sub.j) when J=I, and

(b.3) adjusting each said bias weight W.sub.i0 and each said scaling weight W.sub.ij responsive to one or more samples of said training signals (Y.sub.i) such that said each bias weight W.sub.i0 is changed proportionately to a bias measure .DELTA.W.sub.i0 accumulated over said one or more samples and said each scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot..differential.(ln.vertline.J.vertline.)/.differential. W.sub.ij accumulated over said one or more samples, wherein .epsilon.>0 is a learning rate; and

(c) producing said estimation signals (U.sub.i) to represent said one or more recovered source signals (S.sub.i).

5. The method of claim 4 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation ##EQU23## and said .DELTA.W.sub.i0 =.epsilon..multidot.(-rX.sub.j .vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)) accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij))-rX.sub.j .vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)) accumulated over said one or more samples.

6. The method of claim 4 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g.sub.1 (x)=tanh(x) and g.sub.2 (x)=(1-e.sup.-x).sup.-1 and said adjusting comprises:

(c) adjusting said bias weights (W.sub.i0) and said scaling weights (W.sub.ij) responsive to one or more samples of said training signals (Y.sub.i) such that each said bias weight W.sub.i0 is changed proportionately to a corresponding bias measure .DELTA.W.sub.i0 selected from the group consisting essentially of .DELTA..sub.1 W.sub.i0 =.epsilon..multidot.(-2Y.sub.i) and .DELTA..sub.2 W.sub.i0 =.epsilon..multidot.(1-2Y.sub.i) accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to the a corresponding scaling measure .DELTA.W.sub.ij selected from the group consisting essentially of .DELTA..sub.l W.sub.ij =.DELTA..multidot.((cof(W.sub.ij)/det(W.sub.ij))-2X.sub.j Y.sub.i) and .DELTA..sub.2 W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det (W.sub.ij))+X.sub.j (1-2Y.sub.i)) accumulated over said one or more samples.

7. A method implemented in a transversal filter having an input for receiving a sensor signal X that includes a combination of multipath reverberations of a source signal S and having a plurality I of delay line tap output signals (T.sub.i) distributed at intervals of one or more time delays .tau., said source signal S and said sensor signal X varying with time over a plurality J.gtoreq.I of said time delay intervals .tau. such that said sensor signal X has a value X.sub.j at time .tau.(j-1) and each said delay line tap output signal T.sub.i has a value X.sub.j+1-i representing said sensor signal value X.sub.j delayed by a time interval .tau.(i-1), wherein .tau.>0 is a predetermined constant and 0<i.ltoreq.I>1 and 0<j.ltoreq.J.gtoreq.I are integers, said method recovering said source signal S from said sensor signal X and comprising:

(a) selecting a plurality I of filter weights (W.sub.i);

(b) adjusting said filter weights (W.sub.i) by repeatedly performing the steps of

(b.1) producing a plurality K=I of weighted tap output signals (V.sub.k) by combining said delay line tap output signals (T.sub.i) such that (V.sub.k)=(F.sub.ki) (T.sub.i), wherein 0<k.ltoreq.K=I>1 are integers, and wherein F.sub.ki =W.sub.k+1-i when 1.ltoreq.k+1-i.ltoreq.I and F.sub.ki =0 otherwise,

(b.2) summing a plurality K=I of said weighted tap signals (V.sub.k) to produce an estimation signal ##EQU24## wherein said estimation signal U has a value U.sub.j at time .tau.(j-1), (b.3) producing a plurality J of training signals (Y.sub.j) responsive to a transformation of said sensor signal values (X.sub.j) such that Y.sub.j =g(U.sub.j) wherein g(x) is a nonlinear function and the Jacobian of said transformation is J=det(.differential.Y.sub.i /.differential.X.sub.j) when J=I, and

(b.4) adjusting each said filter weight W.sub.i responsive to one or more samples of said training signals (Y.sub.j) such that said each filter weight W.sub.i is changed proportionately to a corresponding leading measure .DELTA.W.sub.1 accumulated over said one or more samples when i=1 and a corresponding scaling measure .DELTA.W.sub.i =.epsilon..multidot..differential.(ln.vertline.J.vertline.)/.differential. W.sub.i accumulated over said one or more samples otherwise; and

(c) producing said estimation signal U to represent said recovered source signal S.

8. The method of claim 7 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g.sub.1 (x)=tanh(x) and g.sub.2 (X)=(1-e.sup.-x).sup.-1 and said .DELTA.W.sub.1 selected from the group consisting essentially of ##EQU25## accumulated over said one or more samples when i=1 and a corresponding scaling measure .DELTA.W.sub.i selected from the group consisting essentially of ##EQU26## accumulated over said one or more samples otherwise.

9. The method of claim 7 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation ##EQU27## accumulated over said one or more samples when i=1 and a corresponding scaling measure ##EQU28## accumulated over said one or more samples otherwise.

10. A neural network for recovering a plurality of source signals from a plurality of mixtures of said source signals, said neural network comprising:

input means for receiving a plurality J of input signals (X.sub.j) each including a combination of at least some of a plurality I of independent source signals (S.sub.i), wherein 0<i.ltoreq.I>1 and 0<j.ltoreq.J.gtoreq.I are integers;

weight means coupled to said input means for storing a plurality I of bias weights (W.sub.i0) and a plurality I.sup.2 of scaling weights (W.sub.ij);

output means coupled to said weight means for producing a plurality I of output signals (U.sub.i) responsive to said input signals (X.sub.j) such that (U.sub.i)=(W.sub.ij) (X.sub.j)+(W.sub.i0);

training means coupled to said output means for producing a plurality I of training signals (Y.sub.i) responsive to a transformation of said input signals (X.sub.j) such that Y.sub.i =g(U.sub.i),

wherein g(x) is a nonlinear function and the Jacobian of said transformation is J=det(.differential.Y.sub.i /.differential.X.sub.j) when J=I;

adjusting means coupled to said training means and said weight means for adjusting said bias weights (W.sub.i0) and said scaling weights (W.sub.ij) responsive to one or more samples of said training signals (Y.sub.i) such that each said bias weight W.sub.i0 is changed proportionately to a corresponding bias measure .DELTA.W.sub.i0 accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot..differential.(ln.vertline.J.vertline.)/.differential. W.sub.ij accumulated over said one or more samples, wherein .epsilon.>0 is a learning rate.

11. The neural network of claim 10 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation ##EQU29## and said bias measure .DELTA.W.sub.i0 =.epsilon..multidot.(-r.vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)) and said scaling measure .DELTA.W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij))-rX.sub.j .vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)).

12. The neural network of claim 10 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g.sub.1 (x)=tanh(x) and g.sub.2 (x)=(1-e.sup.-x).sup.-1 and said bias measure .DELTA.W.sub.i0 is selected from a group consisting essentially of .DELTA..sub.1 W.sub.i0 =-2Y.sub.i and .DELTA..sub.2 W.sub.i0 =1-2Y.sub.i and said scaling measure .DELTA.W.sub.ij is selected from a group consisting essentially of .DELTA.W.sub.1 W.sub.ij =(cof(W.sub.ij)/det(W.sub.ij))-X.sub.j 2Y.sub.i and .DELTA..sub.2 W.sub.ij =(cof(W.sub.ij)/det(W.sub.ij))+X.sub.j (1-2Y.sub.i).

13. A system for adaptively cancelling one or more interferer signals (S.sub.n) comprising:

input means for receiving a plurality J of input signals (X.sub.j) each including a combination of at least some of a plurality I of independent source signals (S.sub.i) that includes said one or more interferer signals (S.sub.n), wherein 0<i.ltoreq.I>1, 0<j.ltoreq.J.gtoreq.I and 0<n.ltoreq.N.gtoreq.1 are integers;

weight means coupled to said input means for storing a plurality I of bias weights (W.sub.i0) and a plurality I.sup.2 of scaling weights (W.sub.ij);

output means coupled to said weight means for producing a plurality I of output signals (U.sub.i) responsive to said input signals (X.sub.j) such that (U.sub.i)=(W.sub.ij) (X.sub.j)+(W.sub.i0);

training means coupled to said output means for producing a plurality I of training signals (Y.sub.i) responsive to a transformation of said input signals (X.sub.j) such that Y.sub.i =g(U.sub.i), wherein g(x) is a nonlinear function and the Jacobian of said transformation is J=det(.differential.Y.sub.i /.differential.X.sub.j);

adjusting means coupled to said training means and said weight means for adjusting said bias weights (W.sub.i0) and said scaling weights (W.sub.ij) responsive to one or more samples of said training signals (Y.sub.i) such that each said bias weight W.sub.i0 is changed proportionately to a corresponding bias measure .DELTA.W.sub.i0 accumulated over said one or more samples and each said scaling weight W.sub.ij is changed proportionately to a corresponding scaling measure .DELTA.W.sub.ij =.epsilon..multidot..differential.(ln.vertline.J.vertline.)/.differential. W.sub.ij accumulated over said one or more samples, wherein .epsilon.>0 is a learning rate; and

feedback means coupled to said output means and said input means for selecting one or more said output signals (U.sub.n) representing said one or more interferer signals (S.sub.n) for combination with said input signals (X.sub.j), thereby cancelling said interferer signals (S.sub.n).

14. The system of claim 13 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of the solutions to the equation ##EQU30## and said bias measure .DELTA.W.sub.i0 =.epsilon..multidot.(-r.vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)) and said scaling measure .DELTA.W.sub.ij =.epsilon..multidot.((cof(W.sub.ij)/det(W.sub.ij))-rX.sub.j .vertline.Y.sub.i .vertline..sup.r-1 sgn(Y.sub.i)).

15. The system of claim 13 wherein said nonlinear function g(x) is a nonlinear function selected from a group consisting essentially of g.sub.1 (x)=tanh(x) and g.sub.2 (x)=(1-e.sup.-x).sup.-1 and said bias measure .DELTA.W.sub.i0 is selected from a group consisting essentially of .DELTA..sub.1 W.sub.i0 =-2Y.sub.i and .DELTA..sub.2 W.sub.i0 =1-2Y.sub.i and said scaling measure .DELTA.W.sub.ij is selected from a group consisting essentially of .DELTA..sub.1 W.sub.ij =(cof(W.sub.ij)/det(W.sub.ij))-X.sub.j 2Y.sub.i and .DELTA..sub.2 W.sub.ij =(cof(W.sub.ij)/det(W.sub.ij))+X.sub.j (1-2Y.sub.i).
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates generally to systems for recovering the original unknown signals subjected to transfer through an unknown multichannel system by processing the known output signals therefrom and relates specifically to an information-maximizing neural network that uses unsupervised learning to recover each of a multiplicity of unknown source signals in a multichannel having reverberation.

2. Description of the Related Art

Blind Signal Processing: In many signal processing applications, the sample signals provided by the sensors are mixtures of many unknown sources. The "separation of sources" problem is to extract the original unknown signals from these known mixtures. Generally, the signal sources as well as their mixture characteristics are unknown. Without knowledge of the signal sources other than the general statistical assumption of source independence, this signal processing problem is known in the art as the "blind source separation problem". The separation is "blind" because nothing is known about the statistics of the independent source signals and nothing is known about the mixing process.

The blind separation problem is encountered in many familiar forms. For instance, the well-known "cocktail party" problem refers to a situation where the unknown (source) signals are sounds generated in a room and the known (sensor) signals are the outputs of several microphones. Each of the source signals is delayed and attenuated in some (time varying) manner during transmission from source to microphone, where it is then mixed with other independently delayed and attenuated source signals, including multipath versions of itself (reverberation), which are delayed versions arriving from different directions.

This signal processing problem arises in many contexts other than the simple situation where each of two unknown mixtures of two speaking voices reaches one of two microphones. Other examples involving many sources and many receivers include the separation of radio or radar signals sensed by an array of antennas, the separation of odors in a mixture by a sensor array, the parsing of the environment into separate objects by our biological visual system, and the separation of biomagnetic sources by a superconducting quantum interference device (SQUID) array in magnetoencephalography. Other important examples of the blind source separation problem include sonar array signal processing and signal decoding in cellular telecommunication systems.

The blind source separation problem is closely related to the more familiar "blind deconvolution" problem, where a single unknown source signal is extracted from a known mixed signal that includes many time-delayed versions of the source originating from unknown multipath distortion or reverberation (self-convolution). The need for blind deconvolution or "blind equalization" arises in a number of important areas such as data transmission, acoustic reverberation cancellation, seismic deconvolution and image restoration. For instance, high-speed data transmission over a telephone communication channel relies on the use of adaptive equalization, which can operate either in a traditional training mode that transmits a known training sequence to establish deconvolution parameters or in a blind mode.

The class of communication systems that may need blind equalization capability includes high-capacity line-of-site digital radio (cellular telecommunications). Such a channel suffers from anomalous propagation conditions arising from natural conditions, which can degrade digital radio performance by causing the transmitted signal to propagate along several paths of different electrical length (multipath fading). Severe multipath fading requires a blind equalization scheme to recover channel operation.

In reflection seismology, a reflection coefficient sequence can be blindly extracted from the received signal, which includes echoes produced at the different reflection points of the unknown geophysical model. The traditional linear-predictive seismic deconvolution method used to remove the source waveform from a seismogram ignores valuable phase information contained in the reflection seismogram. This limitation is overcome by using blind deconvolution to process the received signal by assuming only a general statistical geological reflection coefficient model.

Blind deconvolution can also be used to recover unknown images that are blurred by transmission through unknown systems.

Blind Separation Methods: Because of the fundamental importance of both the blind separation and blind deconvolution signal processing problems, practitioners have proposed several classes of methods for solving the problems. The blind separation problem was first addressed in 1986 by Jutten and Herault ("Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture", Signal processing 24 (1991) 1-10), who disclose the HJ neural network with backward connections that can usually solve the simple two-element blind source separation problem. Disadvantageously, the HJ network iterations may not converge to a proper solution in some cases, depending on the initial state and on the source statistics. When convergence is possible, the HJ network appears to converge in two stages, the first of which quickly decorrelates the two output signals and the second of which more slowly provides the statistical independence necessary to recover the two unknown sources. Comon et al. ("Blind separation of sources, Part II: Problems statement", Signal Processing 24 (1991 ) 11-20) show that the HJ network can be viewed as an adaptive process for cancelling higher-order cumulants in the output signals, thereby achieving some degree of statistical independence by minimizing higher-order statistics among the known sensor signals.

Other practitioners have attempted to improve the HJ network to remove some of the disadvantageous features. For instance, Sorouchyari ("Blind separation of sources, Part III: Stability analysis" Signal Processing 24 (1991 ) 21-29) examines other higher-order non-linear transforming functions other than those simple first and third order functions proposed by Jutten et al. but concludes that the higher-order functions cannot improve implementation of the HJ network. In U.S. Pat. No. 5,383,164, filed on Jun. 10, 1993 as application Ser. No. 08/074,940 and fully incorporated herein by this reference, Li et al. describe a blind source separation system based on the HJ neural network model that employs linear beamforming to improve HJ network separation performance. Also, John C. Platt et al. ("Networks For The Separation of Sources That Are Superimposed and Delayed", Advances in Neural Information Processing Systems, vol. 4, Morgan-Kaufmann, San Mateo, 1992) propose extending the original magnitude-optimizing HJ network to estimate a matrix of time delays in addition to the HJ magnitude mixing matrix. Platt et al. observe that their modified network is disadvantaged by multiple stable states and unpredictable convergence.

Pierre Comon ("Independent component analysis, a new concept?" Signal Processing 36 (1994) 287-314) provides a detailed discussion of Independent Component Analysis (ICA), which defines a class of closed form techniques useful for solving the blind identification and deconvolution problems. As is known in the art, ICA searches for a transformation matrix to minimize the statistical dependence among components of a random vector. This is distinguished from Principal Components Analysis (PCA), which searches for a transformation matrix to minimize statistical correlation among components of a random vector, a solution that is inadequate for the blind separation problem. Thus, PCA can be applied to minimize second order cross-moments among a vector of sensor signals while ICA can be applied to minimize sensor signal joint probabilities, which offers a solution to the blind separation problem. Comon suggests that although mutual information is an excellent measure of the contrast between joint probabilities, it is not practical because of computational complexity. Instead, Comon teaches the use of the fourth-order cumulant tensor (thereby ignoring fifth-order and higher statistics) as a preferred measure of contrast because the associated computational complexity increases only as the fifth power of the number of unknown signals.

Similarly, Gilles Burel ("Blind separation of sources: A nonlinear neural algorithm", Neural Networks 5 (1992) 937-947) asserts that the blind source separation problem is nothing more than the Independent Components Analysis (ICA) problem. However, Burel proposes an iterative scheme for ICA employing a back propagation neural network for blind source separation that handles non-linear mixtures through iterative minimization of a cost function. Burel's network differs from the HJ network, which does not minimize any cost function. Like the HJ network, Burel's system can separate the source signals in the presence of noise without attempting noise reduction (no noise hypotheses are assumed). Also, like the HJ system, practical convergence is not guaranteed because of the presence of local minima and computational complexity. Burel's system differs sharply from traditional supervised back-propagation applications because his cost function is not defined in terms of difference between measured and desired outputs (the desired outputs are unknown). His cost function is instead based on output signal statistics alone, which permits "unsupervised" learning in his network.

Blind Deconvolution Methods: The blind deconvolution art can be appreciated with reference to the text edited by Simon Haykin (Blind Deconvolution, Prentice-Hall, New Jersey, 1994), which discusses four general classes of blind deconvolution techniques, including Bussgang processes, higher-order cumulant equalization, polyspectra and maximum likelihood sequence estimation. Haykin neither considers nor suggests specific neural network techniques suitable for application to the blind deconvolution problem.

Blind deconvolution is an example of "unsupervised" learning in the sense that it learns to identify the inverse of an unknown linear time-invariant system without any physical access to the system input signal. This unknown system may be a nonminimum phase system having one or more zeroes outside the unit circle in the frequency domain. The blind deconvolution process must identify both the magnitude and the phase of the system transfer function. Although identification of the magnitude component requires only the second-order statistics of the system output signal, identification of the phase component is more difficult because it requires the higher-order statistics of the output signal. Accordingly, some form of non-linearity is needed to extract the higher-order statistical information contained in the magnitude and phase components of the output signal. Such non-linearity is useful only for unknown source signals having non-Gaussian statistics. There is no solution to the problem when the input source signal is Gaussian-distributed and the channel is nonminimum-phase because all polyspectra of Gaussian processes of order greater than two are identical to zero.

Classical adaptive deconvolution methods are based almost entirely on second order statistics, and thus fail to operate correctly for nonminimum-phase channels unless the input source signal is accessible. This failure stems from the inability of second-order statistics to distinguish minimum-phase information from maximum-phase information of the channel. A minimum phase system (having all zeroes within the unit circle in the frequency domain) exhibits a unique relationship between its amplitude response and phase response so that second order statistics in the output signal are sufficient to recover both amplitude and phase information for the input signal. In a nonminimum-phase system, second-order statistics of the output signal alone are insufficient to recover phase information and, because the system does not exhibit a unique relationship between its amplitude response and phase response, blind recovery of source signal phase information is not possible without exploiting higher-order output signal statistics. These require some form of non-linear processing because linear processing is restricted to the extraction of second-order statistics.

Bussgang techniques for blind deconvolution can be viewed as iterative polyspectral techniques, where rationale are developed for choosing the polyspectral orders with which to work and their relative weights by subtracting a source signal estimate from the sensor signal output. The Bussgang techniques can be understood with reference to Sandro Bellini (chapter 2: Bussgang Techniques For Blind Deconvolution and Equalization", Blind Deconvolution, S. Haykin (ed.), Prentice Hall, Englewood Cliffs, N.J., 1994), who characterizes the Bussgang process as a class of processes having an auto-correlation function equal to the cross-correlation of the process with itself as it exits from a zero-memory non-linearity.

Polyspectral techniques for blind deconvolution lead to unbiased estimates of the channel phase without any information about the probability distribution of the input source signals. The general class of polyspectral solutions to the blind decorrelation problem can be understood with reference to a second Simon Haykin textbook ("Ch. 20: Blind Deconvolution", Adaptive Filter Theory, Second Ed., Simon Haykin (ed.), Prentice Hall, Englewood Cliffs, N.J., 1991) and to Hatzinakos et al. ("Ch. 5: Blind Equalization Based on Higher Order Statistics (HOS)", Blind Deconvolution, Simon Haykin (ed.), Prentice Hall, Englewood Cliffs, N.J., 1994).

Thus, the approaches in the art to the blind separation and deconvolution problems can be classified as those using non-linear transforming functions to spin off higher-order statistics (Jutten et al. and Bellini) and those using explicit calculation of higher-order cumulants and polyspectra (Haykin and Hatzinakos et al.). The HJ network does not reliably converge even for the simplest two-source problem and the fourth-order cumulant tensor approach does not reliably converge because of truncation of the cumulant expansion. There is accordingly a clearly-felt need for blind signal processing methods that can reliably solve the blind processing problem for significant numbers of source signals.

Unsupervised Learning Methods: In the biological sensory system arts, practitioners have formulated neural training optimality criteria based on studies of biological sensory neurons, which are known to solve blind separation and deconvolution problems of many kinds. The class of supervised learning techniques normally used with artificial neural networks are not useful for these problems because supervised learning requires access to the source signals for training purposes. Unsupervised learning instead requires some rationale for internally creating the necessary teaching signals without access to the source signals.

Practitioners have proposed several rationale for unsupervised learning in biological sensory systems. For instance, Linsker ("An Application of the Principle of Maximum Information Preservation to Linear Systems", Advances in Neural Information Processing Systems 1, D. S. Touretzky (ed.), Morgan-Kaufmann, (1989) shows that his well-known "infomax" principle (first proposed in 1987) explains why biological sensor systems operate to minimize information loss between neural layers in the presence of noise. In a later work ("Local Synaptic Learning Rules Suffice to Maximize Mutual Information in a Linear Network", Neural Computation 4 (1992) 691-702) Linsker describes a two-phase learning algorithm for maximizing the mutual information between two layers of a neural network. However, Linsker assumes a linear input-output transforming function and multivariate Gaussian statistics for both source signals and noise components. With these assumptions, Linsker shows that a "local synaptic" (biological) learning rule is sufficient to maximize mutual information but he neither considers nor suggests solutions to the more general blind processing problem of recovering non-Gaussian source signals in a non-linear transforming environment.

Simon Haykin ("Ch. 11: Self-Organizing Systems III: Information-Theoretic Models", Neural Networks: A Comprehensive Foundation, S. Haykin (ed.) MacMillan, New York 1994) discusses Linsker's "infomax" principle, which is independent of the neural network learning rule used in its implementation. Haykin also discusses other well-known principles such as the "minimization of information loss" principle suggested in 1988 by Plumbley et al. and Barlow's "principle of minimum redundancy", first proposed in 1961, either of which can be used to derive a class of unsupervised learning rules.

Joseph Atick ("Could information theory provide an ecological theory of sensory processing?", Network 3 (1992) 213-251 ) applies Shannon's information theory to the neural processes seen in biological optical sensors. Atick observes that information redundancy is useful only in noise and includes two components: (a) unused channel capacity arising from suboptimal symbol frequency distribution and (b) intersymbol redundancy or mutual information. Atick suggests that optical neurons apparently evolved to minimize the troublesome intersymbol redundancy (mutual information) component of redundancy rather than to minimize overall redundancy. H. B. Barlow ("Unsupervised Learning", Neural Computation 1 (1989) 295-311) also examines this issue and shows that "minimum entropy coding" in a biological sensory system operates to reduce the troublesome mutual information component even at the expense of suboptimal symbol frequency distribution. Barlow shows that the mutual information component of redundancy can be minimized in a neural network by feeding each neuron output back to other neuron inputs through anti-Hebbian synapses to discourage correlated output activity. This "redundancy reduction" principle is offered to explain how unsupervised perceptual learning occurs in animals.

S. Laughlin ("A Simple Coding Procedure Enhances a Neuron's Information Capacity", Z. Naturforsch 36 (1981) 910-912) proves that the optical neuron of a blowfly optimizes information capacity through equalization of the probability distribution for each neural code value (minimizing the unused channel capacity component of redundancy), thereby confirming Barlow's "minimum redundancy" principle. J. J. Hopfield ("Olfactory computation and object perception", Proc. Natl. Acad. Sci. USA 88 (August 1991) 6462-6466) examines the separation of odor source solution in neurons using the HJ neuron model for minimizing output redundancy.

Becker et al. ("Self-organizing neural network that discovers surfaces in random-dot stereograms", Nature, vol. 355, pp. 161-163, Jan. 9, 1992) propose a standard back-propagation neural network learning model modified to replace the external teacher (supervised learning) by internally-derived teaching signals (unsupervised learning). Becker et al. use non-linear networks to maximize mutual information between different sets of outputs, contrary to the blind signal recovery requirement. By increasing redundancy, their network discovers invariance in separate groups of inputs, which can be selected out of information passed forward to improve processing efficiency.

Thus, it is known in the neural network arts that anti-Hebbian mutual interaction can be used to explain the decorrelation or minimization of redundancy observed in biological vision systems. This can be appreciated with reference to H. B. Barlow et al. ("Adaptation and Decorrelation in the Cortex", The Computing Neuron R. Durbin et al. (eds.), Addison-Wesley, (1989) and to Schraudolph et al. ("Competitive Anti-Hebbian Learning of Invariance", Advances in Neural Information Processing Systems 4, J. E. Moody et al. (eds.), Morgan-Kaufmann, (1992). In fact, practitioners have suggested that Linsker's "infomax" principle and Barlow's "minimum redundancy" principle may both yield the same neural network learning procedures. Until now, however, non-linear versions of these procedures applicable to the blind signal processing problem have been unknown in the art.

The Blind Processing Problem: As mentioned above, blind source separation and blind deconvolution are related problems in signal processing. The blind source separation problem can be succinctly stated as where a set of unknown source signals S.sub.l (t), . . . , S.sub.I (t), are mixed together linearly by an unknown matrix [A.sub.ji ]. Nothing is known about the sources or the mixing process, both of which may be time-varying, although the mixing process is assumed to vary slowly with respect to the source. The blind separation task is to recover the original source signals from the J.gtoreq.I measured superpositions of them, X.sub.l (t), . . . , X.sub.J (t) by finding a square matrix [W.sub.ij ] that is a permutation of the inverse of the unknown matrix [A.sub.ji ]. The blind deconvolution problem can be similarly stated as where a single unknown signal S(t) is convolved with an unknown tapped delay-line filter A.sub.l, . . . , A.sub.I, producing the corrupted measured signal X(t)=A(t) * S(t), where A(t) is the impulse response of the unknown (perhaps slowly time-varying) filter. The blind deconvolution task is to recover S(t) by finding and convolving X(t) with a tapped delay-line filter W.sub.l, . . . , W.sub.J having the impulse response W(t) that reverses the effect of the unknown filter A(t).

There are many similarities between the two problems. In one, source signals are corrupted by the superposition of other source signals and, in the other, a single source signal is corrupted by superposition of time-delayed versions of itself. In both cases, unsupervised learning is required because no error signals are available and no training signals are provided. In both cases, second-order statistics alone are inadequate to solve the more general problem. For instance, a second-order decorrelation technique such as that proposed by Barlow et al. would find uncorrelated (linearly independent) projections [Y.sub.j ] of the input sensor signals [X.sub.j ] when attempting to separate unknown source signals {S.sub.i } but is limited to discovering a symmetric decorrelation matrix that cannot reverse the effects of mixing matrix [A.sub.ji ] if the mixing matrix is asymmetric. Similarly, second-order decorrelation techniques based on the autocorrelation function, such as prediction-error filters, are phase-blind and do not offer sufficient information to estimate the phase characteristics of the corrupting filter A(t) when applied to the more general blind deconvolution problem.

Thus, both blind signal processing problems require the use of higher-order statistics as well as certain assumptions regarding source signal statistics. For the blind separation problem, the sources are assumed to be statistically independent and non-Gaussian. With this assumption, the problem of learning [W.sub.ij ] becomes the ICA problem described by Comon. For blind deconvolution, the original signal S(t) is assumed to be a "white" process consisting of independent symbols. The blind deconvolution problem then becomes the problem of removing from the measured signal X(t) any statistical dependencies across time that are introduced by the corrupting filter A(t). This process is sometimes denominated the "whitening" of X(t).

As used herein, both the ICA procedure and the "whitening" of a time series are denominated "redundancy reduction". The first class of techniques uses some type of explicit estimation of cumulants and polyspectra, which can be appreciated with reference to Haykin and Hatzinakos et al. Disadvantageously, such "brute force" techniques are computationally intensive for high numbers of sources or taps and may be inaccurate when cumulants higher than fourth order are ignored, as they usually must be. The second class of techniques uses static non-linear functions, the Taylor series expansions of which yield higher-order terms. Iterative learning rules containing such terms are expected to be somehow sensitive to the particular higher-order statistics necessary to accurate redundancy reduction. This reasoning is used by Comon et al. to explain the HJ network and by Bellini to explain the Bussgang deconvolver. Disadvantageously, there is no assurance that the particular higher-order statistics yielded by the (heuristically) selected non-linear function are weighted in the manner necessary for achieving statistical independence. Recall that the known approach to attempting improvement of the HJ network is to test various non-linear functions selected heuristically and that the original functions are not yet improved in the art.

Accordingly, there is a need in the art for an improved blind processing method, such as some method of rigorously linking a static non-linearity to a learning rule that performs gradient ascent in some parameter guaranteed to be usefully related to statistical dependency. Until now, this was believed to be practically impossible because of the infinite number of higher-order statistics associated with statistical dependency. The related unresolved problems and deficiencies are clearly felt in the an and are solved by this invention in the manner described below.

SUMMARY OF THE INVENTION

This invention solves the above problem by introducing a new class of unsupervised learning procedures for a neural network that solve the general blind signal processing problem by maximizing joint input/output entropy through gradient ascent to minimize mutual information in the outputs. The network of this invention arises from the unexpectedly advantageous observation that a particular type of non-linear signal transform creates learning signals with the higher-order statistics needed to separate unknown source signals by minimizing mutual information among neural network output signals. This invention also arises from the second unexpectedly advantageous discovery that mutual information among neural network outputs can be minimized by maximizing joint output entropy when the learning transform is selected to match the signal probability distributions of interest.

The process of this invention can be appreciated as a generalization of the infomax principle to non-linear units with arbitrarily distributed inputs uncorrupted by any known noise sources. It is a feature of the system of this invention that each measured input signal is passed through a predetermined sigmoid function to adaptively maximize information transfer by optimal alignment of the monotonic sigmoid slope with the input signal peak probability density. It is an advantage of this invention that redundancy is minimized among a multiplicity of outputs merely by maximizing total information throughput, thereby producing the independent components needed to solve the blind separation problem.

The foregoing, together with other objects, features and advantages of this invention, can be better appreciated with reference to the following specification, claims and the accompanying drawing.

BRIEF DESCRIPTION OF THE DRAWING

For a more complete understanding of this invention, reference is now made to the following detailed description of the embodiments as illustrated in the accompanying drawing, wherein:

FIGS. 1A, 1B, 1C and 1D illustrate the feature of sigmoidal transfer function alignment for optimal information flow in a sigmoidal neuron from the prior art;

FIGS. 2A, 2B and 2C illustrate the blind source separation and blind deconvolution problems from the prior art;

FIGS. 3A, 3B and 3C provide graphical diagrams illustrating a joint entropy maximization example where maximizing joint entropy fails to produce statistically independent output signals because of improper selection of the non-linear transforming function;

FIG. 4 shows the theoretical relationship between the several entropies and mutual information from the prior art;

FIG. 5 shows a functional block diagram of an illustrative embodiment of the source separation network of this invention;

FIG. 6 is a functional block diagram of an illustrative embodiment of the blind decorrelating network of this invention;

FIG. 7 is a functional block diagram of an illustrative embodiment of the combined blind source separation and blind decorrelation network of this invention;

FIGS. 8A, 8B and 8C show typical probability density functions for speech, rock music and Gaussian white noise;

FIGS. 9A and 9B show typical spectra of a speech signal before and after decorrelation is performed according to the procedure of this invention;

FIG. 10 shows the results of a blind source separation experiment performed using the procedure of this invention; and

FIGS. 11A, 11B, 11C, 11D, 11E, 11F, 11G, 11H, 11I, 11J, 11K and 11L show time domain filter charts illustrating the results of the blind deconvolution of several different corrupted human speech signals according to the procedure of this invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

This invention arises from the unexpectedly advantageous observation that a class of unsupervised learning rules for maximizing information transfer in a neural network solves the blind signal processing problem by minimizing redundancy in the network outputs. This class of new learning rules is now described in information theoretic terms, first for a single input and then for a multiplicity of unknown input signals.

Information Maximization For a Single Source

In a single-i