WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Unsupervised adaptation and classification of multiple classes and sources in blind signal separation    
United States Patent6424960   
Link to this pagehttp://www.wikipatents.com/6424960.html
Inventor(s)Lee; Te-Won (San Diego, CA), Lewicki; Michael S. (Pittsburgh, PA), Sejnowski; Terrence J. (Solana Beach, CA)
AbstractA computer-implemented method and apparatus that adapts class parameters, classifies data and separates sources configured in one of multiple classes whose parameters (i.e. characteristics) are initially unknown. The data set may be generated in a dynamic environment where the sources provide signals that are mixed, and the mixing parameters change without notice and in an unknown manner. A mixture model is used in which the observed data is categorized into two or more mutually exclusive classes. The class parameters for each of the classes are adapted to a data set in an adaptation algorithm in which class parameters including mixing matrices and bias vectors are adapted. Each data vector is assigned to one of the learned mutually exclusive classes. In some embodiments the class parameters may have been previously learned, and the system is used to classify the data and if desired to separate the sources. The adaptation and classification algorithms can be utilized in a wide variety of applications such as speech processing, image processing, medical data processing, satellite data processing, antenna array reception, and information retrieval systems. The adaptation algorithm described is implemented with an extended infomax ICA algorithm, which provides a way to separate sources that have a non-Gaussian (e.g., platykurtic or leptokurtic) structure.



 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 6424960
Unsupervised adaptation and classification of multiple classes and sources
     in blind signal separation - US Patent 6424960 Drawing
Unsupervised adaptation and classification of multiple classes and sources in blind signal separation
Inventor     Lee; Te-Won (San Diego, CA) , Lewicki; Michael S. (Pittsburgh, PA) , Sejnowski; Terrence J. (Solana Beach, CA)
Owner/Assignee     The Salk Institute for Biological Studies (La Jolla, CA)
Patent assignment
All assignments
Publication Date     July 23, 2002
Application Number     09/418,099
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     October 14, 1999
US Classification     706/20 600/310 600/515
Int'l Classification    
Examiner     Starks Jr.; Wilbert L.
Assistant Examiner    
Attorney/Law Firm     Knobbe Martens Olson & Bear LLP
Address
Parent Case    
Priority Data    
USPTO Field of Search     706/20 342/13 375/262 379/386 600/310 600/515
Patent Tags     unsupervised adaptation classification multiple classes sources blind signal separation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5999902
Scahill et al.

Dec,1999

[0 after 0 votes]
6002952
Diab et al.

Dec,1999

[0 after 0 votes]
5933806
Beyerlein et al.

Aug,1999

[0 after 0 votes]
5790758
Streit

Aug,1998

[0 after 0 votes]
5778342
Erell et al.

Jul,1998

[0 after 0 votes]
5724487
Streit

Mar,1998

[0 after 0 votes]
5706402
Bell

Jan,1998

[0 after 0 votes]
5706391
Yamada et al.

Jan,1998

[0 after 0 votes]
5625749
Goldenthal et al.

Apr,1997

[0 after 0 votes]
5566209
Forssen et al.

Oct,1996

[0 after 0 votes]
5353346
Cox et al.

Oct,1994

[0 after 0 votes]
5092343
Spitzer et al.

Mar,1992

[0 after 0 votes]
4721958
Jenkin

Jan,1988

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A computerized method of separating an independent source signal from a mixture of source signals, comprising: receiving said mixture of source signals into a plurality of data inputs of a computer, reading a sample of said mixture of source signals from said data inputs; comparing said sample to previously received samples of said source signals; classifying said sample based on its similarity to other samples from said mixture of source signals; and performing independent component analysis on said classified sample to separate an independent source signal in said sample from other signals in the class of signals.

2. The method of claim 1, wherein said mixture of source signals comprises a mixture of sounds.

3. The method of claim 2, wherein said mixture of sounds comprises a mixture of voices.

4. The method of claim 2, wherein said independent source signal comprises a single voice.

5. The method of claim 2, wherein said data inputs comprise microphones.

6. The method of claim 1, wherein said mixture of source signals comprises an EEG or an EKG.

7. The method of claim 1, wherein reading a sample of said mixture comprises reading a sample over a predetermined period of time.

8. The method of claim 1, wherein classifying said sample comprises: determining data parameters that characterize the source in said other samples; obtaining a first set of data parameters that characterize the source signals in said sample; and assigning said sample to a first class based on the similarity of said first set of data parameters to the data pars for other samples in said first class.

9. A system for separating an independent source signal from a mixture of source signals, comprising: a plurality of data inputs configured to receive a mixture of source signals into said system; a memory in communication with the data inputs and configured to store samples of said mixture of source signals; a first function that compares said samples to previously received samples of said source signals; a second function that classifies each of said samples based on its similarity to other samples; and an independent component analysis module that performs an independent component analysis of the source signals in each of said classes to separate an independent source signal in said sample from other source signals in said mixture of source signals.

10. The system of claim 9, wherein said mixture of source signals comprises a mixture of sounds.

11. The system of claim 10, wherein said mixture of sounds comprises a mixture of voices.

12. The system of claim 11, wherein said independent source signal comprises a single voice.

13. The system of claim 10, wherein said data inputs comprise microphones.

14. The system of claim 9, wherein said mixture of source signals comprises an EEG or an EKG.

15. The system of claim 9, wherein the samples of said mixture of source signals comprises samples taken over a predetermined period of time.

16. A system for separating a single voice signal from a mixture of voice signals, comprising: a plurality of microphones configured to receive a mixture of voice signals into the system; a memory in communication with the microphones and configured to store a sample of said voices; a first function that compares said sample to previously received samples of said voice signals; a second function that assigns said sample to a class of voice signals, wherein each sample in the class has similar data attributes; an independent component analysis module that performs an independent component analysis of the voice signals in said class and separates a single voice signal from other voice signals in said class.

17. The system of claim 16, comprising a speaker that outputs said single voice signal that it separated from other voice signals in said class.

18. The system of claim 16, wherein the sample of said mixture of voice signals comprises a sample taken over a predetermined period of time.

19. The system of claim 16, wherein said system is a personal computer system.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to computer-implemented systems for processing data that includes mixed signals from multiple sources, and particularly to systems for adapting parameters to the data, classifying the data, and separating sources from the data.

2. Description of Related Art

Recently, blind source separation by ICA (Independent Component Analysis) has received attention because of its potential signal processing applications, such as speech enhancement, image processing, telecommunications, and medical signal processing, among others. ICA is a technique for finding a linear non-orthogonal coordinate system in multivariate data. The directions of the axes of the coordinate system are determined by the data's second- and higher-order statistics. The separation is "blind" because the source signals are observed only as unknown linear mixtures of signals from multiple sensors, and the characteristic parameters of the source signals are unknown except that the sources are assumed to be independent. In other words, both the source signals and the way the signals are mixed is unknown. The goal of ICA is to learn the parameters and recover the independent sources (i.e., separate the independent sources) given only the unknown linear mixtures of the independent source signals as observed by the sensors. In contrast to correlation-based transformations such as principal component analysis (PCA), the ICA technique adapts a matrix to linearly transform the data and reduce the statistical dependencies of the source signals, attempting to make the source signals as independent as possible. ICA has proven a useful tool for finding structure in data, and has been successfully applied to processing real world data, including separating mixed speech signals and removing artifacts from EEG recordings.

U.S. Pat. No. 5,706,402, entitled "Blind Signal Processing System Employing Information Maximization to Recover Unknown Signals Through Unsupervised Minimization of Output Redundancy", issued to Bell on Jan. 6, 1998, discloses an unsupervised learning algorithm based on entropy maximization in a single-layer feedforward neural network. In the ICA algorithm disclosed by Bell, an unsupervised learning procedure is used to solve the blind signal processing problem by maximizing joint output entropy through gradient ascent to minimize mutual information in the outputs. In that learned process, a plurality of scaling weights and bias weights are repeatedly adjusted to generate scaling and bias terms that are used to separate the sources. The algorithm disclosed by Bell separates sources that have supergaussian distributions, which can be described as sharply peaked probability density functions (pdfs) with heavy tails. Bell does not disclose how to separate sources that have negative kurtosis (e.g., uniform distribution).

In many real world situations the ICA algorithm cannot be effectively used because the sources are required to be independent (e.g. stationary), which means that the mixture parameters must be identical throughout the entire data set. If the sources become non-stationary at some point then the mixture parameters change, and the ICA algorithm will not operate properly. For example, in the classic cocktail party example where there are several voice sources, ICA will not operate if one of the sources has moved at some time during data collection because the source's movement changes the mixing parameters. In summary, the ICA requirement that the sources be stationary greatly limits the usefulness of the ICA algorithm to find structure in data.

SUMMARY OF THE INVENTION

A mixture model is implemented in which the observed data is categorized into two or more mutually exclusive classes, each class being modeled with a mixture of independent components. The multiple class model allows the sources to become non-stationary. A computer-implemented method and apparatus is disclosed that adapts multiple class parameters in an adaptation algorithm for a plurality of classes whose parameters (i.e. characteristics) are initially unknown. In the adaptation algorithm, an iterative process is used to define multiple classes for a data set, each class having a set of mixing parameters including a mixing matrix A.sub.k and a bias vector b.sub.k. After the adaptation algorithm has completed operations, the class parameters and the class probabilities for each data vector are known, and data is then assigned to one of the learned mutually exclusive classes. The sources can now be separated using the source vectors calculated during the adaptation algorithm. Advantageously, the sources are not required to be stationary throughout the data set, and therefore the system can classify data in a dynamic environment where the mixing parameters change without notice and in an unknown manner. The system can be used in a wide variety of applications such as speech processing, image processing, medical data processing, satellite data processing, antenna array reception, and information retrieval systems. Furthermore, the adaptation algorithm described herein is implemented in one embodiment using an extended infomax ICA algorithm, which provides a way to separate sources that have a non-Gaussian (e.g., platykurtic or leptokurtic) structure.

A computer-implemented method is described that adapts class parameters for a plurality of classes and classifies a plurality of data vectors having N elements that represent a linear mixture of source signals into said classes. The method includes receiving a plurality of data vectors from data index t=1 to t=T, initializing parameters for each class, including the number of classes, the probability that a random data vector will be in class k, the mixing matrix for each class, and the bias vector for each class. In a main adaptation loop, for each data vector from data index t=1 to t=T, steps are performed to adapt the class parameters, which include the mixing matrices and bias vectors for each class. The main adaptation loop is repeated a plurality of iterations while observing a learning rate at each subsequent iteration, and after observing convergence of said learning rate, then each data vector is assigned to one of said classes. The source vectors, which are calculated for each data vector and each class, can then be used to separate source signals in each of said classes. In one embodiment, the mixing matrices are adapted using an extended infomax ICA algorithm, so that both sub-Gaussian and super-Gaussian sources can be separated.

A method is also described in which a plurality of data vectors are classified using previously adapted class parameters. The class probability for each class is calculated and each data vector is assigned to one of the previously adapted class. This classification algorithm can be used, for example to compress images or to search an image for a particular structure or particular types of structure.

The method can be used in a variety of signal processing applications to find structure in data, such as image processing, speech recognition, and medical data processing. Other uses used include image compression, speech compression, and classification of images, speech, and sound.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of this invention, reference is now made to the following detailed description of the embodiments as illustrated in the accompanying drawing, wherein:

FIG. 1 is a diagram that shows a plurality of M sources that generate signals, a plurality of N sensors that receive mixed signal, a data vector whose elements are defined by the mixed signals from the sensors, and a data set defined by a collection of data vectors;

FIG. 2 is a flow chart of an unsupervised adaptation and classification algorithm that adapts class parameters, classifies the data, and separates the sources;

FIG. 3 is a flow chart of the main adaptation loop shown in FIG. 2;

FIG. 4 is flow chart of the initial calculation loop shown in FIG. 3

FIG. 5 is flow chart of the mixing matrix adaptation loop shown in FIG. 3

FIG. 6 is flow chart of the bias vector adaptation loop shown in FIG. 3;

FIG. 7 is flow chart of operations in the step to adapt number of classes shown in FIG. 2;

FIG. 8 is a graph that shows the results of an experiment to adapt and classify two-dimensional data;

FIG. 9A is a graph of data collected over time from a first channel;

FIG. 9B is a graph of data collected over time from a second channel;

FIG. 9C is a graph of a first source (voices) after adapting the parameters, classifying the source vectors, and separating the sources;

FIG. 9D is a graph of a second source (background music) after adapting the parameters, classifying the source vectors, and separating the sources;

FIG. 9E is a graph of the class probability for single samples;

FIG. 9F is a graph of the class probability for samples in blocks of 100 adjacent samples;

FIG. 9G is a graph of the class probability for samples in blocks of 2000 adjacent samples;

FIG. 10 is a diagram illustrating a variety of source data, a computer to process the data, and output devices;

FIG. 11 is a flow chart of an adaptation (training) algorithm that learns the class parameters based upon a selected data set;

FIG. 12 is a flow chart of a classification algorithm that utilizes previously-adapted class parameters to classify a data set;

FIG. 13 is a diagram of an image, illustrating selection of patches and pixels within the patches that are used to construct a vector;

FIG. 14 is a diagram of four image regions, each region having different features that are used to adapt the class parameters for four classes;

FIG. 15 is a graph of the number of source vectors as a function of their value, illustrating that values of the source vectors are clustered around zero; and

FIG. 16 is a diagram of data collection from a single person and a single microphone.

DETAILED DESCRIPTION

This invention is described in the following description with reference to the Figures, in which like numbers represent the same or similar elements.

The following symbols are used herein to represent the certain quantities and variables, and in accordance with conventional usage, a matrix is represented by an uppercase letter with boldface type, and a vector is represented by a lowercase letter with boldface type.

Table of Symbols A.sub.k mixing matrix with elements a.sub.ij for class k A.sup.-t filter matrix, inverse of A b.sub.k bias vector for class k .theta..sub.k parameters for class k .THETA. parameters for all classes j Jacobian matrix k class index K number of classes q.sub.k switching moment vectors for sub- and super-Gaussian densities Q.sub.k diagonal matrix with elements of the vector q.sub.k M number of sources n mixture index N number of sensors (mixtures) p(s) probability density function s.sub.t Independent source signal vectors t data index, (e.g. time or position) T total number of data vectors in the data set W weight matrix x.sub.t observed data vector (data point) at data index t X observed data vectors X = [x.sub.1,...,x.sub.t,...,x.sub.T ].sup.T (whole data set)

In some instances, reference may be made to "basis functions" or "basis vectors", which are defined by the columns of the mixing matrix. In other words, the basis functions or vectors for a class are defined by the column vectors of the mixing matrix for that class.

Overview of a Data Set

Reference is now made to FIG. 1 which shows a plurality of M sources 100, including a first source 101, a second (Mth) source 102, and a number of sources in-between. The sources 100 provide signals shown generally at 110 to a plurality of N sensors 120, including a first sensor 121, a second sensor 122, a third (Nth) sensor 123, and a number of sensors in-between 10 that depend upon the embodiment. From FIG. 1 it can be seen that the sensors receive a linear combination (mixture) of the signals from the sources. The number of sensors (N) is assumed to be greater than or equal to the number of sources (M), i.e. N.gtoreq.M. Subject to this restriction, there is no upper limit on the number of sources M and sensors N, and accordingly M and N are constrained only by practical concerns.

The actual number of sources may be unknown, and in such circumstances it may be useful to estimate the number of sources. If the number of sensors is greater than or equal to the number of sources, then the ICA algorithm will work in the adaptation process described herein. However if the number of sensors is less than the number of sources, then an alternative to ICA must be used. One way of estimating the number of sources is to compute the correlation matrix of the data set X. The rank of the correlation matrix gives an estimate of the number of actual sources in the data.

The parameters (e.g. characteristics) of the mixture and the sources are initially unknown. The sources 100 are assumed to be mutually independent, and each of their probability distributions is assumed to be non-Gaussian. The sources and sensors may comprise many different combinations and types. For example, each of the sources may be a person speaking in a room, in which case the signals comprise voices provided to N microphone sensors situated in different locations around the room. All the voices are received by each microphone in the room, and accordingly each microphone outputs a linear combination (a mixture) of all the voices. The data from each of the microphones is collected in a data vector x.sub.t shown at 130 that has N elements, each element representing data from its corresponding sensor. In other words the first element x.sub.1 includes data from the first sensor, the second element x.sub.2 includes data from the second sensor, and so forth. In the microphone example, the data vectors may be collected as a series of digital samples at a rate (e.g. 8 kHz) sufficient to recover the sources.

A series of observations of the sources are observed by the sensors from t=1 to t=T. Typically the variable t represents time, and accordingly the series of measurements typically represent a time sequence of observations. The observed data vectors are collected in a data set 140, which includes a group of all observed data vectors from x.sub.1 to x.sub.T. The data log may reside in the memory of a computer, or any other suitable memory location from which it can be supplied to a computer for processing. Before processing, the data vectors must be in digital form, and therefore if the information from the sensors is not already in digital form, the data must be digitized by any suitable system. For example if the microphones receive analog signals, these signals must processed by a audio digitizer to put the data in digital form that can be stored in a computer memory and processed.

Separation of Sources

Based upon the mixed signals received by the sensors 120, one goal in some embodiments is to separate the sources so that each source can be observed. In the above example, this means that the goal is to separate the voices so that each voice can listened to separately. In other embodiments to be described, the data set may include patches from digitized images in which the N elements include data from N pixels, or even data from a single sensor such as a microphone in which the N elements include a series of N samples over time.

If the sources are independent for all observations from t=1 to T, then an ICA (Independent Components Analysis) algorithm such as disclosed by Bell in U.S. Pat. No. 5,706,402, which is incorporated by reference herein, can be utilized to separate the sources. In the ICA algorithm disclosed by Bell, an unsupervised learning procedure is used to solve the blind signal processing problem by maximizing joint output entropy through gradient ascent to minimize mutual information in the outputs. In that learned process, a plurality of scaling weights and bias weights are repeatedly adjusted to generate scaling and bias terms that are used to separate the sources. However, the ICA algorithm disclosed by Bell is limited because the sources must be independent throughout the data set; i.e. Bell's ICA algorithm requires that the sources must be independent for all data vectors in the data log. Therefore, if one of the sources becomes dependent upon the other, or in the example above if one of the sources shifts location, such as the first sensor 101 moves to the location shown in dotted lines at 160, the mixture parameters for the signals 110 will change and Bell's ICA algorithm will not operate properly.

The algorithm described herein provides a way to classify the data vectors into one of multiple classes, thereby eliminating the assumption of source independence throughout the data set, and allowing for movements of sources and other dependencies across data vectors. However, the sources in each data vector are still assumed to be independent.

Class Characteristics (Parameters) Each class has a plurality of different parameters in the form of a mixing matrix A.sub.k, a bias vector b.sub.k, and a class probability p(C.sub.k). However, because the parameters for each class are initially unknown, one goal is to determine the class characteristics (i.e. determine the parameters). The algorithm described herein learns the parameters for each class in a process that includes adapting (i.e. learning) the mixing matrix and bias vectors in an iterative process. Optionally, the class probability can also be adapted. Once adapted, each data vector is assigned to a mutually exclusive class, and the corresponding source vector calculated for the data vector and assigned class provides the desired source vector.

The characteristic parameters for each class are referenced by the variable .theta..sub.k, from k=1 to K. Each class has a probability designated by p(C.sub.k), which is the probability that a random data vector will fall within the class k. The characteristics for all K classes are collectively referenced by .THETA.. The description of the parameters for each class may vary between embodiments, but generally include mixing matrices referenced by A.sub.k and bias vectors referenced by b.sub.k.

The A.sub.k 's are N by M scalar matrices (called basis or mixing matrices) for the class k. N is the number of mixtures (e.g. sensors) and M is the number of sources, and it is assumed that N.gtoreq.M, as discussed above. The b.sub.k 's are N-element bias vectors. There are a total of K mixing matrices (A.sub.1, . . . A.sub.K) and K bias vectors (b.sub.1, . . . , b.sub.K) that are learned as described herein.

Overview of the Unsupervised Adaptation and Classification Algorithm

Reference is now made to FIG. 2, which is a top level flow chart that illustrates the unsupervised classification algorithm described herein. Due to the amount of information to be disclosed herein, many of the steps in the algorithm are referenced in FIG. 2 and then shown in detail in other Figures and discussed in detail with reference thereto. The unsupervised classification algorithm begins at a box 200 that indicates the beginning of the unsupervised classification algorithm.

In an initialization step shown at 210, parameters .THETA. are initialized to appropriate values. Particularly, the mixing matrices A.sub.k and bias vectors b.sub.k are initialized for each class from 1 to K. K is the total number of classes, and K is typically greater than one. The class probability for each class is typically initialized to 1/ K, unless another probability is suggested.

In one example, the mixing matrices A.sub.k are set to the identity matrix, which is a matrix whose diagonal elements are one and all other elements are zero. Small random values (e.g. noise) may be added to any of the elements, which advantageously makes the mixing matrices different for each class. In this example, the bias vectors b.sub.k are set to the mean of all data vectors x.sub.t in the data set. Some small random values (e. g. noise) may be added to each of the elements of the bias vectors, which makes the bias vectors different for each class.

In some embodiments, it may be useful to also initialize switching parameter vectors q.sub.t for each data vector from t=1 to T to designate a sub- or super-Gaussian distribution. The switching vectors q.sub.t, . . . q.sub.T are N-element switching parameter vectors used to create a diagonal matrix in operations performed in a classification algorithm described herein. The switching parameters q.sub.n.di-elect cons.{1, -1} designate either a sub- or super-Gaussian probability distribution function (pdf).

At 220 the data vectors x.sub.t for the data set (from t=1 to t=T) are provided to the algorithm. The data index is t, and the number T is the total number of data vectors in the data set. Referring briefly to FIG. 1, it can be seen that in one embodiment each data vector x.sub.t has N elements that correspond to the number of mixtures (linear combinations), which also correspond to the number of sensors.

At 230 the main adaptation loop is performed to adapt the class parameters .THETA. of all the classes. This is an iterative operation performed for each data vector in the data set, and then repeated until convergence, as described in more detail below with reference to FIGS. 3, 4, 5, and 6. Generally, for each data vector the adaptation process in the main adaptation loop includes performing probabilistic calculations for each class, then adapting the class parameters based upon those calculations, and repeating these operations for each data vector. Until the algorithm converges, the main adaptation loop is repeated until the algorithm converges. Operations within the main adaptation loop will be described in detail with reference to FIGS. 3, 4, 5, and 6.

At 240, after the main adaptation loop 230 has completed one loop, the probability of each class can be adapted using a suitable learning rule. In some embodiments, this operation will be performed only after several iterations of the main loop when the learning rate slows, or at other suitable points in the process as determined by the application. One suitable learning rule, performed for each class from k=1 to k=K, is ##EQU1##

This calculation gives the adapted class probability for each class for the next operation. The adapted class probability is then used in the next iteration of the main adaptation loop. In other embodiments, other suitable leaning rules could be used to adapt the class probabilities for each class.

At 250, the number of classes K may be adapted using a split and merge algorithm. One such algorithm, described with reference to FIG. 7 begins by assuming a certain number of classes (K), and performing a number of iterations of the main adaptation loop to calculate a first set of parameters .THETA..sub.1. If all of the learned classes are sufficiently different, the assumed number of classes may adequately represent the data. However if two of the classes are very similar they may be merged. If all are different, and is possible that there may be more classes, then the number of classes (K) can be incremented, the main adaptation loop reiterated to calculate a second set of parameters .THETA..sub.2, and the first and second sets of parameters compared to determine which more accurately represents the data. The adapted K value for the number of classes is then used in the next iteration of the main adaptation loop.

Another way of adapting the number of classes is to use a split and merge EM algorithm such as disclosed by Ueda, et al. in "SMEM Algorithm for Mixture Models", published in the Proceedings of the Advances in Neural Information Processing Systems 11, (Kearns et al., editors) MIT Press, Cambridge Mass. (1999), which overcomes the local maximum problem in parameter estimation of finite mixture models. In the split and merge EM algorithm described by Ueda et al., simultaneous split and merge operations are performed using a criterion that efficiently is disclosed to select the split and merge candidates that are used in the next iteration.

At 260, the results of the previous iteration are evaluated and compared with previous iterations to determine if the algorithm has converged. For example, the learning rate could be observed as the rate of change in the average likelihood of all classes: ##EQU2##

The main adaptation loop 230 and (if implemented) the class number and probability adaptation steps 240 and 250 will be repeated until convergence. Generally, to determine convergence the algorithm tests the amount of adaptation (learning) done in the most recent iteration of the main loop. If substantial learning has occurred, the loop is repeated. Convergence can be determined when the learning rate is small and stable over a number of iterations sufficient to provide a desired level of confidence that it has converged. if, for example, the change in the average likelihood is very small over several iterations, it may be determined that the loop has converged.

Determining when an algorithm has converged is very application-specific. The initial values for the parameters can be important, and therefore they should be selected carefully on a case-by-case basis. Furthermore, as i