A neural network is trained using a training neural network having the same topology as the original network but having a differential network output and accepting also differential network inputs. This new training method enables deeper neural networks to be successfully trained by avoiding a problem occuring in conventional training methods in which errors vanish as they are propagated in the reverse direction through deep networks. An acceleration in convergence rate is achieved by adjusting the error used in training to compensate for the linkage between multiple training data points.
A method is provided for calculating the change in level which is output from a multivariable power series as a separate variable, where this change out put signal is the change in level which is output when supply input is from two different data points. This method requires defining a structure of a change-variable and modifying the arithmetic operations so this structure can be processed. A similar procedure is followed to calculate the derivative of the parameters used in the construction of the multivariable power series with this change output signal. Given an error in the change output signal and the existents of appropriate derivatives, it is possible to train the power series using the change-variable. As with any training algorithm, a matrix technique can be used to increase the training rate.
The algorithm used for training a state machine has been disclosed in another application. The objective of this application is to reduce this theoretical framework to a more concrete structure. The structure disclosed involves the use of nodes to perform the function of state equations and calculating the value of particular state variables. After the nodes where classified as either Lead-Type Nodes or Non-Lead-Type Nodes, this classification was used to define a structure of nodes to build a State Machine Block. This State Machine Block model also restricts the location of system inputs and system outputs. The internal structure of a node is discussed with the components being a Function Block and a Complex Impedance Network. The Function Block is typically a multivariable power series and the Complex Impedance Network is a linear circuit of resistors and capacitors. Such a Complex Impedance Network is referred to as an Electrical Component Model. A Complex Impedance Network can also be modeled as a z-transform circuit referred to as the Z-Transform Model. To assist in processing the signal level and the derivative variable through this structure, some C++ code structure are developed and discussed.
The methods, systems and devices of the present invention comprise use of Support Vector Machines and RFE (Recursive Feature Elimination) for the identification of patterns that are useful for medical diagnosis, prognosis and treatment. SVM-RFE can be used with varied data sets.
Identification of a determinative subset of features from within a large set of features is performed by training a support vector machine to rank the features according to classifier weights, where features are removed to determine how their removal affects the value of the classifier weights. The features having the smallest weight values are removed and a new support vector machine is trained with the remaining weights. The process is repeated until a relatively small subset of features remain that is capable of accurately separating the data into different patterns or classes. The method is applied for selecting the smallest number of genes that are capable of accurately distinguishing between medical conditions such as cancer and non-cancer.
A computer-implemented method is provided for ranking features within a large dataset containing a large number of features according to each feature's ability to separate data into classes. For each feature, a support vector machine separates the dataset into two classes and determines the margins between extremal points in the two classes. The margins for all of the features are compared and the features are ranked based upon the size of the margin, with the highest ranked features corresponding to the largest margins. A subset of features for classifying the dataset is selected from a group of the highest ranked features. In one embodiment, the method is used to identify the best genes for disease prediction and diagnosis using gene expression data from micro-arrays.