|
Description  |
|
|
BACKGROUND OF THE INVENTION
This invention relates to instrumentation and related methodology for
determining, primarily in a biological material, one or more unknown
values of a known characteristic (e.g., the concentration of an analyte
such as glucose in blood, or the concentration of at least one blood gas
parameter) with a model based on a set of samples with known analyte
values and a multivariate algorithm. The instrument maximizes performance
while simultaneously minimizing cost. Minimization of cost and
maximization of performance are obvious goals, yet consideration of both
in a systematic fashion is difficult. The ability to perform both
objectives is especially difficult when designing complex optical
instrumentation using multivariate quantitative spectroscopy. Such a
capability is even more critical when designing noninvasive medical
instrumentation due to the significant ramifications associated with
spurious results and the cost containment measures being introduced
throughout the medical profession.
Multivariate calibration techniques, such as principal component regression
(PCR) and partial least squares regression (PLS), have proven to be useful
in conjunction with spectral measurements, allowing for quantitative
analysis of materials and material properties in various forms (gases,
liquids, and solids) in an ever growing number of applications.
Applications include nondestructively determining the composition of
passivation glass deposited on wafers during the fabrication of
semiconductor devices and noninvasively determining the concentration of
glucose in human blood (as set forth in U.S. Pat. No. 4,975,581).
In spectroscopic applications, measurements such as absorbance and
reflectance, are taken at one or more spectral wavelengths. These
measurements are obtained from a number of specimens in which the amount
of the analyte of interest (e.g. glucose, alcohol, and arterial blood
gasses) has been determined by some independent assay (e.g., wet
chemistry). Together, the spectral measurements and results from the
independent assays are used to construct empirical calibration models that
relate the amount of the analyte of interest to the spectral measurements.
These models are then used to predict analyte concentrations of future
samples solely on the basis of the spectral measurements. Quantitative
analysis based on spectral data has advantages over some of the more
traditional methods of analysis because it is often much quicker, less
labor intensive (hence cheaper), and can be nondestructive and/or
noninvasive.
The basis for the many calibration models using absorbance spectroscopy is
Beer's Law. In the limiting case of dilute component concentrations in a
nonabsorbing medium, the absorbance of a sample at wavelength .lambda.,
y(.lambda.), depends upon the concentration of the multiple (p) chemical
species in the sample through Beer's Law, which is
y(.lambda.)=a.sub..lambda. +x.sub.1 k.sub.1 (.lambda.)+x.sub.2 k.sub.2
(.lambda.)+ . . . +x.sub.p k.sub.p (.lambda.)+e.sub..lambda.,,[Equation 1]
where a.sub..lambda. is a spectral intercept, x.sub.i is the concentration
of the i.sup.th chemical species, k.sub.i (.lambda.) is the product of the
optical pathlength with the absorptivity of the i.sup.th chemical
species, and e.sub..lambda. is the measurement error of the absorbance at
wavelength .lambda.. The degree to which Beer's Law is adhered to depends
on the nature of the sample components, the concentrations of the chemical
components as well as the wavelength considered. The performance of
calibration methods is best when Beer's Law provides a good approximation
over the range of wavelengths used.
There has been a rapid evolution of multivariate calibration methods for
analysis of spectral data due to the availability of advanced computerized
optical instrumentation. Early on, spectroscopists used methods based on a
single wavelength to develop a calibration model. However, the usefulness
of these methods, known as univariate methods, is limited to cases where
the spectral response of the analyte of interest is isolated from that of
other components in the material to be analyzed, or the spectral response
of the analyte dominates that of other spectrally absorbing components.
(Note: as typically used herein, analyte refers to the component in the
system to be analyzed, even though there are other components present in
the system). Multivariate calibration models relating an analyte's
concentration to a linear combination of measurements from several
wavelengths were introduced in order to develop models that are useful in
a broader range of conditions; specifically to include cases where the
spectral features of the analyte are overlapped with features of other
components in the material to be analyzed. However, the success of these
techniques generally requires wavelength selection based on specific
knowledge of the spectra of interfering components in the sample material
as well as that of the analyte.
Recently, calibration methods (e.g., PLS) capable of simultaneously using
measurements from a very large number of wavelengths, sometimes identified
as full-spectrum methods, have been introduced. These methods, which are
capable of analyzing rather complex materials, have a number of inherent
advantages (e.g., signal averaging and improved outlier detection) over
methods that use relatively few wavelengths. While still allowing for
overlapped spectra of various components, the capability of using many
wavelengths seemingly eliminates the need for wavelength selection and the
implicit requirement of knowledge of the spectra of interfering
components. The number of potential spectral wavelengths available to use
with a full-spectrum method is often very large, perhaps thousands. Thus,
with full-spectrum methods, spectroscopists often use all available
wavelengths within some broad range. However, in many applications,
measurements from a large number of the spectral wavelengths are
irrelevant or difficult to incorporate in a model because of
non-linearities. While full-spectrum methods like PLS are able to
accommodate non-linearities to some degree, inclusion of irrelevant (or
difficult) spectral measurements in a model can seriously degrade
performance.
Difficult or irrelevant spectral sections are sometimes removed through a
spectral "pre-treatment" step. In such a pre-treatment step subject matter
knowledge can be used to remove regions which exhibit the following
characteristics: (1) regions which lack spectral information; (2) regions
of significant non-linearity; and (3) regions with a poor signal-to-noise
ratio. This type of simple processing is well known and has been
previously used. See Haaland, et al., "Reagentless Near-infrared
determination of glucose in whole blood using multivariate calibration",
Applied Spectroscopy, Vol. 46, Nov. 10, 1992. In the experiment performed
in the above reference, "The spectral region between 4850 and 6600
cm.sup.1 is a water band that is too strongly absorbing and too variable
in intensity to aid in the analysis".
In many and probably most situations where full-spectrum methods are
utilized, practitioners often use all wavelengths. Wavelength selection is
not recommended or deemed necessary as evidenced by:
1. Howard Mark, "A Computerized Study of the Effect of Noise on Wavelength
Selection during Computerized Wavelength Searches," Applied Spectroscopy,
42, 1427-1440 (1988): "Currently there is a trend toward use of
calibration methods, such a Principal Component Analysis (PCA) and Partial
Least Squares (PLS), that do not require wavelength selection because data
at all available wavelengths are used."
2. John H. Kalivas, Nancy Roberts, and Jon M. Sutter, "Global Optimization
by Simulated Annealing with Wavelength Selection for Ultraviolet-Visible
Spectrophotometry," Analytical Chemistry, 61, 2024-2030 (1989): "Two of
the most common techniques used in spectral chemical analysis are
principal components regression (PCR) and partial least squares (PLS).
Even though wavelength searches are not necessary, the proper number of
factors to include in an analysis must be established, which represents a
computational search as well."
With advice such as the foregoing, practitioners often use full spectrum
methods in conjunction with all wavelengths within some broad range. For
example, in a very recent peer reviewed paper on the noninvasive analysis
of blood glucose, Marbach, et al., "Noninvasive Blood Glucose Assay by
Near-Infrared Diffuse Reflection Spectroscopy of the Human Inner Lip,"
Applied Spectroscopy, 47, pp. 875-881 (1993), the authors use all
wavelengths in a very wide spectral range. Thus, it is dear that even
leading researchers in the field on noninvasive glucose measurement do not
recognize the potential benefits of using wavelength selection in
conjunction with full-spectrum methods.
In relatively simple cases involving materials with only a few components,
spectroscopists can sometimes select wavelength regions for analysis based
on knowledge of where the components are spectrally active and likely to
follow Beer's Law. However, when analyzing materials of a more complex
nature (e.g., human tissue or other biological material), wavelength
selection is much more difficult. There are a number of reasons for this.
First, some of the material components may be unknown. This is especially
true in noninvasive applications where the entire molecular structure of
the biological material (e.g., finger) is not known. Furthermore, even if
a component is known, its characteristic spectral signature may be
modified by variable experimental conditions (e.g., temperature) and the
host medium. Even if all components along with their associated spectral
signatures are known, considerable spectral overlap among different
components (such as found in the near-infrared region among, for instance,
glucose, urea, blood urea nitrogen (BUN), alcohol, and cholesterol) can
make wavelength selection very difficult. Physical and chemical
interactions among components, along with other sources of deviations from
Beer's Law, also impede the ability to select wavelengths.
Commonly used approaches given for wavelength selection in complex
situations are based on criteria that do not utilize the
interrelationships among measurements at multiple spectral wavelengths.
See Hruschka, W. R., "Data Analysis: Wavelength Selection Methods," in
Near-Infrared Technology in the Agricultural and Food Industries,
(Williams, P., and Norris K. editors), American Association of Cereal
Chemists, Inc., St. Paul, Minn. (1987), and Brown, P. J., Spiegelman, C.
H., and Denham, M. C., "Chemometrics and Spectral Frequency Selection,"
Philosophical Transactions of the Royal Society of London, Series A, 337,
311-322 (1991). By not considering the interrelationships among
measurements, these methods will miss synergistic effects that could
ultimately have significant positive effects on model performance.
Thus, for complex problems such as the measurement of blood analytes, there
is a fundamental need for development of systematic wavelength selection
that uses the interrelationships among measurements. Such a systematic and
reliable procedure for wavelength selection dramatically improves the
performance of these full spectrum methods and greatly broadens their use
to more complex problems.
To demonstrate the need for wavelength or frequency selection when using
full-spectrum methods such as PLS, consider a simple hypothetical chemical
system with a single spectrally active component. For this example, it is
assumed that at unit concentration the spectrally active component (which
is the analyte of interest) exhibits the Gaussian absorbance spectrum
illustrated in FIG. 1 with q=101 frequencies. Following Beer's Law for a
single component system, when the concentration of the analyte of interest
is x, the t.sup.th element (t=1, 2, . . . , q) of the spectrum is
y.sub.t =x.multidot.b.sub.t +.epsilon..sub.t, [Equation2]
where
##EQU1##
and the .epsilon..sub.t are independent and identically distributed normal
(.mu.=0, .sigma..sup.2 =0.01) measurement errors. Note that frequency 51
(t=51) has the largest signal and therefore the best signal-to-noise
characteristics. In contrast, frequencies far away from the center of the
frequency range contain virtually no signal and therefore have very poor
signal-to-noise characteristics.
A small simulation study was conducted to evaluate the effect on the
predictive ability of three full-spectrum calibration methods in
conjunction with various subsets of the spectrum. The three full-spectrum
methods are PLS, and two variations of a method based on the explicit
Beer's Law model (Eqn. 2). PLS modeling was performed by using one latent
variable and centering both concentration and spectral data. The two
variations of the method based on the explicit model (denoted LS/LS and
LS/ML) are differentiated by the methods used in the prediction phase. The
model parameters, {b.sub.t }, are estimated similarly in the calibration
phase for both variants of the method using least-squares regression and
the explicit model given by Eqn. 2. Estimation of the analyte
concentration of a new sample .chi. by using {b.sub.t } and its associated
new spectrum (Y) is accomplished by using least-squares regression (LS/LS)
and maximum likelihood estimation (LS/ML). Note that of the three
calibration methods that were considered, only PLS is suitable for use in
complex situations with unknown spectrally active components with
overlapping spectral features. Six different subsets of the 101
frequencies were used for analysis. They are A.sub.1 ={49, 50, 51, 52,
53}, A.sub.2 ={46, 47, . . . , 56}, A.sub.3 ={41, 42, . . . , 61}, A.sub.4
={31, 32, . . . , 71}, A.sub.5 ={21, 22, . . . , 81}, and A.sub.6 ={1, 2,
. . . , 101}. Note that the frequency subsets are centered around
frequency 51 and add increasing numbers of additional frequencies. A.sub.1
contains the five frequencies with the best signal-to-noise, while A.sub.6
contains all 101 frequencies, many of which contain very little useful
signal.
For the simulation study fifty calibration sets were generated, each with
five observations. The set of analyte concentrations corresponding to the
five observations are {0.1, 0.3, 0.4, 0.5, 0.7}. For each observation, a
spectrum was constructed based on Eqn. 2. Calibration models using each of
the six sets of frequencies (A.sub.1, A.sub.2, . . . , A.sub.6) were
constructed using PLS and least-squares regression. For each calibration
set, fifty new spectra were generated based on Eqn. 2 with a fixed analyte
concentration, .chi.. These spectra along with the constructed calibration
models and prediction methods were then used to predict .chi.. This
complete procedure was repeated three times with
.chi..epsilon.{0.1,0.5,0.9}
##EQU2##
where .chi..sub.i is the predicted value of .chi., for each of the various
simulation conditions (defined by .chi., the calibration method, and
frequency set) is set forth in Table 1.
TABLE I
______________________________________
RMSPE versus Frequency Subset
Frequency Subset
.chi.
Method A.sub.1 A.sub.2
A.sub.3
A.sub.4
A.sub.5
A.sub.6
______________________________________
.1 PLS .145 .118 .140 .176 .196 .235
LS/LS .114 .084 .073 .070 .069 .070
LS/ML .121 .090 .081 .083 .087 .099
.5 PLS .110 .087 .079 .081 .081 .088
LS/LS .131 .100 .097 .119 .144 .189
LS/ML .140 .109 .091 .092 .094 .107
.9 PLS .179 .193 .203 .268 .321 .375
LS/LS .150 .134 .126 .190 .234 .338
LS/ML .163 .128 .109 .107 .107 .117
______________________________________
From Table 1, it is apparent that the effect of including frequencies that
have poor S/N depends on both the calibration method and the value of
.chi.. The ability to predict .chi., provided by LS/ML appears to be
rather insensitive to the inclusion of irrelevant frequencies for all
three values of .chi.. On the other hand, the prediction abilities of PLS
and LS/LS are quite sensitive to the inclusion of irrelevant frequencies
for two of the three values of .chi.. Except in the instance where
.chi.=0.5, the performance of PLS degenerates significantly as more and
more frequencies with poor S/N are included. PLS performance is
insensitive to the inclusion of irrelevant frequencies when .chi.=0.5.
This is due to the fact that, with a poor model, PLS-predictions tend to
be biased toward the average value in the calibration set, which in this
case is close to 0.5. Except in the case where .chi.0.1, the performance
of LS/LS degenerates significantly as more and more frequencies with poor
S/N are included. The performance of LS/LS is insensitive to using
irrelevant frequencies when .chi.=1 because of the fact that use of LS in
the prediction phase tends to produce predictions biased towards zero,
which is relatively close to 0.1 (see Thomas 1991). From this simple study
it is clear that use of irrelevant frequencies can seriously degrade the
performance of full-spectrum calibration methods. Note that other
full-spectrum methods, such as PCR, also exhibit behavior wherein if
irrelevant wavelengths are included, predictions tend to be biased towards
the average value of the analyte of interest in the calibration set.
As described above, PLS becomes increasingly sensitive to the inclusion of
additional wavelengths when utilizing the algorithm to predict on analyte
concentrations removed from the average. This fact of PLS analysis is
extremely relevant in designing a noninvasive glucose monitor. In patients
with and without diabetes the average glucose concentrations are
approximately 150 and 100 mg/dL, respectively. Due to an inability to
control their glucose levels, diabetic patients' glucose level can fall
below 80 mg/dl. At this point the patient starts to become hypoglycemic.
Hypoglycemia, especially below 50 mg/dl, is a very dangerous condition as
the patient can experience "insulin shock" and become comatose. Diabetic
patients fear hypoglycemia because they no longer function normally and
are often unaware of the compromised state.
If all wavelengths are included, PLS predictions will tend to be biased
toward the average value of glucose in the calibration set. In operation a
patient may have a glucose concentration of 80 mg/dl but a monitor using
all wavelengths might predict 100 mg/dl. Given this information, a
diabetic patient may take no action to correct his/her glucose level, even
though approaching a dangerously low level. If no action is undertaken,
the patient may experience severe hypoglycemia and its possibly severe
consequences. Thus, accurate readings at below average glucose
concentrations are extremely important. The ability to select those
wavelength subsets for use in the multivariate algorithm that maximize
performance and minimize PLS' tendency to bias extreme results to the
average is of importance in the design of a noninvasive home glucose
monitor for use by the diabetic patient.
Despite the importance of wavelength selection as demonstrated above,
effective methods for wavelength selection remain inadequate for complex
situations requiring multivariate spectral analysis. This is especially
true when Beer's law is not followed or when not all components are known.
In conditions where Beer's Law is followed and all spectrally active
components in the sample material (and associated spectra) are known, then
except for measurement error, the q-vector of absorbancies for a single
sample (y.sub.i) is given by y.sub.i =B.multidot.x.sub.i, where x.sub.i is
the p-vector of concentrations, and B=(b.sub.1, b.sub.2, . . . , b.sub.p)
contains the spectra of each of the p spectrally active components when
each are at unit concentration. If B is known, there are various
approaches for selecting wavelengths (see e.g., Kalivas, J. H., and
Kalivas, J. H., "Evaluation of Experimental Designs for Multicomponent
Determinations by Spectrophotometry," Analytica Chimica Acta, 207, 125-135
(1988)). These approaches are most often used in conjunction with inverse
least-squares regression (see Haaland, D. M., and Thomas, E. V., "Partial
Least-Squares Methods for Spectral Analyses. 1. Relation to Other
Quantitative Calibration Methods and the Extraction of Qualitative
Information," Analytical Chemistry, 60, 1193-1202 (1988)) and related
procedures where it is possible to use only a limited number of spectral
wavelengths. For example, suppose that is a subset of the q potential
wavelengths containing q* elements. Let B.sub. represent the
corresponding q*.times.p submatrix of B. Procedures have been proposed for
searching for subsets of wavelengths that optimize some metric relating to
sensitivity and/or selectivity of the frequency set to the analyte of
interest, such as the condition number of B.sub. (see e.g., Juhl and
Kalivas 1988). For example, Kalivas, J. H., Roberts, N., and Sutter, J.
M., "Global Optimization by Simulated Annealing with Wavelength Selection
for Ultraviolet-Visible Spectrophotometry," Analytical Chemistry, 61,
2024-2030 (1989), advocate using simulated annealing and Lucasius, C. B.,
and Kateman, G., "Genetic Algorithms for Large-Scale Optimization in
Chemometrics: An Application," Trends in Analytical Chemistry, 10, 254-261
(1991) propose using genetic algorithms to search for wavelengths that
minimize the condition number of B.sub. . All of these procedures assume
that Beer's Law is followed and the spectra of all spectrally active
components in the sample material are known. Unless Beer's Law is
followed, the optimization metrics associated with these procedures do not
necessarily relate directly to prediction performance. Thus, the
usefulness of these methods in complex situations is limited.
Li, Tong-Hua, Lucasius, C. B., and Kateman, G., "Optimization of
calibration data with the dynamic genetic algorithm," Analytica Chimica
Acta, 268, 123-234 (1992) describe the use of genetic algorithms (GAs) to
"optimize calibration data sets and enhance the predictive ability of a
calibration model successfully". Additionally, Li, et al. state that "GAs
should be tested on higher level problems." In summary, the article by Li,
et al., teaches the use of genetic algorithms for wavelength selection
utilizing the predicted error sum of squares (PRESS) as the fitness
function. However, the article does not teach how to interpret the
resulting data nor how to perform the wavelength selection process for
development of optical instruments. No mention is made of minimizing
instrument cost or how to specifically optimize instrument performance.
Further, no mechanism or method is described for selection of those
wavelengths or wavelength subsets that yield optimal results. Thus, the
article is a general overview of genetic algorithms but does not teach a
method or methodology for implementation in a practical, systematic
manner. In addition to the foregoing, as Li, et al. was published in
October 1992, applicants do not concede that it is prior art to them.
In complex situations, where Beer's Law does not provide a good
approximation throughout the spectrum or not all spectrally active
components are known, there are very few existing procedures for
wavelength selection. Most procedures are associated with calibration
methods (e.g., inverse least squares) that are capable of using relatively
few wavelengths because of problems with collinearity of the spectral
measurements. Stepwise (forward) regression is often used in conjunction
with these calibration methods (e.g., see Hruschka (1987)). Although this
procedure can utilize the synergy among wavelengths, it is often fraught
with difficulties such as overfitting.
In the process of performing quantitative spectroscopy, there are two
important terms to understand clearly. Those wavelengths that are
"predictive" are those wavelengths that are useful in modeling the
relationship between spectral information and analyte concentration.
"Synergistic" wavelengths are those wavelengths that when used singularly
have a given ability to model the relationship between spectral
information and analyte concentration, but when used together have an
enhanced capability of modeling the relationship.
Other procedures search for wavelengths which empirically exhibit good
selectivity, sensitivity, and linearity for the analyte of interest over
the training set. Consider the model y.sub.it =a.sub.t +x.sub.i
.multidot.b.sub.t +f.sub.it +.epsilon..sub.it where x.sub.i is the analyte
concentration of the i.sup.th sample in the calibration set, y.sub.it is
the response of i.sup.th sample at the t.sup.th wavelength, a.sub.t and
b.sub.t are parameters, f.sub.it represents contributions from other
spectrally active components and/or deviations from Beer's Law due the
presence of non-linearities, and .epsilon..sub.it is a random measurement
error with a mean of zero and variance, .sigma..sub.t.sup.2. The object of
these search procedures is to find wavelengths where b.sub.t is relatively
large for a single analyte and f.sub.it and .epsilon..sub.it are
relatively small for all samples in the training set. In conjunction with
full-spectrum and limited-wavelength calibration methods, near-infrared
spectroscopists often rely on correlation plots (e.g., see Hruschka 1987)
to search for appropriate wavelengths for use. Based on the n samples in
the calibration set, a correlation plot is a spectrum given by the set of
univariate correlations, {R.sub.t }, between the x.sub.i 's and Y.sub.it
's. Wavelengths whose measurements exhibit a high degree of correlation
with the amount/concentration, of the analyte of interest, measured by
R.sub.t.sup.2, are selected. This technique does not account for synergy
among the different wavelengths.
A related method was recently proposed by Brown et al. (1991) and Brown
(1992). Rather than use R.sub.t.sup.2 as a measure for selecting
wavelengths, Brown and his colleagues recommend selecting wavelengths
associated with large values of
##EQU3##
where b.sub.i is the simple least-squares estimate of b.sub.t and
.sigma..sub.t.sup.2 is an estimate of .sigma..sub.t.sup.2. Again, this
method does not account for synergy among measurements at different
wavelengths. As set forth in the Description of the Preferred Embodiment,
this synergy can be very desirable.
In order for the correlation plot or the method proposed by Brown and his
colleagues to be useful, wavelengths specific to the analyte of interest
with good signal-to-noise are needed. However, the required wavelength
specificity is not always available. For example, in the near-infrared
spectrum there is considerable overlap between spectral responses of many
different chemical species which often appear together in biological
specimens. Therefore, it is very doubtful whether these procedures have
much utility when analyzing complex biological materials, such as human
tissue in this or any other spectral region.
Because subject-matter knowledge is insufficient to select wavelengths and
the search space of possible wavelength subsets is too large to be
searched exhaustively (2.sup.q possible combinations of wavelength
subsets, where q can be in the hundreds or thousands), some method is
needed to determine which points in the search space should be sampled. We
have determined that the use of genetic selection criteria, specifically
genetic algorithms, form a class of techniques for carrying out this
search. Genetic algorithms rely on the analogy between a bit string and a
chromosome. Under this analogy, an initial population of bit strings
(subsets of wavelengths) is generated randomly. The fitness of each member
of the population is evaluated. The fitness values are used to eliminate
weak individuals (subsets with low fitness) and replicate those with high
fitness. Through interaction of this procedure, the genetic algorithm will
eventually converge to wavelength subsets that have high fitness (meaning
low cost/high performance).
The seminal work on genetic algorithms was provided by Holland, J. H.,
"Genetic Algorithms and the Optimal Allocations of Trials," SIAM Journal
of Computing, 2, 88-105 (1973), and Adaptation in Natural and Artificial
Systems, The University of Michigan Press: Ann Arbor (1975). Since then,
there has been a great deal of activity in the area. Unlike traditional
methods of optimization, genetic algorithms have been shown to work well
over a broad range of difficult problems (e.g., see Davis, L. (editor),
Handbook of Genetic Algorithms, Van Nostrand Reinhold (1991). Goldberg, D.
E., Genetic Algorithms in Search, Optimization, and Machine Learning,
Addison-Wesley (1989) provides a very readable introduction to genetic
algorithms as well as applications which include problems in science,
business, and engineering. In the area of chemometrics several authors
have very recently proposed using genetic algorithms in a number of
applications (see e.g. Li, et al. (1992), and Lucasius, et al., (1991)).
With regard to the use of genetic algorithms in the context of wavelength
selection, first, suppose that there are q potential wavelengths to choose
from when building a calibration model. The notion of a binary string, S,
with dimension q, will be used to indicate the set of wavelengths that are
used to build the model. This binary representation is key to using
genetic algorithms for this problem. The biological analog of S is a
chromosome. Each binary element of S, analogous to a gene, indicates
whether its associated wavelength is or is not used for modeling. For
example, if S={1, 1, 0, 1, 0, 0}, then wavelengths 1, 2, and 4 (of six)
are used for modeling. With the additional specification of the method
used to build the calibration model (e.g. PLS), S provides a
straightforward index for model identification.
In order to search for sets of wavelengths (represented by binary strings)
that yield good performance, it is necessary to specify a reasonable
performance metric, denoted here by the term fitness (F). The fitness of a
certain wavelength subset (represented by a binary string) is a single
numerical measure of how well that subset meets these criteria. The
likelihood that an individual binary string contributes to the next
generation of binary strings is related directly to the fitness of that
string. In the analogous context of Darwinian evolution, the likelihood
that an individual will live (hence contribute genetic material to the
next generation) is related directly to the fitness of that individual.
For wavelength selection purposes, we will allow the fitness of each
string to be various decreasing functions of the standard error of
prediction (SEP) based on cross validation (see Stone, M.,
"Cross-Validatory Choice and Assessment of Statistical Predictions,"
Journal of the Royal Statistical Socieity, Series B, 36, 111-133 (1974)).
Also note that, unlike the metric proposed by Lucasius and Kateman, 1991
(condition number), the SEP is a direct measure of performance. The SEP
is, however, a very complicated non-linear function of S and can be
obtained only through intensive computational means.
The inadequacies of the prior methods can be overcome by the use of a
systematic search or optimization process based on genetic algorithms.
Genetic algorithms differ from traditional methods of optimization in some
important ways. While most traditional methods of optimization move from a
single point in the search space to another, genetic algorithms move from
a set of points to another set. Each successive set of points will be
referred to as a generation, with each generation containing r binary
strings. Unlike traditional methods of optimization that rely on
deterministic transition rules to move throughout the search space,
genetic algorithms use probabilistic transition rules embodied within a
number of operators. The three operators that are common among the many
variations of genetic algorithms are reproduction, crossover, and
mutation.
The first step is to form the first generation of the S's, consisting of r
q-dimensional binary strings and denoted by G.sup.1 ={S.sub.1.sup.1,
S.sub.2.sup.1, . . . , S.sub.r.sup.1 }. Note that the effectiveness of the
genetic algorithms depends on the diversity within G.sup.1. Therefore,
pseudo-random number generators are often used to create this first
generation of strings. Next, the fitness of each of the models specified
by the r strings in G.sup.1 are obtained.
The next generation of bit strings, G.sup.2 ={S.sub.1.sup.2, S.sub.2.sup.2,
. . . , S.sub.r.sup.2 }, is formed in three stages. First, r individual
strings are selected from G.sup.1 with replacement, where the probability
of selecting an individual string is proportional to its fitness. This
process is referred to as reproduction. The reproduced strings are used as
the basis for constructing the next generation. In this way, strings with
a higher fitness values will have a higher probability of contributing to
the next generation. Following reproduction, crossover proceeds in two
steps. First, members of the newly reproduced strings are paired (mated) | | |