WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Systematic wavelength selection for improved multivariate spectral analysis    
United States Patent5435309   
Link to this pagehttp://www.wikipatents.com/5435309.html
Inventor(s)Thomas; Edward V. (2828 Georgia NE., Albuquerque, NM 87110); Robinson; Mark R. (1603 Solano NE., Albuquerque, NM 87110); Haaland; David M. (809 Richmond Dr. SE., Albuquerque, NM 87106)
AbstractMethods and apparatus for determining in a biological material one or more unknown values of at least one known characteristic (e.g. the concentration of an analyte such as glucose in blood or the concentration of one or more blood gas parameters) with a model based on a set of samples with known values of the known characteristics and a multivariate algorithm using several wavelength subsets. The method includes selecting multiple wavelength subsets, from the electromagnetic spectral region appropriate for determining the known characteristic, for use by an algorithm wherein the selection of wavelength subsets improves the model's fitness of the determination for the unknown values of the known characteristic. The selection process utilizes multivariate search methods that select both predictive and synergistic wavelengths within the range of wavelengths utilized. The fitness of the wavelength subsets is determined by the fitness function F=.function.(cost, performance). The method includes the steps of: (1) using one or more applications of a genetic algorithm to produce one or more count spectra, with multiple count spectra then combined to produce a combined count spectrum; (2) smoothing the count spectrum; (3) selecting a threshold count from a count spectrum to select these wavelength subsets which optimize the fitness function; and (4) eliminating a portion of the selected wavelength subsets. The determination of the unknown values can be made: (1) noninvasively and in vivo; (2) invasively and in vivo; or (3) in vitro.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5435309
Systematic wavelength selection for improved multivariate spectral

     analysis - US Patent 5435309 Drawing
Systematic wavelength selection for improved multivariate spectral analysis
Inventor     Thomas; Edward V. (2828 Georgia NE., Albuquerque, NM 87110); Robinson; Mark R. (1603 Solano NE., Albuquerque, NM 87110); Haaland; David M. (809 Richmond Dr. SE., Albuquerque, NM 87106)
Owner/Assignee    
Patent assignment
All assignments
Publication Date     July 25, 1995
Application Number     08/104,857
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     August 10, 1993
US Classification     600/310 250/339.12 356/39 356/300 356/320 356/436 600/326 600/364
Int'l Classification     A61B 005/00
Examiner     Sykes; Angela D.
Assistant Examiner    
Attorney/Law Firm     Morgan; DeWitt M.
Address
Parent Case    
Priority Data    
USPTO Field of Search     128/633 128/634 128/635 128/632 128/633 128/634 128/635 356/39 356/40 356/41 356/51 356/300 356/39 356/40 356/41 356/306 356/320 356/432 356/39 356/40 356/41
Patent Tags     systematic wavelength selection improved multivariate spectral analysis
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5204532
Rosenthal
250/341.5
Apr,1993

[0 after 0 votes]
5120961
Levin
250/339.07
Jun,1992

[0 after 0 votes]
5088493
Giannini

Feb,1992

[0 after 0 votes]
5003977
Suzuki
600/317
Apr,1991

[0 after 0 votes]
4883963
Kemeny
250/339.11
Nov,1989

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What we claim is:

1. In a method for use with optical instrumentation for determining one or more unknown values of at least one known characteristic by an optical measurement, said method including the steps of:

(a) irradiating said material having said unknown values of said known characteristic with electromagnetic energy including at least several wavelengths so that there is differential absorption of at least some of said wavelengths by said material as a function of said wavelengths and said characteristic, said differential absorption causing intensity variations of said wavelengths incident from said material as a function of said wavelengths and said unknown values of said known characteristic;

(b) measuring said intensity variations from said material; and

(c) calculating said unknown values of said known characteristic in said material from said measured intensity variations utilizing an algorithm and a model, said algorithm being capable of using all independent sources of intensity variations v. wavelengths information obtained from irradiating a set of samples with a range of wavelengths in which said values of said known characteristic are known, said algorithm also being capable of using more wavelengths than samples in said set of samples, said model constructed from said set of samples and being a function of said known values of said characteristic and said intensity variations v. wavelengths information obtained from irradiating said set of samples, the improvement comprising selecting multiple variable subsets for generation and use in an improved model, each of said subsets containing one or more variables, said model being improved by selecting said multiple variable subsets from the set of instrument variables and wherein said algorithm with said improved model improves the fitness for said determination of said unknown values of said known characteristic, said selection process utilizing multivariate search methods that select both predictive and synergistic variables.

2. The method as set forth in claim 1, wherein said variables include said wavelengths used by said optical instrumentation to irradiate said material.

3. The method as set forth in claim 2, wherein said wavelength subset selection step is made independent of the knowledge of the spectral features of the characteristics of said material.

4. The method as set forth in claim 1, wherein said fitness (F) is defined as:

F=.function.(cost, performance).

5. The method as set forth in claim 4, wherein factors utilized in determining said cost contribution to said fitness are selected from the group including: measurement time, instrument resolution, wavelength range, and number of wavelength subsets measured.

6. The method as set forth in claim 4, wherein factors utilized in determining said performance contribution to said fitness are selected from the group including: SEP, outlier detection, range of said values of said known characteristic covered by said model, the robustness of said model, and the ease of transferability of said model.

7. The method as set forth in claim 4, wherein said variable subset selection process is made utilizing a genetic algorithm.

8. The method as set forth in claim 7, further including the step of generating a count spectrum.

9. The method as set forth in claim 8, wherein said variable subset selection process further includes the step of selecting a threshold count from said count spectrum to select said variable subsets.

10. The method as set forth in claim 9, wherein said variable subset selection process includes the step of eliminating a portion of said selected variable subsets.

11. The method as set forth in claim 8, wherein said variable subset selection process further includes the step of smoothing said count spectrum.

12. The method as set forth in claim 11, wherein said variable subset selection process further includes the step of selecting a threshold count from said count spectrum to select said variable subsets.

13. The method as set forth in claim 12, wherein said variable subset selection process includes the step of eliminating a portion of selected variable subsets.

14. The method as set forth in claim 7, wherein said variable subset selection process is performed by using multiple applications of said genetic algorithm.

15. The method as set forth in claim 14, further including the step of generating multiple count spectra, and wherein said count spectra are combined to produce a combined count spectrum.

16. The method as set forth in claim 15, wherein said variable subset selection process further includes the step of selecting a threshold count from said combined count spectrum to select said variable subsets.

17. The method as set forth in claim 16, wherein said variable subset selection process includes the step of eliminating a portion of said selected variable subsets.

18. The method as set forth in claim 15, wherein said variable subset selection process further includes the step of smoothing said combined count spectrum.

19. The method as set forth in claim 18, wherein said variable subset selection process further includes the step of selecting a threshold count from said combined count spectrum to select said variable subsets.

20. The method as set forth in claim 19, wherein said variable subset selection process includes the step of eliminating a portion of said selected variable subsets.

21. The method as set forth in claim 1, wherein said algorithm is selected from the group including PLS, PLS2, PCR, CLS, Q-matrix, cross-correlation, Kalman filtering, neural networks, and continuum regression.

22. The method as set forth in claim 1, wherein said determination is made for a known characteristic of a solid material.

23. The method as set forth in claim 1, wherein said determination is made for a known characteristic of a liquid.

24. The method as set forth in claim 1, wherein said determination is made for a known characteristic of a gas.

25. The method as set forth in claim 1, wherein said variables utilized in said variable selection are selected from the group including: physical properties, chemical properties, temperature, the number of factors used in the model, and wavelengths.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

This invention relates to instrumentation and related methodology for determining, primarily in a biological material, one or more unknown values of a known characteristic (e.g., the concentration of an analyte such as glucose in blood, or the concentration of at least one blood gas parameter) with a model based on a set of samples with known analyte values and a multivariate algorithm. The instrument maximizes performance while simultaneously minimizing cost. Minimization of cost and maximization of performance are obvious goals, yet consideration of both in a systematic fashion is difficult. The ability to perform both objectives is especially difficult when designing complex optical instrumentation using multivariate quantitative spectroscopy. Such a capability is even more critical when designing noninvasive medical instrumentation due to the significant ramifications associated with spurious results and the cost containment measures being introduced throughout the medical profession.

Multivariate calibration techniques, such as principal component regression (PCR) and partial least squares regression (PLS), have proven to be useful in conjunction with spectral measurements, allowing for quantitative analysis of materials and material properties in various forms (gases, liquids, and solids) in an ever growing number of applications. Applications include nondestructively determining the composition of passivation glass deposited on wafers during the fabrication of semiconductor devices and noninvasively determining the concentration of glucose in human blood (as set forth in U.S. Pat. No. 4,975,581).

In spectroscopic applications, measurements such as absorbance and reflectance, are taken at one or more spectral wavelengths. These measurements are obtained from a number of specimens in which the amount of the analyte of interest (e.g. glucose, alcohol, and arterial blood gasses) has been determined by some independent assay (e.g., wet chemistry). Together, the spectral measurements and results from the independent assays are used to construct empirical calibration models that relate the amount of the analyte of interest to the spectral measurements. These models are then used to predict analyte concentrations of future samples solely on the basis of the spectral measurements. Quantitative analysis based on spectral data has advantages over some of the more traditional methods of analysis because it is often much quicker, less labor intensive (hence cheaper), and can be nondestructive and/or noninvasive.

The basis for the many calibration models using absorbance spectroscopy is Beer's Law. In the limiting case of dilute component concentrations in a nonabsorbing medium, the absorbance of a sample at wavelength .lambda., y(.lambda.), depends upon the concentration of the multiple (p) chemical species in the sample through Beer's Law, which is

y(.lambda.)=a.sub..lambda. +x.sub.1 k.sub.1 (.lambda.)+x.sub.2 k.sub.2 (.lambda.)+ . . . +x.sub.p k.sub.p (.lambda.)+e.sub..lambda.,,[Equation 1]

where a.sub..lambda. is a spectral intercept, x.sub.i is the concentration of the i.sup.th chemical species, k.sub.i (.lambda.) is the product of the optical pathlength with the absorptivity of the i.sup.th chemical species, and e.sub..lambda. is the measurement error of the absorbance at wavelength .lambda.. The degree to which Beer's Law is adhered to depends on the nature of the sample components, the concentrations of the chemical components as well as the wavelength considered. The performance of calibration methods is best when Beer's Law provides a good approximation over the range of wavelengths used.

There has been a rapid evolution of multivariate calibration methods for analysis of spectral data due to the availability of advanced computerized optical instrumentation. Early on, spectroscopists used methods based on a single wavelength to develop a calibration model. However, the usefulness of these methods, known as univariate methods, is limited to cases where the spectral response of the analyte of interest is isolated from that of other components in the material to be analyzed, or the spectral response of the analyte dominates that of other spectrally absorbing components. (Note: as typically used herein, analyte refers to the component in the system to be analyzed, even though there are other components present in the system). Multivariate calibration models relating an analyte's concentration to a linear combination of measurements from several wavelengths were introduced in order to develop models that are useful in a broader range of conditions; specifically to include cases where the spectral features of the analyte are overlapped with features of other components in the material to be analyzed. However, the success of these techniques generally requires wavelength selection based on specific knowledge of the spectra of interfering components in the sample material as well as that of the analyte.

Recently, calibration methods (e.g., PLS) capable of simultaneously using measurements from a very large number of wavelengths, sometimes identified as full-spectrum methods, have been introduced. These methods, which are capable of analyzing rather complex materials, have a number of inherent advantages (e.g., signal averaging and improved outlier detection) over methods that use relatively few wavelengths. While still allowing for overlapped spectra of various components, the capability of using many wavelengths seemingly eliminates the need for wavelength selection and the implicit requirement of knowledge of the spectra of interfering components. The number of potential spectral wavelengths available to use with a full-spectrum method is often very large, perhaps thousands. Thus, with full-spectrum methods, spectroscopists often use all available wavelengths within some broad range. However, in many applications, measurements from a large number of the spectral wavelengths are irrelevant or difficult to incorporate in a model because of non-linearities. While full-spectrum methods like PLS are able to accommodate non-linearities to some degree, inclusion of irrelevant (or difficult) spectral measurements in a model can seriously degrade performance.

Difficult or irrelevant spectral sections are sometimes removed through a spectral "pre-treatment" step. In such a pre-treatment step subject matter knowledge can be used to remove regions which exhibit the following characteristics: (1) regions which lack spectral information; (2) regions of significant non-linearity; and (3) regions with a poor signal-to-noise ratio. This type of simple processing is well known and has been previously used. See Haaland, et al., "Reagentless Near-infrared determination of glucose in whole blood using multivariate calibration", Applied Spectroscopy, Vol. 46, Nov. 10, 1992. In the experiment performed in the above reference, "The spectral region between 4850 and 6600 cm.sup.1 is a water band that is too strongly absorbing and too variable in intensity to aid in the analysis".

In many and probably most situations where full-spectrum methods are utilized, practitioners often use all wavelengths. Wavelength selection is not recommended or deemed necessary as evidenced by:

1. Howard Mark, "A Computerized Study of the Effect of Noise on Wavelength Selection during Computerized Wavelength Searches," Applied Spectroscopy, 42, 1427-1440 (1988): "Currently there is a trend toward use of calibration methods, such a Principal Component Analysis (PCA) and Partial Least Squares (PLS), that do not require wavelength selection because data at all available wavelengths are used."

2. John H. Kalivas, Nancy Roberts, and Jon M. Sutter, "Global Optimization by Simulated Annealing with Wavelength Selection for Ultraviolet-Visible Spectrophotometry," Analytical Chemistry, 61, 2024-2030 (1989): "Two of the most common techniques used in spectral chemical analysis are principal components regression (PCR) and partial least squares (PLS). Even though wavelength searches are not necessary, the proper number of factors to include in an analysis must be established, which represents a computational search as well."

With advice such as the foregoing, practitioners often use full spectrum methods in conjunction with all wavelengths within some broad range. For example, in a very recent peer reviewed paper on the noninvasive analysis of blood glucose, Marbach, et al., "Noninvasive Blood Glucose Assay by Near-Infrared Diffuse Reflection Spectroscopy of the Human Inner Lip," Applied Spectroscopy, 47, pp. 875-881 (1993), the authors use all wavelengths in a very wide spectral range. Thus, it is dear that even leading researchers in the field on noninvasive glucose measurement do not recognize the potential benefits of using wavelength selection in conjunction with full-spectrum methods.

In relatively simple cases involving materials with only a few components, spectroscopists can sometimes select wavelength regions for analysis based on knowledge of where the components are spectrally active and likely to follow Beer's Law. However, when analyzing materials of a more complex nature (e.g., human tissue or other biological material), wavelength selection is much more difficult. There are a number of reasons for this. First, some of the material components may be unknown. This is especially true in noninvasive applications where the entire molecular structure of the biological material (e.g., finger) is not known. Furthermore, even if a component is known, its characteristic spectral signature may be modified by variable experimental conditions (e.g., temperature) and the host medium. Even if all components along with their associated spectral signatures are known, considerable spectral overlap among different components (such as found in the near-infrared region among, for instance, glucose, urea, blood urea nitrogen (BUN), alcohol, and cholesterol) can make wavelength selection very difficult. Physical and chemical interactions among components, along with other sources of deviations from Beer's Law, also impede the ability to select wavelengths.

Commonly used approaches given for wavelength selection in complex situations are based on criteria that do not utilize the interrelationships among measurements at multiple spectral wavelengths. See Hruschka, W. R., "Data Analysis: Wavelength Selection Methods," in Near-Infrared Technology in the Agricultural and Food Industries, (Williams, P., and Norris K. editors), American Association of Cereal Chemists, Inc., St. Paul, Minn. (1987), and Brown, P. J., Spiegelman, C. H., and Denham, M. C., "Chemometrics and Spectral Frequency Selection," Philosophical Transactions of the Royal Society of London, Series A, 337, 311-322 (1991). By not considering the interrelationships among measurements, these methods will miss synergistic effects that could ultimately have significant positive effects on model performance.

Thus, for complex problems such as the measurement of blood analytes, there is a fundamental need for development of systematic wavelength selection that uses the interrelationships among measurements. Such a systematic and reliable procedure for wavelength selection dramatically improves the performance of these full spectrum methods and greatly broadens their use to more complex problems.

To demonstrate the need for wavelength or frequency selection when using full-spectrum methods such as PLS, consider a simple hypothetical chemical system with a single spectrally active component. For this example, it is assumed that at unit concentration the spectrally active component (which is the analyte of interest) exhibits the Gaussian absorbance spectrum illustrated in FIG. 1 with q=101 frequencies. Following Beer's Law for a single component system, when the concentration of the analyte of interest is x, the t.sup.th element (t=1, 2, . . . , q) of the spectrum is

y.sub.t =x.multidot.b.sub.t +.epsilon..sub.t, [Equation2]

where ##EQU1## and the .epsilon..sub.t are independent and identically distributed normal (.mu.=0, .sigma..sup.2 =0.01) measurement errors. Note that frequency 51 (t=51) has the largest signal and therefore the best signal-to-noise characteristics. In contrast, frequencies far away from the center of the frequency range contain virtually no signal and therefore have very poor signal-to-noise characteristics.

A small simulation study was conducted to evaluate the effect on the predictive ability of three full-spectrum calibration methods in conjunction with various subsets of the spectrum. The three full-spectrum methods are PLS, and two variations of a method based on the explicit Beer's Law model (Eqn. 2). PLS modeling was performed by using one latent variable and centering both concentration and spectral data. The two variations of the method based on the explicit model (denoted LS/LS and LS/ML) are differentiated by the methods used in the prediction phase. The model parameters, {b.sub.t }, are estimated similarly in the calibration phase for both variants of the method using least-squares regression and the explicit model given by Eqn. 2. Estimation of the analyte concentration of a new sample .chi. by using {b.sub.t } and its associated new spectrum (Y) is accomplished by using least-squares regression (LS/LS) and maximum likelihood estimation (LS/ML). Note that of the three calibration methods that were considered, only PLS is suitable for use in complex situations with unknown spectrally active components with overlapping spectral features. Six different subsets of the 101 frequencies were used for analysis. They are A.sub.1 ={49, 50, 51, 52, 53}, A.sub.2 ={46, 47, . . . , 56}, A.sub.3 ={41, 42, . . . , 61}, A.sub.4 ={31, 32, . . . , 71}, A.sub.5 ={21, 22, . . . , 81}, and A.sub.6 ={1, 2, . . . , 101}. Note that the frequency subsets are centered around frequency 51 and add increasing numbers of additional frequencies. A.sub.1 contains the five frequencies with the best signal-to-noise, while A.sub.6 contains all 101 frequencies, many of which contain very little useful signal.

For the simulation study fifty calibration sets were generated, each with five observations. The set of analyte concentrations corresponding to the five observations are {0.1, 0.3, 0.4, 0.5, 0.7}. For each observation, a spectrum was constructed based on Eqn. 2. Calibration models using each of the six sets of frequencies (A.sub.1, A.sub.2, . . . , A.sub.6) were constructed using PLS and least-squares regression. For each calibration set, fifty new spectra were generated based on Eqn. 2 with a fixed analyte concentration, .chi.. These spectra along with the constructed calibration models and prediction methods were then used to predict .chi.. This complete procedure was repeated three times with .chi..epsilon.{0.1,0.5,0.9} ##EQU2## where .chi..sub.i is the predicted value of .chi., for each of the various simulation conditions (defined by .chi., the calibration method, and frequency set) is set forth in Table 1.

TABLE I ______________________________________ RMSPE versus Frequency Subset Frequency Subset .chi. Method A.sub.1 A.sub.2 A.sub.3 A.sub.4 A.sub.5 A.sub.6 ______________________________________ .1 PLS .145 .118 .140 .176 .196 .235 LS/LS .114 .084 .073 .070 .069 .070 LS/ML .121 .090 .081 .083 .087 .099 .5 PLS .110 .087 .079 .081 .081 .088 LS/LS .131 .100 .097 .119 .144 .189 LS/ML .140 .109 .091 .092 .094 .107 .9 PLS .179 .193 .203 .268 .321 .375 LS/LS .150 .134 .126 .190 .234 .338 LS/ML .163 .128 .109 .107 .107 .117 ______________________________________

From Table 1, it is apparent that the effect of including frequencies that have poor S/N depends on both the calibration method and the value of .chi.. The ability to predict .chi., provided by LS/ML appears to be rather insensitive to the inclusion of irrelevant frequencies for all three values of .chi.. On the other hand, the prediction abilities of PLS and LS/LS are quite sensitive to the inclusion of irrelevant frequencies for two of the three values of .chi.. Except in the instance where .chi.=0.5, the performance of PLS degenerates significantly as more and more frequencies with poor S/N are included. PLS performance is insensitive to the inclusion of irrelevant frequencies when .chi.=0.5. This is due to the fact that, with a poor model, PLS-predictions tend to be biased toward the average value in the calibration set, which in this case is close to 0.5. Except in the case where .chi.0.1, the performance of LS/LS degenerates significantly as more and more frequencies with poor S/N are included. The performance of LS/LS is insensitive to using irrelevant frequencies when .chi.=1 because of the fact that use of LS in the prediction phase tends to produce predictions biased towards zero, which is relatively close to 0.1 (see Thomas 1991). From this simple study it is clear that use of irrelevant frequencies can seriously degrade the performance of full-spectrum calibration methods. Note that other full-spectrum methods, such as PCR, also exhibit behavior wherein if irrelevant wavelengths are included, predictions tend to be biased towards the average value of the analyte of interest in the calibration set.

As described above, PLS becomes increasingly sensitive to the inclusion of additional wavelengths when utilizing the algorithm to predict on analyte concentrations removed from the average. This fact of PLS analysis is extremely relevant in designing a noninvasive glucose monitor. In patients with and without diabetes the average glucose concentrations are approximately 150 and 100 mg/dL, respectively. Due to an inability to control their glucose levels, diabetic patients' glucose level can fall below 80 mg/dl. At this point the patient starts to become hypoglycemic. Hypoglycemia, especially below 50 mg/dl, is a very dangerous condition as the patient can experience "insulin shock" and become comatose. Diabetic patients fear hypoglycemia because they no longer function normally and are often unaware of the compromised state.

If all wavelengths are included, PLS predictions will tend to be biased toward the average value of glucose in the calibration set. In operation a patient may have a glucose concentration of 80 mg/dl but a monitor using all wavelengths might predict 100 mg/dl. Given this information, a diabetic patient may take no action to correct his/her glucose level, even though approaching a dangerously low level. If no action is undertaken, the patient may experience severe hypoglycemia and its possibly severe consequences. Thus, accurate readings at below average glucose concentrations are extremely important. The ability to select those wavelength subsets for use in the multivariate algorithm that maximize performance and minimize PLS' tendency to bias extreme results to the average is of importance in the design of a noninvasive home glucose monitor for use by the diabetic patient.

Despite the importance of wavelength selection as demonstrated above, effective methods for wavelength selection remain inadequate for complex situations requiring multivariate spectral analysis. This is especially true when Beer's law is not followed or when not all components are known. In conditions where Beer's Law is followed and all spectrally active components in the sample material (and associated spectra) are known, then except for measurement error, the q-vector of absorbancies for a single sample (y.sub.i) is given by y.sub.i =B.multidot.x.sub.i, where x.sub.i is the p-vector of concentrations, and B=(b.sub.1, b.sub.2, . . . , b.sub.p) contains the spectra of each of the p spectrally active components when each are at unit concentration. If B is known, there are various approaches for selecting wavelengths (see e.g., Kalivas, J. H., and Kalivas, J. H., "Evaluation of Experimental Designs for Multicomponent Determinations by Spectrophotometry," Analytica Chimica Acta, 207, 125-135 (1988)). These approaches are most often used in conjunction with inverse least-squares regression (see Haaland, D. M., and Thomas, E. V., "Partial Least-Squares Methods for Spectral Analyses. 1. Relation to Other Quantitative Calibration Methods and the Extraction of Qualitative Information," Analytical Chemistry, 60, 1193-1202 (1988)) and related procedures where it is possible to use only a limited number of spectral wavelengths. For example, suppose that is a subset of the q potential wavelengths containing q* elements. Let B.sub. represent the corresponding q*.times.p submatrix of B. Procedures have been proposed for searching for subsets of wavelengths that optimize some metric relating to sensitivity and/or selectivity of the frequency set to the analyte of interest, such as the condition number of B.sub. (see e.g., Juhl and Kalivas 1988). For example, Kalivas, J. H., Roberts, N., and Sutter, J. M., "Global Optimization by Simulated Annealing with Wavelength Selection for Ultraviolet-Visible Spectrophotometry," Analytical Chemistry, 61, 2024-2030 (1989), advocate using simulated annealing and Lucasius, C. B., and Kateman, G., "Genetic Algorithms for Large-Scale Optimization in Chemometrics: An Application," Trends in Analytical Chemistry, 10, 254-261 (1991) propose using genetic algorithms to search for wavelengths that minimize the condition number of B.sub. . All of these procedures assume that Beer's Law is followed and the spectra of all spectrally active components in the sample material are known. Unless Beer's Law is followed, the optimization metrics associated with these procedures do not necessarily relate directly to prediction performance. Thus, the usefulness of these methods in complex situations is limited.

Li, Tong-Hua, Lucasius, C. B., and Kateman, G., "Optimization of calibration data with the dynamic genetic algorithm," Analytica Chimica Acta, 268, 123-234 (1992) describe the use of genetic algorithms (GAs) to "optimize calibration data sets and enhance the predictive ability of a calibration model successfully". Additionally, Li, et al. state that "GAs should be tested on higher level problems." In summary, the article by Li, et al., teaches the use of genetic algorithms for wavelength selection utilizing the predicted error sum of squares (PRESS) as the fitness function. However, the article does not teach how to interpret the resulting data nor how to perform the wavelength selection process for development of optical instruments. No mention is made of minimizing instrument cost or how to specifically optimize instrument performance. Further, no mechanism or method is described for selection of those wavelengths or wavelength subsets that yield optimal results. Thus, the article is a general overview of genetic algorithms but does not teach a method or methodology for implementation in a practical, systematic manner. In addition to the foregoing, as Li, et al. was published in October 1992, applicants do not concede that it is prior art to them.

In complex situations, where Beer's Law does not provide a good approximation throughout the spectrum or not all spectrally active components are known, there are very few existing procedures for wavelength selection. Most procedures are associated with calibration methods (e.g., inverse least squares) that are capable of using relatively few wavelengths because of problems with collinearity of the spectral measurements. Stepwise (forward) regression is often used in conjunction with these calibration methods (e.g., see Hruschka (1987)). Although this procedure can utilize the synergy among wavelengths, it is often fraught with difficulties such as overfitting.

In the process of performing quantitative spectroscopy, there are two important terms to understand clearly. Those wavelengths that are "predictive" are those wavelengths that are useful in modeling the relationship between spectral information and analyte concentration. "Synergistic" wavelengths are those wavelengths that when used singularly have a given ability to model the relationship between spectral information and analyte concentration, but when used together have an enhanced capability of modeling the relationship.

Other procedures search for wavelengths which empirically exhibit good selectivity, sensitivity, and linearity for the analyte of interest over the training set. Consider the model y.sub.it =a.sub.t +x.sub.i .multidot.b.sub.t +f.sub.it +.epsilon..sub.it where x.sub.i is the analyte concentration of the i.sup.th sample in the calibration set, y.sub.it is the response of i.sup.th sample at the t.sup.th wavelength, a.sub.t and b.sub.t are parameters, f.sub.it represents contributions from other spectrally active components and/or deviations from Beer's Law due the presence of non-linearities, and .epsilon..sub.it is a random measurement error with a mean of zero and variance, .sigma..sub.t.sup.2. The object of these search procedures is to find wavelengths where b.sub.t is relatively large for a single analyte and f.sub.it and .epsilon..sub.it are relatively small for all samples in the training set. In conjunction with full-spectrum and limited-wavelength calibration methods, near-infrared spectroscopists often rely on correlation plots (e.g., see Hruschka 1987) to search for appropriate wavelengths for use. Based on the n samples in the calibration set, a correlation plot is a spectrum given by the set of univariate correlations, {R.sub.t }, between the x.sub.i 's and Y.sub.it 's. Wavelengths whose measurements exhibit a high degree of correlation with the amount/concentration, of the analyte of interest, measured by R.sub.t.sup.2, are selected. This technique does not account for synergy among the different wavelengths.

A related method was recently proposed by Brown et al. (1991) and Brown (1992). Rather than use R.sub.t.sup.2 as a measure for selecting wavelengths, Brown and his colleagues recommend selecting wavelengths associated with large values of ##EQU3## where b.sub.i is the simple least-squares estimate of b.sub.t and .sigma..sub.t.sup.2 is an estimate of .sigma..sub.t.sup.2. Again, this method does not account for synergy among measurements at different wavelengths. As set forth in the Description of the Preferred Embodiment, this synergy can be very desirable.

In order for the correlation plot or the method proposed by Brown and his colleagues to be useful, wavelengths specific to the analyte of interest with good signal-to-noise are needed. However, the required wavelength specificity is not always available. For example, in the near-infrared spectrum there is considerable overlap between spectral responses of many different chemical species which often appear together in biological specimens. Therefore, it is very doubtful whether these procedures have much utility when analyzing complex biological materials, such as human tissue in this or any other spectral region.

Because subject-matter knowledge is insufficient to select wavelengths and the search space of possible wavelength subsets is too large to be searched exhaustively (2.sup.q possible combinations of wavelength subsets, where q can be in the hundreds or thousands), some method is needed to determine which points in the search space should be sampled. We have determined that the use of genetic selection criteria, specifically genetic algorithms, form a class of techniques for carrying out this search. Genetic algorithms rely on the analogy between a bit string and a chromosome. Under this analogy, an initial population of bit strings (subsets of wavelengths) is generated randomly. The fitness of each member of the population is evaluated. The fitness values are used to eliminate weak individuals (subsets with low fitness) and replicate those with high fitness. Through interaction of this procedure, the genetic algorithm will eventually converge to wavelength subsets that have high fitness (meaning low cost/high performance).

The seminal work on genetic algorithms was provided by Holland, J. H., "Genetic Algorithms and the Optimal Allocations of Trials," SIAM Journal of Computing, 2, 88-105 (1973), and Adaptation in Natural and Artificial Systems, The University of Michigan Press: Ann Arbor (1975). Since then, there has been a great deal of activity in the area. Unlike traditional methods of optimization, genetic algorithms have been shown to work well over a broad range of difficult problems (e.g., see Davis, L. (editor), Handbook of Genetic Algorithms, Van Nostrand Reinhold (1991). Goldberg, D. E., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley (1989) provides a very readable introduction to genetic algorithms as well as applications which include problems in science, business, and engineering. In the area of chemometrics several authors have very recently proposed using genetic algorithms in a number of applications (see e.g. Li, et al. (1992), and Lucasius, et al., (1991)).

With regard to the use of genetic algorithms in the context of wavelength selection, first, suppose that there are q potential wavelengths to choose from when building a calibration model. The notion of a binary string, S, with dimension q, will be used to indicate the set of wavelengths that are used to build the model. This binary representation is key to using genetic algorithms for this problem. The biological analog of S is a chromosome. Each binary element of S, analogous to a gene, indicates whether its associated wavelength is or is not used for modeling. For example, if S={1, 1, 0, 1, 0, 0}, then wavelengths 1, 2, and 4 (of six) are used for modeling. With the additional specification of the method used to build the calibration model (e.g. PLS), S provides a straightforward index for model identification.

In order to search for sets of wavelengths (represented by binary strings) that yield good performance, it is necessary to specify a reasonable performance metric, denoted here by the term fitness (F). The fitness of a certain wavelength subset (represented by a binary string) is a single numerical measure of how well that subset meets these criteria. The likelihood that an individual binary string contributes to the next generation of binary strings is related directly to the fitness of that string. In the analogous context of Darwinian evolution, the likelihood that an individual will live (hence contribute genetic material to the next generation) is related directly to the fitness of that individual. For wavelength selection purposes, we will allow the fitness of each string to be various decreasing functions of the standard error of prediction (SEP) based on cross validation (see Stone, M., "Cross-Validatory Choice and Assessment of Statistical Predictions," Journal of the Royal Statistical Socieity, Series B, 36, 111-133 (1974)). Also note that, unlike the metric proposed by Lucasius and Kateman, 1991 (condition number), the SEP is a direct measure of performance. The SEP is, however, a very complicated non-linear function of S and can be obtained only through intensive computational means.

The inadequacies of the prior methods can be overcome by the use of a systematic search or optimization process based on genetic algorithms. Genetic algorithms differ from traditional methods of optimization in some important ways. While most traditional methods of optimization move from a single point in the search space to another, genetic algorithms move from a set of points to another set. Each successive set of points will be referred to as a generation, with each generation containing r binary strings. Unlike traditional methods of optimization that rely on deterministic transition rules to move throughout the search space, genetic algorithms use probabilistic transition rules embodied within a number of operators. The three operators that are common among the many variations of genetic algorithms are reproduction, crossover, and mutation.

The first step is to form the first generation of the S's, consisting of r q-dimensional binary strings and denoted by G.sup.1 ={S.sub.1.sup.1, S.sub.2.sup.1, . . . , S.sub.r.sup.1 }. Note that the effectiveness of the genetic algorithms depends on the diversity within G.sup.1. Therefore, pseudo-random number generators are often used to create this first generation of strings. Next, the fitness of each of the models specified by the r strings in G.sup.1 are obtained.

The next generation of bit strings, G.sup.2 ={S.sub.1.sup.2, S.sub.2.sup.2, . . . , S.sub.r.sup.2 }, is formed in three stages. First, r individual strings are selected from G.sup.1 with replacement, where the probability of selecting an individual string is proportional to its fitness. This process is referred to as reproduction. The reproduced strings are used as the basis for constructing the next generation. In this way, strings with a higher fitness values will have a higher probability of contributing to the next generation. Following reproduction, crossover proceeds in two steps. First, members of the newly reproduced strings are paired (mated)