WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method suitable for identifying a code sequence of a biomolecule    

Get related patents on CD
United States Patent5538898   
Link to this pagehttp://www.wikipatents.com/5538898.html
Inventor(s)Wickramasinghe; Hemantha K. (Chappaqua, NY); Zenhausern; Frederic (Mohegan Lake, NY)
AbstractA method suitable for identifying a code sequence of a biomolecule. The method comprises the steps of using a near-field probe technique for generating a super-resolution chemical analysis of at least a portion of the biomolecule; and, correlating the chemical analysis with a broad spectral content of a referent biomolecule for generating code sequencing.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Drawing from US Patent 5538898
Method suitable for identifying a code sequence of a biomolecule - US Patent 5538898 Drawing
Method suitable for identifying a code sequence of a biomolecule
Inventor     Wickramasinghe; Hemantha K. (Chappaqua, NY); Zenhausern; Frederic (Mohegan Lake, NY)
Owner/Assignee     International Business Machines Corporation (Armonk, NY)
Patent assignment
All assignments
Company News
Publication Date     July 23, 1996
Application Number     08/405,070
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     March 16, 1995
US Classification     436/94 422/82.01 422/82.05 422/82.08 422/82.12 436/164 436/177
Int'l Classification     G01N 033/48
Examiner     Alexander; Lyle A.
Assistant Examiner    
Attorney/Law Firm     Kaufman; Stephen C.
Address
Parent Case    
Priority Data    
USPTO Field of Search     436/94 436/164 436/177 436/156 422/50 422/63 422/65 422/82.01 422/82.05 422/82.08 422/82.12
Patent Tags     suitable identifying code sequence biomolecule
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5485536
Islam

Jan,1996

[0 after 0 votes]
5479024
Hillner
250/458.1
Dec,1995

[0 after 0 votes]
5362653
Carr
436/165
Nov,1994

[0 after 0 votes]
5319977
Quate
73/606
Jun,1994

[0 after 0 votes]
5272330
Betzig
250/216
Dec,1993

[0 after 0 votes]
5003815
Martin
73/105
Apr,1991

[0 after 0 votes]
4947034
Wickramasinghe
250/216
Aug,1990

[0 after 0 votes]
4941753
Wickramasinghe
374/120
Jul,1990

[0 after 0 votes]
4917462
Lewis
359/368
Apr,1990

[0 after 0 votes]
4747698
Wickramasinghe
374/6
May,1988

[0 after 0 votes]
4343993
Binnig
250/306
Aug,1982

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method suitable for identifying a code sequence of at least a portion of a biomolecule, the method comprising the steps of:

1) using a near-field probe technique for generating a super-resolution chemical analysis of the portion of a biomolecule below the diffraction limit of one half the wavelength of the probing radiation;

and

2) correlating the chemical analysis with a broad spectral content of a referent biomolecule for generating a code sequencing of the portion of the biomolecule.

2. A method according to claim 1, comprising a step of generating a code sequence for a portion of a biomolecule comprising DNA.

3. A method according to claim 1, comprising a step of generating a code sequence for a portion of a biomolecule comprising RNA.

4. A method according to claim 1, comprising a step of generating a code sequence for a portion of a biomolecule comprising a protein.

5. A method according to claim 1, comprising a step of generating a code sequence for a carbohydrate.

6. A method according to claim 1, wherein a portion of a biomolecule comprises purines and pyrimidines bases.

7. A method according to claim 1, comprising generating a code sequence for a portion of a biomolecule comprising a nucleic acid having at least 1000 bases/portion.

8. A method according to claim 7 comprising generating a code sequence that is endomorphic with the bases.

9. A method according to claim 7, comprising generating within less than 1 hour a code sequence comprising at least 1000 bases/portion.

10. A method according to claim 9, comprising generating within less than 1 hour a code sequence comprising at least 100'000 bases/portion.

11. A method according to claim 1, comprising generating a code sequence for a portion of a biomolecule comprising a carbohydrate having at least 1000 residue per portion.

12. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule at a resolution below the optical diffraction limit.

13. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule from a sub-nanometer resolution up to the diffraction limit.

14. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule by near-field acoustic microscopy.

15. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule by magnetic force microscopy.

16. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule by near-field optical microscopy.

17. A method according to claim 1, comprising a step of interrogating a portion of a biomolecule by near-field thermal probe microscopy.

18. A method according to claim 1, wherein the super-resolution chemical analysis comprises absorption spectroscopic information.

19. A method according to claim 1, wherein the super-resolution chemical analysis comprises identifying magnetic properties of the portion of the biomolecule.

20. A method according to claim 1, wherein the super-resolution chemical analysis comprises identifying thermal properties of the portion of the biomolecule.

21. A method according to claim 1, wherein the super-resolution chemical analysis comprises emission spectroscopic information.

22. A method according to claim 1, comprising separating a portion of a biomolecule by a sequencing reaction into independent sub-units uniquely identifiable by predetermined magnetic properties.

23. A method according to claim 1, comprising separating a portion of a biomolecule by using gel-electrophoresis.

24. A method according to claim 1, comprising generating a fast code sequencing comprising identifying at least 200 building blocks per hour.

25. A method according to claim 1, comprising a step of deriving the broad spectral content of the referent biomolecule from a portion of the biomolecule itself.

26. A method according to claim 1, comprising a step of deriving the broad spectral content of the referent biomolecule from a second independent biomolecule.

27. A method according to claim 1, comprising generating a code sequence for a portion of a biomolecule comprising a protein having at least 1000 amino acids per portion.

28. A method according to claim 2, comprising generating within less than 1 hour a code sequence for a portion of a biomolecule comprising a protein having at least 1000 amino acids per portion.

29. A method according to claim 1, comprising a step of separating a portion of a biomolecule by a sequencing reaction into independent sub-units uniquely identifiable by predetermined absorbant labels.

30. A method according to claim 29, wherein a label is fluorescent.

31. A method according to claim 1, comprising separating a portion of a biomolecule by using free-solution electrophoresis.

32. A method according to claim 31, comprising initial stretch-and- positioning of a portion of a biomolecule at a surface.

33. A method according to claim 32, comprising initial magnetic stretch-and-positioning of a portion of a biomolecule at an electrode surface.

34. A method according to claim 32, comprising initial electrostatic stretch-and-positioning of a portion of a biomolecule at an electrode surface.

35. A method according to claim 32, comprising initial electrostatic and magnetic stretch and positioning of a portion of a biomolecule at an electrode surface.

36. A method according to claim 32, comprising initial electromagnetic stretch-and-positioning of a portion of a biomolecule by optical forces at an electrode surface.

37. A method according to claim 36, comprising initial stretch-and-positioning of a portion of a biomolecule by viscous drag.

38. A method according to claim 32, comprising anchoring a portion of a biomolecule at a solid matrix.

39. A method according to claim 38, comprising end-labeling a portion of a biomolecule with large monodisperse labeling proteins or chemicals.

40. A method suitable for identifying a code sequence of at least a portion of a biomolecule, the method comprising the steps of:

1) using a super resolution apertureless near-field scanning probe for interrogating absorption properties characteristic of a portion of the biomolecule;

and

2) correlating the characteristic with a spectral content of a referent absorbant, for generating a code sequencing of the portion of the biomolecule.

41. A method suitable for identifying a code sequence of at least a portion of an arbitrary biomolecule, the method comprising the steps of:

1) generating a broad spectral content information base for a referent biomolecule;

2) using a near-field scanning probe technique for generating a super-resolution chemical analysis of a portion of the arbitrary biomolecule;

and

3) correlating the super-resolution chemical analysis for the arbitrary biomolecule with the broad spectral content information base of the referent biomolecule, for generating a code sequencing of the portion of the arbitrary biomolecule.

42. A method according to claim 41, wherein step (1) comprises generating a broad spectral content information base for a referent biomolecule by using a far-field detector probe.

43. A method according to claim 41, comprising:

1) using a near-field scanning probe for generating an absorption information base for a referent biomolecule;

2) using a near-field scanning probe for generating a super-resolution chemical analysis comprising a thermal information base for the portion of the arbitrary biomolecule;

and

3) correlating the absorption information base and the thermal information base as a measure of a code sequencing of the portion of the arbitrary biomolecule.
 Description Submit all comments and votes
 


This application is related to application Ser. No. 08/405,476 filed Mar. 16, 1995 by H. K. Wickramasinghe and F. Zenhausern (YO995-058) and to application Ser. No. 08/405,481 filed Mar. 16, 1995 by F. Zenhausern and H. K. Wickramasinghe (YO995-061)and to application Ser. No. 08/405,068 filed Mar. 16, 1995 by F. Zenhausern and H. K. Wickramasinghe (YO995-065), which applications are being filed contemporaneously with this application. The entire disclosures of those applications, all of which are copending and commonly assigned, are incorporated by reference herein.

FIELD OF THE INVENTION

This invention relates to a method suitable for identifying a code sequence of at least a portion of a biomolecule.

BACKGROUND OF THE INVENTION

Four classes of biological molecules are known, namely, those comprising proteins, lipids, carbohydrates and nucleic acids. Nucleic acids, in turn, comprise two subsumed classes: DNA which is a genetic component of all cells, and RNA which usually functions in a synthesis of proteins.

The purview of the present invention extends to biomolecules, generally, but a working point for the sake of pedagogy is now established by referencing biomolecules comprising DNA. DNA is emphasized because it is the prime genetic molecule, carrying all hereditary information within chromosomes.

DNA stands for deoxyribonucleic acid. The DNA of most cells resides in a cell's nucleus. Its structure comprises long chains of relatively simple molecules called nucleotides. Each nucleotide comprises three parts: (1) a phosphate group stripped of one special oxygen atom; (2) a sugar called "ribose"; and (3) a base. It is the base alone which distinguishes one nucleotide from another--thus it suffices to specify a base to identify a nucleotide. The four types of bases which occur in DNA nucleotides are adenine (A); guanine (G), cytosine (C) and thymine (T).

A single strand of DNA comprises many nucleotides strung together like a chain of beads. DNA usually comes in double strands, that is, two single strands which are paired up, nucleotide by nucleotide, in the form of the well known DNA double helix.

DNA carries a vast array of information through its nucleotide sequence. Accordingly, the order of nucleotides (considered as a linear progression e.g., "A T T C G G A C C . . . ") is highly varied. A nucleotide sequence may comprise inter alia a single nucleotide, a duplet (adjacent pairs of bases), a codon (three consecutive bases), a gene (a portion of a strand which codes liar a single enzyme), a strand of arbitrary nucleotides, or a genome comprising a total set of DNA molecules for an organism (e.g., 3.times.10.sup.9 nucleotides for a human cell).

SUMMARY OF THE INVENTION

Our work relates to a novel approach and method for biomolecular code sequencing. We proceed from the following considerations.

First, we set forth why it is significant and of great utility to have a biomolecular code sequencing capability. This effort, secondly, can help elicit problems, difficulties and constraints in an attempt to realize and effect such a capability. Thirdly, we situate what is of pertinence with respect to the prior art as it relates to this situation. Finally, we define the novel method of the present invention, and argue that it addresses and solves the problems to be overcome in realizing a qualitatively new method comprising biomolecular code sequencing. Furthermore, we set the novel method in a position to the prior art, thereby highlighting its novel and unobvious aspects as well as attesting to its advantages.

Accordingly, we assume firstly that one somehow has nucleotide sequencing information, and that this information may be accessed by conventional computer techniques. Then, once in the computer, nucleotide sequences can be scanned (at least theoretically, in some cases) inter alia for RNA synthesis, a presence of inverted palindromes, preferred segments of potential Z--DNA (alternating purine and pyrimidine stretches), homologies to other known DNA sequences, mutation detection, genotyping, genetic database comparing, or large-scale supersequencing specifying a human genome by way or its component nucleotides and their location with respect to the entire genome.

It is believed that this recital makes self-evident the significance and utility of a biomolecular code sequencing capability. At the same time, it provokes outstanding difficulties, problems and constraints implicit in an hypothesized method for effecting such a sequencing capability. For example, a genome comprises approximately 10.sup.9 nucleotides and has an average length of approximately 0.6 m, and a single nucleotide has an average length of approximately 1 to two angstroms. A candidate methodology must at least, therefore, somehow be able to resolve one nucleotide from an adjacent nucleotide, presumably without damage to the nucleotide, and resolve significant numbers of such nucleotides with precision and accuracy and within a meaningful time span.

Two important and representative prior art methodologies that are pertinent to this situation comprise separation techniques including gel electrophoresis and free-solution electrophoresis.

Gel electrophoresis requires a physical separation of DNA fragments produced during a sequencing reaction. Instruction on conventional gel electrophoresis may be found in (1) J. Sambrook, E. F. Fritsch, T. Maniantis, "Molecular Cloning: A Laboratory Manual" (Cold Spring Harbor Laboratory, N.Y. 1989), (2) A. T. Bankier and B. G. Barrel, "Nucleic Acids Sequencing: A Practical Approach", Eds. E. M. Howe, C. I. Rowlings, IRL Press, Oxford 1989, pp. 37-73, which instruction is incorporated by reference herein.

In overview, gel electrophoresis methodology typically comprises the steps of: (1) fragmenting a DNA strand to be sequenced into a series starting from the same point on the strand, each figment different in length to the other by one nucleotide; (2) labelling each fragment with e.g., fluorescent tags which can fluoresce at different colours depending on the end base (A,T, C or G); (3) doing gel electrophoresis for sequentially separating the fragments into bands of decreasing molecular size; and (4) using a suitable detection means for determining the end label of each band.

To this end, present gel electrophoresis methodology relics on a dispersion in the mobility of the DNA molecules with length to separate and effect bands in an electric field. Gel electrophoresis methodology, as it is presently understood, accordingly, is therefore disadvantageously limited to approximately 700 bases (nucleotides) because there is a saturation in the dispersion for molecular lengths longer than 700 nucleotides. Further, due to the low dispersion and mobility, it takes several hours to achieve the separation of 700 nucleotides. It is true that this speed can be marginally increased by having several lanes/up to say 36 sequencing different portions of a strand.

An important advantage of the present invention is that, notwithstanding the present difficulties or deficiencies of gel electrophoresis, as just noted, it is able to offset or remedy these limitations, so that as modified or re-evaluated from the standpoint of the present invention, gel electrophoresis can provide a significantly enhanced utility. This advantage comes about in the following way.

The present invention includes a method which can resolve at least a portion of a biomolecule specifically distinguishable against chemically complex backgrounds. In one embodiment, the present invention can be used for determining a code sequence of large duplex DNA molecules in polyacrylamide gels using conventional electrophoretic equipment.

In explanation of this advantage, we note that a critical parameter that may limit the performance of present gel-based techniques is a band-broadening of DNA sequencing reactions, as they are separated through a fixed distance of gel at continuous field strengths, often ranging from 50-400 V/cm. The size-dependence of band widths may be a result of various mechanisms of reorientation and migration of the nucleic acid fragments in the gel, such as diffusion and thermal gradient broadenings.

Now, when a sample biomolecule migrates through a polymer solution chemically cross-linked, such as polyacrylamide or agarose gels, an overall friction coefficient can become a complicated function of the pore size in the gel, the size of the sample and the electric field strength, thereby limiting resolution.

Several approaches based upon the use of capillaries or pulsed fields can partially overcome this limit of resolution (C. R. Cantor et. al., Pulsed-Field gel electrophoresis of very large-DNA molecules Annual Review of Biophysics and Biophysical Chemistry, vol. 17, 287, 1988).

A spatial resolution of the detection system may also be a source of band broadening, relying on the fact that a detector does not interrogate an infinitely thin section of the sample as it reaches a finite detection volume, thereby precluding single nucleotide resolution. Present confocal-fluorescence microscopes typically provide a fat field detection system to interrogate either capillaries or slab gels with a limiting sensitivity, defined as a signal-to-noise ratio of 1, or about 10.sup.-17 mole of fluorescently labeled DNA per band and a spatial resolution ranging from 10 um (Smith L. M., et al., Nature, vol. 321, 12 June, 1986). Based upon several theoretical approaches of band broadening in sequencing analysis by gel electrophoresis (Y. F. Chen et al., Anal Chem., 62, 496-503, 1990), a theoretical peak width of a band may be determined to be a complex function of starting conditions (i.e., injection time and volume), detection (spot size of the focused laser beam), diffusion and thermal gradient variances.

Now, starting conditions begin with an injection process.

During an injection process, which comprises loading biomolecules in the gel, the biomolecules are not stacked by moving boundaries of buffet conditions, and the biomoleculcs therefore enter the gel at different rates corresponding to their electrophoretic velocity in the gel, thereby contributing to the net effect on the band width variance. Subsequent detection of the biomolecule may comprise using a focused laser with a Gaussian beam profile. For this situation, a standard deviation of the beam profile can be estimated to be equal to one-half the beam spot. This yields a detection variance of the form .sigma..sup.2 =w.sup.2 /4, where w is the spot size. In most conventional equipment, lenses or fiber optics may be used to focus the laser on the slab gel or filled gel capillary vessel, but due to an orthogonal direction of the excitation radiation with the emitted radiation, the numerical aperture of the lens of the optical detection system may therefore be limited to about 0.20-0.75. For example, several collinear arrangements for on-column detection in capillary electrophoresis have been reported using narrower capillaries and higher numerical aperture, permitting more fluorescence to be collected, thereby contributing to sensitivity improvement.

In preparation for gel electrophoresis, a sample is loaded in each lane of a slab gel in a well of typically 0.4 mm.times.6 mm, or 2.4 mm.sup.2, whilst for example in a 50 um capillary, the surface area of the top of the gel is one thousandth of that in the slab gel, corresponding to about 10.sup.-7 mole of sample in a given band. Accordingly, loading conditions not taking advantage of sample stacking and optical diffraction threshold of detection system may be significant sources of band broadening, affecting resolution.

In sharp contrast, the procedures and embodiments of the present invention define innovative approaches to overcoming the above limitations by employing, in a specific embodiment, a mechanism that can focus sample bands to the sample dimensions, at least 0.1 micron, and a near-field detection system that permits spatial resolution beyond the diffraction limit, thereby extending the limit of concentration detection to at least the mass of a single molecule.

One way to increase conventional gel electrophoresis low mobility is to use free-solution electrophoresis. Here, there is no dispersion in mobility with molecular length (M bases). This is clue to the fact that mobility (velocity divided by electric field) is equivalent to electric charge divided by friction coefficient, and both electric charge and friction coefficient scale linearly with molecular length, M. In Mayer et al (Anal. Chem. 1994, 66, 1777-1780), there is a proposal for attaching a large molecule at the end of each fragment in order to add a constant friction contribution to each. In this way, mobility is no longer independent of the number of bases. Theoretical calculations based on this reference suggest that dispersion can allow one to separate 3000 nucleotides in five minutes, in a best case comprising a far field detection limit.

Finally, we reference in passing proposed advanced technologies comprising large-scale automated DNA sequencing methodologies, namely, applying mass spectrometry to fast sequencing DNA, or sequencing by hybridization. See references 1) R. J. Lewis et al, J. AM. Chem. Soc., 113, 9665, 1991 and 2) R. Drmanac et al, "Sequencing of Magabase Plus DNA by Hybridization: Theory of the Method" in Genomics, vol. 4, pp. 114-118 (1989), respectively.

We have now discovered an approach to biomolecular code sequencing which is qualitatively distinct from the prior art. This different approach is manifest in a novel method suitable for identifying a code sequence of at least a portion of a biomolecule, the method comprising the steps of:

1) using a near-field probe technique for generating a super-resolution chemical analysis of the portion of a biomolecule; and

2) correlating the chemical analysis with a broad spectral content of a referent biomolecule for generating a code sequencing.

The present invention as defined can realize several significant advantages.

First of all, the novel method has an immanent capability for generating nucleotide sequencing information of such a quality, quantity and time-responsiveness, that heretofore even merely theorized applications requiring such information can now become a straightforward reality. For example, the method can be employed for developing a map that accurately reflects both individual nucleotide identification (i.e., A, G, C and T) and the location of an individual nucleotide with respect to a strand of arbitrary length, including an entire genome.

In this sense, moreover, the novel method can evince a remarkable versatility, since it may be selectively and variously employed e.g., in dependent steps, for:

1) identifying a first nucleotide from a second(adjacent) nucleotide;

or

2) locating with respect to an arbitrary strand or to a genome, a location of an identified nucleotide;

or

3) identifying a first duplet, codon, gene from a second (adjacent) duplet, codon, gene;

or

4) locating with respect to an arbitrary strand or to a genome, a location of an identified duplet, codon, gene.

To this end, the novel method has a capability for generating a fast and/or high throughput code sequence e.g., comprising at least 1000 bases/portion of biomolecule, preferably at least 100 kilobases bases/portion of biomolecule within less than 1 hour, particularly an entire human genome within less than one day, for example, 3 kilobases in less than 5 minutes.

Other advantages of the novel method proceed from the following considerations. An application of the method can generate, for the first time, nucleotide information of a quality and quantity sui generis. This information, in turn, can become a centerpiece for new and efficient approaches to gene testing or drug design, DNA sequence homology or biomolecular computing.

Other advantages of the novel method are enumerated below.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated in the accompanying drawing, in which:

FIG. 1 shows an assembly suitable for identifying a code sequence of at least a portion of a biomolecule, and of utility in realizing the novel method;

FIG. 2 shows a near-field scanning probe comprising an apertureless near-field optical microscope;

FIG. 3 provides a schematic for explaining basic concepts about the FIG. 2 apertureless microscope;

FIG. 4 shows a chemical modification of a biomolecule into a sample preliminary to its interrogation by a near-field scanning probe;

FIG. 5 shows spectroscopic curves for DNA nucleotides;

FIG. 6 shows a spectroscopic curve for an arbitrary biomolecule;

FIG. 7 shows a correlogram based on FIGS. 5,6;

FIG. 8 shows an assembly employed to realize the invention in a free-solution embodiment;

FIG. 9 shows a mathematical relationship of biomolecular diffusion actions in a FIG. 8 context;

FIG. 10 shows an assembly employed to realize the invention in a gel embodiment;

and

FIGS. 11-13 show further embodiments and details of systems and assemblies constructed in accordance with the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the interests of clarity, the following detailed description of the invention includes sections which are chiefly or exclusively concerned with a particular part of the invention. It is to be understood, however, that the relationship between different parts of the invention is of significant importance, and the following detailed description should be read in the light of that understanding. It should also be understood that, where features of the invention are described in the context of particular Figures of the drawing, the same description can also be applied to the invention in general and to the other Figures, insofar as the context permits.

Section one sets forth sundry definitions and examples of words, phrases or concepts that may be abstracted from the summarized invention, or may be used to reference preferred embodiments of the invention. Section two provides a conceptual overview of the present invention with special emphasis on that aspect of the present invention which comprises coupling a near-field scanning probe technique with interrogation of a biomolecule. Section three discloses in overview an assembly that may be preferably used to realize the present invention. In a fourth section, we disclose particulars of a preferred near-field probe included in the section three assembly, while in a fifth section entitled "Chemistry", we disclose preferred techniques for preparing a biomolecule for sequencing. Section six, entitled "Correlation", builds on the previous sections, and discloses how the invention can correlate a chemical analysis of an arbitrary biomolecule with spectroscopic data of a known such biomolecule. Sections seven and eight are dedicated respectively to preferred realizations of the present method in free-solution and gel. Section 9, finally, builds on the previous sections and discloses further assembly and system details.

I. Definitions

1. "a code sequence": In reference to a biomolecule, a code sequence means the order of the basic building blocks of a macromolecule or equivalent chemical compound, for example, amino acids for peptides, nucleotides for nucleic acids or a sugar residue for carbohydrates. A code sequence may comprise a map that is 1 to 1 congruent with a portion of a biomolecule i.e., endomorphic, or alternatively, may be isomorphic with respect to the portion. To illustrate this point: assume that an arbitrary nucleotide string comprises AAGCATATCG. Then, an endomorphic code sequence consists of AAGCATATCG, while an isomorphic code sequence may comprise alternative nucleotides i.e., ACTTG.

2. "a portion of a biomolecule": A biomolecule comprises polymeric macromolecules. The present method may be used to interrogate the code sequence of an entire macromolecule, or at least a preselected portion of a macromolecule. For example, the method may be used to interrogate the code sequence of a fragment of DNA.

3. "electrophoresis" comprises a separation of molecules on the basis of their net electrical charge. For purposes of the present invention, electrophoresis may be carried out, e.g., in a gel or preferably in a free-solution.

4. "near-field probe techniques": near-field probe techniques can provide a measurement modality capable of resolution of a sample beyond the diffraction limit and capable of atomic resolution imaging. In brief, the technique may comprise placing a subwavelength-sized probe within tens of nanometers of the sample: Travelling over such short distances, radiation has no opportunity to diffract and take on its asymptotic far-field characteristics--hence the name "near-field". Note that a suitable probe may comprise a sharp metallic tip or an uncoated silicon and/or silicon nitrate tip, or a tip coated with a conductive layer or a molecular system. A near-field probe capability may be realized by e.g., a scanning tunneling microscope (STM), an atomic force microscope (AFM), an aperture or apertureless near-field optical microscope, a near-field acoustic microscope, a thermal microscope or a magnetic force microscope (MFM). The notion of "scanning" references the fact that probe and biomolecule may be in relative motion. Reference may be made for example to U.S. Pat. Nos. 5,319,977; 4,343,993; 5,003,815; 4,941,753; 4,947,034; 4,747,698 and Appl. Phys. Lett. 65(13), Sep. 26, 1994. The disclosures of each of these patents and publications are incorporated herein by reference.

5. "super-resolution chemical analysis" comprises a recognition of a chemical species e.g., at least a portion of a biomolecule, by analyzing a molecular specificity of its spectra or pints thereof, preferably by using spatially resolved spectroscopy with physical methods, for example, near-field microscopic techniques.

6. "broad spectral content of a biomolecule" means a characterization of a spectra e.g., absorption or emission or thermal or magnetic properties of a pre-defined analyte when it is preferably interrogated by a tuned excitation radiation source with a frequency specific to an analyte being monitored, ranging from x-ray, UV, visible, IR or microwave of the spectrum.

II. Conceptual Overview of Present Invention

As alluded to above, the present invention comprises coupling a near-field probe technique with interrogation of at least a portion of a biomolecule to an end of generating a super-resolution chemical analysis of a portion of the biomolecule under interrogation, and correlating the chemical analysis with a broad spectral content of a referent biomolecule for generating a precise code sequencing.

If even theoretically contemplated, it is not technically known or obvious outside of the present instruction, how one may effect the required coupling. Restated, the desired result i.e., the precise code sequencing, cannot in fact be effected by some sort of nominal juxtaposition of a near-field probe and a biomolecule. (cf. imaging). We note that the reason for this is that a putative such attempt simply results in a blurred and information-less output signal.

The present invention addresses and solves this problem by way of preferred novel assemblies suitable for identifying a code sequence of at least a portion of a biomolecule. Various preferred embodiments of these assemblies are disclosed below.

III. Overview of Physical Components of Invention As An Assembly

Attention is now directed to FIG. 1, which shows a schematic overview 10 of physical components that preferably may be assembled in realization of the present invention, in particular, for distinguishing a biomolecule 12 against a chemically complex background solution 14.

The biomolecule 12 can migrate beneath an interrogating and preferably movable i.e., scanning (see arrows) near-field probe 16. Note that FIG. 1 shows one such near-field probe. However, expediencies of interrogation may be realized by suitably ganging a plurality of near-field probes. Note that the near-field probe 16 can function as an excitation source, or alternatively, an external excitation source (see FIGS. 8, 10, 12, 13 infra) can be used.

A resultant interrogation signal 18 from the near-field probe 16 may be detected by a detector 20, comprising, For example, a conventional spectrometer e.g., an interferometric system. The detector 20 can generate a detection signal 22 for storage and processing on a computer 24. For example, an IBM RS 6000 may be programmed for interpreting a sequence of building blocks of a biomolecule comprising amino acids in a case of proteins, or nucleotides in a case of nucleic acids.

Note in FIG. 1 that the biomolecule 12 is initially loaded in a container 26 comprising the solution 14, preferably using a stretching procedure comprising external radiation such as a magnetic field 28, and a specific positioning of the biomolecule 12 to a support 30. This arrangement can facilitate an efficient immobilization and stretching of the biomolecule 12, for example, before and during the migration rate, by way of an applied electric field generated by a power source 32. These points are amplified below, in section V entitled "chemistry".

IV. Preferred Near-Field Probe and Detection

FIG. 1 indicates the employment of a near-field probe 16 and an excitation source and a detector 20. Further information on preferred such devices is now set forth by way of FIGS. 2,3.

A preferred apertureless near-field scanning probe microscope and detector 34 are shown respectively in FIGS. 2, 3. The apertureless near-field scanning microscope is preferred because, among other reasons, its capability of measuring absorption properties of a sample can be extended to a spatial resolution in the sub-nanometer regime, thereby realizing single nucleotide resolution (cf. nucleotide length .alpha.1 to 2 angstroms). We note that aperture based systems can also be used, at lower resolution e.g., approximately .lambda./40. In particular, the FIGS. 2, 3 microscope comprises an apertureless near-field optical microscope wherein a light source preferably emits spherical light scattering from a sharp tip, rather than light transmitted through a fine aperture.

An understanding of the operation of the .FIGS. 2, 3 apertureless microscope 34 is now provided by first summarizing :its mechanical-physical components, and then disclosing its theory of operation.

The microscope preferably includes a high numerical aperture Nomarski objective 36 (e.g., liquid immersion objective)that may be used to form two diffraction limited spots at the far surface of a transparent substrate e.g., a glass cover slip 38. A sharp silicon tip 40 of an A