|
Description  |
|
|
This application is related to application Ser. No. 08/405,476 filed Mar.
16, 1995 by H. K. Wickramasinghe and F. Zenhausern (YO995-058) and to
application Ser. No. 08/405,481 filed Mar. 16, 1995 by F. Zenhausern and
H. K. Wickramasinghe (YO995-061)and to application Ser. No. 08/405,068
filed Mar. 16, 1995 by F. Zenhausern and H. K. Wickramasinghe (YO995-065),
which applications are being filed contemporaneously with this
application. The entire disclosures of those applications, all of which
are copending and commonly assigned, are incorporated by reference herein.
FIELD OF THE INVENTION
This invention relates to a method suitable for identifying a code sequence
of at least a portion of a biomolecule.
BACKGROUND OF THE INVENTION
Four classes of biological molecules are known, namely, those comprising
proteins, lipids, carbohydrates and nucleic acids. Nucleic acids, in turn,
comprise two subsumed classes: DNA which is a genetic component of all
cells, and RNA which usually functions in a synthesis of proteins.
The purview of the present invention extends to biomolecules, generally,
but a working point for the sake of pedagogy is now established by
referencing biomolecules comprising DNA. DNA is emphasized because it is
the prime genetic molecule, carrying all hereditary information within
chromosomes.
DNA stands for deoxyribonucleic acid. The DNA of most cells resides in a
cell's nucleus. Its structure comprises long chains of relatively simple
molecules called nucleotides. Each nucleotide comprises three parts: (1) a
phosphate group stripped of one special oxygen atom; (2) a sugar called
"ribose"; and (3) a base. It is the base alone which distinguishes one
nucleotide from another--thus it suffices to specify a base to identify a
nucleotide. The four types of bases which occur in DNA nucleotides are
adenine (A); guanine (G), cytosine (C) and thymine (T).
A single strand of DNA comprises many nucleotides strung together like a
chain of beads. DNA usually comes in double strands, that is, two single
strands which are paired up, nucleotide by nucleotide, in the form of the
well known DNA double helix.
DNA carries a vast array of information through its nucleotide sequence.
Accordingly, the order of nucleotides (considered as a linear progression
e.g., "A T T C G G A C C . . . ") is highly varied. A nucleotide sequence
may comprise inter alia a single nucleotide, a duplet (adjacent pairs of
bases), a codon (three consecutive bases), a gene (a portion of a strand
which codes liar a single enzyme), a strand of arbitrary nucleotides, or a
genome comprising a total set of DNA molecules for an organism (e.g.,
3.times.10.sup.9 nucleotides for a human cell).
SUMMARY OF THE INVENTION
Our work relates to a novel approach and method for biomolecular code
sequencing. We proceed from the following considerations.
First, we set forth why it is significant and of great utility to have a
biomolecular code sequencing capability. This effort, secondly, can help
elicit problems, difficulties and constraints in an attempt to realize and
effect such a capability. Thirdly, we situate what is of pertinence with
respect to the prior art as it relates to this situation. Finally, we
define the novel method of the present invention, and argue that it
addresses and solves the problems to be overcome in realizing a
qualitatively new method comprising biomolecular code sequencing.
Furthermore, we set the novel method in a position to the prior art,
thereby highlighting its novel and unobvious aspects as well as attesting
to its advantages.
Accordingly, we assume firstly that one somehow has nucleotide sequencing
information, and that this information may be accessed by conventional
computer techniques. Then, once in the computer, nucleotide sequences can
be scanned (at least theoretically, in some cases) inter alia for RNA
synthesis, a presence of inverted palindromes, preferred segments of
potential Z--DNA (alternating purine and pyrimidine stretches), homologies
to other known DNA sequences, mutation detection, genotyping, genetic
database comparing, or large-scale supersequencing specifying a human
genome by way or its component nucleotides and their location with respect
to the entire genome.
It is believed that this recital makes self-evident the significance and
utility of a biomolecular code sequencing capability. At the same time, it
provokes outstanding difficulties, problems and constraints implicit in an
hypothesized method for effecting such a sequencing capability. For
example, a genome comprises approximately 10.sup.9 nucleotides and has an
average length of approximately 0.6 m, and a single nucleotide has an
average length of approximately 1 to two angstroms. A candidate
methodology must at least, therefore, somehow be able to resolve one
nucleotide from an adjacent nucleotide, presumably without damage to the
nucleotide, and resolve significant numbers of such nucleotides with
precision and accuracy and within a meaningful time span.
Two important and representative prior art methodologies that are pertinent
to this situation comprise separation techniques including gel
electrophoresis and free-solution electrophoresis.
Gel electrophoresis requires a physical separation of DNA fragments
produced during a sequencing reaction. Instruction on conventional gel
electrophoresis may be found in (1) J. Sambrook, E. F. Fritsch, T.
Maniantis, "Molecular Cloning: A Laboratory Manual" (Cold Spring Harbor
Laboratory, N.Y. 1989), (2) A. T. Bankier and B. G. Barrel, "Nucleic Acids
Sequencing: A Practical Approach", Eds. E. M. Howe, C. I. Rowlings, IRL
Press, Oxford 1989, pp. 37-73, which instruction is incorporated by
reference herein.
In overview, gel electrophoresis methodology typically comprises the steps
of: (1) fragmenting a DNA strand to be sequenced into a series starting
from the same point on the strand, each figment different in length to the
other by one nucleotide; (2) labelling each fragment with e.g.,
fluorescent tags which can fluoresce at different colours depending on the
end base (A,T, C or G); (3) doing gel electrophoresis for sequentially
separating the fragments into bands of decreasing molecular size; and (4)
using a suitable detection means for determining the end label of each
band.
To this end, present gel electrophoresis methodology relics on a dispersion
in the mobility of the DNA molecules with length to separate and effect
bands in an electric field. Gel electrophoresis methodology, as it is
presently understood, accordingly, is therefore disadvantageously limited
to approximately 700 bases (nucleotides) because there is a saturation in
the dispersion for molecular lengths longer than 700 nucleotides. Further,
due to the low dispersion and mobility, it takes several hours to achieve
the separation of 700 nucleotides. It is true that this speed can be
marginally increased by having several lanes/up to say 36 sequencing
different portions of a strand.
An important advantage of the present invention is that, notwithstanding
the present difficulties or deficiencies of gel electrophoresis, as just
noted, it is able to offset or remedy these limitations, so that as
modified or re-evaluated from the standpoint of the present invention, gel
electrophoresis can provide a significantly enhanced utility. This
advantage comes about in the following way.
The present invention includes a method which can resolve at least a
portion of a biomolecule specifically distinguishable against chemically
complex backgrounds. In one embodiment, the present invention can be used
for determining a code sequence of large duplex DNA molecules in
polyacrylamide gels using conventional electrophoretic equipment.
In explanation of this advantage, we note that a critical parameter that
may limit the performance of present gel-based techniques is a
band-broadening of DNA sequencing reactions, as they are separated through
a fixed distance of gel at continuous field strengths, often ranging from
50-400 V/cm. The size-dependence of band widths may be a result of various
mechanisms of reorientation and migration of the nucleic acid fragments in
the gel, such as diffusion and thermal gradient broadenings.
Now, when a sample biomolecule migrates through a polymer solution
chemically cross-linked, such as polyacrylamide or agarose gels, an
overall friction coefficient can become a complicated function of the pore
size in the gel, the size of the sample and the electric field strength,
thereby limiting resolution.
Several approaches based upon the use of capillaries or pulsed fields can
partially overcome this limit of resolution (C. R. Cantor et. al.,
Pulsed-Field gel electrophoresis of very large-DNA molecules Annual Review
of Biophysics and Biophysical Chemistry, vol. 17, 287, 1988).
A spatial resolution of the detection system may also be a source of band
broadening, relying on the fact that a detector does not interrogate an
infinitely thin section of the sample as it reaches a finite detection
volume, thereby precluding single nucleotide resolution. Present
confocal-fluorescence microscopes typically provide a fat field detection
system to interrogate either capillaries or slab gels with a limiting
sensitivity, defined as a signal-to-noise ratio of 1, or about 10.sup.-17
mole of fluorescently labeled DNA per band and a spatial resolution
ranging from 10 um (Smith L. M., et al., Nature, vol. 321, 12 June, 1986).
Based upon several theoretical approaches of band broadening in sequencing
analysis by gel electrophoresis (Y. F. Chen et al., Anal Chem., 62,
496-503, 1990), a theoretical peak width of a band may be determined to be
a complex function of starting conditions (i.e., injection time and
volume), detection (spot size of the focused laser beam), diffusion and
thermal gradient variances.
Now, starting conditions begin with an injection process.
During an injection process, which comprises loading biomolecules in the
gel, the biomolecules are not stacked by moving boundaries of buffet
conditions, and the biomoleculcs therefore enter the gel at different
rates corresponding to their electrophoretic velocity in the gel, thereby
contributing to the net effect on the band width variance. Subsequent
detection of the biomolecule may comprise using a focused laser with a
Gaussian beam profile. For this situation, a standard deviation of the
beam profile can be estimated to be equal to one-half the beam spot. This
yields a detection variance of the form .sigma..sup.2 =w.sup.2 /4, where w
is the spot size. In most conventional equipment, lenses or fiber optics
may be used to focus the laser on the slab gel or filled gel capillary
vessel, but due to an orthogonal direction of the excitation radiation
with the emitted radiation, the numerical aperture of the lens of the
optical detection system may therefore be limited to about 0.20-0.75. For
example, several collinear arrangements for on-column detection in
capillary electrophoresis have been reported using narrower capillaries
and higher numerical aperture, permitting more fluorescence to be
collected, thereby contributing to sensitivity improvement.
In preparation for gel electrophoresis, a sample is loaded in each lane of
a slab gel in a well of typically 0.4 mm.times.6 mm, or 2.4 mm.sup.2,
whilst for example in a 50 um capillary, the surface area of the top of
the gel is one thousandth of that in the slab gel, corresponding to about
10.sup.-7 mole of sample in a given band. Accordingly, loading conditions
not taking advantage of sample stacking and optical diffraction threshold
of detection system may be significant sources of band broadening,
affecting resolution.
In sharp contrast, the procedures and embodiments of the present invention
define innovative approaches to overcoming the above limitations by
employing, in a specific embodiment, a mechanism that can focus sample
bands to the sample dimensions, at least 0.1 micron, and a near-field
detection system that permits spatial resolution beyond the diffraction
limit, thereby extending the limit of concentration detection to at least
the mass of a single molecule.
One way to increase conventional gel electrophoresis low mobility is to use
free-solution electrophoresis. Here, there is no dispersion in mobility
with molecular length (M bases). This is clue to the fact that mobility
(velocity divided by electric field) is equivalent to electric charge
divided by friction coefficient, and both electric charge and friction
coefficient scale linearly with molecular length, M. In Mayer et al (Anal.
Chem. 1994, 66, 1777-1780), there is a proposal for attaching a large
molecule at the end of each fragment in order to add a constant friction
contribution to each. In this way, mobility is no longer independent of
the number of bases. Theoretical calculations based on this reference
suggest that dispersion can allow one to separate 3000 nucleotides in five
minutes, in a best case comprising a far field detection limit.
Finally, we reference in passing proposed advanced technologies comprising
large-scale automated DNA sequencing methodologies, namely, applying mass
spectrometry to fast sequencing DNA, or sequencing by hybridization. See
references 1) R. J. Lewis et al, J. AM. Chem. Soc., 113, 9665, 1991 and 2)
R. Drmanac et al, "Sequencing of Magabase Plus DNA by Hybridization:
Theory of the Method" in Genomics, vol. 4, pp. 114-118 (1989),
respectively.
We have now discovered an approach to biomolecular code sequencing which is
qualitatively distinct from the prior art. This different approach is
manifest in a novel method suitable for identifying a code sequence of at
least a portion of a biomolecule, the method comprising the steps of:
1) using a near-field probe technique for generating a super-resolution
chemical analysis of the portion of a biomolecule; and
2) correlating the chemical analysis with a broad spectral content of a
referent biomolecule for generating a code sequencing.
The present invention as defined can realize several significant
advantages.
First of all, the novel method has an immanent capability for generating
nucleotide sequencing information of such a quality, quantity and
time-responsiveness, that heretofore even merely theorized applications
requiring such information can now become a straightforward reality. For
example, the method can be employed for developing a map that accurately
reflects both individual nucleotide identification (i.e., A, G, C and T)
and the location of an individual nucleotide with respect to a strand of
arbitrary length, including an entire genome.
In this sense, moreover, the novel method can evince a remarkable
versatility, since it may be selectively and variously employed e.g., in
dependent steps, for:
1) identifying a first nucleotide from a second(adjacent) nucleotide;
or
2) locating with respect to an arbitrary strand or to a genome, a location
of an identified nucleotide;
or
3) identifying a first duplet, codon, gene from a second (adjacent) duplet,
codon, gene;
or
4) locating with respect to an arbitrary strand or to a genome, a location
of an identified duplet, codon, gene.
To this end, the novel method has a capability for generating a fast and/or
high throughput code sequence e.g., comprising at least 1000 bases/portion
of biomolecule, preferably at least 100 kilobases bases/portion of
biomolecule within less than 1 hour, particularly an entire human genome
within less than one day, for example, 3 kilobases in less than 5 minutes.
Other advantages of the novel method proceed from the following
considerations. An application of the method can generate, for the first
time, nucleotide information of a quality and quantity sui generis. This
information, in turn, can become a centerpiece for new and efficient
approaches to gene testing or drug design, DNA sequence homology or
biomolecular computing.
Other advantages of the novel method are enumerated below.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention is illustrated in the accompanying drawing, in which:
FIG. 1 shows an assembly suitable for identifying a code sequence of at
least a portion of a biomolecule, and of utility in realizing the novel
method;
FIG. 2 shows a near-field scanning probe comprising an apertureless
near-field optical microscope;
FIG. 3 provides a schematic for explaining basic concepts about the FIG. 2
apertureless microscope;
FIG. 4 shows a chemical modification of a biomolecule into a sample
preliminary to its interrogation by a near-field scanning probe;
FIG. 5 shows spectroscopic curves for DNA nucleotides;
FIG. 6 shows a spectroscopic curve for an arbitrary biomolecule;
FIG. 7 shows a correlogram based on FIGS. 5,6;
FIG. 8 shows an assembly employed to realize the invention in a
free-solution embodiment;
FIG. 9 shows a mathematical relationship of biomolecular diffusion actions
in a FIG. 8 context;
FIG. 10 shows an assembly employed to realize the invention in a gel
embodiment;
and
FIGS. 11-13 show further embodiments and details of systems and assemblies
constructed in accordance with the present invention.
DETAILED DESCRIPTION OF THE INVENTION
In the interests of clarity, the following detailed description of the
invention includes sections which are chiefly or exclusively concerned
with a particular part of the invention. It is to be understood, however,
that the relationship between different parts of the invention is of
significant importance, and the following detailed description should be
read in the light of that understanding. It should also be understood
that, where features of the invention are described in the context of
particular Figures of the drawing, the same description can also be
applied to the invention in general and to the other Figures, insofar as
the context permits.
Section one sets forth sundry definitions and examples of words, phrases or
concepts that may be abstracted from the summarized invention, or may be
used to reference preferred embodiments of the invention. Section two
provides a conceptual overview of the present invention with special
emphasis on that aspect of the present invention which comprises coupling
a near-field scanning probe technique with interrogation of a biomolecule.
Section three discloses in overview an assembly that may be preferably
used to realize the present invention. In a fourth section, we disclose
particulars of a preferred near-field probe included in the section three
assembly, while in a fifth section entitled "Chemistry", we disclose
preferred techniques for preparing a biomolecule for sequencing. Section
six, entitled "Correlation", builds on the previous sections, and
discloses how the invention can correlate a chemical analysis of an
arbitrary biomolecule with spectroscopic data of a known such biomolecule.
Sections seven and eight are dedicated respectively to preferred
realizations of the present method in free-solution and gel. Section 9,
finally, builds on the previous sections and discloses further assembly
and system details.
I. Definitions
1. "a code sequence": In reference to a biomolecule, a code sequence means
the order of the basic building blocks of a macromolecule or equivalent
chemical compound, for example, amino acids for peptides, nucleotides for
nucleic acids or a sugar residue for carbohydrates. A code sequence may
comprise a map that is 1 to 1 congruent with a portion of a biomolecule
i.e., endomorphic, or alternatively, may be isomorphic with respect to the
portion. To illustrate this point: assume that an arbitrary nucleotide
string comprises AAGCATATCG. Then, an endomorphic code sequence consists
of AAGCATATCG, while an isomorphic code sequence may comprise alternative
nucleotides i.e., ACTTG.
2. "a portion of a biomolecule": A biomolecule comprises polymeric
macromolecules. The present method may be used to interrogate the code
sequence of an entire macromolecule, or at least a preselected portion of
a macromolecule. For example, the method may be used to interrogate the
code sequence of a fragment of DNA.
3. "electrophoresis" comprises a separation of molecules on the basis of
their net electrical charge. For purposes of the present invention,
electrophoresis may be carried out, e.g., in a gel or preferably in a
free-solution.
4. "near-field probe techniques": near-field probe techniques can provide a
measurement modality capable of resolution of a sample beyond the
diffraction limit and capable of atomic resolution imaging. In brief, the
technique may comprise placing a subwavelength-sized probe within tens of
nanometers of the sample: Travelling over such short distances, radiation
has no opportunity to diffract and take on its asymptotic far-field
characteristics--hence the name "near-field". Note that a suitable probe
may comprise a sharp metallic tip or an uncoated silicon and/or silicon
nitrate tip, or a tip coated with a conductive layer or a molecular
system. A near-field probe capability may be realized by e.g., a scanning
tunneling microscope (STM), an atomic force microscope (AFM), an aperture
or apertureless near-field optical microscope, a near-field acoustic
microscope, a thermal microscope or a magnetic force microscope (MFM). The
notion of "scanning" references the fact that probe and biomolecule may be
in relative motion. Reference may be made for example to U.S. Pat. Nos.
5,319,977; 4,343,993; 5,003,815; 4,941,753; 4,947,034; 4,747,698 and Appl.
Phys. Lett. 65(13), Sep. 26, 1994. The disclosures of each of these
patents and publications are incorporated herein by reference.
5. "super-resolution chemical analysis" comprises a recognition of a
chemical species e.g., at least a portion of a biomolecule, by analyzing a
molecular specificity of its spectra or pints thereof, preferably by using
spatially resolved spectroscopy with physical methods, for example,
near-field microscopic techniques.
6. "broad spectral content of a biomolecule" means a characterization of a
spectra e.g., absorption or emission or thermal or magnetic properties of
a pre-defined analyte when it is preferably interrogated by a tuned
excitation radiation source with a frequency specific to an analyte being
monitored, ranging from x-ray, UV, visible, IR or microwave of the
spectrum.
II. Conceptual Overview of Present Invention
As alluded to above, the present invention comprises coupling a near-field
probe technique with interrogation of at least a portion of a biomolecule
to an end of generating a super-resolution chemical analysis of a portion
of the biomolecule under interrogation, and correlating the chemical
analysis with a broad spectral content of a referent biomolecule for
generating a precise code sequencing.
If even theoretically contemplated, it is not technically known or obvious
outside of the present instruction, how one may effect the required
coupling. Restated, the desired result i.e., the precise code sequencing,
cannot in fact be effected by some sort of nominal juxtaposition of a
near-field probe and a biomolecule. (cf. imaging). We note that the reason
for this is that a putative such attempt simply results in a blurred and
information-less output signal.
The present invention addresses and solves this problem by way of preferred
novel assemblies suitable for identifying a code sequence of at least a
portion of a biomolecule. Various preferred embodiments of these
assemblies are disclosed below.
III. Overview of Physical Components of Invention As An Assembly
Attention is now directed to FIG. 1, which shows a schematic overview 10 of
physical components that preferably may be assembled in realization of the
present invention, in particular, for distinguishing a biomolecule 12
against a chemically complex background solution 14.
The biomolecule 12 can migrate beneath an interrogating and preferably
movable i.e., scanning (see arrows) near-field probe 16. Note that FIG. 1
shows one such near-field probe. However, expediencies of interrogation
may be realized by suitably ganging a plurality of near-field probes. Note
that the near-field probe 16 can function as an excitation source, or
alternatively, an external excitation source (see FIGS. 8, 10, 12, 13
infra) can be used.
A resultant interrogation signal 18 from the near-field probe 16 may be
detected by a detector 20, comprising, For example, a conventional
spectrometer e.g., an interferometric system. The detector 20 can generate
a detection signal 22 for storage and processing on a computer 24. For
example, an IBM RS 6000 may be programmed for interpreting a sequence of
building blocks of a biomolecule comprising amino acids in a case of
proteins, or nucleotides in a case of nucleic acids.
Note in FIG. 1 that the biomolecule 12 is initially loaded in a container
26 comprising the solution 14, preferably using a stretching procedure
comprising external radiation such as a magnetic field 28, and a specific
positioning of the biomolecule 12 to a support 30. This arrangement can
facilitate an efficient immobilization and stretching of the biomolecule
12, for example, before and during the migration rate, by way of an
applied electric field generated by a power source 32. These points are
amplified below, in section V entitled "chemistry".
IV. Preferred Near-Field Probe and Detection
FIG. 1 indicates the employment of a near-field probe 16 and an excitation
source and a detector 20. Further information on preferred such devices is
now set forth by way of FIGS. 2,3.
A preferred apertureless near-field scanning probe microscope and detector
34 are shown respectively in FIGS. 2, 3. The apertureless near-field
scanning microscope is preferred because, among other reasons, its
capability of measuring absorption properties of a sample can be extended
to a spatial resolution in the sub-nanometer regime, thereby realizing
single nucleotide resolution (cf. nucleotide length .alpha.1 to 2
angstroms). We note that aperture based systems can also be used, at lower
resolution e.g., approximately .lambda./40. In particular, the FIGS. 2, 3
microscope comprises an apertureless near-field optical microscope wherein
a light source preferably emits spherical light scattering from a sharp
tip, rather than light transmitted through a fine aperture.
An understanding of the operation of the .FIGS. 2, 3 apertureless
microscope 34 is now provided by first summarizing :its
mechanical-physical components, and then disclosing its theory of
operation.
The microscope preferably includes a high numerical aperture Nomarski
objective 36 (e.g., liquid immersion objective)that may be used to form
two diffraction limited spots at the far surface of a transparent
substrate e.g., a glass cover slip 38. A sharp silicon tip 40 of an A | | |