|
Description  |
|
|
BACKGROUND OF THE INVENTION
The present invention relates to the determination of the sequences of
polymers immobilized to a substrate. In particular, one embodiment of the
invention provides a method and apparatus for sequencing many nucleic acid
sequences immobilized at distinct locations on a matrix surface. The
principles and apparatus of the present invention may be used, for
example, also in the determination of sequences of peptides, polypeptides,
oligonucleotides, nucleic acids, oligosaccharides, phospholipids and other
biological polymers. It is especially useful for determining the sequences
of nucleic acids and proteins.
The structure and function of biological molecules are closely
interrelated. The structure of a biological polymer, typically a
macromolecule, is generally determined by its monomer sequence. For this
reason, biochemists historically have been interested in the sequence
characterization of biological macromolecule polymers. With the advent of
molecular biology, the relationship between a protein sequence and its
corresponding encoding gene sequence is well understood. Thus,
characterization of the sequence of a nucleic acid encoding a protein has
become very important.
Partly for this reason, the development of technologies providing the
capability for sequencing enormous amounts of DNA has received great
interest. Technologies for this capability are necessary for, for example,
the successful completion of the human genome sequencing project.
Structural characterization of biopolymers is very important for further
progress in many areas of molecular and cell biology.
While sequencing of macromolecules has become extremely important, many
aspects of these technologies have not advanced significantly over the
past decade. For example, in the protein sequencing technologies being
applied today the Edman degradation methods are still being used. See,
e.g., Knight (1989) "Microsequencers for Proteins and Oligosaccharides,"
Bio/Technol. 7:1075-1076. Although advanced instrumentation for protein
sequencing has been developed, see, e.g., Frank et al. (1989) "Automation
of DNA Sequencing Reactions and Related Techniques: A Work Station for
Micromanipulation of Liquids," Bio/Technol. 6:1211-1213, this technology
utilizes a homogeneous and isolated protein sample for determination of
removed residues from that homogeneous sample.
Likewise, in nucleic acid sequencing technology, three major methods for
sequencing have been developed, of which two are commonly used today. See,
e.g., Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d
Ed.) Vols. 1-3, Cold Spring Harbor Press, New York, which is hereby
incorporated herein by reference. The first method was developed by Maxam
and Gilbert. See, e.g., Maxam and Gilbert (1980) "Sequencing End-Labeled
DNA with Base-Specific Chemical Cleavages," Methods in Enzymol.
65:499-560, which is hereby incorporated herein by reference. The polymer
is chemically cleaved with a series of base-specific cleavage reagents
thereby generating a series of fragments of various lengths. The various
fragments, each resulting from a cleavage at a specific base, are run in
parallel on a slab gel which resolves nucleic acids which differ in length
by single nucleotides. A specific label allows detection of cleavages at
all nucleotides relative to the position of the label.
This separation requires high resolution electrophoresis or some other
system for separating nucleic acids of very similar size. Thus, the target
nucleic acid to be sequenced must usually be initially purified to near
homogeneity.
Sanger and Coulson devised two alternative methods for nucleic acid
sequencing. The first method, known as the plus and minus method, is
described in Sanger and Coulson (1975) J. Mol. Biol. 94:441-448, and has
been replaced by the second method. Subsequently, Sanger and Coulson
developed another improved sequencing method known as the dideoxy chain
termination method. See, e.g., Sanger et al. (1977) "DNA Sequencing with
Chain-Termination Inhibitors," Proc. Natl Acad Sci USA 74:5463-5467, which
is hereby incorporated herein by reference This method is based on the
inability of 2', 3' dideoxy nucleotides to be elongated by a polymerase
because of the absence of a 3' hydroxyl group on the sugar ring, thus
resulting in chain termination. Each of the separate chain terminating
nucleotides are incorporated by a DNA polymerase, and the resulting
terminated fragment is known to end with the corresponding dideoxy
nucleotide. However, both of the Sanger and Coulson sequencing techniques
usually require isolation and purification of the nucleic acid to be
sequenced and separation of nucleic acid molecules differing in length by
single nucleotides.
Both the polypeptide sequencing technology and the oligonucleotide
sequencing technologies described above suffer from the requirement to
isolate and work with distinct homogeneous molecules in each
determination.
In the polypeptide technology, the terminal amino acid is sequentially
removed and analyzed. However, the analysis is dependent upon only one
single amino acid being removed, thus requiring the polypeptide to be
homogeneous.
In the case of nucleic acid sequencing, the present techniques typically
utilize very high resolution polyacrylamide gel electrophoresis. This high
resolution separation uses both highly toxic acrylamide for the separation
of the resulting molecules and usually very high voltages in running the
electrophoresis. Both the purification and isolation techniques are highly
tedious, time consuming and expensive processes.
Thus, a need exists for the capability of simultaneously sequencing many
biological polymers without individual isolation and purification.
Moreover, dispensing with the need to individually perform the high
resolution separation of related molecules leads to greater safety, speed,
and reliability. The present invention solves these and many other
problems.
SUMMARY OF THE INVENTION
The present invention provides the means to sequence hundreds, thousands or
even millions of biological macromolecules simultaneously and without
individually isolating each macromolecule to be sequenced. It also
dispenses with the requirement, in the case of nucleic acids, of
separating the products of the sequencing reactions on dangerous
polyacrylamide gels. Adaptable to automation, the cost and effort required
in sequence analysis will be dramatically reduced.
This invention is most applicable, but not limited, to linear
macromolecules. It also provides specific reagents for sequencing both
oligonucleotides and polypeptides. It provides an apparatus for automating
the processes described herein.
The present invention provides methods for determining the positions of
polymers which terminate with a given monomer, where said polymers are
attached to a surface having a plurality of positionally distinct polymers
attached thereto, said method comprising the steps of:
labeling a terminal monomer in a monomer type specific manner; and
scanning said surface, thereby determining the positions of said label. In
one embodiment, the polymers are polynucleotides, and usually the labeling
of the terminal marker comprises incorporation of a labeled terminal
monomer selected from the group of nucleotides consisting of adenine,
cytidine, guanidine and thymidine.
An alternative embodiment provides methods for concurrently determining
which subset of a plurality of positionally distinct polymers attached to
a solid substrate at separable locations terminates with a given terminal
subunit, said method comprising the steps of:
mixing said solid substrate with a solution comprising a reagent, which
selectively marks positionally distinct polymers which terminate with said
given terminal subunit; and
determining with a detector which separable locations are marked, thereby
determining which subset of said positionally distinct polymers terminated
with said given terminal subunit. In one version, the solution comprises a
reagent which marks the positionally distinct polymer with a fluorescent
label moiety. In another version the terminal subunit is selected from the
group consisting of adenosine, cytosine, guanosine, and thymine.
Methods are also provided for determining which subset of a plurality of
primer polynucleotides have a predetermined oligonucleotide, wherein the
polynucleotides are complementary to distinctly positioned template
strands which are attached to a solid substrate, said method comprising
the steps of:
selectively marking said subset of primer polynucleotides having the
predetermined oligonucleotide; and
detecting which polynucleotides are marked. In one embodiment, the
oligonucleotide subunit is a single nucleotide; in another the marking
comprises elongating said primer with a labeled nucleotide which is
complementary to a template; and in a further embodiment the marking step
uses a polymerase and a blocked and labeled adenine.
The invention embraces methods for concurrently obtaining sequence
information on a plurality of polynucleotides by use of a single label
detector, said method comprising the steps of:
attaching a plurality of positionally distinct polynucleotides to a solid
substrate at separable locations;
labeling said plurality of polynucleotides with a terminal nucleotide
specific reagent, said label being detectable using said label detector;
determining whether said specific labeling reagent has labeled each
separable location. Often, the labeling is performed with reagents which
can distinguishably label alternative possible nucleotide monomers. One
embodiment uses four replica substrates each of which is labeled with a
specific labeling reagent for adenine, cytosine, guanine, or thymine.
Usually, the labeling and determining steps are performed in succession
using reagents specific for each of adenine, cytosine, guanine, and
thymine monomers.
An alternative embodiment provides methods for concurrently obtaining
sequence information on a plurality of polynucleotides, said method
comprising the steps of:
attaching distinct polynucleotides to a plurality of distinct solid
substrates;
labeling said plurality of solid substrates with a terminal nucleotide
specific labeling reagent; and
determining whether said specific labeling reagent has labeled each
distinct substrate. The method can be performed using a continuous flow of
distinct solid substrates through a reaction solution.
A method is provided for simultaneously sequencing a plurality of polymers
made up of monomer units, said plurality of polymers attached to a
substrate at definable positions, said method comprising the steps of:
mixing said substrate with a reagent which specifically recognizes a
terminal monomer, thereby providing identification among various terminal
monomer units; and
scanning said substrate to distinguish signals at definable positions on
said substrate; and
correlating said signals at defined positions on said substrate to provide
sequential series of sequence determinations. Often, the plurality of
polymers are synthesized by a plurality of separate cell colonies, and the
polymers may be attached to said substrate by a carbonyl linkage. In one
embodiment, the polymers are polynucleotides, and often the substrate
comprises silicon. The scanning will often identify a fluorescent label.
In one embodiment, the reagent exhibits specificity of removal of terminal
monomers, in another, the reagent exhibits specificity of labeling of
terminal monomers.
The invention also embraces methods for sequencing a plurality of
distinctly positioned polynucleotides attached to a solid substrate
comprising the steps of:
hybridizing complementary primers to said plurality of polynucleotides;
elongating a complementary primer hybridized to a polynucleotide by adding
a single nucleotide; and
identifying which of said complementary primers have incorporated said
nucleotide. In some versions, the elongating step is performed
simultaneously on said plurality of polynucleotides linked to said
substrate. Typically, the substrate is a two dimensional surface and the
identifying results from a positional determination of the complementary
primers incorporating the single defined nucleotide. A silicon substrate
is useful in this method.
Methods, are provided where the linking is by photocrosslinking
polynucleotide to said complementary primer, where said primer is attached
to said substrate. The elongating will be often catalyzed by a DNA
dependent polymerase. In various embodiments, a nucleotide will have a
removable blocking moiety to prevent further elongation, e.g., NVOC.
A nucleotide with both a blocking moiety and labeling moiety will be often
used.
A further understanding of the nature and advantages of the invention
herein may be realized by reference to the remaining portions of the
specification and the attached drawings.
BRIEF DESCRIPTION OF THE FIGURES
FIGS. 1A-D illustrates a simplified and schematized embodiment of a
degradative scheme for polymer sequencing.
FIGS. 2A-D illustrates a simplified and schematized embodiment of a
synthetic scheme for polymer sequencing.
FIG. 3 illustrates a coordinate mapping system of a petri plate containing
colonies. Each position of a colony can be assigned a distinct coordinate
position.
FIGS. 4A-C illustrates various modified embodiments of the substrates.
FIGS. 5A-B illustrates an idealized scanning result corresponding to a
particular colony position.
FIG. 6 illustrates particular linkers useful for attaching a nucleic acid
to a silicon substrate. Note that thymine may be substituted by adenine,
cytidine, guanine, or uracil.
FIG. 7 illustrates an embodiment of the scanning system and reaction
chamber.
FIG. 8 illustrates the application of the synthetic scheme for sequencing
as applied to a nucleic acid cluster localized to a discrete identified
position. FIG. 8A illustrates schematically, at a molecular level, the
sequence of events which occur during a particular sequencing cycle. FIG.
8B illustrates, in a logic flow chart, how the scheme is performed.
FIG. 9 illustrates the synthesis of a representative nucleotide analog
useful in the synthetic scheme. Note that the FMOC may be attached to
adenine, cystosine, or guanine.
FIG. 10 illustrates the application of the degradative scheme for
sequencing as applied to a nucleic acid cluster localized to a discrete
identified position. FIG. 10A illustrates schematically, at a molecular
level, the sequence of events which occur during a particular sequencing
cycle. FIG. 10B illustrates in a logic flow chart how the scheme is
performed.
FIG. 11 illustrates a functionalized apparatus for performing the scanning
steps and sequencing reaction steps.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
I. Sequencing Procedure for a Generic Polymer
A. Overview
1. Substrate and matrix
2. Scanning system
3. Synthetic/degradative cycles
4. Label
5. Utility
B. Substrate/Matrix
1. Non-distortable
2. Attachment of polymer
C. Scanning system
1. Mapping to distinct position
2. Detection system
3. Digital or analog signal
D. Synthetic or degradative cycle
1. Synthetic cycles
a. synthetic scheme
b. blocking groups
2. Degradative cycles
3. Conceptual principles
E. Label
1. Attachment
2. Mode of detection
F. Utility
II. Specific Embodiments
A. Synthetic method
B. Chain degradation method
III. Apparatus
I. Sequencing Procedure for a Generic Polymer
The present invention provides methods and apparatus for the preparation
and use of a substrate having a plurality of polymers with various
sequences where each small defined contiguous area defines a small cluster
of homogeneous polymer sequences. The invention is described herein
primarily with regard to the sequencing of nucleic acids but may be
readily adapted to the sequencing of other polymers, typically linear
biological macromolecules. Such polymers include, for example, both linear
and cyclical polymers or nucleic acids, polysaccharides, phospholipids,
and peptides having various different amino acids, heteropolymers in which
the polymers are mixed, polyurethanes, polyesters, polycarbonates,
polyureas, polyamides, polyethyleneimines, polyarylene sulfides,
polysiloxanes, polyimides, polyacetates or mixed polymers of various
sorts. In a preferred embodiment, the present invention is described in
the use of sequencing nucleic acids.
Various aspects of the patents and applications in the cross reference
above are applicable to the substrates and matrix materials described
herein, to the apparatus used for scanning the matrix arrays, to means for
automating the scanning process, and to the linkage of polymers to a
substrate.
A. Overview
The present invention is based, in part, on the ability to perform a step
wise series of reactions which either extend or degrade a polymer by
defined units.
FIG. 1 schematizes a simplified linear two monomer polymer made up of A
type and B type subunits. A degradative scheme is illustrated. Panel A
depicts a matrix with two different polymers located at positions 10 and
14, but with no polymer linked at position 12. A reaction is employed to
label all of these polymers at the terminus opposite the attachment of the
monomer. Panel B illustrates a label (designated by an asterisk)
incorporated at position 16 on the terminal monomers. A scan step is
performed to locate positions 10 and 14 where polymers have been linked,
but no polymer is located at position 12. The entire matrix is exposed to
a reagent which is specific for removing single terminal A monomers, which
are also labeled. The reagent is selected to remove only a single monomer;
it will not remove further A monomers. Removal of the labeled A monomer
leaves a substrate as illustrated in panel C. A scan step is performed and
compared with the previous scan, indicating that the polymer located at
position 12 has lost its label, i.e, that polymer at 12 terminated with an
A monomer. The entire matrix is then exposed to a second reagent which is
specific for removing terminal B monomers which are also labeled. Note
that only a single B on each monomer is removed and that successive B
monomers are not affected. Removal of the labeled B monomer leaves a
substrate as illustrated in panel D. Another scan step is performed,
indicating that the polymer located at position 14 has lost its label,
i.e., it terminated with a B monomer. The sequence of treatments and scans
is repeated to determine the successive monomers. It will be recognized
that if the labeled A and B are distinguishable, i.e., the label on
polymers at sites 10 and 14 may be distinguished, a single removal step
can be performed to convert the substrate as illustrated in panel B
directly to that illustrated in panel D.
An alternative embodiment employs synthetic reactions where a synthetic
product is made at the direction of the attached polymer. The method is
useful in the synthesis of a complementary nucleic acid strand by
elongation of a primer as directed by the attached polymer.
FIG. 2 illustrates a similar simplified polymer scheme, where the A and B
monomer provide a complementary correspondence to A' and B' respectively.
Thus, an A monomer directs synthetic addition of an A' monomer and a B
monomer directs synthetic addition of a B' monomer. Panel A depicts
monomers attached at locations 18 and 22, but not at location 20. Each
polymer already has one corresponding complementary monomer A'. The
matrix, with polymers, is subjected to an elongation reaction which
incorporates, e.g., single labeled A' monomers 24 but not B' monomers, as
depicted in panel B. The label is indicated by the asterisk. Note that
only one A monomer is added. A scan step is performed to determine whether
polymers located at positions 18 or 22 have incorporated the labeled A'
monomers. The polymer at position 18 has, while the polymer at position 22
has not. Another elongation reaction which incorporates labeled B'
monomers 26 is performed resulting in a matrix as depicted in panel C.
Again note that only one, and not successive B' monomers, is added.
Another scan is performed to determine whether a polymer located at sites
18 or 22 has incorporated a labeled B' monomer, and the result indicates
that the polymer located at site 22 has incorporated the labeled B'
monomer. A next step removes all of the labels to provide a substrate as
depicted in panel D. As before, if the polymer which incorporated a
labeled A' monomer is distinguishable from a polymer which incorporated a
labeled B' monomer, the separate elongation reactions may be combined
producing a panel C type matrix directly from a panel A type matrix and
the scan procedure can distinguish which terminal monomer was
incorporated.
It will be appreciated that the process may be applied to more complicated
polymers having more different types of monomers. Also, the number of scan
steps can be minimized if the various possible labeled monomers can be
differentiated by the detector system.
Typically, the units will be single monomers, though under certain
circumstances the units may comprise dimers, trimers, or longer segments
of defined length. In fact, under certain circumstances, the method may be
operable in removing or adding different sized units so long as the units
are distinguishable. However, it is very important that the reagents used
do not remove or add successive monomers. This is achieved in the
degradative method by use of highly specific reagents. In the synthetic
mode, this is often achieved with removable blocking groups which prevent
further elongation.
One important aspect of the invention is the concept of using a substrate
having homogeneous clusters of polymers attached at distinct matrix
positions. The term "cluster" refers to a localized group of substantially
homogeneous polymers which are positionally defined as corresponding to a
single sequence. For example, a coordinate system will allow the
reproducible identification and correlation of data corresponding to
distinct homogeneous clusters of polymers locally attached to a matrix
surface. FIG. 3 illustrates a mapping system providing such a
correspondence, where transfer of polymers produced by a colony of
organisms to a matrix preserves spatial information thereby allowing
positional identification. The positional identification allows
correlation of data from successive scan steps.
In one embodiment, bacterial colonies producing polymers are spatially
separated on the media surface of a petri plate as depicted in panel A.
Alternatively, phage plaques on a bacterial lawn can exhibit a similar
distribution. A portion of panel A is enlarged and shown in panel B.
Individual colonies are labeled C1-C7. The position of each colony can be
mapped to positions on a coordinate system, as depicted in panel C. The
positions of each colony can then be defined, as in a table shown in panel
D, which allows reproducible correlation of scan cycle results.
Although the preferred embodiments are described with respect to a flat
matrix, the invention may also be applied using the means for correlating
detection results from multiple samples after passage through batch or
continuous flow reactions. For example, spatially separated polymers may
be held in separate wells on a microtiter plate. The polymers will be
attached to a substrate to retain the polymers as the sequencing reagents
are applied and removed.
The entire substrate surface, with homogeneous clusters of polymers
attached at defined positions, may be subjected to batch reactions so the
entire surface is exposed to a uniform and defined sequence of reactions.
As a result, each cluster of target polymers for sequencing will be
subjected to similar reactive chemistry. By monitoring the results of
these reactions on each cluster localized to a defined coordinate
position, the sequence of the polymer which is attached at that site will
be determined.
FIG. 4, panel A illustrates solid phase attached polymers linked to
particles 32 which are individually sequestered in separate wells 34 on a
microtiter plate. The scanning system will separately scan each well. FIG.
4 panel B illustrates marbles 36 to which polymers are attached. The
marbles are automatically fed in a continuous stream through the reaction
reagents 38 and past a detector 40. The marbles may be carefully held in
tubes or troughs which prevent the order of the beads from being
disturbed. In a combination of the two embodiments, each polymer is
attached to a plurality of small marbles, and marbles having each polymer
are separated, but retained in a known order. Each marble is, in batch
with a number of analogous marbles having other polymers linked
individually to them, passed through a series of reagents in the
sequencing system. For example, A2, B2, and C2 are subjected to sequencing
reactions in batch, with label incorporated only for the second monomer.
A3, B3, and C3 are likewise treated to determine the third monomer.
Likewise for A.sub.n, B.sub.n, and C.sub.n. However, within each batch,
the detection will usually occur in the order A, B, and C, thereby
providing for correlation of successive detection steps for the A polymer
beads, for the B polymer beads, and for the C polymer beads.
FIG. 5 illustrates a signal which might result from a particular defined
position. Panel A illustrates the position of a given colony relative to
the positions corresponding to the positional map. The scan system will
typically determine the amount of signal, or type of signal, at each
position of the matrix. The scan system will adjust the relationship of
the detector and the substrate to scan the matrix in a controllable
fashion. An optical system with mirrors or other elements may allow the
relative positions of the substrate and detection to be fixed. The scanner
can be programmed to scan the entire substrate surface in a reproducible
manner, or to scan only those positions where polymer clusters have been
localized. A digital data map, panel B, can be generated from the scan
step.
Thus, instead of subjecting each individual and separated polymer to the
series of reactions as a homogeneous sample, a whole matrix array of
different polymers targeted for sequencing may be exposed to a series of
chemical manipulations in a batch format. A large array of hundreds,
thousands, or even millions of spatially separated homogeneous regions may
be simultaneously treated by defined sequencing chemistry.
The use of a coordinate system which can reproducibly assay a defined
position after each reaction cycle can be advantageously applied according
to this invention. For example, a colony plaque lift of polymers can be
transferred onto a nitrocellulose filter or other substrate. A scanning
detector system will be able to reproducibly monitor the results of
chemical reactions performed on the target polymers located at the defined
locations of particular clones. An accurate positioning can be further
ensured by incorporating various alignment marks on the substrate.
The use of a high resolution system for monitoring the results of
successive sequencing steps provides the possibility for correlating the
scan results of each successive sequencing reaction at each defined
position.
The invention is dependent, in part, upon the stepwise synthesis or
degradation of the localized polymers as schematized in FIGS. 1 and 2. The
synthetic scheme is particularly useful on nucleic acids which can be
synthesized from a complementary strand. Otherwise, a stepwise degradation
scheme may be the preferred method. Although single monomer cycles of
synthesis or degradation will usually be applicable, in certain cas | | |