|
Description  |
|
|
FIELD OF THE INVENTION
The invention relates generally to methods for determining the nucleotide
sequence of a polynucleotide, and more particularly, to a method of
step-wise removal and identification of terminal nucleotides of a
polynucleotide.
BACKGROUND
Analysis of polynucleotides with currently available techniques provides a
spectrum of information ranging from the confirmation that a test
polynucleotide is the same or different than a standard sequence or an
isolated fragment to the express identification and ordering of each
nucleoside of the test polynucleotide. Not only are such techniques
crucial for understanding the function and control of genes and for
applying many of the basic techniques of molecular biology, but they have
also become increasingly important as tools in genomic analysis and a
great many non-research applications, such as genetic identification,
forensic analysis, genetic counseling, medical diagnostics, and the like.
In these latter applications both techniques providing partial sequence
information, such as fingerprinting and sequence comparisons, and
techniques providing full sequence determination have been employed, e.g.
Gibbs et al, Proc. Natl. Acad. Sci., 86: 1919-1923 (1989); Gyllensten et
al, Proc. Natl. Acad. Sci, 85: 7652-7656 (1988); Carrano et al, Genomics,
4: 129-136 (1989); Caetano-Anolles et al, Mol. Gen. Genet., 235: 157-165
(1992); Brenner and Livak, Proc. Natl. Acad. Sci., 86: 8902-8906 (1989);
Green et al, PCR Methods and Applications, 1: 77-90 (1991); and Versalovic
et al, Nucleic Acids Research, 19: 6823-6831 (1991).
Native DNA consists of two linear polymers, or strands of nucleotides. Each
strand is a chain of nucleosides linked by phosphodiester bonds. The two
strands are held together in an antiparallel orientation by hydrogen bonds
between complementary bases of the nucleotides of the two strands:
deoxyadenosine (A) pairs with thymidine (T) and deoxyguanosine (G) pairs
with deoxycytidine (C).
Presently there are two basic approaches to DNA sequence determination: the
dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad.
Sci., 74: 5463-5467 (1977); and the chemical degradation method, e.g.
Maxam et al, Proc. Natl. Acad. Sci., 74: 560-564 (1977). The chain
termination method has been improved in several ways, and serves as the
basis for all currently available automated DNA sequencing machines, e.g.
Sanger et al, J. Mol. Biol., 143: 161-178 (1980); Schreier et al, J. Mol.
Biol., 129: 169-172 (1979); Smith et al, Nucleic Acids Research, 13:
2399-2412 (1985); Smith et al, Nature, 321: 674-679 (1987); Prober et al,
Science, 238: 336-341 (1987); Section II, Meth. Enzymol., 155: 51-334
(1987); Church et al, Science, 240: 185-188 (1988); Hunkapiller et al,
Science, 254: 59-67 (1991); Bevan et al, PCR Methods and Applications, 1:
222-228 (1992).
Both the chain termination and chemical degradation methods require the
generation of one or more sets of labeled DNA fragments, each having a
common origin and each terminating with a known base. The set or sets of
fragments must then be separated by size to obtain sequence information.
In both methods, the DNA fragments are separated by high resolution gel
electrophoresis, which must have the capacity of distinguishing very large
fragments differing in size by no more than a single nucleotide.
Unfortunately, this step severely limits the size of the DNA chain that
can be sequenced at one time. Sequencing using these techniques can
reliably accommodate a DNA chain of up to about 400-450 nucleotides,
Bankier et al, Meth. Enzymol., 155: 51-93 (1987); and Hawkins et al,
Electrophoresis, 13: 552-559 (1992).
Several significant technical problems have seriously impeded the
application of such techniques to the sequencing of long target
polynucleotides, e.g. in excess of 500-600 nucleotides, or to the
sequencing of high volumes of many target polynucleotides. Such problems
include i) the gel electrophoretic separation step which is labor
intensive, is difficult to automate, and introduces an extra degree of
variability in the analysis of data, e.g. band broadening due to
temperature effects, compressions due to secondary structure in the DNA
sequencing fragments, inhomogeneities in the separation gel, and the like;
ii) nucleic acid polymerases whose properties, such as processivity,
fidelity, rate of polymerization, rate of incorporation of chain
terminators, and the like, are often sequence dependent; iii) detection
and analysis of DNA sequencing fragments which are typically present in
fmol quantities in spatially overlapping bands in a gel; iv) lower signals
because the labeling moiety is distributed over the many hundred spatially
separated bands rather than being concentrated in a single homogeneous
phase, and v) in the case of single-lane fluorescence detection, the
availability of dyes with suitable emission and absorption properties,
quantum yield, and spectral resolvability, e.g. Trainor, Anal. Biochem.,
62: 418-426 (1990); Connell et al, Biotechniques, 5: 342-348 (1987);
Karger et al, Nucleic Acids Research, 19: 4955-4962 (1991); Fung et al,
U.S. Pat. No. 4,855,225; and Nishikawa et al, Electrophoresis, 12: 623-631
(1991).
Another problem exists with current technology in the area of diagnostic
sequencing. An ever widening array of disorders, susceptibilities to
disorders, prognoses of disease conditions, and the like, have been
correlated with the presence of particular DNA sequences, or the degree of
variation (or mutation) in DNA sequences, at one or more genetic loci.
Examples of such phenomena include human leukocyte antigen (HLA) typing,
cystic fibrosis, tumor progression and heterogeneity, p53 proto-oncogene
mutations, ras proto-oncogene mutations, and the like, e.g. Gyllensten et
al, PCR Methods and Applications, 1: 91-98 (1991); Santamaria et al,
International application PCT/US92/01675; Tsui et al, International
application PCT/CA90/00267; and the like. A difficulty in determining DNA
sequences associated with such conditions to obtain diagnostic or
prognostic information is the frequent presence of multiple subpopulations
of DNA, e.g. allelic variants, multiple mutant forms, and the like.
Distinguishing the presence and identity of multiple sequences with
current sequencing technology is virtually impossible, without additional
work to isolate and perhaps clone the separate species of DNA.
A major advance in sequencing technology could be made if an alternative
approach was available for sequencing DNA that did not required high
resolution separations, provided signals more amenable to analysis, and
provided a means for readily analyzing DNA from heterozygous genetic loci.
SUMMARY OF THE INVENTION
The invention provides a method of nucleic acid sequence analysis based on
ligation and cleavage of probes at the terminus of a target
polynucleotide. Preferably, repeated cycles of such ligation and cleavage
are implemented in the method, and in each such cycle a nucleotide is
identified at the end of the target polynucleotide and the target
polynucleotide is shortened, such that further cycles of ligation,
cleavage, and identification can take place. That is, preferably, in each
cycle the target sequence is shortened by a single nucleotide and the
cycles are repeated until the nucleotide sequence of the target
polynucleotide is determined.
An important feature of the invention is the probe employed in the ligation
and cleavage events. A probe of the invention is a double stranded
polynucleotide which (i) contains a recognition site for a nuclease, and
(ii) preferably has a protruding strand capable of forming a duplex with a
complementary protruding strand of the target polynucleotide. At each
cycle in the latter embodiment, only those probes whose protruding strands
form perfectly matched duplexes with the protruding strand of the target
polynucleotide are ligated to the end of the target polynucleotide to form
a ligated complex. After removal of the unligated probe, a nuclease
recognizing the probe cuts the ligated complex at a site one or more
nucleotides from the ligation site along the target polynucleotide leaving
an end, usually a protruding strand, capable of participating in the next
cycle of ligation and cleavage. An important feature of the nuclease is
that its recognition site be separate from its cleavage site. As is
described more fully below, in the course of such cycles of ligation and
cleavage, the terminal nucleotides of the target polynucleotide are
identified.
In one aspect of the invention, more than one nucleotide at the terminus of
a target polynucleotide can be identified and/or cleaved during each cycle
of the method.
Generally, the method of the invention comprises the following steps: (a)
ligating a probe to an end of the polynucleotide, the probe having a
nuclease recognition site; (b) identifying one or more nucleotides at the
end of the polynucleotide; (c) cleaving the polynucleotide with a nuclease
recognizing the nuclease recognition site of the probe such that the
polynucleotide is shortened by one or more nucleotides; and (d) repeating
steps (a) through (c) until the nucleotide sequence of the polynucleotide
is determined. As is described more fully below, the order of steps (a)
through (c) may vary with different embodiments of the invention. For
example, identifying the one or more nucleotides can be carried out either
before or after cleavage of the ligated complex from the target
polynucleotide. Likewise, ligating a probe to the end of the
polynucleotide may follow the step of identifying in some preferred
embodiments of the invention. Preferably, the method further includes a
step of removing the unligated probe after the step of ligating.
Preferably, whenever natural protein endonucleases are employed as the
nuclease, the method further includes a step of methylating the target
polynucleotide at the start of a sequencing operation to prevent spurious
cleavages at internal recognition sites fortuitously located in the target
polynucleotide.
The present invention overcomes many of the deficiencies inherent to
current methods of DNA sequencing: there is no requirement for the
electrophoretic separation of closely-sized DNA fragments; no
difficult-to-automate gel-based separations are required; no polymerases
are required for generating nested sets of DNA sequencing fragments;
detection and analysis are greatly simplified because signal-to-noise
ratios are much more favorable on a nucleotide-by-nucleotide basis,
permitting smaller sample sizes to be employed; and for fluorescent-based
detection schemes, analysis is further simplified because fluorophores
labeling different nucleotides may be separately detected in homogeneous
solutions rather than in spatially overlapping bands.
The present invention is readily automated, both for small-scale serial
operation and for large-scale parallel operation, wherein many target
polynucleotides or many segments of a single target polynucleotide are
sequenced simultaneously. Unlike present sequencing approaches, the
progressive nature of the method-that is, determination of a sequence
nucleotide-by-nucleotide--permits one to monitor the progress of the
sequencing operation in real time which, in turn, permits the operation to
be curtailed, or re-started, if difficulties arise, thereby leading to
significant savings in time and reagent usage. Also unlike current
approaches, the method permits the simultaneous determination of allelic
forms of a target polynucleotide: As described more fully below, if a
population of target polynucleotides consists of several subpopulations of
distinct sequences, e.g. polynucleotides from a heterozygous genetic
locus, then the method can identify the proportion of each nucleotide at
each position in the sequence.
Generally, the method of the invention is applicable to all tasks where DNA
sequencing is employed, including medical diagnostics, genetic mapping,
genetic identification, forensic analysis, molecular biology research, and
the like.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1a illustrates a preferred structure of a labeled probe of the
invention.
FIG. 1b illustrates a probe and terminus of a target polynucleotide wherein
a separate labeling step is employed to identify one or more nucleotides
in the protruding strand of a target polynucleotide.
FIG. 1c illustrates steps of an embodiment wherein a nucleotide of the
target polynucleotide is identified by extension with a polymerase in the
presence of labeled dideoxynucleoside triphosphates followed by their
excision, strand extension, and strand displacement.
FIG. 1d diagrammatically illustrates an embodiment in which nucleotide
identification is carried out by polymerase extension of a probe strand in
the presence of labeled chain-terminating nucleoside triphosphates.
FIG. 1e diagrammatically illustrates an embodiment in which nucleotide
identification is carried out by polymerase extension in the presence of
unlabeled chain-terminating 3'-amino nucleoside triphosphates followed by
ligation of a labeled probe.
FIG. 1f illustrates probe assembly at the end of a target polynucleotide
having a 5' protruding strand.
FIG. 1g illustrates probe assembly at the end of a target polynucleotide
having a 3' protruding strand.
FIG. 1h illustrates an embodiment employing a probe for identifying two
nucleotides of a target polynucleotide in each cycle of ligation and
cleavage.
FIG. 2 illustrates the relative positions of the nuclease recognition site,
ligation site, and cleavage site in a ligated complex.
FIGS. 3a through 3h diagrammatically illustrate the embodiment referred to
herein as "double stepping," or the simultaneous use of two different
nucleases in accordance with the invention.
FIGS. 4a through 4d illustrate data showing the fidelity of nucleotide
identification through ligation with a ligase.
FIGS. 5a through 5c illustrate data showing nucleotide identification
through polymerase extension.
DEFINITIONS
As used herein "sequence determination" or "determining a nucleotide
sequence" in reference to polynucleotides includes determination of
partial as well as full sequence information of the polynucleotide. That
is, the term includes sequence comparisons, fingerprinting, and like
levels of information about a target polynucleotide, as well as the
express identification and ordering of nucleosides, usually each
nucleoside, in a target polynucleotide. The term also includes the
determination of the identification, ordering, and locations of one, two,
or three of the four types of nucleotides within a target polynucleotide.
For example, in some embodiments sequence identification may be effected
by identifying the ordering and locations of a single type of nucleotide,
e.g. cytosines, within a target polynucleotide so that its sequence is
represented as a binary code, e.g. "100101 . . . " for "C-(not C)-(not
C)-C-(not C)-C . . . " and the like.
"Perfectly matched duplex" in reference to the protruding strands of probes
and target polynucleotides means that the protruding strand from one forms
a double stranded structure with the other such that each nucleotide in
the double stranded structure undergoes Watson-Crick base pairing with a
nucleotide on the opposite strand. The term also comprehends the pairing
of nucleoside analogs, such as deoxyinosine, nucleosides with
2-aminopurine bases, and the like, that may be employed to reduce the
degeneracy of the probes.
The term "oligonucleotide" as used herein includes linear oligomers of
nucleosides or analogs thereof, including deoxyribonucleosides,
ribonucleosides, and the like. Usually oligonucleotides range in size from
a few monomeric units, e.g. 3-4, to several hundreds of monomeric units.
Whenever an oligonucleotide is represented by a sequence of letters, such
as "ATGCCTG," it will be understood that the nucleotides are in
5'.fwdarw.3' order from left to right and that "A" denotes deoxyadenosine,
"C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes
thymidine, unless otherwise noted.
As used herein, "nucleoside" includes the natural nucleosides, including
2'-deoxy and 2'-hydroxyl forms, e.g. as described in Kornberg and Baker,
DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in
reference to nucleosides includes synthetic nucleosides having modified
base moieties and/or modified sugar moieties, e.g. described generally by
Scheit, Nucleotide Analogs (John Wiley, New York, 1980). Such analogs
include synthetic nucleosides designed to enhance binding properties,
reduce degeneracy, increase specificity, and the like.
DETAILED DESCRIPTION OF THE INVENTION
The invention provides a method of sequencing nucleic acids which obviates
electrophoretic separation of similarly sized DNA fragments and which
eliminates the difficulties associated with the detection and analysis of
spatially overlapping bands of DNA fragments in a gel or like medium.
Moreover, the invention obviates the need to generate DNA fragments from
long single stranded templates with a DNA polymerase.
Structure of Probes
As mentioned above an important feature of the invention are the probes
ligated to the target polynucleotide. Generally, the probes of the
invention provide a "platform" from which a nuclease cleaves the target
polynucleotide to which probe is ligated. Probes of the invention can also
provide a means for identifying or labeling a nucleotide at the end of the
target polynucleotide. Probes do not necessarily provide both functions in
every embodiment.
In one aspect of the invention, probes have the form illustrated in FIG.
1a. In this embodiment, probes are double stranded segments of DNA having
a protruding strand at one end 10, at least one nuclease recognition site
12, and a spacer region 14 between the recognition site and the protruding
end 10. Preferably, probes also include a label 16, which in this
particular embodiment is illustrated at the end opposite of the protruding
strand. The probes may be labeled by a variety of means and at a variety
of locations, the only restriction being that the labeling means selected
does not interfere with the ligation step or with the recognition of the
probe by the nuclease.
In the above embodiment, whenever a nuclease leaves a 5' phosphate on the
terminus of the target polynucleotide, it is sometimes desirable to remove
it, e.g. by treatment with a standard phosphatase, prior to ligation. This
prevents undesired ligation of one of the strands, when the protruding
strands of the probe and target sequence fail to form a perfectly matched
duplex. This is particularly problematic when a mismatch occurs precisely
at the nucleotide position where identification is sought. Where such
phosphatase treatment is employed, the "nick" remaining in the ligated
complex after the initial ligation can be repaired by kinase treatment
followed by a second ligation step.
Preferably, embodiments of the invention employing the above type of probe
comprise the following steps: (a) ligating a probe to an end of the
polynucleotide having a protruding strand to form a ligated complex, the
probe having a complementary protruding strand to that of the
polynucleotide and the probe having a nuclease recognition site; (b)
identifying one or more nucleotides in the protruding strand of the
polynucleotide, e.g. by the identity of the ligated probe; (c) cleaving
the ligated complex with a nuclease; and (d) repeating steps (a) through
(c) until the nucleotide sequence of the polynucleotide is determined. The
step of identifying can take place either before or after the step of
cleaving. Preferably, the one or more nucleotides in the protruding strand
of the polynucleotide are identified prior to cleavage. In further
preference, the method also includes a step of removing unligated probe
from the ligated complex.
It is not critical whether protruding strand 10 of the probe is a 5' or 3'
end. However, in this embodiment, it is important that the protruding
strands of the target polynucleotide and probes be capable of forming
perfectly matched duplexes to allow for specific ligation. If the
protruding strands of the target polynucleotide and probe are different
lengths the resulting gap can be filled in by a polymerase prior to
ligation, e.g. as in "gap LCR" disclosed in Backman et al, European patent
application 91100959.5. Such gap filling can be used as a means for
identifying one or more nucleotides in the protruding strand of the target
polynucleotide. Preferably, the number of nucleotides in the respective
protruding strands are the same so that both strands of the probe and
target polynucleotide are capable of being ligated without a filling step.
Preferably, the protruding strand of the probe is from 2 to 6 nucleotides
long. As indicated below, the greater the length of the protruding strand,
the greater the complexity of the probe mixture that is applied to the
target polynucleotide during each ligation and cleavage cycle.
In another aspect of the invention, the primary function of the probe is to
provide a site for a nuclease to bind to the ligated complex so that the
complex can be cleaved and the target polynucleotide shortened. In this
aspect of the invention, identification of the nucleotides can take place
separately from probe ligation and cleavage. This embodiment provides
several advantages: First, sequence determination does not require that
the protruding strand of the ligated probe be perfectly complementary to
the protruding strand of the target polynucleotide, thereby permitting
greater flexibility in the control of hybridization stringency. Second,
one need not provide a fully degenerate set of probes based on the four
natural nucleotides. So-called "wild card" nucleotides, or "degeneracy
reducing analogs" can be provided to significantly reduce, or even
eliminate, the complexity of the probe mixture employed in the ligation
step, since specific binding is not critical to nucleotide identification
in this embodiment. Third, if identification is not carried out via a
labeling means on the probe, then probes designed for blunt end ligation
may be employed with no need for using degenerate mixtures.
Preferably, this embodiment of the invention comprises the following steps:
(a) providing a polynucleotide having a protruding strand; (b) identifying
one or more nucleotides in the protruding strand by extending a 3' end of
a strand with a nucleic acid polymerase, (c) ligating a probe to an end of
the polynucleotide to form a ligated complex; (d) cleaving the ligated
complex with a nuclease; and (e) repeating steps (a) through (d) until the
nucleotide sequence of the polynucleotide is determined. Preferably, the
target polynucleotide has a 3' recessed strand which is extended by the
nucleic acid polymerase in the presence of chain-terminating nucleoside
triphosphates, and the nuclease used produces a 3'-recessed strand and 5'
protruding strand at the terminus of the target polynucleotide.
An example of this embodiment is illustrated in FIG. 1b: The 3' recessed
strand of polynucleotide (15) is extended with a nucleic acid polymerase
in the presence of the four dideoxynucleoside triphosphates, each carrying
a distinguishable fluorescent label, so that the 3' recessed strand is
extended by one nucleotide (11), which permits its complementary
nucleotide in the 5' protruding strand of polynucleotide (15) to be
identified. Probe (9) having recognition site (12), spacer region (14),
and complementary protruding strand (10), is then ligated to
polynucleotide (15) to form ligated complex (17). Ligated complex (17) is
then cleaved at cleavage site (19) to release a labeled fragment (21) and
augmented probe (23). A shortened polynucleotide (15) with a regenerated
3' recessed strand is then ready for the next cycle of identification,
ligation, and cleavage.
In such embodiments, the first nucleotide of the 5' protruding strand
adjacent to the double stranded portion of the target polynucleotide is
readily identified by extending the 3' strand with a nucleic acid
polymerase in the presence of chain-terminating nucleoside triphosphates.
Preferably, the 3' strand is extended by a nucleic acid polymerase in the
presence of the four chain-terminating nucleoside triphosphates, each
being labeled with a distinguishable fluorescent dye so that the added
nucleotide is readily identified by the color of the attached dye. Such
chain-terminating nucleoside triphosphates are available commercially,
e.g. labeled dideoxynucleoside triphosphates, such as described by Hobbs,
Jr. et al, U.S. Pat. No. 5,047,519; Cruickshank, U.S. Pat. No. 5,091,519;
and the like. Procedures for such extension reactions are described in
various publications, including Syvanen et al, Genomics, 8: 684-692
(1990); Goelet et al, International Application No. PCT/US92/01905; Livak
and Brenner, U.S. Pat. No. 5,102,785; and the like.
A probe may be ligated to the target polynucleotide using conventional
procedures, as described more fully below. Preferably, the probe is
ligated after a single nucleotide extension of the 3' strand of the target
polynucleotide. More preferably, the number of nucleotides in the
protruding strand of the probe is the same as the number of nucleotides in
the protruding strand of the target polynucleotide after the extension
step. That is, if the nuclease provides a protruding strand having four
nucleotides, then after the extension step the protruding strand will have
three nucleotides and the protruding strand of the preferred probe will
have three nucleotides.
The cleavage step in this embodiment may be accomplished by a variety of
techniques, depending on the effect that the added chain-terminating
nucleotide has on the efficiencies of the nuclease and/or ligase employed.
Preferably, a ligated complex is formed with the presence of the labeled
chain-terminating nucleotide, which is subsequently cleaved with the
appropriate nuclease, e.g. a class IIs restriction endonuclease, such as
Fok I, or the like.
In a preferred embodiment, after extension and ligation, the
chain-terminating nucleotide may be excised. Preferably, this is carried
out by the 3'.fwdarw.5' exonuclease activity (i.e. proof-reading activity)
of a DNA polymerase, e.g. T4 DNA polymerase, acting in the pr | | |