|
Description  |
|
|
FIELD OF THE INVENTION
The invention relates generally to methods for determining the nucleotide
sequence of a polynucleotide, and more particularly, to a method of
step-wise removal and identification of terminal nucleotides of a
polynucleotide.
BACKGROUND
Analysis of polynucleotides with currently available techniques provides a
spectrum of information ranging from the confirmation that a test
polynucleotide is the same or different than a standard sequence or an
isolated fragment to the express identification and ordering of each
nucleoside of the test polynucleotide. Not only are such techniques
crucial for understanding the function and control of genes and for
applying many of the basic techniques of molecular biology, but they have
also become increasingly important as tools in genomic analysis and a
great many non-research applications, such as genetic identification,
forensic analysis, genetic counselling, medical diagnostics, and the like.
In these latter applications both techniques providing partial sequence
information, such as fingerprinting and sequence comparisons, and
techniques providing full sequence determination have been employed, e.g.
Gibbs et al, Proc. Natl. Acad. Sci., 86:1919-1923 (1989); Gyllensten et
al, Proc. Natl. Acad. Sci, 85:7652-7656 (1988); Carrano et al, Genomics,
4:129-136 (1989); Caetano-Anolles et al, Mol. Gen. Genet., 235:157-165
(1992); Brenner and Livak, Proc. Natl. Acad. Sci., 86:8902-8906 (1989);
Green et al, PCR Methods and Applications, 1:77-90 (1991); and Versalovic
et al, Nucleic Acids Research, 19:6823-6831 (1991).
Native DNA consists of two linear polymers, or strands of nucleotides. Each
strand is a chain of nucleosides linked by phosphodiester bonds. The two
strands are held together in an antiparallel orientation by hydrogen bonds
between complementary bases of the nucleotides of the two strands:
deoxyadenosine (A) pairs with thymidine (T) and deoxyguanosine (G) pairs
with deoxycytidine (C).
Presently there are two basic approaches to DNA sequence determination: the
dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad.
Sci., 74:5463-5467 (1977); and the chemical degradation method, e.g. Maxam
et al, Proc. Natl. Acad. Sci., 74:560-564 (1977). The chain termination
method has been improved in several ways, and serves as the basis for all
currently available automated DNA sequencing machines, e.g. Sanger et al,
J. Mol. Biol., 143:161-178 (1980); Schreier et al, J. Mol. Biol.,
129:169-172 (1979); Smith et al, Nucleic Acids Research, 13: 2399-2412
(1985); Smith et al, Nature, 321:674-679 (1987); Prober et al, Science,
238:336-341 (1987); Section II, Meth. Enzymol., 155:51-334 (1987); Church
et al, Science, 240:185-188 (1988); Hunkapiller et al, Science, 254:59-67
(1991); Bevan et al, PCR Methods and Applications, 1: 222-228 (1992).
Both the chain termination and chemical degradation methods require the
generation of one or more sets of labeled DNA fragments, each having a
common origin and each terminating with a known base. The set or sets of
fragments must then be separated by size to obtain sequence information.
In both methods, the DNA fragments are separated by high resolution gel
electrophoresis, which must have the capacity of distinguishing very large
fragments differing in size by no more than a single nucleotide.
Unfortunately, this step severely limits the size of the DNA chain that
can be sequenced at one time. Sequencing using these techniques can
reliably accommodate a DNA chain of up to about 400-450 nucleotides,
Bankier et al, Meth. Enzymol., 155:51-93 (1987); and Hawkins et al,
Electrophoresis, 13:552-559 (1992).
Several significant technical problems have seriously impeded the
application of such techniques to the sequencing of long target
polynucleotides, e.g. in excess of 500-600 nucleotides, or to the
sequencing of high volumes of many target polynucleotides. Such problems
include i) the gel electrophoretic separation step which is labor
intensive, is difficult to automate, and introduces an extra degree of
variability in the analysis of data, e.g. band broadening due to
temperature effects, compressions due to secondary structure in the DNA
sequencing fragments, inhomogeneities in the separation gel, and the like;
ii) nucleic acid polymerases whose properties, such as processivity,
fidelity, rate of polymerization, rate of incorporation of chain
terminators, and the like, are often sequence dependent; iii) detection
and analysis of DNA sequencing fragments which are typically present in
fmol quantities in spacially overlapping bands in a gel; iv) lower signals
because the labelling moiety is distributed over the many hundred
spacially separated bands rather than being concentrated in a single
homogeneous phase, and v) in the case of single-lane fluorescence
detection, the availability of dyes with suitable emission and absorption
properties, quantum yield, and spectral resolvability, e.g. Trainor, Anal.
Biochem., 62:418-426 (1990); Connell et al, Biotechniques, 5:342-348
(1987); Karger et al, Nucleic Acids Research, 19:4955-4962 (1991); Fung et
al, U.S. Pat. No. 4,855,225; and Nishikawa et al, Electrophoresis, 12:
623-631 (1991).
Another problem exists with current technology in the area of diagnostic
sequencing. An ever widening array of disorders, susceptibilities to
disorders, prognoses of disease conditions, and the like, have been
correlated with the presence of particular DNA sequences, or the degree of
variation (or mutation) in DNA sequences, at one or more genetic loci.
Examples of such phenomena include human leukocyte antigen (HLA) typing,
cystic fibrosis, rumor progression and heterogeneity, p53 proto-oncogene
mutations, ras proto-oncogene mutations, and the like, e.g. Gyllensten et
al, PCR Methods and Applications, 1:91-98 (1991); Santamaria et al,
International application PCT/US92/01675; Tsui et al, International
application PCT/CA90/00267; and the like. A difficulty in determining DNA
sequences associated with such conditions to obtain diagnostic or
prognostic information is the frequent presence of multiple subpopulations
of DNA, e.g. allelic variants, multiple mutant forms, and the like.
Distinguishing the presence and identity of multiple sequences with
current sequencing technology is virtually impossible, without additional
work to isolate and perhaps clone the separate species of DNA.
A major advance in sequencing technology could be made if an alternative
approach was available for sequencing DNA that did not required high
resolution separations, generated signals more amenable to analysis, and
provided a means for readily analyzing DNA from heterozygous genetic loci.
SUMMARY OF THE INVENTION
The invention provides a method of nucleic acid sequence analysis based on
repeated cycles of ligation and cleavage of probes at the terminus of a
target polynucleotide. Preferably, at each such cycle a terminal
nucleotide is identified and removed from the end of the target
polynucleotide, such that further cycles of ligation, cleavage, and
identification can take place. That is, in each cycle the target sequence
is shortened by a single nucleotide and the cycles are repeated until the
nucleotide sequence of the target polynucleotide is determined. An
important feature of the invention is providing a target polynucleotide,
that is, the nucleic acid whose sequence is to be determined, with a
protruding strand.
Another important feature of the invention is the probe employed in the
ligation and cleavage events. A probe of the invention is a double
stranded polynucleotide (i) containing a recognition site for a nuclease
and (ii) having a protruding strand capable of forming a duplex with the
protruding strand of the target polynucleotide. Preferably, at each cycle,
only those probes whose protruding strands form perfectly matched duplexes
with the protruding strand of the target polynucleotide are ligated to the
end of the target polynucleotide to form a ligated complex. After removal
of the unligated probe, a nuclease recognizing the probe cuts the ligated
complex at a site one or more nucleotides from the ligation site along the
target polynucleotide leaving a protruding strand capable of participating
in the next cycle of ligation and cleavage. An important feature of the
nuclease is that its recognition site be separate from its cleavage site.
As is described more fully below, in the course of such cycles of ligation
and cleavage, the terminal nucleotides of the target polynucleotide are
identified.
In one aspect of the invention, more than one nucleotide at the terminus of
a target polynucleotide can be identified and/or cleaved during each cycle
of the method.
Generally, the method of the invention comprises the following steps: (a)
ligating a probe to an end of the polynucleotide having a protruding
strand to form a ligated complex, the probe having a complementary
protruding strand to that of the polynucleotide and the probe having a
nuclease recognition site; (b) identifying one or more nucleotides in the
protruding strand of the polynucleotide; (c) cleaving the ligated complex
with a nuclease; and (d) repeating steps (a) through (c) until the
nucleotide sequence of the polynucleotide is determined. As is described
more fully below, the order of steps (a) through (c) may vary with
different embodiments of the invention. For example, identifying the one
or more nucleotides can be carried out either before or after cleavage of
the ligated complex from the target polynucleotide. Likewise, ligating a
probe to the end of the polynucleotide may follow the step of identifying
in some preferred embodiments of the invention. Preferably, the method
further includes a step of removing the unligated probe after the step of
ligating.
Preferably, whenever natural protein endonucleases are employed as the
nuclease, the method further includes a step of methylating the target
polynucleotide at the start of a sequencing operation.
The present invention overcomes many of the deficiencies inherent to
current methods of DNA sequencing: there is no requirement for the
electrophoretic separation of closely-sized DNA fragments; no
difficult-to-automate gel-based separations are required; no polymerases
are required; detection and analysis are greatly simplified because
signal-to-noise ratios are much more favorable on a
nucleotide-by-nucleotide basis, permitting smaller sample sizes to be
employed; and for fluorescent-based detection schemes, analysis is further
simplified because fluorophores labeling different nucleotides may be
separately detected in homogeneous solutions rather than in spacially
overlapping bands.
The present invention is readily automated, both for small-scale serial
operation and for large-scale parallel operation, wherein many target
polynucleotides or many segments of a single target polynucleotide are
sequenced simultaneously. Unlike present sequencing approaches, the
progressive nature of the method--that is, determination of a sequence
nucleotide-by-nucleotide--permits one to monitor the progress of the
sequencing operation in real time which, in ram, permits the operation to
be curtailed, or re-started, if difficulties arise, thereby leading to
significant savings in time and reagent usage. Also unlike current
approaches, the method permits the simultaneous determination of allelic
forms of a target polynucleotide: As described more fully below, if a
population of target polynucleotides consists of several subpopulations of
distinct sequences, e.g. polynucleotides from a heterozygous genetic
locus, then the method can identify the proportion of each nucleotide at
each position in the sequence.
Generally, the method of the invention is applicable to all tasks where DNA
sequencing is employed, including medical diagnostics, genetic mapping,
genetic identification, forensic analysis, molecular biology research, and
the like.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1a illustrates a preferred structure of a labeled probe of the
invention.
FIG. 1b illustrates a probe and terminus of a target polynucleotide wherein
a separate labeling step is employed to identify one or more nucleotides
in the protruding strand of a target polynucleotide.
FIG. 1c illustrates steps of an embodiment wherein a nucleotide of the
target polynucleotide is identified by extension with a polymerase in the
presence of labeled dideoxynucleoside triphosphates followed by their
excision, strand extension, and strand displacement.
FIG. 2 illustrates the relative positions of the nuclease recognition site,
ligation site, and cleavage site in a ligated complex.
FIGS. 3a through 3h diagrammatically illustrate the embodiment referred to
herein as "double stepping," or the simultaneous use of two different
nucleases in accordance with the invention.
DEFINITIONS
As used herein "sequence determination" or "determining a nucleotide
sequence" in reference to polynucleotides includes determination of
partial as well as full sequence information of the polynucleotide. That
is, the term includes sequence comparisons, fingerprinting, and like
levels of information about a target polynucleotide, as well as the
express identification and ordering of each nucleoside of the test
polynucleotide.
"Perfectly matched duplex" in reference to the protruding strands of probes
and target polynucleotides means that the protruding strand from one forms
a double stranded structure with the other such that each nucleotide in
the double stranded structure undergoes Watson-Crick basepairing with a
nucleotide on the opposite strand. The term also comprehends the pairing
of nucleoside analogs, such as deoxyinosine, nucleosides with
2-aminopurine bases, and the like, that may be employed to reduce the
degeneracy of the the probes.
The term "oligonucleotide" as used herein includes linear oligomers of
nucleosides or analogs thereof, including deoxyribonucleosides,
ribonucleosides, and the like. Usually oligonucleotides range in size from
a few monomeric units, e.g. 3-4, to several hundreds of monomeric units.
Whenever an oligonucleotide is represented by a sequence of letters, such
as "ATGCCTG," it will be understood that the nucleotides are in
5'.fwdarw.3' order from left to right and that "A" denotes deoxyadenosine,
"C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes
thymidine, unless otherwise noted.
As used herein, "nucleoside" includes the natural nucleosides, including
2'-deoxy and 2'-hydroxyl forms, e.g. as described in Komberg and Baker,
DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in
reference to nucleosides includes synthetic nucleosides having modified
base moieties and/or modified sugar moieties, e.g. described generally by
Scheit, Nucleotide Analogs (John Wiley, New York, 1980). Such analogs
include synthetic nucleosides designed to enhance binding properties,
reduce degeneracy, increase specificity, and the like.
DETAILED DESCRIPTION OF THE INVENTION
The invention provides a method of sequencing nucleic acids which obviates
electrophoretic separation of similarly sized DNA fragments and which
eliminates the difficulties associated with the detection and analysis of
spacially overlapping bands of DNA fragments in a gel or like medium.
Moreover, the invention obviates the need to generate DNA fragments from
long single stranded templates with a DNA polymerase.
As mentioned above an important feature of the invention are the probes
ligated to the target polynucleotide. In one aspect of the invention,
probes have the form illustrated in FIG. 1a. Probes are double stranded
segments of DNA having a protruding strand at one end 10, at least one
nuclease recognition site 12, and a spacer region 14 between the
recognition site and the protruding end 10. Preferably, probes also
include a label 16, which in this particular embodiment is illustrated at
the end opposite of the protruding strand. The probes may be labeled by a
variety of means and at a variety of locations, the only restriction being
that the labeling means selected does not interfer with the ligation step
or with the recognition of the probe by the nuclease.
Preferably, this embodiment of the invention comprises the following steps:
(a) ligating a probe to an end of the polynucleotide having a protruding
strand to form a ligated complex, the probe having a complementary
protruding strand to that of the polynucleotide and the probe having a
nuclease recognition site; (b) removing unligated probe from the ligated
complex; (c) identifying one or more nucleotides in the protruding strand
of the polynucleotide by the identity of the ligated probe; (d) cleaving
the ligated complex with a nuclease; and (e) repeating steps (a) through
(d) until the nucleotide sequence of the polynucleotide is determined. The
step of identifying can take place either before or after the step of
cleaving. Preferably, the one or more nucleotides in the protruding strand
of the polynucleotide are identified prior to cleavage.
It is not critical whether protruding strand 10 of the probe is a 5' or 3'
end. However, in this embodiment, it is important that the protruding
strands of the target polynucleotide and probes be capable of forming
perfectly matched duplexes to allow for specific ligation. If the
protruding strands of the target polynucleotide and probe are different
lengths the resulting gap can be filled in by a polymerase prior to
ligation, e.g. as in "gap LCR" disclosed in Backman et al, European patent
application 91100959.5. Preferably, the number of nucleotides in the
respective protruding strands are the same so that both strands of the
probe and target polynucleotide are capable of being ligated without a
filling step. Preferably, the protruding strand of the probe is from 2 to
6 nucleotides long. As indicated below, the greater the length of the
protruding strand, the greater the complexity of the probe mixture that is
applied to the target polynucleotide during each ligation and cleavage
cycle.
In another aspect of the invention, the primary function of the probe is to
provide a site for a nuclease to bind to the ligated complex so that the
complex can be cleaved and the target polynucleotide shortened. In this
aspect of the invention, identification of the nucleotides can take place
separately from probe ligation and cleavage. This embodiment provides
several advantages: First, correct sequence determination does not require
that the protruding strand of the ligated probe be perfectly complementary
to the protruding strand of the target polynucleotide, thereby permitting
greater flexibility in the control of hybridization stringency. Second,
one need not provide a fully degenerate set of probes based on the four
natural nucleotides. So-called "wild card" nucleotides, or "degeneracy
reducing analogs" can be provided to significantly reduce, or even
eliminate, the complexity of the probe mixture employed in the ligation
step, since specific binding is not critical to nucleotide identification
in this embodiment.
Preferably; this embodiment of the invention comprises the following steps:
(a) providing a polynucleotide having a 3' recessed strand and a 5'
protruding strand; (b) identifying one or more nucleotides in the
protruding strand, (c) ligating a probe having a 5' protruding strand to
an end of the polynucleotide to form a ligated complex, the probe having a
complementary protruding strand to that of the polynucleotide and the
probe having a nuclease recognition site; (d) cleaving the ligated complex
with a nuclease; and (e) repeating steps (a) through (d) until the
nucleotide sequence of the polynucleotide is determined. A nuclease is
employed that produces a 3'-recessed strand and 5' protruding strand at
the terminus of the target polynucleotide.
An example of this embodiment is illustrated in FIG. 1b: The 3' recessed
strand of polynucleotide (15) is extended with a nucleic acid polymerase
in the presence of the four dideoxynucleoside triphosphates, each carrying
a distinguishable fluorescent label, so that the 3' recessed strand is
extended by one nucleotide (11), which permits its complementary
nucleotide in the 5' protruding strand of polynucleotide (15) to be
identified. Probe (9) having recognition site (12), spacer region (14),
and complementary protruding strand (10), is then ligated to
polynucleotide (15) to form ligated complex (17). Ligated complex (17) is
then cleaved at cleavage site (19) to release a labeled fragment (21) and
augmented probe (23). A shortened polynucleotide (15) with a regenerated
3' recessed strand is then ready for the next cycle of identification,
ligation, and cleavage.
In such embodiments, the first nucleotide of the 5' protruding strand
adjacent to the double stranded portion of the target polynucleoti | | |