|
Claims  |
|
|
I claim:
1. A sequencing method for identifying a first nucleotide n and a second
nucleotide n+x in a double stranded nucleic acid segment, comprising:
a) digesting said double stranded nucleic acid segment with a restriction
enzyme whose cleavage site is separate from its recognition site to
produce a double stranded molecule having a single stranded overhang
sequence corresponding to an enzyme cut site;
b) providing an adaptor having a cycle identification tag, a restriction
enzyme recognition domain, a sequence identification region, and a
detectable label;
c) hybridizing said adaptor to said double stranded nucleic acid having
said single-stranded overhang sequence to form a ligated molecule;
d) identifying said nucleotide n by identifying said ligated molecule;
e) amplifying said ligated molecule from step (d) with a primer specific
for said cycle identification tag of said adaptor; and
f) repeating steps (a) through (d) on said amplified molecule from step (e)
to yield the identity of said nucleotide n+x,
wherein x is less than or equal to the number of nucleotides between a
recognition domain for a restriction enzyme and an enzyme cut site.
2. The method of claim 1, wherein said enzyme cut site is the cut site
located the farthest away from said recognition domain.
3. The method of claim 1, wherein said restriction enzyme of step (a) is a
class-IIS restriction endonuclease.
4. The method of claim 3, wherein said class-IIS restriction endonuclease
is selected from the group consisting of AccBSI, AceIII, AciI, AclWI,
AlwI, Alw26I, AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, Asp50HI,
AsuHPI, BaeI, BbsI, BbvI, BbvII, Bbv16II, Bce83I, BcefI, BcgI, Bco5I,
Bco116I BcoKI, BinI, Bli736I, BpiI, BpmI, Bpu10I, BpuAI, Bsal, BsaMI,
Bsc9II, BscAI, BscCI, BseII, Bse3DI, BseNI, BseRI, BseZI, BsgI, BsiI,
BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, BspIS4I, BspKT5I,
BspLU11III, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, BsrDI, BsrSI,
BssSI, Bst11I, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTS5I, Bsu6I,
CjeI, CjePI, Eam1104I, EarI, Eco31I, Eco57I, EcoA4I, EcoO44I, Esp3I, FauI,
FokI, GdiII, GsuI, HgaI, HphI, Ksp632I, MboII, MlyI, MmeI, Mn1I, Mva1269I,
PhaI, PieI, RleAI, SapI, SfaNI, SimI, StsI, TaqII, TspII, TspRI, Tth111II,
and VpaK32I.
5. The method of claim 1, wherein a nucleic acid ligase is used to attach
at least one strand of said restriction enzyme recognition domain of step
(b) to said nucleic acid segment.
6. The method of claim 1, wherein said method further comprises blocking an
enzyme recognition domain lying outside said enzyme recognition domain of
step (b).
7. The method of claim 6, wherein said blocking occurs through an in vitro
primer extension.
8. The method of claim 7, wherein said in vitro primer extension is DNA
amplification in vitro.
9. The method of claim 8, wherein said DNA amplification in vitro occurs
during said amplification in step (e).
10. The method of claim 7, wherein said in vitro primer extension occurs
following said amplification in step (e).
11. The method of claim 7, wherein said method further comprises
hemi-methylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (b).
12. The method of claim 11, wherein said hemi-methylation occurs through an
in vitro primer extension using a primer having a portion of said enzyme
recognition domain that blocks enzyme recognition if it is
hemi-methylated.
13. The method of claim 12, wherein said primer extension occurs with a
methylated nucleotide.
14. The method of claim 7, wherein said restriction endonuclease recognizes
a hemi-methylated recognition domain, and the primer contains at least one
methylated nucleotide in a methylated portion of said recognition domain.
15. The method of claim 1, wherein said nucleic acid segment is a genomic
DNA.
16. The method of claim 1, wherein said nucleic acid segment is a cDNA.
17. The method of claim 1, wherein said nucleic acid segment is a product
of an in vitro DNA amplification.
18. The method of claim 1, wherein said nucleic acid segment is a PCR
product.
19. The method of claim 1, wherein said nucleic acid segment is a product
of a strand displacement amplification.
20. The method of claim 1, wherein said nucleic acid segment is a vector
insert.
21. The method of claim 1, wherein said detectable label is selected from
one or more of the group consisting of fluorescent, near infra-red,
radionucleotide and chemiluminescent labels.
22. The method of claim 1, wherein said nucleic acid segment is attached to
a solid matrix.
23. The method of claim 22, wherein said solid matrix is a magnetic
streptavidin.
24. The method of claim 22, wherein said solid matrix is a magnetic glass
particle.
25. The method of claim 1, wherein said adaptor of step (b) is attached to
a solid matrix.
26. The method of claim 25, wherein said solid matrix is a magnetic
streptavidin.
27. The method of claim 25, wherein said solid matrix is a magnetic glass
particle.
28. A method for sequencing an interval within a double stranded nucleic
acid segment by identifying a first nucleotide n and a second nucleotide
n+x in a plurality of staggered double stranded molecules produced from
said double stranded nucleic acid segment, comprising:
a) attaching an enzyme recognition domain to different positions along said
double stranded nucleic acid segment within an interval no greater than
the distance between a recognition domain for a restriction enzyme and an
enzyme cut site, such attachment occurring at one end of said double
stranded nucleic acid segment;
b) digesting said double stranded nucleic acid segment with a restriction
enzyme whose cleavage site is separate from its recognition site to
produce a plurality of staggered double stranded molecules each having a
single stranded overhang sequence corresponding to said cut site;
c) providing an adaptor having a restriction enzyme recognition domain, a
sequence identification region, and a detectable label;
d) hybridizing said adaptor to said double stranded nucleic acid having
said single-stranded overhang sequence to form a ligated molecule;
e) identifying a nucleotide n within a staggered double stranded molecule
by identifying said ligated molecule;
f) repeating steps (b) through (e) to yield the identity of said nucleotide
n+x in each of said staggered double stranded molecules having said single
strand overhang sequence thereby sequencing an interval within said double
stranded nucleic acid segment,
wherein x is greater than one and no greater than the number of nucleotides
between a recognition domain for a restriction enzyme and an enzyme cut
site.
29. The method of claim 28, wherein said enzyme cut site is the cut site
located the farthest away from said recognition domain.
30. The method of claim 28, wherein said restriction enzyme of step (b) is
a class-IIS restriction endonuclease.
31. The method of claim 30, wherein said class-IIS restriction endonuclease
is selected from the group consisting of AccBSI, AceIII, AciI, AclWI,
AlwI, Alw26I, AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, Asp50HI,
AsuHPI, BaeI, BbsI, BbvI, BbvII, Bbv16II, Bce83I, BcefI, BcgI, Bco5I,
Bco116I BcoKI, BinI, Bli736I, BpiI, BpmI, Bpu10I, BpuAI, Bsal, BsaMI,
Bsc9II, BscAI, BscCI, BseII, Bse3DI, BseNI, BseRI, BseZI, BsgI, BsiI,
BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, BspIS4I, BspKT5I,
BspLU11III, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, BsrDI, BsrSI,
BssSI, Bst11I, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTS5I, Bsu6I,
CjeI, CjePI, Eam1104I, EarI, Eco31I, Eco57I, EcoA4I, EcoO44I, Esp3I, FauI,
FokI, GdiII, GsuI, HgaI, HphI, Ksp632I, MboII, MlyI, MmeI, Mn1I, Mva1269I,
PhaI, PieI, R1eAI, SapI, SfaNI, SimI, StsI, TaqII, TspII, TspRI, Tth111II,
and VpaK32I.
32. The method of claim 28, wherein a nucleic acid ligase is used to attach
at least one strand of said restriction enzyme recognition domain of step
(c) to said nucleic acid segment.
33. The method of claim 28, wherein said method further comprises blocking
an enzyme recognition domain lying outside said enzyme recognition domain
of step (c).
34. The method of claim 33, wherein said method further comprises
methylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (c).
35. The method of claim 34, wherein said methylation occurs through in
vitro reaction with a methylase that recognizes the enzyme recognition
domain of step (c).
36. The method of claim 35, wherein said methylase is a FokI methylase.
37. The method of claim 33, wherein said blocking occurs through an in
vitro primer extension.
38. The method of claim 37, wherein said in vitro primer extension is DNA
amplification in vitro.
39. The method of claim 37, wherein said method further comprises
hemi-mythylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (c).
40. The method of claim 39, wherein said hemi-methylation occurs through an
in vitro primer extension using a primer having a portion of said enzyme
recognition domain that blocks enzyme recognition if it is
hemi-methylated.
41. The method of claim 40, wherein said primer extension occurs with a
methylated nucleotide.
42. The method of claim 37, wherein said restriction endonuclease
recognizes a hemi-methylated recognition domain, and the primer contains
at least one methylated nucleotide in a methylated portion of said
recognition domain.
43. The method of claim 28, wherein said nucleic acid segment is a genomic
DNA.
44. The method of claim 28, wherein said nucleic acid segment is a cDNA.
45. The method of claim 28, wherein said nucleic acid segment is a product
of an in vitro DNA amplification.
46. The method of claim 28, wherein said nucleic acid segment is a PCR
product.
47. The method of claim 28, wherein said nucleic acid segment is a product
of a strand displacement amplification.
48. The method of claim 28, wherein said nucleic acid segment is a vector
insert.
49. The method of claim 28, wherein said detectable label is selected from
one or more of the group consisting of fluorescent, near infra-red,
radionucleotide and chemiluminescent labels.
50. The method of claim 28, wherein said nucleic acid segment is attached
to a solid matrix.
51. The method of claim 50, wherein said solid matrix is a magnetic
streptavidin.
52. The method of claim 50, wherein said solid matrix is a magnetic glass
particle.
53. The method of claim 28, wherein said adaptor of step (c) is attached to
a solid matrix.
54. The method of claim 53, wherein said solid matrix is a magnetic
streptavidin.
55. The method of claim 53, wherein said solid matrix is a magnetic glass
particle.
56. A sequencing method for identifying a first nucleotide n and a second
nucleotide n+x in a double stranded nucleic acid segment, comprising:
a) digesting said double stranded nucleic acid segment with a restriction
enzyme whose cleavage site is separate from its recognition site to
produce a double stranded molecule having a 5' single stranded overhang
sequence corresponding to an enzyme cut site;
b) identifying said nucleotide n by template-directed polymerization with a
labeled nucleotide or nucleotide terminator;
c) providing an adaptor having a cycle identification tag and a restriction
enzyme recognition domain;
d) ligating said adaptor to said double stranded nucleic acid to form a
ligated molecule;
e) amplifying said ligated molecule from step (d) with a primer specific
for said cycle identification tag of said adaptor; and
f) repeating steps (a) through (b) on said amplified molecule from step (e)
to yield the identity of said nucleotide n+x,
wherein x is less than or equal to the number of nucleotides between a
recognition domain for a restriction enzyme and an enzyme cut site.
57. The method of claim 56, wherein said enzyme cut site is the cut site
located the farthest away from said recognition domain.
58. The method of claim 56, wherein said restriction enzyme of step (a) is
a class-IIS restriction endonuclease.
59. The method of claim 58, wherein said class-IIS restriction endonuclease
is selected from the group consisting of AccBSI, AceIII, AciI, AclWI,
AlwI, Alw26I, AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, Asp50HI,
AsuHPI, BaeI, BbsI, BbvI, BbvII, Bbv16II, Bce83I, BcefI, BcgI, Bco5I, Bcol
116I BcoKI, BinI, Bli736I, BpiI, BpmI, Bpu10I, BpuAI, Bsal, BsaMI, Bsc9II,
BscAI, BscCI, BseII, Bse3DI, BseNI, BseRI, BseZI, BsgI, BsiI, BsmI, BsmAI,
BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, BspIS4I, BspKT5I, BspLU11III,
BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, BsrDI, BsrSI, BssSI,
Bst11I, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTS5I, Bsu6I, CjeI,
CjePI, Eam1104I, EarI, Eco31I, Eco57I, EcoA4I, EcoO44I, Esp3I, FauI, FokI,
GdiII, GsuI, HgaI, HphI, Ksp632I, MboII, MlyI, MmeI, MnlI, Mva1269I, PhaI,
PieI, RleAI, SapI, SfaNI, SimI, StsI, TaqII, TspII, TspRI, Tth111II, and
VpaK32I.
60. The method of claim 56, wherein a nucleic acid ligase is used to attach
at least one strand of said restriction enzyme recognition domain of step
(c) to said nucleic acid segment.
61. The method of claim 56, wherein said method further comprises blocking
an enzyme recognition domain lying outside said enzyme recognition domain
of step (c).
62. The method of claim 61, wherein said blocking occurs through an in
vitro primer extension.
63. The method of claim 62, wherein said in vitro primer extension is DNA
amplification in vitro.
64. The method of claim 63, wherein said DNA amplification in vitro occurs
during said amplification in step (e).
65. The method of claim 62, wherein said in vitro primer extension occurs
following said amplification in step (e).
66. The method of claim 62, wherein said method further comprises
hemi-methylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (c).
67. The method of claim 66, wherein said hemi-methylation occurs through an
in vitro primer extension using a primer having a portion of said enzyme
recognition domain that blocks enzyme recognition if it is
hemi-methylated.
68. The method of claim 67, wherein said primer extension occurs with a
methylated nucleotide.
69. The method of claim 62 wherein said restriction endonuclease recognizes
a hemi-methylated recognition domain, and the primer contains at least one
methylated nucleotide in a methylated portion of said recognition domain.
70. The method of claim 56, wherein said nucleic acid segment is a genomic
DNA.
71. The method of claim 56, wherein said nucleic acid segment is a cDNA.
72. The method of claim 56, wherein said nucleic acid segment is a product
of an in vitro DNA amplification.
73. The method of claim 56, wherein said nucleic acid segment is a PCR
product.
74. The method of claim 56, wherein said nucleic acid segment is a product
of a strand displacement amplification.
75. The method of claim 56, wherein said nucleic acid segment is a vector
insert.
76. The method of claim 56, wherein said label is selected from one or more
of the group consisting of fluorescent, near infra-red, radionucleotide
and chemiluminescent labels.
77. The method of claim 56, wherein said nucleic acid segment is attached
to a solid matrix.
78. The method of claim 77, wherein said solid matrix is a magnetic
streptavidin.
79. The method of claim 77, wherein said solid matrix is a magnetic glass
particle.
80. The method of claim 56, wherein said adaptor of step (c) is attached to
a solid matrix.
81. The method of claim 80, wherein said solid matrix is a magnetic
streptavidin.
82. The method of claim 80, wherein said solid matrix is a magnetic glass
particle.
83. The method of claim 56, wherein said step (a) is modified to generate a
blunt end in said nucleic acid segment.
84. The method of claim 83, wherein said step (b) is modified to identify a
nucleotide in said blunt end of said nucleic acid segment by using a 3'
exonuclease activity of a DNA polymerase to generate a single nucleotide
long single-stranded nucleic acid template.
85. The method of claim 84, said method further comprising sequencing said
nucleotide by a template-directed polymerization with a labeled nucleotide
or nucleotide terminator.
86. The method of claim 85, wherein said template-directed polymerization
is followed by identification of an incorporated label.
87. A method for sequencing an interval within a double stranded nucleic
acid segment by identifying a first nucleotide n and a second nucleotide
n+x in a plurality of staggered double stranded molecules produced from
said double stranded nucleic acid segment, comprising:
a) attaching an enzyme recognition domain to different positions along said
double stranded nucleic acid segment within an interval no greater than
the distance between a recognition domain for a restriction enzyme and an
enzyme cut site, such attachment occurring at one end of said double
stranded nucleic acid segment;
b) digesting said double stranded nucleic acid segment with a restriction
enzyme whose cleavage site is different from its recognition site to
produce a plurality of staggered double stranded molecules each having a
5' single stranded overhang sequence corresponding to said cut site;
c) identifying a nucleotide n within a staggered double stranded molecule
by template-directed polymerization with a labeled nucleotide or
nucleotide terminator;
d) providing an adaptor having a restriction enzyme recognition domain;
e) ligating said adaptor to said double stranded nucleic acid to form a
ligated molecule;
f) repeating steps (b) through (c) to yield the identity of said nucleotide
n+x in each of said staggered double stranded molecules having said single
strand overhang sequence thereby sequencing an interval within said double
stranded nucleic acid segment,
wherein x is greater than one and no greater than the number of nucleotides
between a recognition domain for a restriction enzyme and an enzyme cut
site.
88. The method of claim 87, wherein said enzyme cut site is the cut site
located the farthest away from said recognition domain.
89. The method of claim 87, wherein said restriction enzyme of step (b) is
a class-IIS restriction endonuclease.
90. The method of claim 89, wherein said class-IIS restriction endonuclease
is selected from the group consisting of AccBSI, AceIII, AciI, AclWI,
AlwI, Alw26I, AlwXI, Asp26HI, Asp27HI, Asp35HI, Asp36HI, Asp40HI, Asp50HI,
AsuHPI, BaeI, BbsI, BbvI, BbvII, Bbv16II, Bce83I, BcefI, BcgI, Bco5I,
Bco116I BcoKI, BinI, Bli736I, BpiI, BpmI, Bpu10I, BpuAI, Bsal, BsaMI,
Bsc9II, BscAI, BscCI, BseII, Bse3DI, BseNI, BseRI, BseZI, BsgI, BsiI,
BsmI, BsmAI, BsmBI, BsmFI, Bsp24I, Bsp423I, BspBS3II, BspIS4I, BspKT5I,
BspLU11III, BspMI, BspPI, BspST5I, BspTS514I, BsrI, BsrBI, BsrDI, BsrSI,
BssSI, Bst11I, Bst71I, Bst2BI, BstBS32I, BstD102I, BstF5I, BstTS5I, Bsu6I,
CjeI, CjePI, Eam1104I, Earl, Eco31I, Eco57I, EcoA4I, EcoO44I, Esp3I, Faul,
Fokl, GdiII, GsuI, HgaI, HphI, Ksp632I, MboII, MlyI, MmeI, Mn1I, Mva1269I,
PhaI, PieI, RleAI, SapI, SfaNI, SimI, StsI, TaqII, TspII, TspRI, Tth111II,
and VpaK32I.
91. The method of claim 87, wherein a nucleic acid ligase is used to attach
at least one strand of said restriction enzyme recognition domain of step
(d) to said nucleic acid segment.
92. The method of claim 87, wherein said method further comprises blocking
an enzyme recognition domain lying outside said enzyme recognition domain
of step (d).
93. The method of claim 92, wherein said method further comprises
methylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (d).
94. The method of claim 93, wherein said methylation occurs through in
vitro reaction with a methylase that recognizes the enzyme recognition
domain of step (d).
95. The method of claim 94, wherein said methylase is a FokI methylase.
96. The method of claim 92, wherein said blocking occurs through an in
vitro primer extension.
97. The method of claim 96, wherein said in vitro primer extension is DNA
amplification in vitro.
98. The method of claim 96, wherein said method further comprises
hemi-mythylating an enzyme recognition domain lying outside said enzyme
recognition domain of step (d).
99. The method of claim 98, wherein said hemi-methylation occurs through an
in vitro primer extension using a primer having a portion of said enzyme
recognition domain that blocks enzyme recognition if it is
hemi-methylated.
100. The method of claim 99, wherein said primer extension occurs with a
methylated nucleotide.
101. The method of claim 96, wherein said restriction endonuclease
recognizes a hemi-methylated recognition domain, and the primer contains
at least one methylated nucleotide in a methylated portion of said
recognition domain.
102. The method of claim 87, wherein said nucleic acid segment is a genomic
DNA.
103. The method of claim 87, wherein said nucleic acid segment is a cDNA.
104. The method of claim 87, wherein said nucleic acid segment is a product
of an in vitro DNA amplification.
105. The method of claim 87, wherein said nucleic acid segment is a PCR
product.
106. The method of claim 87, wherein said nucleic acid segment is a product
of a strand displacement amplification.
107. The method of claim 87, wherein said nucleic acid segment is a vector
insert.
108. The method of claim 87, wherein said detectable label is selected from
one or more of the group consisting of fluorescent, near infra-red,
radionucleotide and chemiluminescent labels.
109. The method of claim 87, wherein said nucleic acid segment is attached
to a solid matrix.
110. The method of claim 109, wherein said solid matrix is a magnetic
streptavidin.
111. The method of claim 109, wherein said solid matrix is a magnetic glass
particle.
112. The method of claim 87, wherein said adaptor of step (d) is attached
to a solid matrix.
113. The method of claim 112, wherein said solid matrix is a magnetic
streptavidin.
114. The method of claim 112, wherein said solid matrix is a magnetic glass
particle.
115. The method of claim 87, wherein said step (b) is modified to generate
a blunt end in said nucleic acid segment.
116. The method of claim 115, wherein said step (c) is modified to identify
a nucleotide in said blunt end of said nucleic acid segment by using a 3'
exonuclease activity of a DNA polymerase to generate a single nucleotide
long single-stranded nucleic acid template.
117. The method of claim 116, said method further comprising sequencing
said nucleotide by a template-directed polymerization with a labeled
nucleotide or nucleotide terminator.
118. The method of claim 117, wherein said template-directed polymerization
is followed by identification of an incorporated label. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
BACKGROUND OF THE INVENTION
Analysis of DNA with currently available techniques provides a spectrum of
information ranging from the confirmation that a test DNA is the same or
different than a standard sequence or an isolated fragment, to the express
identification and ordering of each nucleotide of the test DNA. Not only
are such techniques crucial for understanding the function and control of
genes and for applying many of the basic techniques of molecular biology,
but they have also become increasingly important as tools in genomic
analysis and a great many non-research applications, such as genetic
identification, forensic analysis, genetic counseling, medical diagnostics
and many others. In these latter applications, both techniques providing
partial sequence information, such as fingerprinting and sequence
comparisons, and techniques providing full sequence determination have
been employed (Gibbs et al., Proc. Natl. Acad. Sci USA 1989; 86:1919-1923;
Gyllensten et al., Proc. Natl. Acad. Sci USA 1988; 85:7652-7656; Carrano
et al., Genomics 1989; 4:129-136; Caetano-Annoles et al., Mol. Gen. Genet.
1992; 235:157-165; Brenner and Livak, Proc. Natl. Acad. Sci USA 1989;
86:8902-8906; Green et al., PCR Methods and Applications 1991; 1:77-90;
and Versalovic et al., Nucleic Acid Res. 1991; 19:6823-6831).
DNA sequencing methods currently available require the generation of a set
of DNA fragments that are ordered by length according to nucleotide
composition. The generation of this set of ordered fragments occurs in one
of two ways: chemical degradation at specific nucleotides using the Maxam
Gilbert method (Maxam A M and W Gilbert, Proc Natl Acad Sci USA 1977;
74:560-564) or dideoxy nucleotide incorporation using the Sanger method
(Sanger F, S Nicklen, and A R Coulson, Proc Natl Acad Sci USA 1977;
74:5463-5467) so that the type and number of required steps inherently
limits both the number of DNA segments that can be sequenced in parallel,
and the number of operations which may be carried out in sequence.
Furthermore, both methods are prone to error due to the anomalous
migration of DNA fragments in denaturing gels. Time and space limitations
inherent in these gel-based methods have fueled the search for alternative
methods.
Several methods are under development that are designed to sequence DNA in
a solid state format without a gel resolution step. The method that has
generated the most interest is sequencing by hybridization. In sequencing
by hybridization, the DNA sequence is read by determining the overlaps
between the sequences of hybridized oligonucleotides. This strategy is
possible because a long sequence can be deduced by matching up distinctive
overlaps between its constituent oligomers (Strezoska Z, T Paunesku, D
Radosavljevic, I Labat, R Drmanac, R Crkvenjakov, Proc Natl Acad Sci USA
1991; 88:10089-10093; Drmanac R, S Drmanac, Z Strezoska, T Paunesku, I
Labat, M Zeremski, J Snoddy, W K Funkhouser, B Koop, L Hood, R
Crkvenjakov, Science 1993; 260:1649-1652). This method uses hybridization
conditions for oligonucleotide probes that distinguish between complete
complementarity with the target sequence and a single nucleotide mismatch,
and does not require resolution of fragments on polyacrylamide gels
(Jacobs K A, R Rudersdorf, S D Neill, J P Dougherty, E L Brown, and E F
Fritsch, Nucleic Acids Res. 1988; 16:4637-4650). Recent versions of
sequencing by hybridization add a DNA ligation step in order to increase
the ability of this method to discriminate between mismatches, and to
decrease the length of the oligonucleotides necessary to sequence a given
length of DNA (Broude N E, T Sano, C L Smith, C R Cantor, Proc. Natl.
Acad. Sci. USA 1994;91:3072-3076, Drmanac R T, International Business
Communications, Southborough, Mass.). Significant obstacles with this
method are its inability to accurately position repetitive sequences in
DNA fragments, inhibition of probe annealing by the formation of internal
duplexes in the DNA fragments, and the influence of nearest neighbor
nucleotides within and adjacent to an annealing domain on the melting
temperature for hybridization (Riccelli P V, A S Benight, Nucleic Acids
Res 1993;21:3785-3788, Williams J C, S C Case-Green, K U Mir, E M
Southern. Nucleic Acids Res 1994;22:1365-1367). Furthermore, sequencing by
hybridization cannot determine the length of tandem short repeats, which
are associated with several human genetic diseases (Warren S T, Science
1996; 271:1374-1375). These limitations have prevented its use as a
primary sequencing method.
The base addition DNA sequencing scheme uses fluorescently labeled
reversible terminators of polymerase extension, with a distinct and
removable fluorescent label for each of the four nucleotide analogs
(Metzker M L, Raghavachari R, Richards S, Jacutin S E, Civitello A,
Burgess K and R A Gibbs, Nucleic Acids Res. 1994; 22:4259-4267; Canard B
and R S Sarfati, Gene 1994; 148:1-6). Incorporation of one of these base
analogs into the growing primer strand allows identification of the
incorporated nucleotide by its fluorescent label. This is followed by
removal of the protecting fluorescent group, creating a new substrate for
template-directed polymerase extension. Iteration of these steps is
designed to permit sequencing of a multitude of templates in a solid state
format. Technical obstacles include a relatively low efficiency of
extension and deprotection, and interference with primer extension caused
by single-strand DNA secondary structure. A fundamental limitation to this
approach is inherent in iterative methods that sequence consecutive
nucleotides. That is, in order to sequence more than a handful
nucleotides, each cycle of analog incorporation and deprotection must
approach 100% efficiency. Even if the base addition sequencing scheme is
refined so that each cycle occurs at 95% efficiency, one will have <75% of
the product of interest after only 6 cycles (0.95.sup.6 =0.735). This will
severely limit the ability of this method to sequence anything but very
short DNA sequences. Only one cycle of template-directed analog
incorporation and deprotection appears to have been demonstrated so far
(Metzker M L, Raghavachari R, Richards S, Jacutin S E, Civitello A,
Burgess K and R A Gibbs, Nucleic Acids Res. 1994; 22:4259-4267; Canard B
and R S Sarfati, Gene 1994; 148:1-6). A related earlier method, which is
designed to sequence only one nucleotide per template, uses radiolabeled
nucleotides or conventional non-reversible terminators attached to a
variety of labels (Sokolov B P, Nucleic Acids Research 1989;18:3671;
Kuppuswamy M N, J W Hoffman, C K Kasper, S G Spitzer, S L Groce, and S P
Bajaj, Proc. Natl. Acad Sci. USA 1991; 88:1143-1147). Recently, this
method has been called solid-phase minisequencing (Syvanen A C, E Ikonen,
T Manninen, M Bengstrom, H Soderlund, P Aula, and L Peltonen, Genomics
1992; 12:590-595; Kobayashi M, Rappaport E, Blasband A, Semeraro A,
Sartore M, Surrey S, Fortina P., Molecular and Cellular Probes 1995;
9:175-182) or genetic bit analysis (Nikiforov T T, R B Rendle, P Goelet, Y
H Rogers, M L Kotewicz, S Anderson, G L Trainor, and M R Knapp, Nucleic
Acids Research 1994; 22:4167-4175), and it has been used to verify the
parentage of thoroughbred horses (Nikiforov T T, R B Rendle, P Goelet, Y H
Rogers, M L Kotewicz, S Anderson, G L Trainor, and M R Knapp, Nucleic
Acids Research 1994; 22:4167-4175).
An alternative method for DNA sequencing that remains in the development
phase entails the use of flow cytometry to detect single molecules. In
this method, one strand of a DNA molecule is synthesized using
fluorescently labeled nucleotides, and the labeled DNA molecule is then
digested by a processive exonuclease, with identification of the released
nucleotides over real time using flow cytometry. Technical obstacles to
the implementation of this method include the fidelity of incorporation of
the fluorescently labeled nucleotides and turbulence created around the
microbead to which the single molecule of DNA is attached (Davis L M, F R
Fairfield, C A Harger, J H Jett, R A Keller, J H Hahn, L A Krakowski; B L
Marrone, J C Martin, H L Nutter, R L Ratliff, E B Shera, D J Simpson, S A
Soper, Genetic Analysis, Techniques, and Applications 1991; 8:1-7).
Furthermore, this method is not amenable to sequencing numerous DNA
segments in parallel.
Another DNA sequencing method has recently been developed that uses
class-IIS restriction endonuclease digestion and adaptor ligation to
sequence at least some nucleotides offset from a terminal nucleotide.
Using this method, four adjacent nucleotides have reportedly been
sequenced and read following the gel resolution of DNA fragments. However,
a limitation of this sequencing method is that it has built-in product
losses, and requires many iterative cycles (International Application
PCT/US95/03678).
Another problem exists with currently available technologies in the area of
diagnostic sequencing. An ever widening array of disorders,
susceptibilities to disorders, prognoses of disease conditions, and the
like, have been correlated with the presence of particular DNA sequences,
or the degree of variation (or mutation) in DNA sequences, at one or more
genetic loci. Examples of such phenomena include human leukocyte antigen
(HLA) typing, cystic fibrosis, tumor progression and heterogeneity, p53
proto-oncogene mutations, and ras proto-oncogene mutations (Gullensten et
al., PCR Methods and Applications, 1:91-98 (1991); International
application PCT/US92/01675; and International application PCT/CA90/00267).
A difficulty in determining DNA sequences associated with such conditions
to obtain diagnostic or prognostic information is the frequent presence of
multiple subpopulations of DNA, e.g., allelic variants, multiple mutant
forms, and the like. Distinguishing the presence and identity of multiple
sequences with current sequencing technology is impractical due to the
amount of DNA sequencing required.
SUMMARY OF THE INVENTION
The present invention provides an alternative approach for sequencing DNA
that does not require high resolution separations and that generates
signals more amenable to analysis. The methods of the present invention
can also be easily automated. This provides a means for readily analyzing
DNA from many genetic loci. Furthermore, the DNA sequencing method of the
present invention does not require the gel resolution of DNA fragments
which allows for the simultaneous sequencing of cDNA or genomic DNA
library inserts. Therefore, the full length transcribed sequences or
genomes can be obtained very rapidly with the methods of the present
invention. The method of the present invention further provides a means
for the rapid sequencing of previously uncharacterized viral, bacterial or
protozoan human pathogens, as well as the sequencing of plants and animals
of interest to agriculture, conservation, and/or science.
The present invention pertains to methods which can sequence multiple DNA
segments in parallel, without running a gel. Each DNA sequence is
determined without ambiguity, as this novel method sequences DNA in
discrete intervals that start at one end of each DNA segment. The method
of the present invention is carried out on DNA that is almost entirely
double-stranded, thus preventing the formation of secondary structures
that complicate the known sequencing methods that rely on hybridization to
single-stranded templates (e.g., sequencing by hybridization), and
overcoming obstacles posed by microsatellite repeats, other direct
repeats, and inverted repeats, in a given DNA segment. The iterative and
regenerative DNA sequencing method described herein also overcomes the
obstacles to sequencing several thousand distinct DNA segments attached to
addressable sites on a matrix or a chip, because it is carried out in
iterative steps and in various embodiments effectively preserves the
sample through a multitude of sequencing steps, or creates a nested set of
DNA segments to which a few steps are applied in common. It is, therefore,
highly suitable for automation. Furthermore, the present invention
particularly addresses the problem of increasing throughput in DNA
sequencing, both in number of steps and parallelism of analyses, and it
will facilitate the identification of disease-associated gene
polymorphisms, with particular value for sequencing entire genomes and for
characterizing the multiple gene mutations underlying polygenic traits.
Thus, the invention pertains to novel methods for generating staggered
templates and for iterative and regenerative DNA sequencing as well as to
methods for automated DNA sequencing.
Accordingly, the invention features a method for identifying a first
nucleotide n and a second nucleotide n+x in a double stranded nucleic acid
segment. The method includes (a) digesting the double stranded nucleic
acid segment with a restriction enzyme to produce a double stranded
molecule having a single stranded overhang sequence corresponding to an
enzyme cut site; (b) providing an adaptor having a cycle identification
tag, a restriction enzyme recognition domain, a sequence identification
region, and a detectable label; (c) hybridizing the adaptor to the double
stranded nucleic acid having the single-stranded overhang sequence to form
a ligated molecule; (d) identifying the nucleotide n by identifying the
ligated molecule; (e) amplifying the ligated molecule from step (d) with a
primer specific for the cycle identification tag of the adaptor; and (f)
repeating steps (a) through (d) on the amplified molecule from step (e) to
yield the identity of the nucleotide n+x, wherein x is less than or equal
to the number of nucleotides between a recognition domain for a
restriction enzyme and an enzyme cut site.
In another aspect, the invention features a method for sequencing an
interval within a double stranded nucleic acid segment by identifying a
first nucleotide n and a second nucleotide n+x in a plurality of staggered
double stranded molecules produced from the double stranded nucleic acid
segment. The method includes (a) attaching an enzyme recognition domain to
different positions along the double stranded nucleic acid segment within
an interval no greater than the distance between a recognition domain for
a restriction enzyme and an enzyme cut site, such attachment occurring at
one end of the double stranded nucleic acid segment; (b) digesting the
double stranded nucleic acid segment with a restriction enzyme to produce
a plurality of staggered double stranded molecules each having a single
stranded overhang sequence corresponding to the cut site; (c) providing an
adaptor having a restriction enzyme recognition domain, a sequence
identification region, and a detectable label; (d) hybridizing the adaptor
to the double standard nucleic acid having the single-stranded overhang
sequence to form a ligated molecule; (e) identifying a n | | |