WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
DNA sequencing by mass spectrometry via exonuclease degradation    

Get related patents on CD
United States Patent6140053   
Link to this pagehttp://www.wikipatents.com/6140053.html
Inventor(s)Koster; Hubert (Concord, MA)
AbstractThe invention provides fast and highly accurate mass spectrometer based processes for directly sequencing a target nucleic acid (or fragments generated from the target nucleic acid), which by means of protection, specificity of enzymatic activity, or immobilization, are unilaterally degraded in a stepwise manner via exonuclease digestion and the nucleotides, derivatives or truncated sequences detected by mass spectrometry.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Inventor     Koster; Hubert (Concord, MA)
Owner/Assignee     Sequenom, Inc. (San Diego, CA)
Patent assignment
All assignments
Company News
Publication Date     October 31, 2000
Application Number     09/160,671
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     September 25, 1998
US Classification    
Int'l Classification    
Examiner     Marschel; Ardin H.
Assistant Examiner    
Attorney/Law Firm     Seidman; Stephanie L. Heller Ehrman White & McAuliffe
Address
Parent Case     RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 08/744,590, filed Nov. 6, 1996, now allowed, which is a continuation-in-part of U.S. application Ser. No. 08/388,171, filed Feb. 10, 1995, now U.S. Pat. No. 5,622,824, which is a file wrapper continuation (FWC) of U.S. application Ser. No. 08/034,738, filed Mar. 19, 1993, now abandoned. This application is a continuation-in-part of U.S. application Ser. No. 08/388,171, filed Feb. 10, 1995, now U.S. Pat. No. 5,622,824, and also U.S. application Ser. No. 08/034,738. The subject matter of each of U.S. application Ser. No. 08/388,171, U.S. application Ser. No. 08/744,590, and U.S. application Ser. No. 08/034,738 is incorporated by reference.
Priority Data    
USPTO Field of Search    
Patent Tags     dna sequencing mass spectrometry via exonuclease degradation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A method of determining a sequence of a nucleic acid, comprising the steps of:

(i) obtaining the nucleic acid to be sequenced;

(ii) cleaving the nucleic acid to be sequenced from a first end to a second end with an exonuclease to sequentially release individual nucleotides;

(iii) identifying each of the sequentially released nucleotides by mass spectrometry; and

(iv) determining the sequence of the nucleic acid from the identified nucleotides.

2. The method of claim 1, wherein the nucleic acid is a 2'-deoxyribonucleic acid (DNA).

3. The method of claim 1, wherein the nucleic acid is a ribonucleic acid (RNA).

4. The method of claim 1, wherein the exonuclease is selected from the group consisting of snake venom phosphodiesterase, spleen phosphodiesterase, Bal-31 nuclease, E. coli exonuclease I, E. coli exonuclease VII, Mung Bean Nuclease, S1 Nuclease, an exonuclease activity of E. coli DNA polymerase I, an exonuclease activity of the Klenow fragment of DNA polymerase I, an exonuclease activity of T4 DNA polymerase, an exonuclease activity of T7 DNA polymerase and an exonuclease activity of Taq DNA polymerase, E. coli exonuclease III, .lambda. exonuclease an exonuclease activity of Pyrococcus species GB-D DNA polymerase and an exonuclease activity of Thermococcus litoralis DNA polymerase.

5. The method according to claim 1, wherein the exonuclease is immobilized by covalent attachment to a solid support, entrapment within a gel matrix, or contained in a reactor with a semipermeable membrane.

6. The method according to claim 5, wherein the solid support is a capillary and the exonuclease is covalently attached to an inner wall of the capillary.

7. The method of claim 5, wherein the solid support is selected from the group consisting of glass beads, cellulose beads, polystyrene beads, epichlorohydrin-cross-linked -dextran beads, polyacrylamide beads and agarose beads.

8. The method according to claim 5, wherein the solid support is a flat membrane.

9. The method according to claim 1, wherein the nucleic acid is immobilized by covalent attachment to a solid support and the exonuclease is in a solution and is contacted with the immobilized nucleic acid.

10. The method according to claim 9, wherein the solid support is a capillary and the nucleic acid is covalently attached to an inner wall of the capillary.

11. The method according to claim 9, wherein the solid support is selected from the group consisting of glass beads, cellulose beads, polystyrene beads, epichlorohydrin-cross-linked dextran beads, polyacrylamide beads and agarose beads.

12. The method according to claim 9, wherein the solid support is a flat membrane.

13. The method according to claim 1, wherein the nucleic acid comprises mass-modified nucleotides.

14. The method according to claim 13, wherein the mass-modified nucleotides modulate the rate of the exonuclease activity.

15. The method according to claim 1, wherein the sequentially released nucleotides are mass-modified subsequent to exonuclease release and prior to mass spectrometric identification.

16. The method according to claim 15, wherein the sequentially released nucleotides are mass-modified by contact with an alkaline phosphatase.

17. The method of claim 9, wherein the nucleic acid further comprises a linking group (L) for covalently attaching the nucleic acid to the solid support.

18. The method of claim 17, wherein the solid support further comprises a splint oligonucleotide and the linking group (L) comprises a nucleotide sequence able to anneal to the splint oligonucleotide and be covalently attached to the solid support by action of a ligase.

19. The method of claim 1, wherein the mass spectrometry format used in step (iii) is selected from the group consisting of fast atomic bombardment (FAB), plasma desorption (PD), thermospray (TS), electrospray (ES) and matrix assisted laser desorption (MALDI).

20. The method of claim 1, wherein the mass analyzer used in step (iii) is a time-of-flight (TOF) configuration or a quadrapole.

21. A method of determining a sequence of a nucleic acid, comprising the steps of:

(i) obtaining the nucleic acid to be sequenced;

(ii) cleaving the nucleic acid to be sequenced from a first end to a second end with an exonuclease to produce multiple sets of nested nucleic acid fragments;

(iii) determining the molecular weight value of each one of the sets of nucleic acid fragments by mass spectrometry; and

(iv) determining the sequence of the nucleic acid from the molecular weight values of the sets of nucleic acid fragments.

22. The method of claim 21, wherein the nucleic acid is a 2'-deoxyribonucleic acid (DNA).

23. The method of claim 21, wherein the nucleic acid is a ribonucleic acid (RNA).

24. The method of claim 21, wherein the exonuclease is selected from the group consisting of snake venom phosphodiesterase, spleen phosphodiesterase, Bal-31 nuclease, E. coli exonuclease I, E. coli exonuclease VII, Mung Bean Nuclease, S1 Nuclease, an exonuclease activity of E. coli DNA polymerase I, an exonuclease activity of the Klenow fragment of DNA polymerase I, an exonuclease activity of T4 DNA polymerase, an exonuclease activity of T7 DNA polymerase, an exonuclease activity of Taq DNA polymerase, E. coli exonuclease III, .lambda. exonuclease, an exonuclease activity of Pyrococcus species GB-D DNA polymerase and an exonuclease activity of Thermococcus litoralis DNA polymerase.

25. The method of claim 21, wherein the exonuclease is immobilized by covalent attachment to a solid support, entrapment within a gel matrix, or contained in a reactor with a semipermeable membrane.

26. The method of claim 25, wherein the solid support is a capillary and the exonuclease activity is covalently attached to an inner wall of the capillary.

27. The method of claim 25, wherein the solid support is selected from the group consisting of glass beads, cellulose beads, polystyrene beads, epichlorohydrin-cross-linked dextran beads, polyacrylamide beads and agarose beads.

28. The method of claim 25, wherein the solid support is a flat membrane.

29. The method of claim 21, wherein the nucleic acid is immobilized by covalent attachment to a solid support and the exonuclease is in a solution and is contacted with the immobilized nucleic acid.

30. The method of claim 29, wherein the solid support is a capillary and the nucleic acid is covalently attached to an inner wall of the capillary.

31. The method of claim 29, wherein the solid support is selected from the group consisting of glass beads, cellulose beads, polystyrene beads, epichlorohydrin-cross-linked dextran beads, polyacrylamide beads and agarose beads.

32. The method of claim 29, wherein the solid support is a flat membrane.

33. The method of claim 21, wherein the nucleic acid comprises mass-modified nucleotides.

34. The method of claim 33, wherein the mass-modified nucleotides modulate the rate of the exonuclease activity.

35. The method of claim 21, wherein the multiple sets of nested nucleic acid fragments are mass-modified subsequent to exonuclease release and prior to mass spectrometric identification.

36. The method of claim 35, wherein the multiple sets of nested nucleic acid fragments are mass-modified by contact with an alkaline phosphatase.

37. The method of claim 29, wherein the nucleic acid further comprises a linking group (L) for covalently attaching the nucleic acid to the solid support.

38. The method of claim 37, wherein the solid support further comprises a splint oligonucleotide and the linking group (L) comprises a nucleotide sequence able to anneal to the splint oligonucleotide and be covalently attached to the solid support by action of a ligase.

39. The method of claim 21, wherein the mass spectrometry format used in step (iii) is selected from the group consisting of fast atomic bombardment (FAB), plasma desorption (PD), thermospray (TS), electrospray (ES) and matrix assisted laser desorption (MALDI).

40. The method of claim 21, wherein the mass analyzer used in step (iii) is a time-of-flight (TOF) configuration or a quadrapole.

41. A method of nucleic acid sequencing, comprising the steps of:

(a) obtaining a target nucleic acid to be sequenced wherein the target nucleic acid sequence is flanked by cleavable sites at both ends and wherein the plus and the minus strand of the target nucleic acid are distinguishable from each other by differential mass-modification;

(b) cleaving the target nucleic acid at the flanking cleavable sites;

(c) denaturing the cleaved target nucleic acid to generate single-stranded plus and minus strands;

(d) simultaneously cleaving the denatured plus and minus strands from a first end to a second end with an exonuclease to sequentially release individual nucleotides, wherein the individual nucleotides derived from the plus strand are mass-differentiated from the individual nucleotides derived from the minus strand;

(e) identifying each of the sequentially released nucleotides produced in step (d) by mass spectrometry; and

(f) determining the sequence of the target nucleic acid from the identified nucleotides.

42. The method of claim 41, wherein prior to the exonuclease cleavage of step (d), the plus and the minus strands are immobilized to a solid support.

43. A method of nucleic acid sequencing, comprising the steps of:

(a) obtaining a target nucleic acid to be sequenced wherein the target nucleic acid sequence is flanked by cleavable sites at both ends and wherein the plus and the minus strand of the target nucleic acid are distinguishable from each other by differential mass-modification;

(b) cleaving the target nucleic acid at the flanking cleavable sites;

(c) denaturing the cleaved target nucleic acid to generate single-stranded plus and minus strands;

(d) simultaneously cleaving the denatured plus and minus strands from a first end to a second end with an exonuclease to produce multiple sets of nested nucleic acid fragments, wherein the nested nucleic acid fragments derived from the plus strand are mass-differentiated from the nested nucleic acid fragments derived from the minus strand;

(e) identifying each of the nested nucleic acid fragments produced in step (d) by mass spectrometry; and

(f) determining the sequence of the target nucleic acid from the identified nested nucleic acid fragments.

44. The method of claim 43, wherein prior to the exonuclease cleavage of step (d), the plus and the minus strands are immobilized to a solid support.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

The fundamental role that determining DNA sequences has for the life sciences is evident. Its importance in the human genome project has been discussed and published widely [e.g. J. E. Bishop and M. Waldholz, 1991, Genome. The Story of the Most Astonishing Scientific Adventure of Our Time--The Attempt to Map All Genes in the Human Body, Simon & Schuster, New York].

The current state-of-the-art in DNA sequencing is summarized in recent review articles [e.g. B. Barrell, The FASEB Journal, 5, 40 (1991); G. L. Trainor, Anal. Chem. 62, 418 (1990), and references cited therein]. The most widely used DNA sequencing chemistry is the enzymatic chain termination method [F. Sanger et al., Proc. Natl. Acad. Sci. USA, 74, 5463 (1977)] which has been adopted for several different sequencing strategies. The sequencing reactions are either performed in solution with the use of different DNA polymerases, such as the thermophilic Taq DNA polymerase [M. A. Innes, Proc. Natl. Acad. Sci. USA, 85: 9436 (1988)] or specially modified T7 DNA polymerase ("SEQUENASE") [S. Tabor and C.C. Richardson, Proc. Natl. Acad. Sci. USA, 84, 4767 (1987)], or in conjunction with the use of polymer supports. See for example S. Stahl et al., Nucleic Acids Res., 16, 3025 (1988); M. Uhlen, PCT Application WO 89/09282; Cocuzza et al., PCT Application WO 91/11533; and Jones et al., PCT Application WO 92/03575, incorporated by reference herein.

A central, but at the same time limiting part of almost all sequencing strategies used today is the separation of the base-specifically terminated nested fragment families by polyacrylarnide gel electrophoresis (PAGE). This method is time-consuming and error-prone and can result in ambiguous sequence determinations. As a consequence of the use of PAGE, highly experienced personnel are often required for the interpretation of the sequence ladders obtained by PAGE in order to get reliable results. Automatic sequence readers very often are unable to handle artefacts such as "smiling", compressions, faint ghost bands, etc. This is true for the standard detection methods employing radioactive labeling such as 32.sub.P, 33.sub.P or 35.sub.S, as well as for the so-called Automatic DNA Sequencers (e.g. Applied Biosystems, Millipore, DuPont, Pharmacia) using fluorescent dyes for the detection of the sequencing bands.

Apart from the time factor, the biggest limitations of all methods involving PAGE as an integral part, however, is the generation of reliable sequence information, and the transformation of this information into a computer format to facilitate sophisticated analysis of the sequence data utilizing existing software and DNA sequence and protein sequence data banks.

With standard Sanger sequencing, 200 to 500 bases of unconfirmed sequence information can be obtained in about 24 hours; with automatic DNA sequencers this number can be multiplied by approximately a factor of 10 to 20 due to processing several samples simultaneously. A further increase in throughput can be achieved by employing multiplex DNA sequencing [G. Church et al., Science, 240, 185-188 (1988); Koster et al., Nucleic Acids Res. Symposium Ser. No. 24, 318-21 (1991)] in which, by using a unique tag sequence, several sequencing ladders can be detected, one after the other, from the same PAGE after blotting, UV-crosslinking to a membrane, and hybridizations with specific complementary tag probes. However, this approach is still very laborious, often requires highly skilled personnel and can be hampered by the use of PAGE as a key element of the whole process.

A large scale sequencing project often starts with either a cDNA or genomic library of large DNA fragments inserted in suitable cloning vectors such as cosmid, plasmid (e.g. pUC), phagemid (e.g. pEMBL, pGEM) or single-stranded phage (e.g. M13) vectors [T. Maniatis, E. F. Fritsch and J. Sambrook (1982) Molecular Cloning. A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; Methods in Enzymology, Vol. 101 (1983), Recombinant DNA, Part C; Vol. 153 (1987), Recombinant DNA, Part D; Vol. 154 (1987), Recombinant DNA, Part E; Vol. 155 (1987), Recombinant DNA, Part F and Vol. 152 (1987), Guide to Molecular Cloning Techniques, Academic Press, New York]. Since large DNA fragments currently cannot be sequenced directly in one run because the Sanger sequencing chemistry allows only about 200 to 500 bases to be read at a time, the long DNA fragments have to be cut into shorter pieces which are separately sequenced. In one approach this is done in a fully random manner by using, for example, nonspecific DNAse I digestion, frequently cutting restriction enzymes, or sonification, and sorting by electrophoresis on agarose gels [Methods in Enzymology, supra]. However, this method is time-consuming and often not economical as several sequences are sequenced many times until a contiguous DNA sequence is obtained. Very often the expenditure of work to close the gaps of the total sequence is enormous. Consequently, it is desirable to have a method which allows sequencing of a long DNA fragment in a non-random, i.e. direct, way from one end through to the other. Several strategies have been proposed to achieve this [Methods of Enzymology, supra; S. Henikoff, Gene, 28, 351-59 (1984); S. Henikoff, et al. U.S. Pat. No. 4,843,003; and PCT Application WO 91/12341]. However, none of the currently available sequencing methods provide an acceptable method of sequencing megabase DNA sequences in either a timely or economical manner. The main reason for this stems from the use of PAGE as a central and key element of the overall process.

In PAGE, under denaturing conditions, the nested families of terminated DNA fragments are separated by the different mobilities of DNA chains of different length. A closer inspection, however, reveals that it is not the chain length alone which governs the mobility of DNA chains by PAGE, but there is a significant influence of base composition on the mobility [R. Frank and H. Koster, Nucleic Acids Res., 6, 2069 (1979)]. PAGE, therefore, is not only a very slow, but also an unreliable method for the determination of molecular weights, as DNA fragments of the same length but different sequence/base composition could have different mobilities. Likewise, DNA sequences which have the same mobility could have different sequence/base compositions.

The most reliable way for the determination of the sequence/base composition of a given DNA fragment would, therefore, be to correlate the sequence with its molecular weight. Mass spectrometry is capable of doing this. The enormous advantage of mass spectrometry compared to the above mentioned methods is the speed, which is in the range of seconds per analysis, and the accuracy of mass determination, as well as the possibility to directly read the collected mass data into a computer. The application of mass spectrometry for DNA sequencing has been investigated by several groups [e.g. Methods in Enzymology, Vol. 193: Mass Spectrometry, (J. A. McCloskey, editor), 1990, Academic Press, New York; K. H. Schramm Biomedical Applications of Mass Spectrometry, 34, 203-287 (1990); P. F. Crain Mass Spectrometry Reviews, 9, 505 (1990)].

Most of the attempts to use mass spectrometry to sequence DNA have used stable isotopes for base-specific labeling, as for instance the four sulfur isotopes .sup.32 S, .sup.33 S, .sup.34 S and .sup.36 S. See, for example, Brennan et al., PCT Application WO 89/12694, R. L. Mills U.S. Pat. No. 5,064,754, U.S. Pat. No. 5,002,868, Jacobson et al.; Haan European Patent Application No. A1 0360676. Most of these methods employed the Sanger sequencing chemistry and polyacrylamide gel electrophoresis with some variations, such as capillary zone electrophoresis (CZE), to separate the nested, terminated DNA fragments prior to mass spectrometric analysis, which, jeopardizes, to some extent, the advantages of mass spectrometry.

One advantage of PAGE is that it is a parallel method, i.e. several samples can be analyzed simultaneously (though this is not true for CZE which is a serial method), whereas mass spectrometry allows, in general, only a serial handling of the samples. In U.S. Pat. No. 5,547,835, mass spectrometric DNA sequencing is proposed without the use of PAGE, employing desorption/ionization techniques applicable to larger biopolymers, such as electrospray (ES) [J. B. Fenn et al., J. Phys. Chem., 88, 4451-59 (1984); Fenn et al., PCT Application No. WO 90/14148; and B. Ardrey, Spectroscopy Europe, 4, 10-18 (1992)] and matrix-assisted laser desorption/ionization (MALDI) mass spectrometry [F. Hillenkamp et al., Laser Desorption Mass Spectrometry, Part I: Mechanisms and Techniques and Part II: Performance and Application of MALDI of Large Biomolecules, in Mass Spectrometry in the Biological Sciences: A Tutorial(M. L. Gross, editor), 165-197 (1992), Kluwer Academic Publishers, The Netherlands] which can facilitate determination of DNA sequences by direct measurement of the molecular masses in the mixture of base-specifically terminated nested DNA fragments. By integrating the concept of multiplexing through the use of mass-modified nucleoside triphosphate derivatives, the serial mode of analysis typical for current mass spectrometric methods can be changed to a parallel mode [H. Koster, U.S. Pat. No. 5,547,835, supra].

MALDI and ES mass spectrometry are in some aspects complementary techniques. While ES, using an atmospheric pressure ionization interface (API), can accommodate continuous flow streams from high-performance liquid chromatoghraphs (HPLC) [K. B. Tomer, et al. Biological Mass Spectrometry, 20, 783-88 (1991)] and capillary zone electrophoresis (CZE) [R. D. Smith et al., Anal. Chem., 60, 436-41 (1988)] this is currently not available for MALDI mass spectrometry. On the other hand, MALDI mass spectrometry is less sensitive to buffer salts and other low molecular weight components in the analysis of larger molecules with a TOF mass analyzer [Hillenkamp et al. (1992), supra]; in contrast, ES is very sensitive to by-products of low volatility. While the high mass range in ES mass spectrometry is accessible through the formation of multiply charged molecular ions, this is achieved in MALDI mass spectrometry by applying a time-of-flight (TOF) mass analyzer and the assistance of an appropriate matrix to volatilize the biomolecules. Similar to ES, a thermospray interface has been used to couple HPLC on-line with a mass analyzer. Nucleosides originating from enzymatic hydrolysates have been analyzed using such a configuration [C. G. Edmonds et al. Nucleic Acids Res., 13, 8197-8206 (1985)]. However, Edmonds et al. does not disclose a method for nucleic acid sequencing.

A complementary and completely different approach to determine the DNA sequence of a long DNA fragment would be to progressively degrade the DNA strand using exonucleases from one side,--nucleotide by nucleotide. This method has been proposed by Jett et al. See J. H. Jett et al. J Biomolecular Structure & Dynamics, 7, 301-309 (1989); and J. H. Jett et al. PCT Application No. WO 89/03432. A single molecule of a DNA or RNA fragment is suspended in a moving flow stream and contacted with an

exonuclease which cleaves off one nucleotide after the other. Detection of the released nucleotides is accomplished by specifically labeling the four nucleotides with four different fluorescent dyes and involving laser-induced flow cytometric techniques.

However, strategies which use a stepwise enzymatic degradation process can suffer from problems relating to synchronization, i.e. the enzymatic reaction soon comes out of phase. Jett et al., supra, have attempted to address this problem by degrading just one single DNA or RNA molecule by an exonuclease. However, this approach is very hard, as handling a single molecule, keeping it in a moving flow stream, and achieving a sensitivity of detection which clearly identifies one single nucleotide are only some of the very difficult technical problems to be solved. In addition, in using fluorescent tags, the physical detection process for a fluorescent signal involves a time factor difficult to control and the necessity to employ excitation by lasers can cause photo-bleaching of the fluorescent signal. Another problem, which still needs to be resolved, is that DNA/RNA polymerases, which are able to use the four fluorescently labeled NTPs instead of the unmodified counterparts, have not been identified.

The invention described herein addresses most of the problems described above, which are inherent to currently existing DNA sequencing processes, and provides chemistries and systems suitable for high-speed DNA sequencing, a prerequisite for tackling the human genome and other genome sequencing projects.

SUMMARY OF THE INVENTION

In contrast to most sequencing strategies, the process of this invention does not use the Sanger sequencing chemistry, polyacrylamide gel electrophoresis or radioactive, fluorescent or chemiluminescent detection. Instead, the process of the invention adopts a direct sequencing approach, beginning with large DNA fragments cloned into conventional cloning vectors, and based on mass spectrometric detection. To achieve this, the target nucleic acid (or fragments of the target nucleic acid) is, by means of protection, specificity of enzymatic activity, or immobilization, unilaterally degraded in a stepwise manner via exonuclease digestion and the nucleotides, derivatives or truncated sequences detected are by mass spectrometry.

In one embodiment, prior to enzymatic degradation, sets of ordered deletions are made, which span the whole sequence of the cloned DNA fragment. In this manner, mass-modified nucleotides are incorporated using a combination of exonuclease and DNA/RNA polymerase. This enables either multiplex mass spectrometric detection, or modulation of the activity of the exonuclease so as to synchronize the degradative process.

In another embodiment of the invention, the phasing problem is resolved by continuously applying small quantities of the enzymatic reaction mixture onto a moving belt with adjustable speed for mass spectrometric detection.

In yet another embodiment of the invention, the throughput is further increased by applying reaction mixtures from different reactors simultaneously onto the moving belt. In this case, the different sets of sequencing reactions are identified by specific mass-modifying labels attached to the four nucleotides. Two-dimensional multiplexing can further increase the throughput of exonuclease-mediated mass spectrometric sequencing as described herein.

In further aspects, the invention features kits and devices for sequencing nucleic acids, based on the novel process described herein.

The enormous advantage of exonuclease-mediated mass spectrometric DNA sequencing is that small molecules are analyzed and identified by mass spectrometry. In this mass range, the accuracy of mass spectrometers is routinely very high; i.e. 0.1 mass units are easily detected. This increases the potential for multiplexing as small differences in mass can be detected and resolved. An additional advantage of mass spectrometric sequencing is that the identified masses can be registered automatically by a computer and, by adding the time coordinate, automatically aligned to sequences. Since the sequences so determined are memorized (i.e. saved to disk or resident in the computer memory), appropriate existing computer programs operating in a multitasking environment can be searching in the "background" (i.e. during continuous generation of new sequence data by the exonuclease mass spectrometric sequencer) for overlaps and generate contiguous sequence information which, via a link to a sequence data bank, can be used in homology searches, etc.

Other features and advantages of the invention will be further described with reference to the following Detailed Description and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a process of exonuclease sequencing beginning with a single-stranded nucleic acid.

FIG. 2 illustrates a process similar to FIG. 1, however, starting with a target nucleic acid inserted into a double-stranded vector.

FIG. 3 illustrates a method for introducing mass-modified nucleotides into a target nucleic acid sequence (e.g. for multiplexing mass spectrometry).

FIGS. 4A and 4B illustrate methods for introducing mass-modified nucleotides into a target nucleic acid sequence (e.g. for multiplexing mass spectrometry).

FIG. 5 shows positions within a nucleic acid molecule which can be modified for the introduction of discriminating mass increments or modulation of exonuclease activity.

FIG. 6 illustrates various structures of modified nucleoside triphosphates useful for the enzymatic incorporation of mass-modified nucleotides into the DNA or RNA to be sequenced.

FIG. 7 shows some possible functional groups (R) useful for either mass-modification of nucleotides in discrete increments for differentiation by mass spectrometry and/or to modulate the enzymatic activity of an exonuclease.

FIG. 8 illustrates some linking groups (X) for the attachment of the mass-modifying functionality (R) to nucleosides.

FIG. 9 is a schematic drawing of a sequencing reactor system.

FIG. 10 is a graphical representation of idealized output signals following the time-course of the stepwise mass spectrometric detection of the exonucleolytically released nucleotides.

FIG. 11 illustrates specific labels introduced by mass modification to facilitate multiplex exonuclease mass spectrometric sequencing.

FIG. 12 is a schematic drawing of a moving belt apparatus for delivering single or multiple tracks of exonuclease samples for laser-induced mass spectrometric sequence determination in conjunction with the sequencing reactor of FIG. 9.

FIG. 13 is a schematic representation of individually labeled signal racks employed in multiplex exonuclease-mediated mass spectrometric sequencing.

FIG. 14 illustrates a method for double-stranded exonuclease sequencing for mass spectrometric sequence determination.

FIG. 15 is a schematic representation of one format of the invention in which a nucleic acid sequence is revealed through mass spectrometer determination of the molecular weights of sequentially and unilaterally (e.g. from the 5' or 3' end) released nucleotides (or nucleosides, e.g. generated by phosphatase treatment of nucleotides).

FIG. 16 is a schematic representation of another format of the invention in which the molecular weight of fragments remaining after exonuclease digestion are determined. Based on the incremental difference in molecular weight between neighboring fragments, the nucleic acid sequence of the complete molecule is deduced.

FIG. 17A shows the mass spectrum obtained from a 60-mer by MALDI-TOF mass spectrometry. The theoretical molecular weights for the four nucleotides are provided.

FIG. 17B provides an expanded version of FIG. 17A.

FIG. 17C provides a further expanded version of a sequence range between a 31-mer and a 43-mer.

FIGS. 17D-1 and 17D-2 compares the theoretical mass values with the values found for each of the fragments in the mixture.

FIG. 18 is a schematic of another format of the invention, wherein four sets of base-specifically terminated fragments obtained via exonuclease treatment are superimposed to reveal the sequence of the nucleic acid molecule (i.e. reverse Sanger sequencing).

FIG. 19 provides a MALDI-TOF spectrum of the 14 fragments simulating the dT-terminated nested set of FIG. 18.

FIG. 20 provides Fast Atom Bombardment (FAB) spectra of three mixtures: (A) a 1:1:1:1 mixture of A:C:G:T; (B) a 1:1:1:0.5 mixture of A:C:G:T; (C) a 1:1:1:0.2 mixture of A:C:G:T;. and (D) matrix alone.

FIG. 21 A provides Electrospray (ES) spectra of three mixtures: (A) a 1:1:1:1 mixture of A:C:G:T; (B) a 1:1:1:0.5 mixture of A:C:G:T; and (C) a 1:1:1:0.2 mixture of A:C:G:T. All four nucleosides can be easily determined and discriminated by their molecular weights, even in a 1:1:1:1 mixture. Qualitatively, the three spectra also clearly reveal the different dT concentrations in the three mixtures.

FIG. 21B is an expanded version around the dT signal of FIG. 21A (panel 2).

FIG. 22 shows the mass spectrum obtained based on exonuclease degradation of a 25-mer with a 3' exonuclease (snake venom phosphodiesterase).

FIG. 23 is a schematic representation of nucleic acid immobilization via covalent bifunctional trityl linkers.

FIG. 24 is a schematic representation of nucleic acid immobilization via hydrophobic trityl linkers.

DETAILED DESCRIPTION OF THE INVENTION

The starting point for the process of the invention can be, for example, DNA cloned from either a genomic or cDNA library, or a piece of DNA isolated and amplified by polymerase chain reaction (PCR) which contains a DNA fragment of unknown sequence. Libraries can be obtained, for instance, by following standard procedures and cloning vectors [Maniatis, Fritsch and Sambrook (1982), supra; Methods in Enzymology, Vol. 101 (1983) and Vol. 152-155 (1987), supra]. Appropriate cloning vectors are also commercially available. As will be apparent, the invention is not limited to the use of any specific vector constructs or cloning procedures, but can be applied to any given DNA fragment whether obtained, for instance, by cloning in any vector or by the Polymerase Chain Reaction (PCR) or any DNA/RNA amplification method. The unknown DNA sequence can be obtained in either double-stranded form (e.g. using standard PCR) or in a single-stranded form (e.g. employing asymmetric PCR, PCR Technology: Principles and Applications for DNA Amplification (Erlich, editor), M. Stockton Press, New York (1989)).

For those skilled in the art it is clear that both DNA and RNA can be exonucleolytically degraded from either the 5' or 3' end depending upon the choice of exonuclease. Similarly the sequence of an unknown DNA molecule can be determined directly by exonuclease digestion, or alternatively, the DNA fragment of unknown sequence can be transcribed first into a complementary RNA copy which is subsequently exonucleolytically degraded to provide the RNA sequence. Appropriate vectors, such as the pGEM (Promega) vectors, are useful in the present invention as they have specific promoters for either the SP6 or T7 DNA-dependent RNA polymerases flanking the multiple cloning site. This feature allows transcription of both unknown DNA strands into complementary RNA for subsequent exonuclease sequencing. Furthermore, these vectors, belonging to the class of phagemid vectors, provide means to generate single-stranded DNA from the unknown, double stranded DNA. Thus, by using two vectors which differ only in the orientation of the f1 origin of replication, both strands of the unknown DNA sequence can be obtained in a single-stranded form and utilized for subsequent exonuclease sequencing. The scope of the invention is also not limited by the choice of restriction sites. There is, however, a preference for rare cutting restriction sites to keep the unknown DNA fragment unfragmented during the manipulations in preparation for exonuclease sequencing.

In one aspect of the invention, the target nucleic acid (or fragments of the target nucleic acid) is, by means of protection, specificity of enzymatic activity, or immobilization, unilaterally degraded in a stepwise manner via exonuclease digestion, and the nucleotides, derivatives (FIG. 15) or truncated sequences (FIG. 16) detected by mass spectrometry. FIG. 17 demonstrates how molecular weights of the truncated fragments of the sequences of a 60-mer can be derived. FIGS. 20 and 21 demonstrate the power of mass spectrometric detection of nucleosides by either FAB (fast atomic bombardment) or ESI (electrospray ionization) mass spectrometry, respectively. Whereas UV or fluorescent measurements will not discriminate mixtures of the nucleoside/nucleotide which are generated when the enzyme gets out of phase, this is no problem with MS since the resolving power in differentiating between the molecular mass of dA, dT, dG and dC is more than significant. FIGS. 20 and 21 also showing that peak intensities are correlated with the amount of the nucleoside in the mixture. Thus, if the enzyme is getting out of phase, samples measured at consecutive time intervals will reveal the sequence of the nucleosides/nucleotides sequentially cleaved off by the exonuclease.

Another aspect of this invention concerns a "reverse-Sanger" type sequencing method using exonuclease digestion of nucleic acids to produce base-specific terminated ladders of nested digestion fragments which are detectable by mass spectrometry. For instance, as above, the target nucleic acid can be immobilized onto a solid support to provide unilateral degradation of the chain by exonuclease action; measuring the molecular weights of the fragments rather than nucleotides, reveals the sequence (FIG. 18). The nested exonuclease fragments are generated by incorporating into the target nucleic acid a limited number of mass-modified nucleotides which inhibit the exonuclease activity (i.e. protect an individual nucleic acid chain from further degradation). See Labeit et al. (1986) DNA 5:173; Eckstein et al. (1988) Nucleic Acid Res. 16:9947; and PCT Application No. GB86/00349. The nested exonuclease fragments can then be released from the solid support (i.e. via a cleavable linkage) and the molecular weight values for each species of the nested fragments determined by mass spectrometry, as described in U.S. Pat. No. 5,547,835 to Koster. FIG. 19 displays the spectrum of a mixture of 14 dT-terminated fragments. From the molecular weight values determined, the sequence of nucleic acid can be generated by superposition of the four sets of fragments. It is clear that many variations of this reaction are possible and that it is amenable to multiplexing. For example, the target nucleic acid need not be bound to a solid support, rather, any protecting group or the specificity of the enzyme can be used to ensure unilateral exonuclease degradation. Where mass-modified nucleotides are used which have large enough molecular weight differences to be differentiated by mass spectrometry (i.e. the termination of a chain with a particular mass-modified nucleotide is discernible from all other terminations), the exonuclease sequencing can be carried out by unilaterally sequencing more than one set of nested fragments. Alternatively, individual types of exonuclease-inhibiting nucleotides can be incorporated in separate reactions to create sets of nested fragments. For instance, four sets of nested fragments can be separately generated wherein one set terminates with mass-modified A's, one set terminates in mass-modified G's, etc. and the total sequence is determined by aligning the collection of nested exonuclease fragments.

Amenable mass spectrometric formats for use in the invention include the ionization (I) techniques such as matrix-assisted laser desorption (MALDI), continuous or pulsed electrospray (ESI) and related methods (e. g. Ionspray, Thermospray), or massive cluster impact (MCI); these ion sources can be matched with detection formats including linear or reflector time-of-flight (TOF), single or multiple quadrupole, single or multiple magnetic sector, Fourier transform ion cyclotron resonance (FTICR), ion trap, or combinations of these to give a hybrid detector (e. g. ion trap-time-of-flight). For ionization, numerous matrix/wavelength combinations (MALDI) or solvent combinations (ESI) can be employed.

(i) Preparation of Unknown Nucleic Acid Sequence for Exonuclease Sequencing:

FIG. 1 schematically depicts the instant claimed process for a

single-stranded DNA insert ("Target DNA") of a single-stranded circular cloning vector. The boundaries of the target DNA are designated A and B. The target DNA, as illustrated in FIG. 1, has been cloned into the Not I site of a vector. A synthetic oligodeoxynucleotide [N. D. Sinha, J. Biernat, J. McManus and H. Koster, Nucleic Acids Res., 12, 4539 (1984)] which will restore the Not I site to double-strandedness and which is complementary to the vector sequence flanking the A boundary of the insert DNA is hybridized to that site and cleaved by Not I restriction endonuclease. The two pieces of the synthetic oligodeoxynucleotide can then be removed by molecular sieving, membrane filtration, precipitation, or other standard procedures.

FIG. 1 also illustrates a set of ordered deletions (t.sup.0, t.sup.1, t.sup.2, t.sup.3) which can be obtained by the time-limited action of an exonuclease, e.g. T4 DNA polymerase, in the absence of dNTPs. The set of deletions can be immobilized on a solid support Tr (Tr.sup.0, Tr.sup.1, Tr.sup.2, Tr.sup.3), or alternatively, the set of ordered deletions can be obtained in a heterogeneous reaction by treating the solid support, Tr.sup.0, containing the complete target DNA sequence, with an exonuclease in a time-limited manner. In the instance where the 3' termini of each time point are too heterogeneous (i.e. "fuzzy") to be analyzed directly by exonuclease-mediated mass spectrometric sequencing, circularization of the template and a cloning step can be performed prior to this sequencing process with single, transformed colonies selected.

A single-stranded linear DNA fragment carrying the unknown sequence with its A boundary at the 3' end can be directly sequenced by a 3' exonuclease in an apparatus described below and schematically depicted in FIG. 9, provided that the exonuclease is immobilized within the reactor, for example, on beads, on a frit, on a membrane located on top of the frit or on the glass walls of a capillary or entrapped in a gel matrix or simply by a semipermeable membrane which keeps the exonuclease in the reactor while the linear DNA fragment is circulating through a loop.

At time intervals, or alternatively as a continuous stream, the reaction mixture containing the buffer and the released nucleotides is fed to the mass spectrometer for mass determination and nucleotide identification. In another embodiment, the stream containing the nucleotides released by exonuclease action can be passed through a second reactor or series of reactors which cause the released nucleotide to be modified. For example, the second reactor can contain an immobilized alkaline phosphatase and the nucleotides passing therethrough are transformed to nucleosides prior to feeding into the mass spectrometer. Other mass-modifications are described below.

In general, when it is the released nucleotide (or ribonucleotide) which is mass-modified, the modification should take as few steps as possible and be relatively efficient. For example, reactions used in adding base protecting groups for oligonucleotide synthesis can also be used to modify the released nucleotide just prior to mass spectrometric analysis. For instance, the amino function of adenine, guanine or cytosine can be modified by acylation. The amino acyl function can be, by way of illustration, an acetyl, benzoyl, isobutyryl or anisoyl group. Benzoylchloride, in the presence of pyridine, can acylate the adenine amino group, as well as the deoxyribose (or ribose) hydroxyl groups. As the glycosidic linkage is more susceptible to hydrolysis, the sugar moiety can be selectively deacylated if the acyl reaction was not efficient at those sites (i.e. heterogeneity in molecular weight arising from incomplete acylation of the sugar). The sugar moiety itself can be the target of the mass-modifying chemistry. For example, the sugar moieties can be acylated, tritylated, monomethoxytritylated, etc. Other chemistries for mass-modifying the released nucleotides (or ribonucleotides) will be apparent to those skilled in the art.

In another embodiment, the linear, single-stranded DNA fragment can be anchored to a solid support. This can be achieved, for example, by covalent attachment to a functional group on the solid support, such as through a specific oligonucleotide sequence which involves a spacer of sufficient length for the ligase to react and which is covalently attached via its 5' end to the support (FIG. 1). A splint oligonucleotide with a sequence complementary in part to the solid support-bound oligonucleotide and to the 5' end of the linearized single stranded vector DNA allows covalent attachment of the DNA to be sequenced to the solid support. After annealing, ligation (i.e. with T4 DNA ligase) covalently links the solid-support-bound oligonucleotide and the DNA to be sequenced. The splint oligonucleotide can be subsequently removed by a temperature jump and/or NaOH treatment, or washed off the support using other standard procedures. The solid support with the linear DNA is transferred to the reactor (FIG. 9) and contacted with an exonuclease in solution. As illustrated, where the 3' end of the unknown DNA fragment is exposed (i.e. unprotected), a 3' exonuclease is employed. The released nucleotides, or modified nucleotides, if intermediately contacted with a modifying agent such as alkaline phosphatase, are identified by mass spectrometry as described above. Other linking groups are described herein, and still others will be apparent to those skilled in the art based on the embodiments described herein. For example, the immobilization can occur through a covalent bond, such as a disulfide linkage, leuvolinyl linkage, a peptide/oligo peptide bond, a pyrophosphate, a tritylether or tritylamino linkage, which can be cleaved in accordance with standard procedures (see e.g. Example 14 and FIG. 23). Immobilization can also be obtained by non-covalent bonds such as between biotin and streptavidin or hydrophobic interactions (see e.g. Example 15 and FIG. 24).

A solid (i.e. insoluble) support as used herein refers to a support which is solid or can be separated from a reaction mixture by filtration, precipitation, magnetic separation, or the like. Exemplary solid supports include beads (such as agarose (e.g., SEPHAROSE.sup.R), dextran cross-linked with epichlorohydrin (e.g., SEPHADEX.sup.R),; polystyrene, polyacrylamide, cellulose, Teflon, glass, (including controlled pore glass), gold, or platinum); flat supports such as membranes (e.g., of cellulose, nitrocellulose, polystyrene, polyester, polycarbonate, polyamide, nylon, glass fiber, polydivinylidene difluoride, and Teflon); glass plates, metal plates (including gold, platinum, silver, copper, and stainless steel); silicon wafers, mictrotiter plates, and the like. Flat solid supports can be provided with pits, combs, pins, channels, filter bottoms, and the like, as is known in the art. The solid supports can also be capillaries, as well as frits from glass or polymers.

Various 3' and or 5' exonucleases can be used in the process of the invention, including: phosphodiesterase from snake venom, spleen phosphodiesterase, Exonuclease I or VII from E. coli, Bal 31 exonuclease, Mung Bean Nuclease, S1 Nuclease, E. coli DNA polymerase I, Klenow fragment of DNA polymerase I, T4 DNA polymerase, T7 DNA polymerase, Taq DNA polymerase, Pyrococcus species GB-D DNA polymerase, such as DEEP VENT.sup.R DNA Polymerase, E. coli exonuclease III, .lambda. exonuclease and Thermococcus litoralis DNA polymerase, such as VENT.sub.R.sup.R DNA Polymerase.

In another embodiment using a phagemid vector with an inverted f1 origin of replication, the B boundary is located at the 3' end of the immobilized linear single-stranded DNA and exposed to exonuclease sequencing using the same restriction endonuclease, hybridizing oligodeoxynucleotide and splint oligonucleotide. As another embodiment of this invention, the hybridizing oligonucleotide can also be designed to bind a promoter site upstream of the A boundary and by doing so restore the double-stranded promoter DNA. Directly, or with a short initiator oligonucleotide carrying an attachment functionality at the 5' end, transcription can be initiated with the appropriate specific DNA-dependent RNA polymerase [Methods in Enzymology, Vol. 185, Gene Expression Technology (1990); J. F. Milligan, D. R. Groebe, G. W. Witherell and O. C. Uhlenbeck, Nucleic Acids Res., 15, 8783-98 (1987); C. Pitulle, R. G. Kleineidam, B. Sproat and G. Krupp, Gene, 112, 101-105 (1992) and H. Koster U.S. Pat. No. 5,547,835, supra]. The RNA transcript can be transferred to the reactor (FIG. 9) and contacted with an immobilized or otherwise contained exonuclease, or immobilized via the 5' functionality of the initiator oligonucleotide incorporated in the RNA transcript to a solid support and then contacted with an exonuclease in solution.

Depending on the length of the DNA insert (i.e. number of nucleotides between boundary A and B in FIG. 1) the mass spectrometric exonuclease sequencing process can allow the complete sequence from A to B to be determined in one run. Alternatively, prior to exonuclease sequencing, a set of ordered deletions can be prepared according to standard procedures [e.g. Methods in Enzymology, Vol. 101 (1983) and Vol. 152-155 (1987); R. M. K. Dale et al., Plasmid, 13, 31-40 (1985)], such that, in FIG. 1 the steps Tr.sup.0 to Tr.sup.3 can represent either different time values of the mass spectrometric exonuclease sequencing reaction from immobilized DNA fragments or different starting points for the exonuclease DNA/RNA mass spectrometric sequencing process. In either case, the principle of the invention described provides a process by which the total sequence of the insert can be determined.

In another embodiment of the invention, the unknown DNA sequence (target DNA) is inserted into a double-stranded cloning vector (FIG. 2) or obtained in double-stranded form, as for example by a PCR (polymerase chain reaction) process [PCR Technology, (1989) supra]. The DNA to be sequenced is inserted into a cloning vector, such as ligated into the Not I site as illustrated in FIG. 2. Adjacent to the A boundary there can be located another cutting restriction endonuclease site, e.g. an Asc I endonuclease cleavage site. The double-stranded circular molecule can be linearized by treatment with Asc I endonuclease and ligated to a solid support using a splint oligodeoxynucleotide (and ligase) as described above, which restores the Asc I restriction site (Tr.sup.0 ds and Tr.sup.0' ds). The strand which is not immobilized can be removed by subjecting the double-stranded DNA to standard denaturing conditions and washing, thereby generating single-stranded DNAs immobilized to the solid support (Tr.sup.0 ss and Tr.sup.0' ss). Since the unknown double-stranded DNA sequence can be ligated in either orientation to the support, there can exist two non-identical 3' termini (+and - strand) immobilized, which can result in ambiguous sequencing data. The immobilized fragment which carries the vector DNA sequence at the 3' end (Tr.sup.0' ss) can be protected from 3' exonuclease degradation during the sequencing process by, for example, annealing with an oligodeoxynucle