|
Claims  |
|
|
What is claimed is:
1. A method of sequencing nucleic acid molecules comprising: a) inserting multiple copies of a template nucleic acid molecule into a reaction chamber; b) synthesizing a
complementary nucleic acid molecule from nucleotide precursors with a polymerase; c) monitoring the order of incorporation of nucleotide precursors into the complementary nucleic acid molecule by surface enhanced Raman scattering, surface enhanced
resonance Raman scattering, stimulated Raman scattering, inverse Raman, stimulated gain Raman spectroscopy, hyper-Raman scattering or coherent anti-Stokes Raman scattering, wherein a tag molecule is attached to each nucleotide precursor; and d)
determining the sequence of the template nucleic acid molecule from the order of incorporation of nucleotide precursors into complementary nucleic acid molecules.
2. The method of claim 1, wherein the polymerase is a DNA polymerase.
3. The method of claim 2, further comprising adding a primer, wherein the primer is complementary in sequence to a portion of the template nucleic acid molecule.
4. The method of claim 3, wherein the primer is complementary to the 3' end of the template nucleic acid molecule.
5. The method of claim 1, wherein the template nucleic acid molecules are attached to an immobilization surface.
6. The method of claim 5, wherein the template nucleic acid molecules are attached to the immobilization surface through a linker arm.
7. The method of claim 3, wherein the primer is attached to an immobilization surface.
8. The method of claim 1, wherein each type of nucleotide precursor is attached to a distinguishable tag molecule.
9. The method of claim 8, wherein tag molecules are selected from the group consisting of TRIT (tetramethyl rhodamine isothiol), NBD (7-nitrobenz-2-oxa-1,3-diazole), Texas Red dye, phthalic acid, terephthalic acid, isophthalic acid, cresyl fast
violet cresyl blue violet, brilliant cresyl blue, para-aminobensoic acid, erythroisine, aminoacridine, cyanide, thiol, chlorine, bromine, methyl, phosphorus, sulfur and carbon nanotubes.
10. The method of claim 1, wherein tag molecules are selected from the group consisting of TRIT (tetramethyl rhodamine isothiol), NBD (7-nitrobenz-2-oxa-1,3-diazole), Texas Red dye, phthalic acid, terephthalic acid, isophthalic acid, cresyl fast
violet cresyl blue violet, brilliant cresyl blue, para-aminobensoic acid, erythroisine, aminoacridine, cyanide, thiol, chlorine, bromine, methyl, phosphorus, sulfur and carbon nanotubes.
11. The method of claim 1, wherein the polymerase is a DNA polymerase, an RNA polymerase or a reverse transcriptase.
12. The method of claim 1, wherein a surface of the reaction chamber is coated with silver, gold, platinum, copper, or aluminum.
13. The method of claim 12, wherein the coated surface of the reaction chamber is opposite a detection unit. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
FIELD OF THE INVENTION
The present methods, compositions and apparatus relate to the fields of molecular biology and genomics. More particularly, the disclosed methods, compositions and apparatus concern nucleic acid sequencing.
BACKGROUND
The advent of the human genome project required that improved methods for sequencing nucleic acids, such as DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), be developed. Genetic information is stored in the form of very long molecules of
DNA organized into chromosomes. The twenty-three pairs of chromosomes in the human genome contain approximately three billion bases of DNA sequence. This DNA sequence information determines multiple characteristics of each individual, such as height,
eye color and ethnicity. Many common diseases, such as cancer, cystic fibrosis, sickle cell anemia and muscular dystrophy are based at least in part on variations in DNA sequence.
Determination of the entire sequence of the human genome has provided a foundation for identifying the genetic basis of such diseases. However, a great deal of work remains to be done to identify the genetic variations associated with each
disease. That would require DNA sequencing of portions of chromosomes in individuals or families exhibiting each such disease, in order to identify specific changes in DNA sequence that promote the disease. RNA, an intermediary molecule required for
processing of genetic information, can also be sequenced in some cases to identify the genetic bases of various diseases.
Existing methods for nucleic acid sequencing, based on detection of fluorescently labeled nucleic acids that have been separated by size, are limited by the length of the nucleic acid that can be sequenced. Typically, only 500 to 1,000 bases of
nucleic acid sequence can be determined at one time. This is much shorter than the length of the functional unit of DNA, referred to as a gene, which can be tens or even hundreds of thousands of bases in length. Using current methods, determination of
a complete gene sequence requires that many copies of the gene be produced, cut into overlapping fragments and sequenced, after which the overlapping DNA sequences may be assembled into the complete gene. This process is laborious, expensive,
inefficient and time-consuming.
BRIEF DESCRIPTION OF THE DRAWINGS
The following drawings form part of the present specification and are included to further demonstrate certain embodiments. Those embodiments may be better understood by reference to one or more of these drawings in combination with the detailed
description of specific embodiments presented herein.
FIG. 1 illustrates an exemplary apparatus 10 (not to scale) and method for DNA sequencing in which a nucleic acid 13 is sequenced by monitoring the uptake of nucleotide precursors 17 from solution during nucleic acid synthesis.
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
The disclosed methods, compositions and apparatus are of use for the rapid, automated sequencing of nucleic acids 13. In particular embodiments, the methods, compositions and apparatus are suitable for obtaining the sequences of very long
nucleic acid 13 molecules of greater than 1,000, greater than 2,000, greater than 5,000, greater than 10,000 greater than 20,000, greater than 50,000, greater than 100,000 or even more bases in length. In various embodiments, such sequence information
may be obtained during the course of a single sequencing run, using one molecule of template nucleic acid 13. In other embodiments, multiple copies of the template nucleic acid molecule 13 may be sequenced in parallel or sequentially to confirm the
nucleic acid 13 sequence or to obtain complete sequence data. In alternative embodiments, both the template strand 13 and its complementary strand may be sequenced to confirm the accuracy of the sequence information. Advantages over prior methods of
nucleic acid 13 sequencing include the ability to read long nucleic acid 13 sequences in a single sequencing run, greater speed of obtaining sequence data, decreased cost of sequencing and greater efficiency in terms of the amount of operator time
required per unit of sequence data generated.
In certain embodiments, the nucleic acid 13 to be sequenced is DNA, although it is contemplated that other nucleic acids 13 comprising RNA or synthetic nucleotide analogs could be sequenced as well. The following detailed description contains
numerous specific details in order to provide a more thorough understanding of the disclosed embodiments. However, it will be apparent to those skilled in the art that the embodiments may be practiced without these specific details. In other instances,
those devices, methods, procedures, and individual components that are well known in the art have not been described in detail herein.
Certain embodiments are illustrated in FIG. 1. FIG. 1 shows an apparatus 10 for nucleic acid 13 sequencing comprising a reaction chamber 11 and a detection unit 12. The reaction chamber 11 contains a nucleic acid (template) molecule 13 attached
to an immobilization surface 14 along with a synthetic reagent 15, such as a DNA polymerase. A primer molecule 16 that is complementary in sequence to the template molecule 13 is allowed to hybridize to the template molecule 13. Nucleotide precursors
17 are present in solution in the reaction chamber 11. For synthesis of a nascent DNA strand 16, the nucleotide precursors 17 must include at least one molecule each of deoxyadenosine-5'-triphosphate (dATP), deoxyguanosine-5'-triphosphate (dGTP),
deoxycytosine-5'-triphosphate (dCTP) and deoxythymidine-5'-triphosphate (dTTP). For synthesis of a nascent RNA strand 16, the nucleotide precursors 17 must comprise ATP, CTP, GTP and uridine-5'-triphosphate (UTP).
To initiate a sequencing reaction, the polymerase 15 adds one nucleotide precursor molecule 17 at a time to the 3' end of the primer 16, elongating the primer molecule 16. As the primer molecule 16 is extended, it is referred to as a nascent
strand 16. For each round of elongation, a single nucleotide precursor 17 is incorporated into the nascent strand 16. Because incorporation of nucleotide precursors 17 is determined by Watson-Crick base pair interactions with the template strand 13,
the sequence of the growing nascent strand 16 will be complementary to the sequence of the template strand 13. In Watson-Crick base pairing, an adenosine (A) residue on one strand is always paired with a thymidine (T) residue on the other strand, or a
uridine (U) residue if the strand is RNA. Similarly, a guanosine (G) residue on one strand is always paired with a cytosine (C) residue on the other strand. Thus, the sequence of the template strand 13 may be determined from the sequence of the nascent
strand 16.
FIG. 1 illustrates embodiments in which a single nucleic acid molecule 13 is contained in a single reaction chamber 11. In alternative embodiments, multiple nucleic acid molecules 13, each in a separate reaction chamber 11, may be sequenced
simultaneously. In such cases, the nucleic acid template 13 in each reaction chamber 11 may be identical or may be different. In other alternative embodiments, two or more template nucleic acid molecules 13 may be present in a single reaction chamber
11. In such embodiments, the nucleic acid molecules 13 will be identical in sequence. Where more than one template nucleic acid 13 is present in the reaction chamber 11, the Raman emission signals will represent an average of the nucleic acid
precursors 17 incorporated into all nascent strands 16 in the reaction chamber 11. The skilled artisan will be able to correct the signal obtained at any given time for synthetic reactions that either lag behind or precede the majority of reactions
occurring in the reaction chamber 11, using known data analysis techniques.
The skilled artisan will realize that depending on the polymerase molecule 15 used, the nascent strand 16 may contain some percentage of mismatched bases, where the newly incorporated base is not correctly hydrogen bonded with the corresponding
base in the template strand 13. In various embodiments, an accuracy of at least 90%, at least 95%, at least 98%, at least 99%, at least 99.5%, at least 99.8%, at least 99.9% or higher may be observed. The skilled artisan will be aware that certain
polymerases 15 have an error correction activity (also referred to as a 3' exonuclease or proof-reading activity) that acts to remove a newly incorporated nucleotide precursor 17 that is incorrectly base-paired to the template strand 13. In various
embodiments, polymerases 15 with or without a proof-reading activity may be employed. The skilled artisan will also be aware that certain polymerases 15, such as reverse transcriptase, have an inherently high error rate, allowing frequent incorporation
of mismatched bases. Depending on the embodiment, a polymerase 15 with either a higher or a lower inherent error rate may be selected. In certain embodiments, a polymerase 15 with the lowest possible error rate may be used. Polymerase 15 error rates
are known in the art.
The detection unit 12 comprises an excitation source 18, such as a laser, and a Raman spectroscopy detector 19. The excitation source 18 illuminates the reaction chamber 11 with an excitation beam 20. The excitation beam 20 interacts with the
nucleotide precursors 17, resulting in the excitation of electrons to a higher energy state. As the electrons return to a lower energy state, they emit a Raman emission signal that is detected by the Raman detector 19. Because the Raman emission signal
from each of the four types of nucleotide precursor 17 can be distinguished, the detection unit 12 is capable of measuring the amount of each type of nucleotide precursor 17 in the reaction chamber 11.
The incorporation of nucleotide precursors 17 into the growing nascent strand 16 results in a depletion of nucleotide precursors 17 from the reaction chamber 11. In order for the synthetic reaction to continue, a source of fresh nucleotide
precursors 17 may be required. This source is shown in FIG. 1 as a molecule dispenser 21. In alternative embodiments, a molecule dispenser 21 may or may not be part of the sequencing apparatus 10.
In certain embodiments, the molecule dispenser 21 is designed to release each of the four nucleotide precursors 17 in equal amounts, calibrated to the rate of synthesis of the nascent strand 16. However, nucleic acids 13 do not necessarily
exhibit a uniform distribution of A, T, G and C residues. In particular, certain regions of DNA molecules may be either AT rich or GC rich, depending on the species from which the DNA is obtained and the specific region of the DNA molecule being
sequenced. In alternative embodiments, the release of nucleotide precursors 17 from the molecule dispenser 21 is controlled, so that relatively constant concentrations of each type of nucleotide precursor 17 are maintained in the reaction chamber 11.
Such embodiments may utilize an information processing and control system that interfaces between the detection unit 12 and the molecule dispenser 21.
In embodiments involving an information processing and control system, such as a computer or microprocessor attached to or incorporating a data storage unit, data may be collected from a detector 19, such as a spectrometer or a monochromator
array. The information processing and control system may maintain a database associating specific Raman signatures with specific nucleotide precursors 17. The information processing and control system may record the signatures detected by the detector
19 and may correlate those signatures with the signatures of known nucleotide precursors 17. The information processing and control system may also maintain a record of nucleotide precursor 17 uptake that indicates the sequence of the template molecule
13. The information processing and control system may also perform standard procedures known in the art, such as subtraction of background signals.
In embodiments involving a molecule dispenser 21, the addition of nucleotide precursors 17 to the reaction chamber 11, simultaneously with the incorporation of nucleotide precursors 17 into the nascent strand 16 may result in a complex Raman
signal. In particular embodiments, the synthetic reaction may be allowed to run to completion or close to completion before additional nucleotide precursors 17 are added to the reaction chamber 11. In alternative embodiments, the addition of nucleotide
precursors 17 to the reaction chamber 11 may occur simultaneously with incorporation of nucleotide precursors 17 into the nascent strand 16. In such embodiments, the information processing and control system may be used to correct the data on nucleotide
precursor 17 concentration obtained from the Raman emission spectrum for the amount of nucleotide precursors 17 added by the molecule dispenser 21.
In certain embodiments, the reaction chamber 11 may contain a single molecule of each type of nucleotide precursor 17. In such embodiments, the release of nucleotide precursors 17 from the molecule dispenser 21 may be tightly linked to the
incorporation of nucleotide precursors 17 into the nascent strand 16, in order to avoid delays in the synthetic reaction due to the absence of a required nucleotide precursor 17.
Certain embodiments concern synthesis of a nascent strand 16 of DNA. The template strand 13 can be either RNA or DNA. With an RNA template strand 13, the synthetic reagent 15 may be a reverse transcriptase, examples of which are known in the
art. In embodiments where the template strand 13 is a molecule of DNA, the synthetic reagent 15 may be a DNA polymerase, examples of which are known in the art.
In other embodiments, the nascent strand 16 can be a molecule of RNA. This requires that the synthetic reagent 15 be an RNA polymerase. In these embodiments, no primer 16 is required. However, the template strand 13 must contain a promoter
sequence that is effective to bind RNA polymerase 15 and initiate transcription of an RNA nascent strand 16. The exact composition of the promoter sequence depends on the type of RNA polymerase 15 used. Optimization of promoter sequences to allow for
efficient initiation of transcription is within the skill in the art. The embodiments are not limited as to the type of template molecule 13 used, the type of nascent strand 16 synthesized, or the type of polymerase 15 utilized. Virtually any template
13 and any polymerase 15 that can support synthesis of a nucleic acid molecule 16 complementary in sequence to the template strand 13 may be used.
In some alternative embodiments, the nucleotide precursors 17 may be chemically modified with a tag. The tag has a unique and highly visible optical signature that can be distinguished for each of the common nucleotide precursors 17. In certain
embodiments, the tag may serve to increase the strength of the Raman emission signal or to otherwise enhance the sensitivity or specificity of the Raman detector 19 for nucleotide precursors 17. Non-limiting examples of tag molecules that could be used
for embodiments involving Raman spectroscopy include TRIT (tetramethyl rhodamine isothiol), NBD (7-nitrobenz-2-oxa-1,3-diazole), Texas Red dye, phthalic acid, terephthalic acid, isophthalic acid, cresyl fast violet, cresyl blue violet, brilliant cresyl
blue, para-aminobenzoic acid, erythrosine and aminoacridine. Other tag moieties that may be of use for particular embodiments include cyanide, thiol, chlorine, bromine, methyl, phosphorus and sulfur. In certain embodiments, carbon nanotubes may be of
use as Raman tags. The use of tags in Raman spectroscopy is known in the art (e.g., U.S. Pat. Nos. 5,306,403 and 6,174,677). The skilled artisan will realize that Raman tags should generate distinguishable Raman spectra when bound to different
nucleotide precursors 17, or different labels should be designed to bind only one type of nucleotide precursor 17.
In some embodiments, the tag exhibits an enhanced Raman signal. In alternative embodiments, tags that exhibit other types of signals, such as fluorescent or luminescent signals, may be employed. It is contemplated that alternative methods of
detection may be used in such embodiments, for example fluorescence spectroscopy or luminescence spectroscopy. Many alternative methods of detection of nucleotide precursors 17 in solution are known in the art and may be used. For such methods, the
Raman spectroscopic detector 19 may be replaced with a detector 19 designed to detect fluorescence, luminescence or other types of signals known in the art.
In certain embodiments, the template molecule 13 may be attached to a surface 14 such as functionalized glass, silicon, PDMS (polydimethlyl siloxane), silver or other metal coated surfaces, quartz, plastic, PTFE (polytetrafluoroethylene), PVP
(polyvinyl pyrrolidone), polystyrene, polypropylene, polyacrylamide, latex, nylon, nitrocellulose, a glass bead, a magnetic bead, or any other material known in the art that is capable of having functional groups such as amino, carboxyl, thiol, hydroxyl
or Diels-Alder reactants incorporated on its surface.
In some embodiments, functional groups may be covalently attached to cross-linking agents so that binding interactions between template strand 13 and polymerase 15 may occur without steric hindrance. Typical cross-linking groups include ethylene
glycol oligomers and diamines. Attachment may be by either covalent or non-covalent binding. Various methods of attaching nucleic acid molecules 13 to surfaces 14 are known in the art and may be employed.
Definitions
As used herein, "a" or "an" may mean one or more than one of an item.
"Nucleic acid" 13 means either DNA, RNA, single-stranded, double-stranded or triple stranded and any chemical modifications thereof, although single-stranded nucleic acids 13 are preferred. Virtually any modification of the nucleic acid 13 is
contemplated. As used herein, a single stranded nucleic acid 13 may be denoted by the prefix "ss", a double stranded nucleic acid by the prefix "ds", and a triple stranded nucleic acid by the prefix "ts."
A "nucleic acid" 13 may be of almost any length, from 10, 20, 30, 40, 50, 60, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, 500, 600, 700, 800, 900, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, 7000, 8000, 9000, 10,000,
15,000, 20,000, 30,000, 40,000, 50,000, 75,000, 100,000, 150,000, 200,000, 500,000, 1,000,000, 1,500,000, 2,000,000, 5,000,000 or even more bases in length, up to a full-length chromosomal DNA molecule 13.
A "nucleoside" is a molecule comprising a base (A, T, G, C or U) covalently attached to a pentose sugar such as deoxyribose, ribose or derivatives or analogs of pentose sugars.
A "nucleotide" refers to a nucleoside further comprising at least one phosphate group covalently attached to the pentose sugar. In some embodiments, the nucleotide precursors 17 are ribonucleoside triphosphates or deoxyribonucleoside
triphosphates. It is contemplated that various substitutions or modifications may be made in the structure of the nucleotide precursors 17, so long as they are still capable of being incorporated into the nascent strand 16 by the polymerase 15. For
example, in certain embodiments the ribose or deoxyribose moiety may be substituted with another pentose sugar or a pentose sugar analog. In other embodiments, the phosphate groups may be substituted by various groups, such as phosphonates, sulphates or
sulfonates. In still other embodiments, the purine or pyrimidine bases may be substituted by other purines or pyrimidines or analogs thereof, so long as the sequence of nucleotide precursors 17 incorporated into the nascent strand 16 reflects the
sequence of the template strand 13.
Nucleic Acids
Template molecules 13 may be prepared by any technique known to one of ordinary skill in the art. In certain embodiments, the template molecules 13 are naturally occurring DNA or RNA molecules, for example, chromosomal DNA or messenger RNA
(mRNA). Virtually any naturally occurring nucleic acid 13 may be prepared and sequenced by the disclosed methods including, without limit, chromosomal, mitochondrial or chloroplast DNA or ribosomal, transfer, heterogeneous nuclear or messenger RNA.
Nucleic acids 13 to be sequenced may be obtained from either prokaryotic or eukaryotic sources by standard methods known in the art.
Methods for preparing and isolating various forms of cellular nucleic acids 13 are known. (See, e.g., Guide to Molecular Cloning Techniques, eds. Berger and Kimmel, Academic Press, New York, N.Y., 1987; Molecular Cloning: A Laboratory Manual,
2nd Ed., eds. Sambrook, Fritsch and Maniatis, Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 1989). Generally, cells, tissues or other source material containing nucleic acids 13 to be sequenced are first homogenized, for example by freezing in
liquid nitrogen followed by grinding in a morter and pestle. Certain tissues may be homogenized using a War | | |