WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
DNA sequencing by stepwise ligation and cleavage    
United States Patent5714330   
Link to this pagehttp://www.wikipatents.com/5714330.html
Inventor(s)Brenner; Sydney (Cambridge, GB2); DuBridge; Robert B. (Belmont, CA)
AbstractThe invention provides a method of nucleic acid sequence analysis based on repeated cycles of ligation to and cleavage of probes at the terminus of a target polynucleotide. At each such cycle one or more terminal nucleotides are identified and one or more nucleotides are removed from the end of the target polynucleotide, such that further cycles of ligation and cleavage can take place. At each cycle the target sequence is shortened by one or more nucleotides until the nucleotide sequence of the target polynucleotide is determined. The method obviates electrophoretic separation of similarly sized DNA fragments and eliminates the difficulties associated with the detection and analysis of spatially overlapping bands of DNA fragments in a gel, or like medium. The invention further obviates the need to generate DNA fragments from long single stranded templates with a DNA polymerase.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Drawing from US Patent 5714330
DNA sequencing by stepwise ligation and cleavage - US Patent 5714330 Drawing
DNA sequencing by stepwise ligation and cleavage
Inventor     Brenner; Sydney (Cambridge, GB2); DuBridge; Robert B. (Belmont, CA)
Owner/Assignee     Lynx Therapeutics, Inc. (Hayward, CA)
Patent assignment
All assignments
Publication Date     February 3, 1998
Application Number     08/667,689
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 21, 1996
US Classification     435/6 435/91.1 435/91.2 435/91.52 435/91.53 536/24.3
Int'l Classification     C12Q 001/68 C12P 019/34 C07H 021/04 C07H 021/02
Examiner     Jones; W. Gary
Assistant Examiner     Tran; Paul B.
Attorney/Law Firm     Macevicz; Stephen C.
Address
Parent Case     This is a continuation-in-part of U.S. patent application Ser. No. 08/410,116 filed 24 Mar. 1995, now U.S. Pat. No. 5,599,675 which is a continuation-in-part of U.S. patent application Ser. No. 08/280,441 filed 25 Jul. 1994, now U.S. Pat. No. 5,552,278, which is a continuation-in-part of U.S. patent application Ser. No. 08/222,300 filed 4 Apr. 1994, now abandoned.
Priority Data    
USPTO Field of Search     435/6 435/91.1 435/91.2 435/91.52 435/91.53 536/24.3 935/77 935/78
Patent Tags     dna sequencing stepwise ligation cleavage
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5599675
Brenner
435/6
Feb,1997

[0 after 0 votes]
5552278
Brenner
435/6
Sep,1996

[0 after 0 votes]
5508169
Deugau
435/6
Apr,1996

[0 after 0 votes]
5403708
Brennan
435/6
Apr,1995

[0 after 0 votes]
5126239
Livak
435/6
Jun,1992

[0 after 0 votes]
5118605
Urdea
435/6
Jun,1992

[0 after 0 votes]
5102785
Livak
435/6
Apr,1992

[0 after 0 votes]
5093245
Keith
435/91.2
Mar,1992

[0 after 0 votes]
4775619
Urdea
435/6
Oct,1988

[0 after 0 votes]
4321365
Wu
536/24.2
Mar,1982

[0 after 0 votes]
4293652
Cohen
435/91.1
Oct,1981

[0 after 0 votes]
4237224
Cohen
435/69.1
Dec,1980

[0 after 0 votes]
5242794
Whiteley
435/6
Dec,1969

[0 after 0 votes]
4683202
Mullis
435/91.2
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method for determining a nucleotide sequence of a polynucleotide, the method comprising the steps of:

(a) ligating a probe to an end of a polynucleotide, the probe having a nuclease recognition site of a nuclease whose cleavage site is separate from its recognition site, and the polynucleotide having been replicated in the presence of 5-methydeoxylcytidine triphosphate;

(b) identifying one or more nucleotides at the end of the polynucleotide by the identity of the probe ligated thereto or by extending a strand of the polynucleotide or probe;

(c) cleaving the polynucleotide with a nuclease recognizing the nuclease recognition site of the probe such that the polynucleotide is shortened by one or more nucleotides; and

(d) repeating said steps (a) through (c) until said nucleotide sequence of the polynucleotide is determined.

2. The method of claim 1 wherein said nuclease is a type Its restriction endonuclease.

3. The method of claim 2 wherein said step of ligating includes treating said polynucleotide and said probe with a ligase.

4. The method of claim 3 wherein said polynucleotide has a protruding strand at at least one end and said probe has a protruding strand at at least one end, and wherein said step of ligating includes providing said probe as a mixture such that the protruding strand of said probe includes every possible sequence of nucleotides the length of the protruding strand.

5. The method of claim 4 wherein said protruding strand of said polynucleotide has a 5'-phosphoryl group and wherein said complementary protruding strand of said probe lacks a 5'-phosphoryl group.

6. The method of claim 5 wherein said step of ligating includes treating said polynucleotide and said probe in succession with (i) a ligase to ligate said protruding strand having said 5'-phosphoryl group to said probe, (ii) a kinase to phosphorylate said complementary protruding strand of said probe, and (iii) a ligase to ligate said complementary protruding strand of said probe to said polynucleotide.

7. The method of claim 4 further including the step of removing unligated probe from said polynucleotide after said step of ligating.

8. The method of claim 4 wherein said step of identifying includes identifying a first nucleotide in said protruding strand of said polynucleotide by the identity of said probe ligated thereto and identifying a second nucleotide in said protruding strand of said polynucleotide by extending a strand of said probe.

9. The method of claim 4 further including a step of capping said polynucleotide which fails to ligate to said probe.

10. The method of claim 9 wherein said step of capping includes extending said polynucleotide with a DNA polymerase in the presence of chain-terminating nucleoside triphosphates.

11. The method of claim 4 further including a step of capping said probe which fails to be cleaved from said polynucleotide.

12. The method of claim 11 wherein said step of capping includes treating said probe with a methylase.

13. The method of claim 11 wherein said step of capping includes providing said probe with a second nuclease recognition site and cleaving said probe with a second nuclease which recognizes the second recognition site such that said polynucleotide is rendered incapable of being ligated to said probe in a subsequent ligation step.

14. A method for determining a nucleotide sequence of a polynucleotide, the method comprising the steps of:

(a) ligating a probe to an end of a polynucleotide, the probe having a nuclease recognition site of a nuclease whose cleavage site is separate from its recognition site;

(b) identifying a plurality of nucleotides at the end of the polynucleotide by the identity of the probe ligated thereto or by extending a strand of the polynucleotide or probe;

(c) cleaving the polynucleotide with a nuclease recognizing the nuclease recognition site of the probe such that the polynucleotide is shortened by one or more nucleotides; and

(d) repeating said steps (a) through (c) until said nucleotide sequence of the polynucleotide is determined.

15. The method of claim 14 wherein said nuclease is a type IIs restriction endonuclease.

16. The method of claim 15 wherein said step of ligating includes treating said polynucleotide and said probe with a ligase.

17. The method of claim 16 wherein said polynucleotide has a protruding strand at at least one end and said probe has a protruding strand at at least one end, and wherein said step of ligating includes providing said probe as a mixture such that the protruding strand of said probe includes every possible sequence of nucleotides the length of the protruding strand.

18. The method of claim 17 wherein said protruding strand of said polynucleotide has a 5'-phosphoryl group and wherein said complementary protruding strand of said probe lacks a 5'-phosphoryl group.

19. The method of claim 18 wherein said step of ligating includes treating said polynucleotide and said probe in succession with (i) a ligase to ligate said protruding strand having said 5'-phosphoryl group to said probe, (ii) a kinase to phosphorylate said complementary protruding strand of said probe, and (iii) a ligase to ligate said complementary protruding strand of said probe to said polynucleotide.

20. The method of claim 17 further including the step of removing unligated probe from said polynucleotide after said step of ligating.

21. The method of claim 17 wherein said step of identifying includes identifying a first nucleotide in said protruding strand of said polynucleotide by the identity of said probe ligated thereto and identifying a second nucleotide in said protruding strand of said polynucleotide by extending a strand of said probe.

22. The method of claim 17 further including a step of capping said polynucleotide which fails to ligate to said probe.

23. The method of claim 22 wherein said step of capping includes extending said polynucleotide with a DNA polymerase in the presence of chain-terminating nucleoside triphosphates.

24. The method of claim 17 further including a step of capping said probe which fails to be cleaved from said polynucleotide.

25. The method of claim 24 wherein said step of capping includes treating said probe with a methylase.

26. The method of claim 24 wherein said step of capping includes providing said probe with a second nuclease recognition site and cleaving said probe with a second nuclease which recognizes the second recognition site such that said polynucleotide is rendered incapable of being ligated to said probe in a subsequent ligation step.

27. The method of claim 17 wherein said plurality is two.

28. The method of claim 27 wherein said protruding strands of said polynucleotide and said probe each have a length of two nucleotides.
 Description Submit all comments and votes
 


FIELD OF THE INVENTION

The invention relates generally to methods for determining the nucleotide sequence of a polynucleotide, and more particularly, to a method of step-wise removal and identification of terminal nucleotides of a polynucleotide.

BACKGROUND

Analysis of polynucleotides with currently available techniques provides a spectrum of information ranging from the confirmation that a test polynucleotide is the same or different than a standard sequence or an isolated fragment to the express identification and ordering of each nucleoside of the test polynucleotide. Not only are such techniques crucial for understanding the function and control of genes and for applying many of the basic techniques of molecular biology, but they have also become increasingly important as tools in genomic analysis and a great many non-research applications, such as genetic identification, forensic analysis, genetic counseling, medical diagnostics, and the like. In these latter applications both techniques providing partial sequence information, such as fingerprinting and sequence comparisons, and techniques providing full sequence determination have been employed, e.g. Gibbs et al, Proc. Natl. Acad. Sci., 86: 1919-1923 (1989); Gyllensten et al, Proc. Natl. Acad. Sci, 85: 7652-7656 (1988); Carrano et al, Genomics, 4: 129-136 (1989); Caetano-Anolles et al, Mol. Gen. Genet., 235: 157-165 (1992); Brenner and Livak, Proc. Natl. Acad. Sci., 86: 8902-8906 (1989); Green et al, PCR Methods and Applications, 1: 77-90 (1991); and Versalovic et al, Nucleic Acids Research, 19: 6823-6831 (1991).

Native DNA consists of two linear polymers, or strands of nucleotides. Each strand is a chain of nucleosides linked by phosphodiester bonds. The two strands are held together in an antiparallel orientation by hydrogen bonds between complementary bases of the nucleotides of the two strands: deoxyadenosine (A) pairs with thymidine (T) and deoxyguanosine (G) pairs with deoxycytidine (C).

Presently there are two basic approaches to DNA sequence determination: the dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad. Sci., 74: 5463-5467 (1977); and the chemical degradation method, e.g. Maxam et al, Proc. Natl. Acad. Sci., 74: 560-564 (1977). The chain termination method has been improved in several ways, and serves as the basis for all currently available automated DNA sequencing machines, e.g. Sanger et al, J. Mol. Biol., 143: 161-178 (1980); Schreier et al, J. Mol. Biol., 129: 169-172 (1979); Smith et al, Nucleic Acids Research, 13: 2399-2412 (1985); Smith et al, Nature, 321: 674-679 (1987); Prober et al, Science, 238: 336-341 (1987); Section II, Meth. Enzymol., 155: 51-334 (1987); Church et al, Science, 240: 185-188 (1988); Hunkapiller et al, Science, 254: 59-67 (1991); Bevan et al, PCR Methods and Applications, 1: 222-228 (1992).

Both the chain termination and chemical degradation methods require the generation of one or more sets of labeled DNA fragments, each having a common origin and each terminating with a known base. The set or sets of fragments must then be separated by size to obtain sequence information. In both methods, the DNA fragments are separated by high resolution gel electrophoresis, which must have the capacity of distinguishing very large fragments differing in size by no more than a single nucleotide. Unfortunately, this step severely limits the size of the DNA chain that can be sequenced at one time. Sequencing using these techniques can reliably accommodate a DNA chain of up to about 400-450 nucleotides, Bankier et al, Meth. Enzymol., 155: 51-93 (1987); and Hawkins et al, Electrophoresis, 13: 552-559 (1992).

Several significant technical problems have seriously impeded the application of such techniques to the sequencing of long target polynucleotides, e.g. in excess of 500-600 nucleotides, or to the sequencing of high volumes of many target polynucleotides. Such problems include i) the gel electrophoretic separation step which is labor intensive, is difficult to automate, and introduces an extra degree of variability in the analysis of data, e.g. band broadening due to temperature effects, compressions due to secondary structure in the DNA sequencing fragments, inhomogeneities in the separation gel, and the like; ii) nucleic acid polymerases whose properties, such as processivity, fidelity, rate of polymerization, rate of incorporation of chain terminators, and the like, are often sequence dependent; iii) detection and analysis of DNA sequencing fragments which are typically present in fmol quantities in spatially overlapping bands in a gel; iv) lower signals because the labeling moiety is distributed over the many hundred spatially separated bands rather than being concentrated in a single homogeneous phase, and v) in the case of single-lane fluorescence detection, the availability of dyes with suitable emission and absorption properties, quantum yield, and spectral resolvability, e.g. Trainor, Anal. Biochem., 62: 418-426 (1990); Connell et al, Biotechniques, 5: 342-348 (1987); Karger et al, Nucleic Acids Research, 19: 4955-4962 (1991); Fung et al, U.S. Pat. No. 4,855,225; and Nishikawa et al, Electrophoresis, 12: 623-631 (1991).

Another problem exists with current technology in the area of diagnostic sequencing. An ever widening array of disorders, susceptibilities to disorders, prognoses of disease conditions, and the like, have been correlated with the presence of particular DNA sequences, or the degree of variation (or mutation) in DNA sequences, at one or more genetic loci. Examples of such phenomena include human leukocyte antigen (HLA) typing, cystic fibrosis, tumor progression and heterogeneity, p53 proto-oncogene mutations, ras proto-oncogene mutations, and the like, e.g. Gyllensten et al, PCR Methods and Applications, 1: 91-98 (1991); Santamaria et al, International application PCT/US92/01675; Tsui et al, International application PCT/CA90/00267; and the like. A difficulty in determining DNA sequences associated with such conditions to obtain diagnostic or prognostic information is the frequent presence of multiple subpopulations of DNA, e.g. allelic variants, multiple mutant forms, and the like. Distinguishing the presence and identity of multiple sequences with current sequencing technology is virtually impossible, without additional work to isolate and perhaps clone the separate species of DNA.

A major advance in sequencing technology could be made if an alternative approach was available for sequencing DNA that did not required high resolution separations, provided signals more amenable to analysis, and provided a means for readily analyzing DNA from heterozygous genetic loci.

SUMMARY OF THE INVENTION

The invention provides a method of nucleic acid sequence analysis based on ligation and cleavage of probes at the terminus of a target polynucleotide. Preferably, repeated cycles of such ligation and cleavage are implemented in the method, and in each such cycle a nucleotide is identified at the end of the target polynucleotide and the target polynucleotide is shortened, such that further cycles of ligation, cleavage, and identification can take place. That is, preferably, in each cycle the target sequence is shortened by a single nucleotide and the cycles are repeated until the nucleotide sequence of the target polynucleotide is determined.

An important feature of the invention is the probe employed in the ligation and cleavage events. A probe of the invention is a double stranded polynucleotide which (i) contains a recognition site for a nuclease, and (ii) preferably has a protruding strand capable of forming a duplex with a complementary protruding strand of the target polynucleotide. At each cycle in the latter embodiment, only those probes whose protruding strands form perfectly matched duplexes with the protruding strand of the target polynucleotide are ligated to the end of the target polynucleotide to form a ligated complex. After removal of the unligated probe, a nuclease recognizing the probe cuts the ligated complex at a site one or more nucleotides from the ligation site along the target polynucleotide leaving an end, usually a protruding strand, capable of participating in the next cycle of ligation and cleavage. An important feature of the nuclease is that its recognition site be separate from its cleavage site. As is described more fully below, in the course of such cycles of ligation and cleavage, the terminal nucleotides of the target polynucleotide are identified.

In one aspect of the invention, more than one nucleotide at the terminus of a target polynucleotide can be identified and/or cleaved during each cycle of the method.

Generally, the method of the invention comprises the following steps: (a) ligating a probe to an end of the polynucleotide, the probe having a nuclease recognition site; (b) identifying one or more nucleotides at the end of the polynucleotide; (c) cleaving the polynucleotide with a nuclease recognizing the nuclease recognition site of the probe such that the polynucleotide is shortened by one or more nucleotides; and (d) repeating steps (a) through (c) until the nucleotide sequence of the polynucleotide is determined. As is described more fully below, the order of steps (a) through (c) may vary with different embodiments of the invention. For example, identifying the one or more nucleotides can be carried out either before or after cleavage of the ligated complex from the target polynucleotide. Likewise, ligating a probe to the end of the polynucleotide may follow the step of identifying in some preferred embodiments of the invention. Preferably, the method further includes a step of removing the unligated probe after the step of ligating.

Preferably, whenever natural protein endonucleases are employed as the nuclease, the method further includes a step of methylating the target polynucleotide at the start of a sequencing operation to prevent spurious cleavages at internal recognition sites fortuitously located in the target polynucleotide.

The present invention overcomes many of the deficiencies inherent to current methods of DNA sequencing: there is no requirement for the electrophoretic separation of closely-sized DNA fragments; no difficult-to-automate gel-based separations are required; no polymerases are required for generating nested sets of DNA sequencing fragments; detection and analysis are greatly simplified because signal-to-noise ratios are much more favorable on a nucleotide-by-nucleotide basis, permitting smaller sample sizes to be employed; and for fluorescent-based detection schemes, analysis is further simplified because fluorophores labeling different nucleotides may be separately detected in homogeneous solutions rather than in spatially overlapping bands.

The present invention is readily automated, both for small-scale serial operation and for large-scale parallel operation, wherein many target polynucleotides or many segments of a single target polynucleotide are sequenced simultaneously. Unlike present sequencing approaches, the progressive nature of the method-that is, determination of a sequence nucleotide-by-nucleotide--permits one to monitor the progress of the sequencing operation in real time which, in turn, permits the operation to be curtailed, or re-started, if difficulties arise, thereby leading to significant savings in time and reagent usage. Also unlike current approaches, the method permits the simultaneous determination of allelic forms of a target polynucleotide: As described more fully below, if a population of target polynucleotides consists of several subpopulations of distinct sequences, e.g. polynucleotides from a heterozygous genetic locus, then the method can identify the proportion of each nucleotide at each position in the sequence.

Generally, the method of the invention is applicable to all tasks where DNA sequencing is employed, including medical diagnostics, genetic mapping, genetic identification, forensic analysis, molecular biology research, and the like.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a illustrates a preferred structure of a labeled probe of the invention.

FIG. 1b illustrates a probe and terminus of a target polynucleotide wherein a separate labeling step is employed to identify one or more nucleotides in the protruding strand of a target polynucleotide.

FIG. 1c illustrates steps of an embodiment wherein a nucleotide of the target polynucleotide is identified by extension with a polymerase in the presence of labeled dideoxynucleoside triphosphates followed by their excision, strand extension, and strand displacement.

FIG. 1d diagrammatically illustrates an embodiment in which nucleotide identification is carried out by polymerase extension of a probe strand in the presence of labeled chain-terminating nucleoside triphosphates.

FIG. 1e diagrammatically illustrates an embodiment in which nucleotide identification is carried out by polymerase extension in the presence of unlabeled chain-terminating 3'-amino nucleoside triphosphates followed by ligation of a labeled probe.

FIG. 1f illustrates probe assembly at the end of a target polynucleotide having a 5' protruding strand.

FIG. 1g illustrates probe assembly at the end of a target polynucleotide having a 3' protruding strand.

FIG. 1h illustrates an embodiment employing a probe for identifying two nucleotides of a target polynucleotide in each cycle of ligation and cleavage.

FIG. 2 illustrates the relative positions of the nuclease recognition site, ligation site, and cleavage site in a ligated complex.

FIGS. 3a through 3h diagrammatically illustrate the embodiment referred to herein as "double stepping," or the simultaneous use of two different nucleases in accordance with the invention.

FIGS. 4a through 4d illustrate data showing the fidelity of nucleotide identification through ligation with a ligase.

FIGS. 5a through 5c illustrate data showing nucleotide identification through polymerase extension.

DEFINITIONS

As used herein "sequence determination" or "determining a nucleotide sequence" in reference to polynucleotides includes determination of partial as well as full sequence information of the polynucleotide. That is, the term includes sequence comparisons, fingerprinting, and like levels of information about a target polynucleotide, as well as the express identification and ordering of nucleosides, usually each nucleoside, in a target polynucleotide. The term also includes the determination of the identification, ordering, and locations of one, two, or three of the four types of nucleotides within a target polynucleotide. For example, in some embodiments sequence identification may be effected by identifying the ordering and locations of a single type of nucleotide, e.g. cytosines, within a target polynucleotide so that its sequence is represented as a binary code, e.g. "100101 . . . " for "C-(not C)-(not C)-C-(not C)-C . . . " and the like.

"Perfectly matched duplex" in reference to the protruding strands of probes and target polynucleotides means that the protruding strand from one forms a double stranded structure with the other such that each nucleotide in the double stranded structure undergoes Watson-Crick base pairing with a nucleotide on the opposite strand. The term also comprehends the pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine bases, and the like, that may be employed to reduce the degeneracy of the probes.

The term "oligonucleotide" as used herein includes linear oligomers of nucleosides or analogs thereof, including deoxyribonucleosides, ribonucleosides, and the like. Usually oligonucleotides range in size from a few monomeric units, e.g. 3-4, to several hundreds of monomeric units. Whenever an oligonucleotide is represented by a sequence of letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'.fwdarw.3' order from left to right and that "A" denotes deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, unless otherwise noted.

As used herein, "nucleoside" includes the natural nucleosides, including 2'-deoxy and 2'-hydroxyl forms, e.g. as described in Kornberg and Baker, DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to nucleosides includes synthetic nucleosides having modified base moieties and/or modified sugar moieties, e.g. described generally by Scheit, Nucleotide Analogs (John Wiley, New York, 1980). Such analogs include synthetic nucleosides designed to enhance binding properties, reduce degeneracy, increase specificity, and the like.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides a method of sequencing nucleic acids which obviates electrophoretic separation of similarly sized DNA fragments and which eliminates the difficulties associated with the detection and analysis of spatially overlapping bands of DNA fragments in a gel or like medium. Moreover, the invention obviates the need to generate DNA fragments from long single stranded templates with a DNA polymerase.

Structure of Probes

As mentioned above an important feature of the invention are the probes ligated to the target polynucleotide. Generally, the probes of the invention provide a "platform" from which a nuclease cleaves the target polynucleotide to which probe is ligated. Probes of the invention can also provide a means for identifying or labeling a nucleotide at the end of the target polynucleotide. Probes do not necessarily provide both functions in every embodiment.

In one aspect of the invention, probes have the form illustrated in FIG. 1a. In this embodiment, probes are double stranded segments of DNA having a protruding strand at one end 10, at least one nuclease recognition site 12, and a spacer region 14 between the recognition site and the protruding end 10. Preferably, probes also include a label 16, which in this particular embodiment is illustrated at the end opposite of the protruding strand. The probes may be labeled by a variety of means and at a variety of locations, the only restriction being that the labeling means selected does not interfere with the ligation step or with the recognition of the probe by the nuclease.

In the above embodiment, whenever a nuclease leaves a 5' phosphate on the terminus of the target polynucleotide, it is sometimes desirable to remove it, e.g. by treatment with a standard phosphatase, prior to ligation. This prevents undesired ligation of one of the strands, when the protruding strands of the probe and target sequence fail to form a perfectly matched duplex. This is particularly problematic when a mismatch occurs precisely at the nucleotide position where identification is sought. Where such phosphatase treatment is employed, the "nick" remaining in the ligated complex after the initial ligation can be repaired by kinase treatment followed by a second ligation step.

Preferably, embodiments of the invention employing the above type of probe comprise the following steps: (a) ligating a probe to an end of the polynucleotide having a protruding strand to form a ligated complex, the probe having a complementary protruding strand to that of the polynucleotide and the probe having a nuclease recognition site; (b) identifying one or more nucleotides in the protruding strand of the polynucleotide, e.g. by the identity of the ligated probe; (c) cleaving the ligated complex with a nuclease; and (d) repeating steps (a) through (c) until the nucleotide sequence of the polynucleotide is determined. The step of identifying can take place either before or after the step of cleaving. Preferably, the one or more nucleotides in the protruding strand of the polynucleotide are identified prior to cleavage. In further preference, the method also includes a step of removing unligated probe from the ligated complex.

It is not critical whether protruding strand 10 of the probe is a 5' or 3' end. However, in this embodiment, it is important that the protruding strands of the target polynucleotide and probes be capable of forming perfectly matched duplexes to allow for specific ligation. If the protruding strands of the target polynucleotide and probe are different lengths the resulting gap can be filled in by a polymerase prior to ligation, e.g. as in "gap LCR" disclosed in Backman et al, European patent application 91100959.5. Such gap filling can be used as a means for identifying one or more nucleotides in the protruding strand of the target polynucleotide. Preferably, the number of nucleotides in the respective protruding strands are the same so that both strands of the probe and target polynucleotide are capable of being ligated without a filling step. Preferably, the protruding strand of the probe is from 2 to 6 nucleotides long. As indicated below, the greater the length of the protruding strand, the greater the complexity of the probe mixture that is applied to the target polynucleotide during each ligation and cleavage cycle.

In another aspect of the invention, the primary function of the probe is to provide a site for a nuclease to bind to the ligated complex so that the complex can be cleaved and the target polynucleotide shortened. In this aspect of the invention, identification of the nucleotides can take place separately from probe ligation and cleavage. This embodiment provides several advantages: First, sequence determination does not require that the protruding strand of the ligated probe be perfectly complementary to the protruding strand of the target polynucleotide, thereby permitting greater flexibility in the control of hybridization stringency. Second, one need not provide a fully degenerate set of probes based on the four natural nucleotides. So-called "wild card" nucleotides, or "degeneracy reducing analogs" can be provided to significantly reduce, or even eliminate, the complexity of the probe mixture employed in the ligation step, since specific binding is not critical to nucleotide identification in this embodiment. Third, if identification is not carried out via a labeling means on the probe, then probes designed for blunt end ligation may be employed with no need for using degenerate mixtures.

Preferably, this embodiment of the invention comprises the following steps: (a) providing a polynucleotide having a protruding strand; (b) identifying one or more nucleotides in the protruding strand by extending a 3' end of a strand with a nucleic acid polymerase, (c) ligating a probe to an end of the polynucleotide to form a ligated complex; (d) cleaving the ligated complex with a nuclease; and (e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide is determined. Preferably, the target polynucleotide has a 3' recessed strand which is extended by the nucleic acid polymerase in the presence of chain-terminating nucleoside triphosphates, and the nuclease used produces a 3'-recessed strand and 5' protruding strand at the terminus of the target polynucleotide.

An example of this embodiment is illustrated in FIG. 1b: The 3' recessed strand of polynucleotide (15) is extended with a nucleic acid polymerase in the presence of the four dideoxynucleoside triphosphates, each carrying a distinguishable fluorescent label, so that the 3' recessed strand is extended by one nucleotide (11), which permits its complementary nucleotide in the 5' protruding strand of polynucleotide (15) to be identified. Probe (9) having recognition site (12), spacer region (14), and complementary protruding strand (10), is then ligated to polynucleotide (15) to form ligated complex (17). Ligated complex (17) is then cleaved at cleavage site (19) to release a labeled fragment (21) and augmented probe (23). A shortened polynucleotide (15) with a regenerated 3' recessed strand is then ready for the next cycle of identification, ligation, and cleavage.

In such embodiments, the first nucleotide of the 5' protruding strand adjacent to the double stranded portion of the target polynucleotide is readily identified by extending the 3' strand with a nucleic acid polymerase in the presence of chain-terminating nucleoside triphosphates. Preferably, the 3' strand is extended by a nucleic acid polymerase in the presence of the four chain-terminating nucleoside triphosphates, each being labeled with a distinguishable fluorescent dye so that the added nucleotide is readily identified by the color of the attached dye. Such chain-terminating nucleoside triphosphates are available commercially, e.g. labeled dideoxynucleoside triphosphates, such as described by Hobbs, Jr. et al, U.S. Pat. No. 5,047,519; Cruickshank, U.S. Pat. No. 5,091,519; and the like. Procedures for such extension reactions are described in various publications, including Syvanen et al, Genomics, 8: 684-692 (1990); Goelet et al, International Application No. PCT/US92/01905; Livak and Brenner, U.S. Pat. No. 5,102,785; and the like.

A probe may be ligated to the target polynucleotide using conventional procedures, as described more fully below. Preferably, the probe is ligated after a single nucleotide extension of the 3' strand of the target polynucleotide. More preferably, the number of nucleotides in the protruding strand of the probe is the same as the number of nucleotides in the protruding strand of the target polynucleotide after the extension step. That is, if the nuclease provides a protruding strand having four nucleotides, then after the extension step the protruding strand will have three nucleotides and the protruding strand of the preferred probe will have three nucleotides.

The cleavage step in this embodiment may be accomplished by a variety of techniques, depending on the effect that the added chain-terminating nucleotide has on the efficiencies of the nuclease and/or ligase employed. Preferably, a ligated complex is formed with the presence of the labeled chain-terminating nucleotide, which is subsequently cleaved with the appropriate nuclease, e.g. a class IIs restriction endonuclease, such as Fok I, or the like.

In a preferred embodiment, after extension and ligation, the chain-terminating nucleotide may be excised. Preferably, this is carried out by the 3'.fwdarw.5' exonuclease activity (i.e. proof-reading activity) of a DNA polymerase, e.g. T4 DNA polymerase, acting in the pr