WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Methods of preparing probe array by hybridation    

Get related patents on CD
United States Patent5631134   
Link to this pagehttp://www.wikipatents.com/5631134.html
Inventor(s)Cantor; Charles R. (Boston, MA)
AbstractThis invention is directed to methods for determining a nucleotide sequence of a nucleic acid using positional sequencing by hybridization, and to the creation of nucleic acids probes which may be used with these methods. This invention is also directed to diagnostic aids for analyzing the nucleic acid composition and content of biological samples, including samples derived from medical and agricultural sources.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History Custom Search
Drawing from US Patent 5631134
Methods of preparing probe array by hybridation - US Patent 5631134 Drawing
Methods of preparing probe array by hybridation
Inventor     Cantor; Charles R. (Boston, MA)
Owner/Assignee     The Trustees of Boston University (Boston, MA)
Patent assignment
All assignments
Company News
Publication Date     May 20, 1997
Application Number     08/462,704
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 5, 1995
US Classification     435/6 435/91.1 435/91.2 435/91.52 536/22.1 536/24.3
Int'l Classification     C07H 021/04 C12P 019/34 C12Q 001/68
Examiner     Zitomer; Stephanie W.
Assistant Examiner     Tran; Paul B.
Attorney/Law Firm     Remenick; James Baker & Botts, L.L.P.
Address
Parent Case     REFERENCE TO RELATED APPLICATIONS This U.S. patent application is a divisional of U.S. patent application, Ser. No. 08/322,526, filed Oct. 17, 1994, which issued as U.S. Pat. No. 5,503,980, on Apr. 2, 1996; which is a continuation of abandoned U.S. patent application Ser. No. 07/972,012, filed Nov. 6, 1992.
Priority Data    
USPTO Field of Search     435/6 435/91.1 435/91.2 435/91.52 935/77 935/78 536/22.1 536/20.3
Patent Tags     methods preparing probe array hybridation
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5219726
Evans
435/6
Jun,1993

[0 after 0 votes]
5202231
Drmanac
435/6
Apr,1993

[0 after 0 votes]
5137806
LeMaistre
435/6
Aug,1992

[0 after 0 votes]
5114839
Blocker
435/6
May,1992

[0 after 0 votes]
5112736
Caldwell
435/6
May,1992

[0 after 0 votes]
5112734
Kramer
435/6
May,1992

[0 after 0 votes]
5106727
Hartley
435/6
Apr,1992

[0 after 0 votes]
5073483
Lebacq

Dec,1991

[0 after 0 votes]
5068176
Vijg

Nov,1991

[0 after 0 votes]
5002867
Macevicz
435/6
Mar,1991

[0 after 0 votes]
4808520
Dattagupta
435/6
Feb,1989

[0 after 0 votes]
5149625
Church
435/6
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B

[0 market size comments]
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 market share comments]
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%

[0 reasonable royalty comments]
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

[0 Guesstimation of Royalty Value Comments]
License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 license availability comments]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
[0 owner/assignee comments]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

[0 competitive advantage comments]
Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

[0 commercial alternatives comments]
 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


I claim:

1. A method for creating a nucleic acid probe array comprising the steps of:

a) hybridizing a plurality of single-stranded first nucleic acids to a plurality of longer, single-stranded second nucleic acids complementary to the first nucleic acids wherein each second nucleic acid contains a variable terminal nucleotide sequence to form probes each having a double-stranded portion and a single-stranded portion with the variable nucleotide sequence in the single-stranded portion;

b) hybridizing target nucleic acids to the probes;

c) ligating hybridized targets to the first nucleic acids of the probes;

d) isolating the second nucleic acids; and

e) hybridizing additional first nucleic acids to isolated second nucleic acids to form the nucleic acid probe array.

2. The method of claim 1 wherein each first nucleic acid is about 15 to about 25 nucleotides in length and each second nucleic acid is about 20 to about 30 nucleotides in length.

3. The method of claim 1 wherein the double-stranded portion contains an enzyme recognition site.

4. The method of claim 1 further comprising the steps of hybridizing ligated probes with an array of oligonucleotides containing variable nucleotide sequences and ligating hybridized oligonucleotides to the second nucleic acids of the probes.

5. The method of claim 1 further comprising the step of enzymatically extending the second nucleic acids after hybridization with targets using the targets as a template.

6. The method of claim 1 further comprising a step of fixing any of said nucleic acids to a solid support.

7. The method of claim 6 wherein the solid support is a plastic, a ceramic, a metal, a resin, a gel, or a membrane.

8. A method for creating a nucleic acid probe array comprising the steps of:

a) hybridizing a plurality of single-stranded first nucleic acids to a plurality of longer, single-stranded second nucleic acids complementary to the first nucleic acids wherein each second nucleic acid contains a variable terminal nucleotide sequence to form probes each having a double-stranded portion and a single-stranded portion with the variable nucleotide sequence in the single-stranded portion;

b) hybridizing target nucleic acids to the probes;

c) ligating hybridized targets to the first nucleic acids of the probes;

d) hybridizing ligated probes to an array of oligonucleotides that contain variable nucleotide sequences;

e) ligating hybridized oligonucleotides to the second nucleic acids of the probes;

f) isolating ligated second nucleic acids; and

g) hybridizing additional first nucleic acids to isolated second nucleic acids to form the nucleic acid probe array.

9. The method of claim 8 wherein each first nucleic acid is about 15 to about 25 nucleotides in length and each second nucleic acid is about 20 to about 30 nucleotides in length.

10. The method of claim 8 wherein the double-stranded portion contains an enzyme recognition site.

11. The method of claim 8 wherein the oligonucleotides are each about 4 to about 20 nucleotides in length.

12. The method of claim 8 which further comprising a step of fixing any of said nucleic acids to a solid support.

13. The method of claim 12 wherein the solid support is a plastic, ceramic, a metal, a resin, a gel, or a membrane.

14. A method for creating a nucleic acid probe array comprising the steps of:

a) hybridizing a plurality of single-stranded first nucleic acids to a plurality of longer, single-stranded second nucleic acids complementary to the first nucleic acids wherein each second nucleic acid contains a variable terminal nucleotide sequence to form probes having a double-stranded portion and a single-stranded portion with the variable nucleotide sequence in the single-stranded portion;

b) hybridizing target nucleic acids to the probes;

c) ligating hybridized targets to the first nucleic acids of the probes;

d) enzymatically extending each second nucleic acid using the target as a template;

e) isolating extended second nucleic acids; and

f) hybridizing additional first nucleic acids to isolated second nucleic acids to form the nucleic acid probe array.

15. The method of claim 14 wherein each first nucleic acid is about 15 to about 25 nucleotides in length and each second nucleic acid is about 20 to about 30 nucleotides in length.

16. The method of claim 14 wherein the double-stranded portion contains an enzyme recognition site.

17. The method of claim 10 further comprising a step of fixing any of said nucleic acids to a solid support.

18. The method of claim 17 wherein the solid support is a plastic, ceramic, a metal, a resin, a gel, or a membrane.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

1. Technical Field

This invention is directed to methods for sequencing nucleic acids by positional hybridization, to procedures combining these methods with more conventional sequencing techniques, to the creation of probes useful for nucleic acid sequencing by positional hybridization, to diagnostic aids useful for screening biological samples for nucleic acid variations, and to methods for using these diagnostic aids.

2. Description of the Prior Art

Since the recognition of nucleic acid as the carrier of the genetic code, a great deal of interest has centered around determining the sequence of that code in the many forms which it is found. Two landmark studies made the process of nucleic acid sequencing, at least with DNA, a common and relatively rapid procedure practiced in most laboratories. The first describes a process whereby terminally labeled DNA molecules are chemically cleaved at single base repetitions (A. M. Maxim and W. Gilbert, Proc. Natl. Acad. Sci. USA 74:560-564, 1977). Each base position in the nucleic acid sequence is then determined from the molecular weights of fragments produced by partial cleavages. Individual reactions were devised to cleave preferentially at guanine, at adenine, at cytosine and thymine, and at cytosine alone. When the products of these four reactions are resolved by molecular weight, using, for example, polyacrylamide gel electrophoresis, DNA sequences can be read from the pattern of fragments on the resolved gel.

The second study describes a procedure whereby DNA is sequenced using a variation of the plus-minus method (F. Sanger et al., Proc. Natl. Acad. Sci. USA 74:5463-67, 1977). This procedure takes advantage of the chain terminating ability of dideoxynucleoside triphosphates (ddNTPs) and the ability of DNA polymerase to incorporate ddNTP with nearly equal fidelity as the natural substrate of DNA polymerase, deoxynucleosides triphosphates (dNTPs). Briefly, a primer, usually an oligonudeotide, and a template DNA are incubated together in the presence of a useful concentration of all four dNTPs plus a limited amount of a single ddNTP. The DNA polymerase occasionally incorporates a dideoxynucleotide which terminates chain extension. Because the dideoxynudeotide has no 3'-hydroxyl, the initiation point for the polymerase enzyme is lost. Polymerization produces a mixture of fragments of varied sizes, all having identical 3' terminal. Fractionation of the mixture by, for example, polyacrylamide gel electrophoresis, produces a pattern which indicates the presence and position of each base in the nucleic acid. Reactions with each of the four ddNTPs allows one of ordinary skill to read an entire nucleic acid sequence from a resolved gel.

Despite their advantages, these procedures are cumbersome and impractical when one wishes to obtain megabases of sequence information. Further, these procedures are, for all practical purposes, limited to sequencing DNA. Although variations have developed, it is still not possible using either process to obtain sequence information directly from any other form of nucleic acid.

A new method of sequencing has been developed which overcomes some of the problems associated with current methodologies wherein sequence information is obtained in multiple discrete packages by hybridization. Instead of having a particular nucleic acid sequenced one base at a time, groups of contiguous bases are determined simultaneously. Advantages in speed, expense and accuracy are clear.

Two general approaches of sequencing by hybridization have been suggested. Their practicality has been demonstrated in pilot studies. In one format, a complete set of 4.sup.n nucleotides of length n is immobilized as an ordered array on a solid support and an unknown DNA sequence is hybridized to this array (K. R. Khrapko et al., J. DNA Sequencing and Mapping 1:375-88, 1991). The resulting hybridization pattern provides all n-tuple words in the sequence. This is sufficient to determine short sequences except for simple tandem repeats.

In the second format, an array of immobilized samples is hybridized with one short oligonudeotide at a time (Z. Strezoska et al., Proc. Natl. Acad. Sci. USA 88: 10,089-93, 1991). When repeated N.sup.4 times for each oligonucleotide of length n, much of the sequence of all the immobilized samples would be determined. In both approaches, the intrinsic power of the method is that many sequenced regions are determined in parallel. In actual practice the array size is about 10.sup.4 to 10.sup.5.

Another powerful aspect of the method is that information obtained is quite redundant, especially as the size of the nucleic acid probe grows. Mathematical simulations have shown that the method is quite resistant to experimental errors and that far fewer than all probes are necessary to determine reliable sequence data (P. A. Pevzner et al., J. Biomol. Struc. & Dyn. 9:399-410, 1991; W. Bains, Genomics 11:295-301, 1991).

In spite of an overall optimistic outlook, there are still a number of potentially severe drawbacks to actual implementation of sequencing by hybridization. First and foremost among these is that 4.sup.n rapidly becomes quite a large number if chemical synthesis of all of the oligonucleotide probes is actually contemplated. Various schemes of automating this synthesis and compressing the products into a small scale array, a sequencing chip, have been proposed.

A second drawback is the poor level of discrimination between a correctly hybridized, perfectly matched duplexes, and an end mismatch. In part, these drawbacks have been addressed at least to a small degree by the method of continuous stacking hybridization as reported by a Khrapko et al. (FEBS Lett. 256:118-22, 1989). Continuous stacking hybridization is based upon the observation that when a single stranded oligonucleotide is hybridized adjacent to a double stranded oligonucleotide, the two duplexes are mutually stabilized as if they are positioned side to side due to a stacking contact between them. The stability of the interaction decreases significantly as stacking is disrupted by nucleotide displacement, gap, or terminal mismatch. Internal mismatches are presumably ignorable because their thermodynamic stability is so much less than perfect matches. Although promising, a related problem arise which is distinguishing between weak but correct duplex formation and simple background such as non-specific adsorption of probes to the underlying support matrix.

A third drawback is that detection is monochromatic. Separate sequential positive and negative controls must be run to discriminate between a correct hybridization match, a mis-match, and background.

A fourth drawback is that ambiguities develop in reading sequences longer than a few hundred base pairs on account of sequence recurrences. For example, if a sequence the same length of the probe recurs three times in the target, the sequence position cannot be uniquely determined. The locations of these sequence ambiguities are called branch points.

A fifth drawback is the effect of secondary structures in the target nucleic acid. This could lead to blocks of sequences that are unreadable if the secondary structure is more stable than occurs on the complementary strand.

A final drawback is the possibility that certain probes will have anomalous behavior and for one reason or another, be recalcitrant to hybridization under whatever standard sets of conditions that are ultimately used. A simple example of this is the difficulty in finding matching conditions for probes rich in G/C content. A more complex example could be sequences with a high propensity to form triple helices. The only way to rigorously explore these possibilities is to carry out extensive hybridization studies with all possible oligonucleotides of length n, under the particular format and conditions chosen. This is clearly impractical if many sets of conditions are involved.

SUMMARY OF THE INVENTION

The present invention overcomes the problems and disadvantages associated with current strategies and design and provides a new method for rapidly and accurately determining the nucleotide sequence of a nucleic acid by the herein described methods of positional sequencing by hybridization.

As broadly described herein, this invention is directed to a rapid, accurate, and reproducible method of sequencing a nucleic acid by hybridizing that nucleic acid with a set of nucleic acid probes containing random, but determinable sequences within the single stranded portion adjacent to a double stranded portion wherein the single stranded portion of the set preferably comprises every possible combination of sequences over a predetermined range. Hybridization occurs by complementary recognition of the single stranded portion of a target with the single stranded portion of the probe and is thermodynamically favored by the presence of adjacent double strandedness of the probe.

As broadly described herein, another object of this invention is the integration of molecular biology techniques to the method of positional sequencing by hybridization. This includes such techniques as the use of exonucleases to partially cleave the target nucleic acid prior to hybridization, and the use of polymerase to extend one strand of a target hybridized probe using the target as a template. Polymerization can be of a single nucleotide or of a sequence of nucleotides, as determined by known methods which are easily applied by one of ordinary skill in the art.

As broadly described herein, another object of the present invention is the creation of nucleic acid probes for determining the sequence of an unknown nucleic acid. These probes comprise a double stranded portion, which is preferably constant, a single stranded portion, and a determinable random nucleotide sequence within the single stranded portion which hybridizes to the target. Probes may comprise a complete set of all possible sequences of the random single stranded portion or a set comprising only a portion of all possible combinations.

As broadly described herein, another object of the present invention is the use of nucleic acid probes as diagnostic aids in the analysis of nucleic acids of a biological sample. The invention includes diagnostic aids and methods for using diagnostic aids for the analysis of the relatedness or unrelatedness of one nucleic acid to another. Probes may be created in which an unknown or undetermined nucleotide sequence has been identified as the source of a mutation or genetic variation. Probes created herein may be used to quickly, easily, and accurately identify that mutation or variation without having to perform a single conventional sequencing reaction.

As broadly described herein, another object of this invention is a method for determining the position of a partial sequence within the whole nucleic acid by labeling the nucleic acid of interest at one terminal site with a first detectable label, labeling the nucleic acid of interest at an internal site with a second detectable label, and comparing the relative mounts of the first label with the relative amounts of the second label to determine the position of the partial sequence.

Other objects and advantages of the invention are set forth in part in the description which follows, and in part, will be obvious from this description, or may be learned from the practice of this invention. The accompanying drawings which are incorporated in and constitute a part of this specification, illustrate and, together with this description, serve to explain the principle of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 (A) Shown is the first step of the basic scheme for positional sequencing by hybridization depicting the hybridization of target nucleic acid with probe forming a 5' overhang of the target. (B) Shown is the first step of the alternate scheme for positional sequencing by hybridization depicting the hybridization of target nucleic acid with probe forming a 3' overhang of the probe.

FIG. 2 Preparation of a random probe array.

FIG. 3 Graphic representation of the ligation step of positional sequencing by hybridization wherein hybridization of the target nucleic acid produces (A) a 5' overhang or (B) a 3' overhang.

FIG. 4 Single nucleotide extension of a probe hybridized with a target nucleic acid using DNA polymerase and a single dideoxynucleotide.

FIG. 5 Preparation of a nested set of targets using labeled target nucleic acids partially digested with Exonudease III.

FIG. 6 Determination of positional information using the ratio of internal label to terminal label.

FIG. 7 (A) Extension of one strand of the probe using hybridized target as template with a single deoxynucleotide. (B) Hybridization of target with a fixed probe followed by ligation of probe to target.

FIG. 8 Four color analysis of sequence extensions of the 3' end of a probe using three labeled nucleoside triphosphates and one unlabeled chain terminator.

FIG. 9 Extension of a nucleic acid probe by ligation of a pentanucleotide 3' blocked to prevent polymerization.

FIG. 10 Preparation of a customized probe containing a 10 base pair sequence that was present in the original target nucleic acid.

FIG. 11 Graphic representation of the general procedure of positional sequencing by hybridization.

FIG. 12 (A) Graphical representation of the ligation efficiency of positional sequencing. Depicted is the relationship between the amount of label remaining over the total amounts of label in the reaction, verses NaCl concentration. (B) Test sequences of biotinylated duplex probes tethered to strepavidin coated magnetic microbeads utilized to determine ligation efficiency.

DESCRIPTION OF THE INVENTION

To achieve the objects and in accordance with the purpose of the invention, as embodied and broadly described herein, the present invention comprises methods, probes, diagnostic aids, and methods for using the diagnostic aids to determine sequence information from nucleic acids. Nucleic acids of the present invention include sequences of deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) which may be isolated from natural sources, recombinantly produced, or artificially synthesized. Preferred embodiments of the present invention is probe synthesized using traditional chemical synthesis, using the more rapid polymerase chain reaction (PCR) technology, or using a combination of these two methods.

Nucleic acids of the present invention further include polyamide nucleic acid (PNA) or any sequence of what are commonly referred to as bases joined by a chemical backbone that have the ability to base pair, or hybridize, with a complementary chemical structure. The bases of DNA, RNA, and PNA are purines and pyrimidines linearly linked to a chemical backbone. Common chemical backbone structures are deoxyribose phosphate and ribose phosphate. Recent studies demonstrated that a number of additional structures may also be effective, such as the polyamide backbone of PNA (P. E. Nielsen et al., Sci. 254:1497-1500, 1991).

The purines found in both DNA and RNA are adeninc and guanine, but others known to exist are xanthine, hypoxanthine, 2, 1,-diaminopurine, and other more modified bases. The pyrimidines are cytosine, which is common to both DNA and RNA, uracil found predominantly in RNA, and thyrmidine which occurs exclusively in DNA. Some of the more atypical pyrimidines include methylcytosine, hydroxymethylcytosine, methyluracil, hydroxymethyluracil, dihydroxypentyluracil, and other base modifications. These bases interact in a complementary fashion to form basepairs, such as, for example, gunnine with cytosine and adeninc with thymidine. However, this invention also encompasses situations in which there is nontraditional base pairing such as Hoogsteen base pairing which has been identified in certain tRNA molecules and postulated to exist in a triple helix.

One embodiment of the present invention is a method for determining a nucleotide sequence by positional hybridization comprising the steps of (a) creating a set of nucleic acid probes wherein each probe has a double stranded portion, a single stranded portion, and a random sequence within the single stranded portion which is determinable, (b) hybridizing a nucleic acid target which is at least partly single stranded to the set of nucleic acid probes, and (c) determining the nucleotide sequence of the target which hybridized to the single strand portion of any probe. The set of nucleic acid probes and the target nucleic acid may comprise DNA, RNA, PNA, or any combination thereof, and may be derived from natural sources, recombinant sources, or be synthetically produced. Each probe of the set of nucleic acid probes has a double stranded portion which is preferably about 10 to 30 nucleotides in length, a single stranded portion which is preferably about 4 to 20 nucleotides in length, and a random sequence within the single stranded portion which is preferably about 4 to 20 nucleotides in length and more preferably about 5 nucleotides in length. A principle advantage of this probe is in its structure. Hybridization of the target nucleic acid is encouraged due to the favorable thermodynamic conditions established by the presence of the adjacent double strandedness of the probe. An entire set of probes contains at least one example of every possible random nucleotide sequence.

By way of example only, if the random portion consisted of a four nucleotide sequence of adenine, guanine, thymine, and cytosine, the total number of possible combinations would be 4.sup.4 or 256 different nucleic acid probes. If the number of nucleotides in the random sequence was five, the number of different probes within the set would be 4.sup.5 or 1,024. This becomes a very large number indeed when considering sequences of 20 nucleotides or more.

However, to determine the complete sequence of a nucleic acid target, the set of probes need not contain every possible combination of nucleotides of the random sequence to be encompassed by the method of this invention. This variation of the invention is based on the theory of degenerated probes proposed by S. C. Macevicz (U.S. Pat. No. 5,002,867, and herein specifically incorporated by reference). The probes are divided into four subsets. In each, one of the four bases is used at a defined number of positions and all other bases except that one on the remaining positions. Probes from the first subset contain two elements, A and non-A (A=adenosine). For a nucleic acid sequence of length k, there are 4(2.sup.k -1), instead of 4.sup.k probes. Where k=8, a set of probes would consist of only 1020 different members instead of the entire set of 65,536. The savings in time and expense would be considerable. In addition, it is also a method of the present invention to utilize probes wherein the random nucleotide sequence contains gapped segments, or positions along the random sequence which will base pair with any nucleotide or at least not interfere with adjacent base pairing.

Hybridization between complementary bases of DNA, RNA, PNA, or combinations of DNA, RNA and PNA, occurs under a wide variety of conditions such as variations in temperature, salt concentration, electrostatic strength, and buffer composition. Examples of these conditions and methods for applying them are described in Nucleic Acid Hybridization: A Practical Approach (B. D. Hames and S. J. Higgins, editors, IRL Press, 1985), which is herein specifically incorporated by reference. It is preferred that hybridization takes place between about 0.degree. C. and about 70.degree. C., for periods of from about 5 minutes to hours, depending on the nature of the sequence to be hybridized and its length. It is also preferred that hybridization between nucleic acids be facilitated using certain reagents and chemicals. Preferred examples of these reagents include single stranded binding proteins such as Rec A protein, T4 gene 32 protein, E. coli single stranded binding protein, and major or minor nucleic acid groove binding proteins. Preferred examples of other reagents and chemicals include divalent ions, polyvalent ions, and intercalating substances such as ethidium bromide, actinomycin D, psoralen, and angelicin.

The nucleotide sequence of the random portion of each probe is determinable by methods which are well-known in the art. Two methods for determining the sequence of the nucleic acid probe are by chemical cleavage, as disclosed by Maxim and Gilbert (1977), and by chain extension using ddNTPs, as disclosed by Sanger et al. (1977), both of which are herein specifically incorporated by reference. Alternatively, another method for determining the nucleotide sequence of a probe is to individually synthesize each member of a probe set. The entire set would comprise every possible sequence within the random portion or some smaller portion of the set. The method of the present invention could then be conducted with each member of the set. Another procedure would be to synthesize one or more sets of nucleic acid probes simultaneously on a solid support. Preferred examples of a solid support include a plastic, a ceramic, a metal, a resin, a gel, and a membrane. A more preferred embodiment comprises a two-dimensional or three-dimensional matrix, such as a gel, with multiple probe binding sites, such as a hybridization chip as described by Pevzner et al. (J. Biomol. Struc. & Dyn. 9:399-410, 1991), and by Maskos and Southern (Nuc. Acids Res. 20:1679-84, 1992), both of which are herein specifically incorporated by reference.

Hybridization chips can be used to construct very large probe arrays which are subsequently hybridized with a target nucleic acid. Analysis of the hybridization pattern of the chip provides an immediate fingerprint identification of the target nucleotide sequence. Patterns can be manually or computer analyzed, but it is clear that positional sequencing by hybridization lends itself to computer analysis and automation. Algorithms and software have been developed for sequence reconstruction which are applicable to the methods described herein (R. Drmanac et al., J. Biomol. Struc.& Dyn. (in press); P. A. Pevzner, J. Biomol. Struc. & Dyn. 7:63-73, 1989, both of which are herein specifically incorporated by reference).

Another embodiment of the invention comprises target nucleic acid labeled with a detectable label. Label may be incorporated at a 5' terminal site, a 3' terminal site, or at an internal site within the length of the nucleic acid. Preferred detectable labels include a radioisotope, a stable isotope, an enzyme, a fluorescent chemical, a luminescent chemical, a chromatic chemical, a metal, an electric charge, or a spatial structure. There are many procedures whereby one of ordinary skill can incorporate detectable label into a nucleic acid. For example, enzymes used in molecular biology will incorporate radioisotope labeled substrate into nucleic acid. These include polymerases, kinases, and transferases. The labeling isotope is preferably, .sup.32 P, .sup.35 S, .sup.14 C, or .sup.125 L.

Label may be directly or indirectly detected using scintillation fluid or a PhosphorImager, chromatic or fluorescent labeling, or mass spectrometry. Other, more advanced methods of detection include evanescent wave detection of surface plasmon resonance of thin metal film labels such as gold, by, for example, the BIAcore sensor sold by Pharmacia, or other suitable biosensors.

Another embodiment of the present invention comprises a method for determining a nucleotide sequence of a nucleic acid comprising the steps of labeling the nucleic acid with a first detectable label at a terminal site, labeling the nucleic acid with a second detectable label at an internal site, identifying the nucleotide sequences of portions of the nucleic acid, determining the relationship of the nucleotide sequence portions to the nucleic acid by comparing the first detectable label and the second detectable label, and determining the nucleotide sequence of the nucleic acid. Fragments of target nucleic acids labeled both terminally and internally can be distinguished based on the relative amounts of each label within respective fragments. Fragments of a target nucleic acid terminally labeled with a first detectable label will have the same amount of label as fragments which include the labeled terminus. However, theses fragments will have variable amounts of the internal label directly proportional to their size and distance for the terminus. By comparing the relative amount of the first label to the relative amount of the second label in each fragment, one of ordinary skill is able to determine the position of the fragment or the position of the nucleotide sequence of that fragment within the whole nucleic acid.

A further embodiment of the present invention is a method for determining a nucleotide sequence by hybridization comprising the steps of (a) creating a set of nucleic acid probes wherein each probe has a doubled stranded portion, a single stranded portion, and a random sequence within the single stranded portion which is determinable, (b) hybridizing a nucleic acid target which is at least party single stranded to the set, (c) ligating the hybridized target to the probe, and (d) determining the nucleic sequence of the target which is hybridized to the single stranded portion of any probe. This embodiment adds a step wherein the hybridized target is ligated to the probe. Ligation of the target nucleic acid to the complementary probe increases fidelity of hybridization and allows for incorrectly hybridized target to be easily washed from correctly hybridized target (see FIG. 11). Ligation can be accomplished using a eukaryotic derived or a prokaryotic derived ligase. Preferred is T4 DNA or RNA ligase. Methods for use of these and other nucleic acid modifying enzymes are described in Current Protocols in Molecular Biology (F. M. Ausubel et al., editors, John Wiley & Sons, 1989), which is herein specifically incorporated by reference.

Another embodiment of the present invention is a method for determining a nucleotide sequence by hybridization which comprises the steps of (a) creating a set of nucleic acid probes wherein each probe has a double stranded portion, a single stranded portion, and a random sequence within the single stranded portion which is determinable, (b) hybridizing a target nucleic acid which is at least partly single stranded to the set of nucleic acid probes, (c) enzymatically extending a strand of the probe using the hybridized target as a template, and (d) determining the nucleotide sequence of the single stranded portion of the target nucleic acid. This embodiment of the invention is similar to the previous embodiment, as broadly described herein, and includes all of the aspects and advantages described therein. An alternative embodiment also includes a step wherein hybridized target is ligated to the probe. Ligation increases the fidelity of the hybridization and allows for a more stringent wash step wherein incorrectly hybridized, unligated target can be removed.

Hybridization produces either a 5' overhang or a 3' overhang of target nucleic acid. Where there is a 5' overhang, a 3- hydroxyl is available on one strand of the probe from which nucleotide addition can be initiated. Preferred enzymes for this process include eukaryotic or prokaryotic polymerases such as T3 or T7 polymerase, Klenow fragment, or Taq polymerase. Each of these enzymes are readily available to those of ordinary skill in the art as are procedures for their use (see Current Protocols in Molecular Biology).

Hybridized probes may also be enzymatically extended a predetermined length. For example, reaction condition can be established wherein a single dNTP or ddNTP is utilized as substrate. Only hybridized probes wherein the first nucleotide to be incorporated is complementary to the target sequence will be extended, thus, providing additional hybridization fidelity and additional information regarding the nucleotide sequence of the target. Sanger or Maxim and Gilbert sequencing can be performed which would provide further target sequence data.

Alternatively, hybridization of target to probe can produces 3' extensions of target nucleic acids. Hybridized probes can be extended using nucleoside biphosphate substrates or short sequences which are ligated to the 5' terminus.

Another embodiment of the invention is a method for determining a nucleotide sequence of a target by hybridization comprising the steps of (a) creating a set of nucleic acid probes wherein each probe has a double stranded portion, a single stranded portion, and a random nucleotide sequence within the single stranded portion which is determinable, (b) cleaving a plurality of nucleic acid targets to form fragments of various lengths which are at least partly single stranded, (c) hybridizing the single stranded region of the fragments with the single stranded region of the probes, (d) identifying the nucleotide sequences of the hybridized portions of the fragments, and (e) comparing the identified nucleotide seque