WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method of sequencing by hybridization of oligonucleotide probes    
United States Patent5695940   
Link to this pagehttp://www.wikipatents.com/5695940.html
Inventor(s)Drmanac; Radoje T. (Beograd, YU); Crkvenjakov; Radomir B. (Beograd, YU)
AbstractThe conditions under which oligonucleotide probes hybridize preferentially with entirely complementary and homologous nucleic acid targets are described. Using these hybridization conditions, overlapping oligonucleotide probes associate with a target nucleic acid. Following washes, positive hybridization signals are used to assemble the sequence of a given nucleic acid fragment. Representative target nucleic acids are applied as dots. Up to to 100,000 probes of the type (A,T,C,G) (A,T,C,G)N8(A,T,C,G) are used to determine sequence information by simultaneous hybridization with nucleic acid molecules bound to a filter. Additional hybridization conditions are provided that allow stringent hybridization of 6-10 nucleotide long oligomers which extends the utility of the invention. A computer process determines the information sequence of the target nucleic acid which can include targets with the complexity of mammalian genomes. Sequence generation can be obtained for a large complex mammalian genome in a single process.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Drmanac; Radoje T. (Beograd, YU); Crkvenjakov; Radomir B. (Beograd, YU)
Owner/Assignee     Hyseq, Inc. (Sunnyvale, CA)
Patent assignment
All assignments
Publication Date     December 9, 1997
Application Number     08/460,853
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 5, 1995
US Classification     435/6 536/23.1 536/24.33
Int'l Classification     C12Q 001/68
Examiner     Ketter; James
Assistant Examiner    
Attorney/Law Firm     Marshall, O'Toole, Gerstein, Murray & Borun
Address
Parent Case     This is a Continuation of U.S. application Ser. No. 08/203,502, filed Feb. 28, 1994, now U.S. Pat. No. 5,525,464; which in turn is a File-Wrapper Continuation of U.S. application Ser. No. 08/048,152, filed Apr. 15, 1993, now abandoned, which is a continuation of Ser. No. 07/576,559, filed Aug. 31, 1990, now abandoned, which is a continuation-in-part of application Ser. No. 07/175,088 filed Mar. 30, 1988, now abandoned, which is incorporated by reference herein in its entirety. Applicants claim priority under 35 U.S.C. .sctn.119 of Yugoslavian Application No. P-570/87 filed Apr. 1, 1987 and Yugoslavian Application No. 18617-P 570/87 filed Sep. 18, 1987, certified copies of which were submitted in the parent application Ser. No. 07/175,088.
Priority Data     Apr 01, 1987[YU]570/87
USPTO Field of Search     435/6 435/91.1 435/91.2 536/24.33
Patent Tags     sequencing hybridization oligonucleotide probes
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5202231
Drmanac
435/6
Apr,1993

[0 after 0 votes]
5002867
Macevicz
435/6
Mar,1991

[0 after 0 votes]
4942124
Church
435/6
Jul,1990

[0 after 0 votes]
4865967
Shiraishi
435/6
Sep,1989

[0 after 0 votes]
4865968
Orgel
204/462
Sep,1989

[0 after 0 votes]
4849334
Lorincz
435/5
Jul,1989

[0 after 0 votes]
4794073
Dattagupta
435/6
Dec,1988

[0 after 0 votes]
4770992
Van den Engh
435/6
Sep,1988

[0 after 0 votes]
4720786
Hara
204/461
Jan,1988

[0 after 0 votes]
4675283
Roninson
435/6
Jun,1987

[0 after 0 votes]
4613566
Potter
435/6
Sep,1986

[0 after 0 votes]
4591567
Britten
435/285.1
May,1986

[0 after 0 votes]
4562159
Shafritz
435/5
Dec,1985

[0 after 0 votes]
4766062
Diamond
435/6
Dec,1969

[0 after 0 votes]
4683202
Mullis
435/91.2
Dec,1969

[0 after 0 votes]
4683195
Mullis
435/6
Dec,1969

[0 after 0 votes]
5149625
Church
435/6
Dec,1969

[0 after 0 votes]
4672040
Josephson
436/526
Dec,1969

[0 after 0 votes]
5492806
Drmanac
435/5
Dec,1969

[0 after 0 votes]
5525464
Drmanac
435/6
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method of sequencing a target nucleic acid of unknown sequence comprising the steps of:

(a) using conditions which differentiate an exactly complementary oligonucleotide probe and an oligonucleotide probe having a single mismatched nucleotide;

(b) contacting a plurality of oligonucleotides, each from six to ten nucleotides in length, with said target nucleic acid;

(c) forming a duplex between the target nucleic acid and the plurality of oligonucleotides;

(d) washing the duplex;

(e) detecting oligonucleotides positively hybridizing as part of said duplex; and

(f) compiling a sequence of the target nucleic acid from overlapping positively-hybridizing oligonucleotides.

2. A method for partial sequencing of a target nucleic acid, comprising the steps of:

(a) using conditions which differentiate an exactly complementary oligonucleotide probe and an oligonucleotide probe having a single mismatched nucleotide;

(b) contacting a target nucleic acid in a reaction mixture with a plurality, but less than a totality, of oligonucleotide probes of given length, each at least six nucleotides in length;

(c) forming a duplex between the target nucleic acid and the plurality of oligonucleotides;

(d) washing the duplex;

(e) detecting oligonucleotides positively hybridizing as part of said duplex; and

(f) compiling a partial sequence of said target nucleic acid from a subset of said oligonucleotide probes which form a duplex with said target nucleic acid and which overlap with a least one other member of said oligonucleotide probes.

3. The method according to claim 2, wherein said oligonucleotide probes are each from six to ten nucleotides in length.
 Description Submit all comments and votes
 


1. INTRODUCTION

The present invention belongs to the field of molecular biology. It involves a novel method of sequencing of a target nucleic acid sequence by hybridization of short oligonucleotide probes to a nucleic acid target. The oligonucleotide probes can comprise all known combinations of the four nucleotides of a given length, i.e. oligonucleotides of base composition adenine (A), thymine (T), guanine (G), and cytosine (C) for DNA and A,G,C, and uridine (U) for RNA. Conditions are described which allow hybridization discrimination between oligonucleotides which are as short as six nucleotides long and have a single base end-mismatch with the target sequence.

The invention is demonstrated by way of examples in which sequence information is generated using the method of the invention.

2. BACKGROUND OF THE INVENTION

2.1. HYBRIDIZATION

Hybridization depends on the pairing of complementary bases in nucleic acids and is a specific tool useful for the general recognition of informational polymers. Diverse research problems using hybridization of synthetic oligonucleotide probes of known sequence include, amongst others, the different techniques of identification of specific clones from cDNA and genomic libraries; detecting single base pair polymorphisms in DNA; generation of mutations by oligonucleotide mutagenesis; and the amplification of nucleic acids in vitro from a single sperm, an extinct organism, or a single virus infecting a single cell.

It is possible to discriminate perfect hybrids from those hybrids containing a single internal mismatch using oligonucleotides 11 to 20 nucleotides in length ›Wallace et al., Nucl. Acids Res. 6: 3543 (1979)!. Mismatched hybrids are distinguished on the basis of the difference in the amount of hybrid formed in the hybridization step and/or the amount remaining after the washing steps ›Ikuta et al., Nucl. Acids Res. 15: 797 (1987); Thein and Wallace, in Human Genetic Diseases: A Practical Approach, ed. by J. Davies, IRL Press Ltd., Oxford, pp. 33-50 (1986)!.

The reproducible hybridization of different and diverse short oligonucleotides less than 11 nucleotides long has not been well characterized previously. Detailed hybridization data that allows a constant set of conditions for all predictable oligonucleotides is not available ›Besmer et al., J. Mol. Biol. 72: 503 (1972); Smith, in Methods of DNA and RNA Sequencing, ed. S. Weissman, Praeger Publishers, New York, N.Y., pp. 23-68 (1983); Estivill et al., Nucl. Acids Res. 15: 1415 (1987).

Information is also not available on the effects of a single noncomplementary base pair located at the 5' or 3' end of a hybridizing oligonucleotide that produces a mismatched hybrid when associated with a target nucleic acid. Hybridization conditions that discriminate between (1) a perfectly complementary hybridizing pair of nucleic acid sequences where one partner of the pair is a short oligonucleotide, and (2) a pair wherein a mismatch of one nucleotide occurs on the 5' or 3' end of the oligonucleotide, provide a more stringent environment than is required for internal mismatches because hybrid stability is affected less by a mismatch at the end of a hybridizing pair of complementary nucleic acids than for an internal mismatch.

The length of nucleotides that can distinguish a unique sequence in a nucleic acid of defined size has been predicted ›Smith in Methods of DNA and RNA Sequencing, ed. S. Weissman, Praeger Publishers, New York, N.Y., pp. 23-68 (1983)!. Thus random oligonucleotide sequences 16-17 long are expected to occur only once in random DNA of 3.times.10.sup.9 bp, the size of the human genome. However, with decreasing probe length, e.g. for oligonucleotides 5 to 10 nucleotides in length, there is an exponential increase in the frequency of occurrence within a random DNA of a given size and complexity. Thus, the purposes for which oligonucleotide probes are employed can impact on the length of the oligonucleotides that are used experimentally.

2.2. CONDITIONS FOR HYBRIDIZATION STRINGENCY

Wallace et al. ›Nucl. Acids Res. 6: 3543 (1979)! describe conditions that differentiate the hybridization of 11 to 17 base long oligonucleotide probes that match perfectly and are completely homologous to the target nucleic acid as compared to similar oligonucleotide probes that contain a single internal base pair mismatch. Wood et al. ›Proc. Natl. Acad. Sci. 82: 1585 (1985)! describe conditions for hybridization of 11 to 20 base long oligonucleotides using 3M tetramethyl ammonium chloride wherein the melting point of the hybrid depends only on the length of the oligonucleotide probe, regardless of its GC content. However, as disclosed in these references eleven mer oligonucleotides are the shortest ones that generally can be hybridized successfully, reliably and reproducibly using known hybridization conditions.

2.3. SEQUENCING

Nucleic acid sequencing methods, where the position of each base in a nucleic acid molecule in relation to its neighbors is determined to define its primary structure, were developed in the early 1960's for RNA molecules and in the late 1970's for DNA. The two major methods for DNA sequencing, i.e. chemical degradation and dideoxy-chain termination, involve identification and characterization of 1-500 nucleotide long DNA fragments, specific for each one of at least four nucleotide bases, on polyacrylamide gels. The polyacrylamide gels must be able to distinguish single base pair differences in length between fragments. The fragments are generated either by chemical degradation ›Maxam-Gilbert, Proc. Natl. Acad. Sci. 74: 560 (1977)! or by dideoxy-chain termination of DNA fragments synthesized by DNA polymerase ›Sanger et al., Proc. Natl. Acad. Sci. 74: 5463 (1977)!. A sufficient quantity of isolated fragments is ensured by recombinant DNA technology methods which include cloning, restriction enzyme digestion, gel electrophoresis, and polymerase chain reaction amongst others. These methods allow the identification and amplification of the target DNA to provide material for sequencing.

An intensive amount of manual labor is required in the preparation of appropriate polyacrylamide gels to resolve small differences in fragment size. The speed of sequencing in experienced laboratories throughout the world is approximately 100 bp per person daily. Although the use of electronic robots and computers allows acceleration of the number of base pairs actually determined, preparation of polyacrylamide gels, application of sample, electrophoresis and the subsequent manipulations necessary to obtain high quality autoradiograms that can be read by machines still involve significant intensive, skilled, manual labor for which no substitutes have been found.

2.4. HUMAN GENOME CHARACTERIZATION

The genome of higher eucaryotes has up to a million times greater physical complexity than is the complexity of individual genes it encodes, giving it a a corresponding huge informational complexity. From the present knowledge of genome organization and biochemical, biophysical and biological functions, the following approximate scale of the informational complexity for higher eucaryotes can be proposed: 10,000 gene families-100,000 genes-1,000,000 biological functions. The number of basic biochemical functions represented by a single gene family is probably not significantly increased compared to procaryotic and lower eucaryotic genomes.

Recently, there has been a surge of interest in mapping and sequencing the entire human genome ›Lewin, Science 232: 1598 (1986); Wada, Nature 325: 771 (1987); Smith and Hood, Bio/Technology 5: 933-939 (1987)!. This stems from the fact that only 1 in about 75 human genes is either cloned or mapped (Human Gene Mapping 9, 1987). Unknown genes will have much to tell us about human biology. In the future, the progress of studies on molecular evolution may depend on the sequencing of genomes of species besides humans.

Because sequence information has already provided accelerated knowledge and potential resolution of diverse biological, medical and therapeutic research problems, it is not surprising that ideas of sequencing the whole human genome were discussed at various scientific meetings during the early and mid-1980's ›Research News in Science 232: 1598 (1986)!. Such massive sequencing projects envision the final determination of approximately 3 billion base pairs of information encoded in the DNA of humans and are expected to take at least 10 years at a cost of at least $3 billion dollars using current technology. However, in practice, actual sequencing of at least three times that number of base pairs is required to obtain a reliable sequence for the human genome, thus requiring even more money and time.

Such endeavors present a challenge to the technology of the twentieth century. Further challenges arise if sequencing projects are extended to include the determination of the genomic sequences of characteristic individuals or species of organisms, especially those that have economic, social or medical importance. Such sequencing projects would advance not only our understanding of the evolution of organisms and the evolution of biochemical processes, but would also further the detection, treatment and understanding of disease, and would aid agriculture, the food industry and biotechnology in general. However beneficial the results of such projects would be, their successful completion requires the development of a new, rapid, reproducible and reliable sequencing method such as those described in this invention.

Although the ultimate goal of human genome characterization is the determination of sequence information, progress in characterizing portions of the human genome or the genome of other organisms have been achieved in several areas. A linkage map of the human genome based on cloned DNA probes detecting RFLPs has been obtained ›Donis-Keller et al., Cell 51: 319-337 (1987)!. Once mapped, a gene can be approached from a neighboring DNA marker not only by walking ›Cross et al., Trends Genet. 2: 174 (1986)! but also by the use of jumping ›Collins+Weissman, Proc. Natl'l. Acad. Sci. USA 81: 6812 (1984); Poustka et al., Nature 325: 353 (1987)! and linking ›Poustka et al., Trends Genet, 2: 174 (1986)! libraries. The task of going from a marker to a mapped gene is facilitated immensely if an ordered collection of overlapping cosmid or phage clones representing individual chromosomes is available. Attempts to provide a library of overlapping clones using similarities in their patterns of restriction digests have been tried ›Coulson et al., Proc. Natl. Acad. Sci. USA 83: 7821 (1986); Olson et al., Proc. Natl. Acad. Sci. USA 83: 7826 (1986); Kohara et al., Cell 50: 495 (1987)!. Alternatively, the hybridization of a collection of 100 specific oligonucleotides to an array of 3-10.times.10.sup.6 cosmid-containing colonies on filters has been proposed. The resulting patterns of hybridization identify specific regions along the genome to which a small collection of cosmids from chromosome libraries can be fitted in the second step ›Poustka et al., Cold Spring Harbor Symp. Quant. Biol. 51: 351 (1986); Craig et al., in Human Genetics, Proceedings of the 7th International Congress, Berlin, (1986); Michiels et al., CABIOS 3: 203 (1987)!. Such identification however does not provide desired and useful sequence information of the DNA in a particular identified fragment.

In the area of human genetics, the emphasis is on an individual's DNA and the methods to detect patterns of its variation and inheritance which may influence the determination of a patient's chances for health or disease. The number of genetic regions to be scored in the DNA of an individual requires a large number of polymorphic probes and makes the use of traditional Southern blotting unpractical. However, a method that is capable of amplifying 1000-bp stretches of DNA starting from two flanking oligonucleotide primers and that requires DNA from only 150 cells of an individual has been described recently as well as oligonucleotide probes that can detect mutants in amplified DNA in dot blot hybridization ›Saiki et al., Science 239: 487 (1986)!. Both the method of ordering cosmid libraries and the method of amplifying DNA use the work of Wallace for conditions of hybridization that only allowed oligonucleotides of almost perfect homology to their target DNA to hybridize at all ›Wallace et al., Nucl. Acids Res.: 3543 (1979)!. In these conditions, almost perfect homology means that the perfect homology has to exist at least in the central part of the hybridizing oligonucleotide/target duplex.

3. SUMMARY OF THE INVENTION

The present invention provides a new method of sequencing that is ideally suited to the sequencing of large complex genomes because it avoids the intensive manual labor involved in resolving gel fragments by size on polyacrylamide gels. The present invention provides methods for sequencing a target nucleic acid by hybridization of overlapping short oligonucleotide probes of known or predicted sequence to the nucleic acid target serially or simultaneously. The oligomer probes of a given size can contain all or most existing combinations of nucleotides for complete sequencing and a part of all possible variants for partial sequencing. Probes can also be composed of oligomers of different sizes as well as comprising all known combinations of nucleotides that are possible for that size oligonucleotide. As the size of the probes that are used decreases, hybridization conditions that are still able to distinguish between mismatched and perfectly matched short oligonucleotides must be used.

In one embodiment of the invention, multiple oligonucleotides that are 11 nucleotides long or longer are hybridized to the target sequence. Hybridization occurs using conditions which are controlled and varied to ensure discrimination between perfectly matched oligonucleotides and oligonucleotides having a one base pair mismatch with the target sequence where the mismatch is located at either one of two ends of the oligonucleotide.

In another embodiment of the invention, as an alternative to previous numerous conditions each specific for different sizes and sequences of probes, a single, or few, sets of conditions is invented for all lengths and sequence of probes. These hybridization conditions allow discrimination between perfectly matched and mismatched oligonucleotides that are as short as six nucleotides long. The conditions allow discrimination between a perfectly matching oligonucleotide and one that has a single base mismatch as compared to the target sequence, the mismatch being located at one of the ends of the oligonucleotide.

Following the detection of hybridization of perfectly matched oligonucleotides of known sequence, the sequence of the target nucleic acid is generated by an algorithm using the principle of maximal nonidentical overlap of probe.

In determining sequence by hybridization, oligonucleotides are prepared, target fragments are prepared appropriate for the length of oligonucleotide used for hybridization, and hybridization of the target with all the oligonucleotides occurs under defined conditions that allow discrimination in binding of perfectly matched complementary oligonucleotides and mismatched oligonucleotides. The relationship of probe size and target length is defined and allows complete sequencing of genomes. The novel theoretical basis of the relationship between oligonucleotide probe size and target length is described infra.

To determine the amount of hybridization data that is needed for sequence determination, the number of target fragments that compose the entire sequence is multiplied by the number of different oligonucleotides required to define the sequence of the target fragment. The shorter the size of the oligonucleotides that are hybridized, the more target fragments that must be analyzed. Similarly, as the oligonucleotide size increases, fewer target fragments must be examined.

Hybridization reactions can be performed in separate reaction vessels or by binding one of the two components (oligomers and DNA fragments) to a solid surface, like nylon filters etc. Since the described method does not require macromolecular separation like gel-based sequencing methods, the surface, bound with either an oligomer or nucleic acid fragment can have microdimensions.

Some of the advantages of the method of the present invention include the following: (1) rapidity, resulting in time effectiveness; (2) elimination of polyacrylamide gel electrophoresis and the intensive manual labor it requires; (3) reliability of the predicted base within the determined sequence due to the hybridization of multiple oligonucleotides to the same base within a target sequence; (4) the possibility of substantial miniaturization of the process; (5) ease of automation; (6) resulting cost effectiveness.

3.1. DEFINITIONS

The following terms and abbreviations will have the meanings indicated:

______________________________________ A adenine bp base pair C cytosine G guanine IF an M13 clone containing a 921 bp EcoR1-Bg1II human .beta..sub.1 interferon fragment kD kilo Dalton nG nanogram nM nenomolar pmol picomole sc subclone SF subfragment SOH short oligonucleotide hybridization T thymine CCD Charge Coupled Device DNA Deoxyribonucleic acid DP Discrete particle HA Hybridization area LAR Ligation-amplification reaction ON Oligonucleotide ONP Oligonucleotide probe ONS Oligonucleotide sequence PCR Polymerase chain reaction RE Restriction Enzyme RFLP Restriction fragment length polymorphism RNA Ribonucleic acid SBH Sequencing by hybridization ______________________________________

4. DESCRIPTION OF THE FIGURES

FIGS. 1A-1D show the generation of subfragments in sequence by hybridization and their ordering. FIG.1A depicts the sequence of a hypothetical clone (SEQ ID NO: 6). NNNNNNN-ends of vector sequence. AGTCCCT and TTGGCTG are the only oligonucleotides 7 bp or longer repeated within the depicted sequence. FIG. 1B depicts the formation of subfragments. Assuming that the content of 8-mers for the depicted sequence is known, these 8-mers are ordered by maximal overlap, in this case 7 bp. Starting from the 5' 8-mer (NNNNNNNc), ordering is unambiguous up to gAGTCCCT, which on its 3' end contains a repeated 7-mer. The large capitals denote overlapping sequences shared by different oligonucleotides, while the small letters denote unshared bases. Both AGTCCCTc and AGTCCCTg can be overlapped with gAGTCCCT preventing further ordering. Each of the two sequences serves as a starting point for new ordering (not shown). Therefore, each repeated sequence 7 bp or longer represents a branching point. Unambiguous sequences are obtained between two consecutive branching points only. FIG. 1C depicts the listing of subfragments formed from 8-mers of depicted sequence (SEQ IDS NOS: T-11). Subfragments are horizontally displaced to indicate overlap; the orientation is 5' to 3' and end subfragments are identifiable. FIG. 1D depicts that the subfragments cannot be unambiguously ordered into a starting sequence (SEQ ID NOS: 12 and 13) without additional information. Both arrangements shown are possible since subfragments AGTCCCTcggTTGGCTG (SEQ ID NO: 1) and AGTCCCTgatTTGGCTG (SEQ ID NO: 2 ) have the same 7-mers at their 5' and 3' ends, respectively. FIG. 1E depicts the of the sequence (SEQ ID NO: 15) from oligonucleotide blocks. The left box represents all 8-mer oligosequences which occur in 15 base long DNA molecule of unknown sequence (NNN . . . NNN) (SEQ ID NO: 14), 8-mers can be ordered by 7 base overlap (right box). Each 7 oligomer extends the sequence of the starting 8-mer ACCGTAAA by one base. Thus, the sequence is generated by uniquely overlapped oligomer blocks.

FIG. 2 presents the average number of SFs (N.sub.sf) as a function of the length of DNA fragment (L.sub.f) for various values of the length of the overlapping sequence (N-1, in bp), or average distance of two consecutive identical N-1 sequences in DNA subjected to sequencing by hybridization (A.sub.o), in kb. The curves are obtained using equation one as described below in section 5.2.

FIG. 3 describes the kinetic stability of a fully matched hybrid obtained with a probe 8 nucleotides in length. Stability is expressed as a fraction of the hybrid dissociated in unit time (minutes) as a function of temperature. 1.4 pmol of NCATGAGCANN (SEQ ID NO: 3) as applied to each dot and hybridized with TGCTCATG as probe in a concentration of 4 nM. The equal amounts of hybrid were incubated at the indicated temperatures for a short time in a large volume of buffer and the remaining hybrid measured. Each point represents the average value for four dots. The curve is computer fitted with E.sub.60 =47.3 Kcal/mol obtained from the experimental points by the least squares method.

FIG. 4 indicates the properties of short oligonucleotide hybridization. In FIG. 4a, non-optimized discrimination with probes 6, 7, and 8 nucleotides in length is illustrated. The probe GCTCAT was hybridized to the target sequence NCATGAGCANN (SEQ ID NO: 3) which contains the perfectly matching sequence (underlined). The NNCATGAGTTN (SEQ ID NO: 5) target sequence contains an end mismatch (double underlined). 1.4 pM of each target was applied to the filter. The probe GCTCATG, and the probe TGCTCATG were used against 50 ng of IF and M13 DNA. The probe concentration was 4 nM.

In FIG. 4b, limits of signal detection are examined. The indicated volumes of IF culture supernatants of average titer of 6.times.10.sup.11 pfu/ml were mixed with an equal volume of 1M NaOH, 3M NaCl and spotted on a filter as described in a above. Hybridization was at 2.degree. C. with TGCTCATG as the probe.

In FIG. 4c, the time course of hybridization at 13.degree. C. is shown. The IF-M13 system was used with 50 ng of phage DNA per dot, and the probe was TGCTCATG. The 3 hr IF dot contained 18020 cpm measured with 20% efficiency.

In FIG. 5 the effect of the washing step on discrimination is indicated. In FIG. 5a, inversion of the signal in IF-M13 pair upon washing is shown. 10 ng of IF and 500 ng M13 DNA were applied, and the probe was TGCTCATG. The top row was not washed, the other rows were washed at 7.degree., 13.degree. and 25.degree. C., respectively for the indicated times. A DNA control is included in the top row also. Hybridization with the M13 specific probe AGCTGCTC measures amounts of DNA in the two dots. In FIG. 5b, the change of discrimination with time of washing at 0.degree. C. (full circles) and 13.degree. C. (open circles) is depicted. 100 ng each of IF and M13 were applied to form dots. The dots were hybridized to probe TGCTCATG and probe AGCTGCTC was used in the control DNA hybridization (see top row, on the right, panel a). The dots were then washed at the indicated temperatures. At each time point the pairs of dots were removed and the ratio of radioactivity remaining in the each dot was measured. The D or discrimination was calculated as the mean value of the ratios for the duplicate pairs of dots.

FIG. 6 demonstrates the effects of complexity of target sequences on discrimination. 50 ng each of IF and M13 were hybridized with the indicated probes at a concentration of 4 nM. No wash was performed. The number of matched and end base mismatched targets in IF and M13 is indicated for each probe.

FIG. 7 examines an array of clones for the presence of an oligonucleotide sequence. 51 recombinant plasmid DNAs (10.+-.5 ng) were spotted in rows B to H, columns 1 to 8 (except row H). Line A and column 9 contained control DNAs of known sequence. Unknown clones were taken from human brain cDNA library in Bluescript vector (BS) (Stratogene Cat. No. 935205). Controls of known sequence in lines A1 to A8 and A9 to G9 are: IF(M13), M13, Alu(M13), IF(BS), BS, 1M(pUC 9), pUC 9, 2M(pUC 9), respectively except that in the vertical row Alu(M13) was omitted. 1M and 2M are rat .beta.-globin gene subclones. The probe concentration was 8 nM. In FIG. 7a, BS specific probe CTCCCTTT was also contained in IF and 2M inserts but not in M13 and pUC vectors. In FIG. 7b, the sequence of probe CCAGTTTT was contained in the IF insert but not in either vector. In FIG. 7c, the sequence of probe GCCTTCTC was contained in the 1M insert only.

FIG. 8, Parts 1-3, depict sequencing 100 bp of 921 bp .beta..sub.1 -human interferon gene fragment. (IF) by hybridization.

FIG. 8 Part 1 depicts hybridization results. FIG. 8 Part 1A depicts hybridization with 93 probes (72) octamers and 21 nonamers with the full match in IF. IF and controls rat globin clones pHEA and pHI were PCR amplified while M13 mp18 and pUC18 were in linearized double stranded form. Base denatured DNA (20 ng of IF and equimolar amounts of control DNA) were spotted on Gene Screen membranes (N.E.N.). Hybridization was according to Drmanac et al., described in .sctn.6 below. Briefly, .gamma.-.sup.32 end labeled probes (3.3 pm, 10 mCI, Amersham 3000 C/mM in concentration of 10 ng/ml were hybridized at 12.degree. C. in 0.5 M Na.sub.2 HPO.sub.4 pH 7.2, 7% Na-lauryl Sarcosine for 3 hours. All probes were made by Genesys, Inc., Houston. Hybrids were washed in 6.times. SSC at 0.degree. for 40 minutes and autoradiographed for 4-48 hours. Test dot signal intensity, Hp, and discrimination as ratio of signals of test over control dot, D, were visually estimated. For probes 34 and 74, dot radioactivity was measured in a scintillation counter. Hp was 6,000 and 300 cpm, D was 20 and 4, and a film was exposed for 4 and 48 hr respectively. FIG. 8 Part 1B depicts hybridization with 12 probes (11 octamers and 1 nonamer) which have end mismatch in IF fragment. Control DNAs having single full match targets were pHEA for probes 97., 98., 102., pUC18 for 95., 100., 104., 105., and M13 for 94., 96., 99., 101., and 103. Probes 104 and 105 have 3 end-mismatched targets in IF. Hybridization procedures were as described in A. FIG 8 Part 1C depicts DNA Calibration. 1. and 2. IF and pHEA, probe CTGATAT. 3. IF and pUC18 probe CAGATGGT. 4 IF and M13mp18, probe GACTGTCT. The ratios of DNA amounts in IF and control dot were 1:1 in panel 1., 3., 4., and 1:3 in panel 2., respectively. Filters with IF and pH had 1:2 ratio with probe CTGATGAT. Filters show in 2. were used with probes 1., 3., 4., 6. to 8., 10. to 13., containing; pUC18 with probes 31 and 85.; containing M13 with probes 53 and 74; and containing pH with probes 22, 54., 55., 69., 70., 83., and 84. The remainder of probes were used on filters of the type shown on panel 1.

FIG. 8 Part 2. 10 bp sequence, position 625-726 in Eco RI .beta..sub.1 -interferon fragment (SE