WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Method of sequencing of genoms by hybridization of oligonucleotide probes    
United States Patent5667972   
Link to this pagehttp://www.wikipatents.com/5667972.html
Inventor(s)Drmanac; Radoje T. (Beograd, YU); Crkvenjakov; Radomir B. (Beograd, YU)
AbstractThe conditions under which oligonucleotides hybridize only with entirely homologous sequences are recognized. The sequence of a given DNA fragment is read by the hybridization and assembly of positively hybridizing probes through overlapping portions. By simultaneous hybridization of DNA molecules applied as dots and bound onto a filter, representing single-stranded phage vector with the cloned insert, with about 50,000 to 100,000 groups of probes, the main type of which is (A,T,C,G)(A,T,C,G)N8(A,T,C,G), information for computer determination of a sequence of DNA having the complexity of a mammalian genome are obtained in one step. To obtain a maximally completed sequence, three libraries are cloned into the phage vector, M13, bacteriophage are used: with the 0.5 kb and 7 kbp insert consisting of two sequences, with the average distance in genomic DNA of 100 kbp. For a million bp of genomic DNA, 25,000 subclones of the 0.5 kbp are required as well as 700 subclones 7 kb long and 170 jumping subclones. Subclones of 0.5 kb are applied on a filter in groups of 20 each, so that the total number of samples is 2,120 per million bp. The process can be easily and entirely robotized for factory reading of complex genomic fragments or DNA molecules.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Drmanac; Radoje T. (Beograd, YU); Crkvenjakov; Radomir B. (Beograd, YU)
Owner/Assignee     Hyseg, Inc. (Sunnyvale, CA)
Patent assignment
All assignments
Publication Date     September 16, 1997
Application Number     08/461,106
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     June 5, 1995
US Classification     435/6 536/23.1
Int'l Classification     C12Q 001/68 C07H 021/04
Examiner     Zitomer; Stephanie W.
Assistant Examiner    
Attorney/Law Firm     McCutchen, Doyle, Brown & Enersen
Address
Parent Case     This is a continuation of U.S. application Ser. No. 045,912, filed Apr. 12, 1993, now U.S. Pat. No. 5,492,806; which is a continuation of U.S. application Ser. No. 07/723,712 filed Jun. 18, 1991, now U.S. Pat. No. 5,202,231; which is a continuation of U.S. application Ser. No. 07/175,088 filed Mar. 30, 1988 now abandoned; which is based on Yugoslavian No. P-570/87 filed on Apr. 1, 1987.
Priority Data     Apr 01, 1987[YU]570/87
USPTO Field of Search     435/6 536/23.1 935/77 935/78
Patent Tags     sequencing genoms hybridization oligonucleotide probes
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
5202231
Drmanac
435/6
Apr,1993

[0 after 0 votes]
5002867
Macevicz
435/6
Mar,1991

[0 after 0 votes]
4942124
Church
435/6
Jul,1990

[0 after 0 votes]
4865967
Shiraishi
435/6
Sep,1989

[0 after 0 votes]
4865968
Orgel
204/462
Sep,1989

[0 after 0 votes]
4849334
Lorincz
435/5
Jul,1989

[0 after 0 votes]
4794073
Dattagupta
435/6
Dec,1988

[0 after 0 votes]
4770992
Van den Engh
435/6
Sep,1988

[0 after 0 votes]
4720786
Hara
204/461
Jan,1988

[0 after 0 votes]
4675283
Roninson
435/6
Jun,1987

[0 after 0 votes]
4613566
Potter
435/6
Sep,1986

[0 after 0 votes]
4591567
Britten
435/285.1
May,1986

[0 after 0 votes]
4562159
Shafritz
435/5
Dec,1985

[0 after 0 votes]
4766062
Diamond
435/6
Dec,1969

[0 after 0 votes]
4683202
Mullis
435/91.2
Dec,1969

[0 after 0 votes]
4683195
Mullis
435/6
Dec,1969

[0 after 0 votes]
5149625
Church
435/6
Dec,1969

[0 after 0 votes]
4672040
Josephson
436/526
Dec,1969

[0 after 0 votes]
5492806
Drmanac
435/5
Dec,1969

[0 after 0 votes]
5525464
Drmanac
435/6
Dec,1969

[0 after 0 votes]
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


We claim:

1. A method of determining the sequence of an ambiguous locus in a nucleic acid fragment in a sequencing by hybridization process, said method comprising:

(a) prehybridizing said nucleic acid fragment with an excess of unlabeled first oligonucleotide probe which is exactly complementary to one possible sequence at said ambiguous locus;

(b) competitively hybridizing said nucleic acid fragment with a labeled second oligonucleotide probe which is exactly complementary to a second possible sequence at said ambiguous locus;

(c) detecting whether the labeled second oligonucleotide probe hybridizes, thereby determining the sequence of said ambiguous locus in said nucleic acid fragment.

2. The method of claim 1, wherein step (b) is repeated with other labeled second oligonucleotide probes which are exactly complementary to other possible sequences at said ambiguous locus.

3. The method of claim 1, further comprising the step of forming a covalent link between the complementary sequence and the unlabeled first probe that is hybridized to the complementary sequence in the nucleic acid fragment.

4. The method of claim 3, wherein the covalent link is formed by exposing the unlabeled first probe hybridized to the complementary sequence in the nucleic acid fragment to psoralen in the presence of ultraviolet radiation.

5. The method of claim 1, wherein said oligonucleotide probes are 8-mers to 20-mers.

6. The method of claim 1, wherein said oligonucleotide probes are 11-mers.

7. A method for ordering a plurality of fragments of a subclone nucleic acid sequence having ambiguous loci at their ends, comprising the steps of:

(a) prehybridizing subclone nucleic acids with an excess of an unlabeled oligonucleotide probe at least part of which is exactly complementary to a portion of an ambiguous locus at a first end of one of the fragments;

(b) competitively hybridizing the subclone nucleic acid with a labeled oligonucleotide probe which is exactly complementary to a portion of an ambiguous locus at a first end of another of the fragments, wherein a portion of the labeled oligonucleotide probe sequence at an end of the labeled oligonucleotide probe is the same as the sequence at an end of the unlabeled oligonucleotide probe;

(c) detecting whether the labeled second oligonucleotide probe hybridizes;

(d) repeating steps (a)-(c) with combinations of other unlabeled and labeled oligonucleotide probes until the order of the fragments of the nucleic acid sequence can be determined.
 Description Submit all comments and votes
 


TECHNICAL FIELD

The subject of this invention belongs to the field of molecular biology.

TECHNICAL PROBLEM

Genomes range in size from about 4.times.10.sup.6 base pairs (bp) in E. coli to 3.times.10.sup.9 bp in mammals. Determination of the primary structure, i.e., sequence, of the entire human genome, is a challenge of the 20th century. A further challenge for biology is the determination of the entire genomic sequence for characteristic species of the living world. It would allow qualitative progress in explaining the function and evolution of organisms. It would also be a great step forward in the explanation and treatment of many diseases, in the food industry and in the entire field of biotechnology.

STATE OF THE ART

Prior Art

Recombinant DNA technology has allowed the multiplication and isolation of short fragments of genomic DNA (from 200 to 500 bp) whereby a sufficient quantity of material for determination of the nucleotide sequence may be obtained in a cloned fragment. The sequence is determined on polyacrylamide gels which separate DNA fragments in the range of 1 to 500 bp, differing in length by one nucleotide. Distinguishing among the four nucleotides is achieved in two ways: (1) by specific chemical degradation of the DNA fragment at specific nucleotides, in accordance with the Maxam and Gilbert method (Maxam, A. M. and Gilbert, W., 1977, Proc. Natl. Acad. Sci., 74, 560); or (2) utilizing the dideoxy sequencing method described by Sanger (Sanger, F., et al., 1977, Proc. Natl. Acad. Sci., 74, 5463). Both methods are laborious, with competent laboratories able to sequence approximately 100 bp per man per day. With the use of electronics (computers and robots), sequencing can be accelerated by several orders of magnitude. The idea of sequencing the whole human genome has been discussed at many scientific meetings in the U.S.A. (Research News, 1986, Science, 232, 1598-1599). The general conclusion was that sequencing is possible only in big, organized centers (a sequencing factory) and that it would take about 3 billion dollars and at least ten years. For the time being the Japanese are the most advanced in organizing components of one such center. Their sequencing center has the capacity of about 1 million bp a day at the price of about 17.cent. per bp (Commentary, 1987, Nature, 325,771-772). Since it is necessary to sequence three lengths of a genome, because of random formation of cloned fragments of about 500 bp, 10 billion bp could be sequenced in approximately 30 years in such a center, i.e., it would take 10 such centers to sequence the human genome in several years.

DESCRIPTION OF THE INVENTION

Our process of sequencing, compared with the existing ones, involves an entirely different logic, and is applicable specifically for determining a sequence of complex DNA fragments and/or molecules (more than 1 million base pairs). It is based upon specific hybridization of oligonucleotide probes (ONPs), having a length of 11 to 20 nucleotides.

Conditions of hybridization of ONPs have been found under which complete homology with the target sequence is differentiated from a single base pair mismatch (Wallace, R. B., et al., 1979, Nucleic Acid Res., 6, 3543-3557). If the method of hybridization with 3M tetramethyl ammonium chloride is used, the melting point of the hybrid is dependent only on the ONP length, not its GC content (Wood, W. I., et al., 1985, Proc. Natl. Acad. Sci. U.S.A., 82, 1585-1588). Thus, hybridization under such conditions unequivocally determines the sequence. Hybridization of genomic DNA multiplied as subclones of convenient length with a sufficient number of ONP's will allow the entire genome to be sequenced with the aid of Computerized assembly of these detected sequences. We believe this method is an order of magnitude quicker and cheaper than technique presently being developed. Therefore, it is more suitable for genomic sequencing of all characteristic species.

In order to apply this method, it is necessary to optimize the length, sequence and number of ONPs, the length and number of subclones and the length of the pooled DNA which may represent a hybridizing sample. Eleven-mer ONPs are the shortest oligonucleotides that can be successfully hybridized. This means a priori that 4.sup.11 (4,194,304) ONPs are needed to detect each sequence. The same number of independent hybridizations would be necessary for each subclone or a pool of subclones. Positively hybridizing ONPs would be ordered through overlapping 10-mers. This results in the DNA sequence of the given subclone.

The process of assembling a subclone sequence is interrupted when the overlapping 10-mer is repeated in the given subclone. Thus, uninterrupted sequences are found only between repeated 10-mers or between longer oligonucleotide sequences (ONSs). These fragments of a subclone sequence (SF) cannot always be ordered in an unambiguous linear order without additional information. Therefore, it is important to determine the probable number of SFs (Nsf) distributed along certain length of DNA; this can be achieved through the application of probability calculations.

The distribution of ONSs along a randomly formed DNA sequence is binomial. The average distance between identical neighboring ONSs (A) depends only on the ONS length (L), and is given as: A=4.sup.L. The probability of having ONSs repeated N times in a fragment of the length of Lf bp is given as:

P(N,Lf)=C(N,Lf).times.(1/A).sup.N .times.(1-1/A).sup.Lf (1)

where C(N,Lf) represents the number of the N class combinations consisting of Lf elements. The expected number of different ONSs having the length L or average distance A, which are repeated N times in a Lf bp fragment, is given as the product P(N,Lf).times.A. If a sequence is assembled through the overlapping length L or through average distance Ao, then Nsf in a fragment Lf is represented as:

Nsf=1+Ao.times..SIGMA.N.times.P(N,Lf), N.gtoreq.2 (2)

If all 11-mers (4.times.10.sup.6) are used, about 3 SFs are to be expected per Lf of the length of 1.5 kb. We shall return to the problem of ordering SFs later.

The required synthesis of 4.times.10.sup.6 11-mers, is impracticable for sequencing by hybridization (SBH). However, it is unsuitable to omit a significant number of ONPs (more than 25%), because it leads to gaps in the sequence. A far better way to decrease the number of independent ONP syntheses and of independent hybridizations is to use ordered ONP groups. This method requires sequencing of shorter fragments, but no gaps appear in the resultant sequence. A forty-fold decrease in the number of syntheses and hybridizations requires a seven-fold increase in the number of subclones.

The use of ordered ONP groups, in an informative respect, is the same as using shorter ONPs. For instance, there are 65,536 different 8-mer ONSs. Since 8-mer ONSs, according to the current knowledge, cannot form a stable hybrid, the 11-mer group can be used as an equivalent. In other words, all 11-mers in a group have one 8-mer in common, so that the information obtained concerns only its presence or absence in the target DNA. Each of the anticipated groups of 11-mers contains 64 ONPs of the (N2)N8(N1) type (5'.fwdarw.3' orientation is as written, /Nx/means x unspecified bases, and y/ means y specified bases. A sequence can be detected with about 65,000 such groups. If equation (2) is applied, then DNA fragments 200 bp long are expected to have 3 SFs on average. Due to dispersion, some fragments of this length will have 10 and more SFs.

Because of its unrandom GC and dinucleotide composition, ONPs of the (N2)N8(N1) type are not very convenient for sequencing mammalian DNA. The more AT bases contained in the common sequence of an ONP group, the longer it should be. Taking this into account, there remain three types of suitable probes: (N1)N10, where N10 stands for all 10-mers not containing G+C; (N1)N9(N1), where N9 stands for all 9-mers containing 1 or 2 G+C; and (N2)N8(N1), where N8 represents all 8-mers containing 3 or more G+C. About 81,000 such ONP groups are necessary. The average value of their Ao(Aao) is about 30,000. Equal Ao value in random DNA requires about 130,000 ONPs of the (N2)N8(A or T) and (N2)N8(or rG) types. These ONP groups allow sequencing of fragments 300 bp long, with 3 SFs at average. This increase of 25% in the number of syntheses allows a multifold decrease in the number of necessary SCs (see below). Apart from these probes it is necessary to synthesize an additional 20,000 ONPs in Order to (1) solve the problem of repetitive sequences, (2) confirming the ends of inserts and (3) supplementing information lost due to the fact that it is unfeasible to use ONP's which hybridize with vector DNA.

Repetitive sequences, or, generally speaking, ONSs repeated in tandems and having the length of one or more bp (AAAAAAA . . . TCTCTCTC . . . TGATGATG . . . ) represent a problem in sequencing by hybridization. The above mentioned probes cannot determine length of repetitive sequences that are longer than the common part of a ONP group. Therefore, the precise determination of repetitive ONSs up to 18 bp long, that represent the largest part of these ONPs, requires application of the following ONPs: 160 NP An and Tn, where the value of n stretches between 11 and 18 bp, ONP (AT), ONP Cn and G.sub.n where n is valued from 9 to 18 bp, ONP (AT)n where n takes on the values of (12, 14, 16, 18), 25 ONPs (AC)n, (AG)n (TC)n (TG)n and (CG)n, where n is valued (10, 12, 14, 16, 18), 60 ONPs of the (N1 N2 N3)n type which encompass all trinucleotides and n is (12, 15, 18), 408 ONPs which include all 5-mers in tandems, having the length of 15 and 18 bp, 672 ONPs consisting of 6-mer tandems up to 18 bp long, and 2340 ONPs consisting of 7-mer tandems 18 bp long. The total number of these ONPs is 3725.

In order to confirm the ends of DNA inserts in an subclone, it is necessary to synthesize an additional 2048 ONPs of the N6(N5) and (N5)N6 types, where N6 represents sequences of the vector ends, and (N5) represents all possible 5-mers in both cases.

The problem of the vector DNA can be solved in two ways. One is prehybridization with cold vector DNA shortened for 7 bp at both sides of the cloning site. The other method is to omit ONPs complementary to the vector DNA. Since phage M13 has been chosen as the most suitable phage vector (see below), it would eliminate the use of approximately 7,000 ONPs. This is a significant percentage (11%) if 65,000 ONP's [(N2)N8(N1)] are utilized. It can be decreased to about 3% if, instead of the given 7,000 ONPs, an additional 21,000 ONPs of the (N1)(N.sup.0 1)N8(N1) type are used, where N8is 7,000 M13 8-mers, and (N.sup.0 1) represents each of the trinucleotides not present by the given 8-mer.

The calculations, supra, are related to the sequencing of single stranded DNA. To sequence double stranded DNA it is not necessary to synthesize both complementary ONPs. Therefore, the number of necessary ONPs can be halved. Yet, due to advantages of the M13 system, we will discuss further the method of sequencing the single stranded DNA. In this case, gaps of unread sequences will appear in the subclones, because half the ONPs are used. However, a gap in one subclone will be read in the subclone containing the complementary chain. In a representative subclone library each sequence is repeated about 10 times on average. This means that it is probable to have each sequence cloned in both directions, i.e., that it will be read on both DNA strands. This allows the use of uncomplementary ONPs with only an increase in algorithmic computation. Thus, the total number of necessary ONPs would be approximately 50,000. If an M13 vector, able to package both strands either simultaneously or successively could be devised, the use of uncomplementary ONPs would not impose any additional requirement.

All subclone andr all pools of subclones hybridize with all anticipated ONPs. In this way a set of positively hybridizing s is attained for each subclone, i.e., subclone pool. These ONPs are ordered in sequences by overlapping their common sequences, which are shorter than the ONPs by only one nucleotide. In order to detect overlapping ONPs faster, it is necessary to determine in advance which ONP overlap maximally with each synthesized ONP. Thus, each ONP will obtain its subset of ONPs: (ONPa, ONPb, ONPc, ONPd) 5' ONP.times.3' (ONPe, ONPf, ONPg, ONPh). Ordering must follow the route of detecting which one out of the four ONPs from the 5' side, and which one out of four from the 3' side positively hybridize with the given subclone, i.e., pool. Assembling continues until two positively overlapping ONPs for the last assembled ONP are found. Thereby, one SF is determined. When all SFs are extended to the maximum, the process is terminated.

The number of SFs is increased for a larger length of DNA by using described ONP groups. Generally, unequivocal ordering can be achieved with 3 SFs per subclone when the SFs are counted in the same way as Nsf is calculated in equation (2). These SFs are recognized as the two placed at the ends of an insert with third placed, logic of SFs cannot be The ordering of SFs cannot be solved for a convenient length of a subclone because it would be too short. Our solution is the mutual ordering of SFs and large numbers of subclones, with the possibility of using subclone pools as one sample of hybridization and/or competitive hybridization of labeled and unlabeled ONPs.

To obtain the maximum extended sequence by SBH on subclones that may be used repeatedly, it is necessary to use three subclone libraries in the M13 vector, with 0.5-7 kb inserts and with inserts of different size, which consist of two sequences: their distance in genomic DNA should be about 100 kb. The first library serves primarily to order SFs. These subclones cannot be preserved for later experimentation. These subclones enter hybridization as pools obtained by simultaneous infection during or after growth of a phage. The second library is the basic one. The subclones it contains are convenient for further use. The length of 7 kbp represents the current limit of the size of an insert suitable for successful cloning in M13. The function of the third library is to regularly associate parts of sequences separated by highly homologous sequences longer than 7 kbp as well as uncloned DNA fragments into an undivided sequence.

Hybridization of subclones of all libraries with ONPs, and computation of SFs is followed by mutual ordering of the SFs and subclones. The basic library is the first to be ordered. Overlapping subclones are detected by the content of the whole or of a part of starting SFs of the starting subclone. Generally, all mammalian SFs having a length of 20 bp or more are suitable. The average SF length of these subclones is calculated on the basis of equation (2) is 12 bp. This means that there are enough SFs of suitable length. Besides, it is known among which SFs--most often there are two--is the one which continues from the starting SF. In this case both sequences are examined; the correct one is among them, and overlapping subclones are detected by it. On the basis of content of other SFs, exact displacement of overlapping subclones in relation to the starting subclone is determined. The linear arrangement of subsets (SSFs) is achieved by detecting of all subclones overlapping with the starting subclone. SSFs are defined by neighboring ends of overlapping subclones (either beginning-beginning, beginning-end or end-end). The process of overlapping subclones is resumed by the SF taken from the jutting SSF of the most jutting subclone. The process of assembling is interrupted when encountered with an uncloned portion of DNA, or, similar to forming of SFs, with a repeated sequence longer than 7 kbp. The maximum size of groups of ordered overlapping subclones 7 bp long is obtained by this method, as well as linear arrangement of the SSFs of their SFs.

The length of DNA contained in a SSF an is fundamental in this procedure. This length depends on the number of SSFs, which is equivalent to the number of subclone ends, i.e., two times the number of subclones. A representative library of DNA fragments of 1,000,000 bp requires 700 SCs of 7 kbp. Therefore, the average size of a SSF is 715 bp. The real average number of SFs within an SSF is not 1/10 the number of SFs in a subclone 7 kbp long nor is it dependent on the length of subclone. Instead, the real number is dependent on the length of an SSF. According to equation (2), for the length of 715 bp and an Aao of 30,000 bp (the values for described ONPs), 16 SFs with an average length of 45 bp are expected on average. The ordering of SFs within obtained SSFs is performed through the 0.5 kb subclone library. This method does not require all subclones to be individual; the use of a subclone pool is sufficient. The subclone in a respective pool are informative if they do not overlap with each other. Informatively and technologically suitable is a 10 kb pool of cloned DNA, although it is not the limit. The number of such necessary subclones or pools is such that the maximum size of an SSF formed by them can be no longer than 300 bp. The ONPs proposed for this length are expected to give 3 SFs (equation 2), which, as explained, may be unequivocally ordered. Utilizing binomial distribution has enabled the derivation of the equation :

Nsa=Nsc(1-2Nsc/Nbp).sup.Lms

where Nsa is the number of SSFs greater than Lms, Nsc is the number of bp per fragment or molecule of the DNA being sequenced, Lms stands for the size of on SSF, which an average gives the number of 3 SF, which can be ordered; in this case it amounts 300 bp. On the basis of this equation it is determined that 25,000 subclones of 0.5 kb are necessary for a DNA fragment 1 million bp long. The number of 10 kb pools is 1250. Average size of an SSF obtained with these subclones is 20 bp.

The ordering of SFs is performed by computer detection of pools containing subclones which overlap with the starting SSF. This detection is performed on the basis of the content of a part of the whole SF, randomly chosen from the starting SSF. Contents of other SFs from the starting SSF determine the overlapping proportion of subclones of 5 kb. Due to their high density, the arrangement of SFs in the starting SSF is also determined. At the end of this process, the sequence of each group of ordered subclones of 7 kb is obtained, as well as information about which pool contains the subclone of 0.5 kb bearing the determined sequence. At a certain small number of loci the sequence will either be incomplete or ambiguous. Our calculations show that on average it is not less than one locus per million bp, including 30% of randomly distributed undetected ONSs. These loci are sequenced by convenient treatments of subclones which bear them, followed by repeated SBH or competitive hybridization with suitably chosen pairs of labeled and unlabeled ONPs or by classical method or by the advanced classical method.

The procedure of competitive hybridization will be explained by the example of the sequence 7 bp long repeated twice. In this case two SFs end and two others begin with the repeating sequence TTAAAGG which is underlined.

5' NNNNNNNNNNNNCATTAAAAGG3'

5' NNNNNNNNNNNNCGTTAAAAGG3'

5' TTAAAAGGTACNNNNNNN3'

5' TTAAAAGGCCGNNNNNNN3'

Prehybridization wish surplus of an unmarked ONP, e.g., 5'(N2)CATTAAAAG(N1)3', which cannot hybridize with 5'NNCGTAAAAGG3' due to one uncomplementary base prevents one of the labelled ONPs-5' (N2)AAAAGGTAC(N1)3' or 5' (N2)AAAAGGCCG(N1)3'--from the subsequent hybridization. A pair of mutually competing probes defines a pair of SFs which follow one another. This can be confirmed by an alternative choice of a suitable ONP pair. This procedure may be applied on all repeating ONSs having the length of up to 18 bp. In order to use it for the ordering of a multitude of SFs, prehybridization must be separated from hybridization in both time and space. Therefore, the stability of a hybrid with unlabelled ONP is important. If such stability cannot be achieved by appropriate concentrations of ONPs and by choice of hybridizing temperatures, then a covalent link should be formed between a cold ONP and complementary DNA by UV radiation in presence of psoralen. Alternatively, one might use ONPs which carry reactive groups for covalent linking.

The subclones of the third library are used to link the sequenced portions into a uniform sequence of the entire DNA fragment being sequenced. Approximately 170 subclones are required for 1 million bp. These and other numbers calculated for 1 million bp increase linearly with length for longer DNA fragments. Since these subclones contain sequences which are distanced at 100 kbp on average, they allow jumping over repeated or uncloned sequences, the size of which increases up to 100 kb. This is done by detecting which of the two sequenced portions contains sequences located in one subclone from the library.

The experimental requirement of this method is to have the total number of 50,000 ONPs and hybridizations, and 2120 separated hybridizing subclone samples per DNA fragment approximately 1,000,000 bp long.

The libraries described, supra, are made in the phage vector, M13. This vector allows easy cloning of DNA inserts from 100 to 7,000 bp long, and gives high titer of excreted recombinant phage without bacterial cell lysis. If a bacterial culture is centrifuged, a pure phage preparation is recovered. Additionally, the bacterial sediment can be used for repeated production of the phage as long as the bacterial cell pellet is resuspended in an appropriate nutrient media. With the addition of alkali, DNA separates from the protein envelope and is simultaneously denatured for efficient "dot blot" formation and covalent linking to nylon filters on which hybridization is performed. The quantity of DNA obtained from several milliliters of a bacterial culture is sufficient to hybridize one subclone with all the ONPs. A suitable format for cultivation and robot application on filters are plates similar to micro-titer plates, of convenient dimensions and number of wells. Application of DNA subclones on filters is performed by a robot arm. Even the largest genomes can be satisfactorily sequenced by a robot arm supplied with 10,000 uptake micropipets. After the DNA solution has been removed from the plates, the micropipets are positioned closer to each other by a mechanism reducing the distance between them to 1 mm. The quantity of DNA suitable for 10,000 subclones is then applied to the filter simultaneously. This procedure is repeated with the same 10,000 subclones as many times as necessary. The same procedure is then repeated with all other subclones, their groups numbering 10,000 each. The number of "imprints" of one such group with 50,000 ONPs is approximately 1,000, since each filter can be washed and reused 50 times.

The hybridization is performed in cycles. One cycle requires at maximum one day. All the subclones are hybridized with a certain number of ONPs in one cycle. In order to have the hybridization completed within a reasonable period, an experiment in each cycle should require approximately 1,000 containers, each with one ONP. For the purpose of saving ONPs, a smaller volume of hybridizing liquid is used, and therefore filters are added in several turns. Filters from all hybridizing containers are collected in one container, and are simultaneously processed further i.e., they are washed and biotin is used for labelling of ONPs instead of radioactive particles, colored reactions are developed. All subclones required for sequencing (up to 10 kb in length) can be hybridized in containers with the dimensions 20.times.20.times.20 cm without having to repeat individual cycles.

Hybridization sequencing of fragments cloned in plasmid vectors can be performed one of two ways: by (1) colony hybridization or (2) "dot blot" hybridization of isolated plasmid DNA. In both cases, 2,000 or 3,000 different ONPs in the vector sequence will not be utilized, i.e., will not be synthesized.

Colony hybridization is presumably faster and cheaper than "dot blot" hybridization. Moreover, colony hybridization requires specific conditions to annul the effect of hybridization with bacterial DNA. The marking of probes giving high sensitivity to hybridizations should be done in order to reduce the general background and to allow use of minimal number of bacterial colonies. Marking of ONPs should, however, be via biotinylization, for the benefit of easy and lasting marking in the last step of synthesis. The sensitivity attained (Al-Hakin, A. H. and Hull, R., 1986, Nucleic Acid Research, 14:9965-9976) permits the use of at least 10 times fewer bacterial colonies than standard protocols.

In order to avoid "false positive" hybridization caused by homology of ONP with a bacterial sequence, and in order to use short probes such as 11-mers, which are repeated twice on average in the bacterial chromosome, plasmid vectors that give the maximum number of copies per cell should be used. High copy plasmid vectors, such as pBR322, are amplified to 300-400 copies per bacterial cell when grown in the presence of chloramphenicol. (Lin Chao, S. and Bremer, L., 1986, Mol. Gen. Genet., 203, 150-153). Plasmids pAT and pUC have at least twice the efficiency of multiplication (Twigg, A. J. et al., 1987, Nature, 283, 216-218). Therefore, we can assume that under optimal conditions even 500 copies of plasmid per bacterial cell can be attained. The additional sequences within the hymeric plasmids will surely cause a decrease in plasmid copies per cell, especially in the presence of poisonous sequences. Therefore, operations should be performed with about 200 copies of chimeric plasmid per bacterial cell. This means that, an average, the signal would be 100 times stronger with every 11-mer if there is a complementary sequence on the plasmid as well. That is sufficient difference for hybridization with bacterial DNA not even to be registered, when a small quantity of DNA (i.e., small colonies) is used.

Using a binomial distribution we determined how many ONPs will be repeated more than 10 times in bacterial chromosomes due to random order. Such ONPs will give unreliabl