|
|
|
| United States Patent | 5525464 |
| Link to this page | http://www.wikipatents.com/5525464.html |
| Inventor(s) | Drmanac; Radoje T. (Belgrade, YU);
Crkvenjakov; Radomir B. (Belgrade, YU) |
| Abstract | The conditions under which oligonucleotide probes hybridize preferentially
with entirely complementary and homologous nucleic acid targets are
described. Using these hybridization conditions, overlapping
oligonucleotide probes associate with a target nucleic acid. Following
washes, positive hybridization signals are used to assemble the sequence
of a given nucleic acid fragment. Representative target nucleic acids are
applied as dots. Up to to 100,000 probes of the type
(A,T,C,G)(A,T,C,G)N8(A,T,C,G) are used to determine sequence information
by simultaneous hybridization with nucleic acid molecules bound to a
filter. Additional hybridization conditions are provided that allow
stringent hybridization of 6-10 nucleotide long oligomers which extends
the utility of the invention. A computer process determines the
information sequence of the target nucleic acid which can include targets
with the complexity of mammalian genomes. Sequence generation can be
obtained for a large complex mammalian genome in a single process. |
|
|
|
Title Information  |
|
|
|
|
|
|
| Publication Date |
*
June 11, 1996 |
|
|
|
|
|
| Filing Date |
February 28, 1994 |
|
|
|
|
|
|
|
|
|
|
|
| Parent Case |
This is a continuation of U.S. application Ser. No. 08/048,152, filed Apr.
15, 1993, now abandoned, which is a continuation of U.S. application Ser.
No. 07/576,559, filed Aug. 31, 1990, now abandoned, in turn a
continuation-in-part of U.S. application Ser. No. 07/175,088, filed Mar.
30, 1988, now abandoned. Applicants claim priority under 35 U.S.C. .sctn.
119 of Yugoslavian Application No. P-570/87 filed Apr. 1, 1987 and
Yugoslavian Application No. 18617-P 570/87 filed Sep. 18, 1987, certified
copies of which were submitted in the parent application Ser. No.
07/175,088. |
|
| Priority Data |
Apr 01, 1987[YU]570/87 |
|
|
|
|
|
|
|
|
|
|
|
Title Information  |
|
|
Claims  |
|
|
We claim:
1. A method of sequencing a target nucleic acid of unknown sequence
comprising the steps of:
(a) using conditions which differentiate an exactly complementary
oligonucleotide probe and a probe having a 5' or a 3' mismatched terminal
nucleotide;
(b) contacting a plurality of oligonucleotides, each at least six
nucleotides in length and having a 5' or a 3' terminal nucleotide, with
said target nucleic acid;
(c) forming a duplex between the target nucleic acid and the plurality of
oligonucleotides;
(d) washing the duplex;
(e) detecting oligonucleotides positively hybridizing as part of said
duplex; and
(f) compiling a sequence of the target nucleic acid from overlapping
positively-hybridizing oligonucleotides.
2. The method of claim 1, wherein said target nucleic acid comprises
multiplied fragments of genomic DNA obtained by cloning of said genomic
DNA in vectors based on single-stranded bacteriophages or plasmids in the
form of three subclone libraries having inserts consisting of two parts
separated, on average, by 50 kb to 200 kb of genomic DNA and ranging in
size from 0.1 kb to 1.0 kb or 3.0 kb to 10 kb, and wherein said fragments
of genomic DNA are applied to a filter in the form of a hybridizing
sample, the vector-insert DNA of individual subclones and groups of
subclones being either uninterrupted or sheared to 20 bp.
3. The method of claim 1, wherein said target nucleic acid comprises
multiplied fragments of genomic DNA obtained by in vitro amplification
with DNA polymerase using combinations of from about 5 to about 200
oligonucleotide primers.
4. The method of claim 1, wherein said compiling step comprises linear
ordering of subfragments of genomic DNA obtained by cyclic detection of
overlapped subclones containing said subfragments as determined by overlap
of positively hybridizing oligonucleotide probes for said subclones, said
linear ordering being determined by the presence of a portion of the
subclone in one of said subfragments and a linear displacement between the
subfragments in said subclones of less then 100 bp.
5. The method of claim 1, wherein said compiling step comprises the linear
ordering of subfragments of said target nucleic acid by competitive
hybridization of said subfragments with detectably labeled
oligonucleotides and unlabeled oligonucleotides, wherein a saturating
quantity of unlabeled oligonucleotide comprising a portion complementary
to at least a portion of a subfragment obtained from the subclone to be
detected is applied to a filter followed by separate hybridizations to
said subfragments with labeled oligonucleotide probes comprising a portion
complementary to all or part of the repeated portion of said unlabelled
probe and further comprising a portion complementary to the remaining
unrepeated portion of said subfragment, the sequence of said subfragment
being determined by the portion to which said labelled probe does not
hybridize.
6. The method of claim 1, wherein said target nucleic acid comprises at
least one million base pairs of mammalian DNA in the form of 1250 groups
of hybridizing samples of target nucleic acid comprising, on average, 20
0.5 kb M13 subclones, 700 7 kb M13 subclones, and 170 reconnecting M13
subclones jumping over 100 kb of genomic DNA, and wherein each sample is
exposed to a first set of 1024 groups of nucleic acid probes, each group
consisting of 16 probes having the structure, (A,T,C,G)N10(A,T,C,G),
wherein N10 represents all 10-mers without G and C; a second set of 23040
groups of probes, each consisting of 16 probes having the structure,
(A,T,C,G)N9(A,T,C,G), wherein N9 represents all 9-mers with one or two G+C
nucleotides; a third set of 55834 groups of nucleic acid probes, each
group consisting of 64 probes having the structure,
(A,T,C,G)(A,T,C,G)N8(A,T,C,G) or 256 probes having the structure,
(A,T,C,G)(A,T,C,G)N8(A,T,C,G)(A,T,C,G), wherein N8 represents all 8-mers
with three or more C+G nucleotides; and a fourth set of 3725 groups of
nucleic acid probes, each group consisting of 16 probes having the
structure (A,T,C,G)Nm(A,T,C,G), wherein Nm represents all monotonous
sequences shorter than 18 bp and consisting of repetitive units of 1 to 7
nucleotides.
7. A method of sequencing by hybridization of a complete genomic DNA of an
organism, or large portions thereof, comprising the step of:
hybridizing multiple fragments of said genomic DNA or large portions
thereof, with all or a portion of oligonucleotide probes comprising 8 to
20 nucleotides and representing all or a portion of the possible
oligonucleotide probes consisting of A, T or U, C, G, and their
derivatives and analogs under conditions in which said oligonucleotide
probes hybridize with an entirely homologous portion of said genomic DNA
or with a portion of said genomic DNA which has fewer mismatches than
would result in ambiguous or erroneous sequence determination upon
assembly of positively-hybridizing oligonucleotide probes by determination
of the maximum mutual overlap of said oligonucleotide probes.
8. A method for selecting non-identical oligonucleotide probes, each of
predetermined length and each of which hybridizes, under conditions which
distinguish probes which are exactly complementary from probes which are
not exactly complementary, to a different portion of a target DNA such
that the entirety of said oligonucleotide probes represents a continuous
linear sequence of said target DNA, comprising the steps of
(a) hybridizing a set of non-identical oligonucleotide probes with said
target DNA;
(b) identifying a first oligonucleotide probe of said set which hybridizes
with said target DNA;
(c) further identifying a plurality of subsequent oligonucleotide probes of
said set, beginning with a second oligonucleotide probe of said set, each
of which hybridizes with a portion of target DNA immediately 5' or 3' to a
portion of said target DNA to which a previously-identified
oligonucleotide probe hybridizes; and
(d) selecting a set of non-identical oligonucleotide probes identified in
said identifying and further identifying steps. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
INTRODUCTION
The present invention belongs to the field of molecular biology. It
involves a novel method of sequencing of a target nucleic acid sequence by
hybridization of short oligonucleotide probes to a nucleic acid target.
The oligonucleotide probes can comprise all known combinations of the four
nucleotides of a given length, i.e. oligonucleotides of base composition
adenine (A), thymine (T), guanine (G), and cytosine (C) for DNA and A,G,C,
and uridine (U) for RNA. Conditions are described which allow
hybridization discrimination between oligonucleotides which are as short
as six nucleotides long and have a single base end-mismatch with the
target sequence.
The invention is demonstrated by way of examples in which sequence
information is generated using the method of the invention.
BACKGROUND OF THE INVENTION
2.1. HYBRIDIZATION
Hybridization depends on the pairing of complementary bases in nucleic
acids and is a specific tool useful for the general recognition of
informational polymers. Diverse research problems using hybridization of
synthetic oligonucleotide probes of known sequence include, amongst
others, the different techniques of identification of specific clones from
cDNA and genomic libraries; detecting single base pair polymorphisms in
DNA; generation of mutations by oligonucleotide mutagenesis; and the
amplification of nucleic acids in vitro from a single sperm, an extinct
organism, or a single virus infecting a single cell.
It is possible to discriminate perfect hybrids from those hybrids
containing a single internal mismatch using oligonucleotides 11 to 20
nucleotides in length [Wallace et al., Nucl. Acids Res. 6: 3543 (1979)].
Mismatched hybrids are distinguished on the basis of the difference ill
the amount of hybrid formed in the hybridization step and/or the amount
remaining after the washing steps [Ikuta et al., Nucl. Acids Res. 15: 797
(1987); Thein and Wallace, in Human Genetic Diseases: A Practical
Approach, ed. by J. Davies, IRL Press Ltd., Oxford, pp. 33-50 (1986)].
The reproducible hybridization of different and diverse short
oligonucleotides less than 11 nucleotides long has not been well
characterized previously. Detailed hybridization data that allows a
constant set of conditions for all predictable oligonucleotides is not
available [Besmer et al., J. Mol. Biol. 72: 503 (1972); Smith, in Methods
of DNA and RNA Sequencing, ed. S. Weissman, Praeger Publishers, New York,
N.Y., pp. 23-68 (1983); Estivill et al., Nucl. Acids Res. 15: 1415 (1987).
Information is also not available on the effects of a single
noncomplementary base pair located at the 5' or 3' end of a hybridizing
oligonucleotide that produces a mismatched hybrid when associated with a
target nucleic acid. Hybridization conditions that discriminate between
(1) a perfectly complementary hybridizing pair of nucleic acid sequences
where one partner of the pair is a short oligonucleotide, and (2) a pair
wherein a mismatch of one nucleotide occurs on the 5' or 3' end of the
oligonucleotide, provide a more stringent environment than is required for
internal mismatches because hybrid stability is affected less by a
mismatch at the end of a hybridizing pair of complementary nucleic acids
than for an internal mismatch.
The length of nucleotides that can distinguish a unique sequence in a
nucleic acid of defined size has been predicted [Smith in Methods of DNA
and RNA Sequencing, ed. S. Weissman, Praeger Publishers, New York, N.Y.,
pp. 23-68 (1983)]. Thus random oligonucleotide sequences 16-17 long are
expected to occur only once in random DNA of 3.times.10.sup.9 bp, the size
of the human genome. However, with decreasing probe length, e.g. for
oligonucleotides 5 to 10 nucleotides in length, there is an exponential
increase in the frequency of occurrence within a random DNA of a given
size and complexity. Thus, the purposes for which oligonucleotide probes
are employed can impact on the length of the oligonucleotides that are
used experimentally.
2.2. CONDITIONS FOR HYBRIDIZATION STRINGENCY
Wallace et al. [Nucl. Acids Res. 6: 3543 (1979)] describe conditions that
differentiate the hybridization of 11 to 17 base long oligonucleotide
probes that match perfectly and are completely homologous to the target
nucleic acid as compared to similar oligonucleotide probes that contain a
single internal base pair mismatch. Wood et al. [Proc. Natl. Acad. Sci.
82: 1585 (1985)] describe conditions for hybridization of 11 to 20 base
long oligonucleotides using 3M tetramethyl ammonium chloride wherein the
melting point of the hybrid depends only on the length of the
oligonucleotide probe, regardless of its GC content. However, as disclosed
in these references eleven met oligonucleotides are the shortest ones that
generally can be hybridized successfully, reliably and reproducibly using
known hybridization conditions.
2.3. SEQUENCING
Nucleic acid sequencing methods, where the position of each base in a
nucleic acid molecule in relation to its neighbors is determined to define
its primary structure, were developed in the early 1960's for RNA
molecules and in the late 1970's for DNA. The two major methods for DNA
sequencing, i.e. chemical degradation and dideoxy-chain termination,
involve identification and characterization of 1-500 nucleotide long DNA
fragments, specific for each one of at least four nucleotide bases, on
polyacrylamide gels. The polyacrylamide gels must be able to distinguish
single base pair differences in length between fragments. The fragments
are generated either by chemical degradation [Maxam-Gilbert, Proc. Natl.
Acad. Sci. 74: 560 (1977)] or by dideoxy-chain termination of DNA
fragments synthesized by DNA polymerase [Sanger et al., Proc. Natl. Acad.
Sci. 74: 5463 (1977)]. A sufficient quantity of isolated fragments is
ensured by recombinant DNA technology methods which include cloning,
restriction enzyme digestion, gel electrophoresis, and polymerase chain
reaction amongst others. These methods allow the identification and
amplification of the target DNA to provide material for sequencing.
An intensive amount of manual labor is required in the preparation of
appropriate polyacrylamide gels to resolve small differences in fragment
size. The speed of sequencing in experienced laboratories throughout the
world is approximately 100 bp per person daily. Although the use of
electronic robots and computers allows acceleration of the number of base
pairs actually determined, preparation of polyacrylamide gels, application
of sample, electrophoresis and the subsequent manipulations necessary to
obtain high quality autoradiograms that can be read by machines still
involve significant intensive, skilled, manual labor for which no
substitutes have been found.
2.4. HUMAN GENOME CHARACTERIZATION
The genome of higher eucaryotes has up to a million times greater physical
complexity than is the complexity of individual genes it encodes, giving
it a a corresponding huge informational complexity. From the present
knowledge of genome organization and biochemical, biophysical and
biological functions, the following approximate scale of the informational
complexity for higher eucaryotes can be proposed: 10,000 gene
families--100,000 genes--1,000,000 biological functions. The number of
basic biochemical functions represented by a single gene family is
probably not significantly incensed compared to procaryotic and lower
eucaryotic genomes.
Recently, there has been a surge of interest in mapping and sequencing the
entire human genome [Lewin, Science 232: 1598 (1986); Wada, Nature 325:
771 (1987); Smith and Hood, Bio/Technology 5: 933-939 (1987)]. This stems
from the fact that only 1 in about 75 human genes is either cloned or
mapped (Human Gene Mapping 9, 1987). Unknown genes will have much to tell
us about human biology. In the future, the progress of studies on
molecular evolution may depend on the sequencing of genomes of species
besides humans.
Because sequence information has already provided accelerated knowledge and
potential resolution of diverse biological, medical and therapeutic
research problems, it is not surprising that ideas of sequencing the whole
human genome were discussed at various scientific meetings during the
early and mid-1980's [Research News in Science 232: 1598 (1986)]. Such
massive sequencing projects envision the final determination of
approximately 3 billion base pairs of information encoded in the DNA of
humans and are expected to take at least 10 years at a cost of at least $3
billion dollars using current technology. However, in practice, actual
sequencing of at least three times that number of base pairs is required
to obtain a reliable sequence for the human genome, thus requiring even
more money and time.
Such endeavors present a challenge to the technology of the twentieth
century. Further challenges arise if sequencing projects are extended to
include the determination of the genomic sequences of characteristic
individuals or species of organisms, especially those that have economic,
social or medical importance. Such sequencing projects would advance not
only our understanding of the evolution of organisms and the evolution of
biochemical processes, but would also further the detection, treatment and
understanding of disease, and would aid agriculture, the food industry and
biotechnology in general. However beneficial the results of such projects
would be, their successful completion requires the development of a new,
rapid, reproducible and reliable sequencing method such as those described
in this invention.
Although the ultimate goal of human genome characterization is the
determination of sequence information, progress in characterizing portions
of the human genome or the genome of other organisms have been achieved in
several areas. A linkage map of the human genome based on cloned DNA
probes detecting RFLPs has been obtained [Donis-Keller et al., Cell 51:
319-337 (1987)]. Once mapped, a gene can be approached from a neighboring
DNA marker not only by walking [Cross et al.,Trends Genet. 2: 174 (1986)]
but also by the use of jumping [Collins+Weissman, Proc. Natl'l. Acad. Sci.
USA 81: 6812 (1984); Poustka et al., Nature 325: 353 (1987)] and linking
[Poustka et al., Trends Genet, 2: 174 (1986)] libraries. The task of going
from a marker to a mapped gene is facilitated immensely if an ordered
collection of overlapping cosmid or phage clones representing individual
chromosomes is available. Attempts to provide a library of overlapping
clones using similarities in their patterns of restriction digests have
been tried [Coulson et al., Proc. Natl. Acad. Sci. USA 83: 7821 (1986);
Olson et al., Proc. Natl. Acad. Sci. USA 83: 7826 (1986); Kohara et al.,
Cell 50: 495 (1987)]. Alternatively, the hybridization of a collection of
100 specific oligonucleotides to an array of 3-10.times.10.sup.6
cosmid-containing colonies on filters has been proposed. The resulting
patterns of hybridization identify specific regions along the genome to
which a small collection of cosmids from chromosome libraries can be
fitted in the second step [Poustka et al., Cold Spring Harbor Symp. Quant.
Biol. 51: 351 (1986); Craig et al., in Human Genetics, Proceedings of the
7th International Congress, Berlin, (1986); Michiels et al., CABIOS 3: 203
(1987)]. Such identification however does not provide desired and useful
sequence information of the DNA in a particular identified fragment.
In the area of human genetics, the emphasis is on an individual's DNA and
the methods to detect patterns of its variation and inheritance which may
influence the determination of a patient's chances for health or disease.
The number of genetic regions to be scored in the DNA of an individual
requires a large number of polymorphic probes and makes the use of
traditional Southern blotting unpractical. However, a method that is
capable of amplifying 1000-bp stretches of DNA starting from two flanking
oligonucleotide primers and that requires DNA from only 150 cells of an
individual has been described recently as well as oligonucleotide probes
that can detect mutants in amplified DNA in dot blot hybridization [Saiki
et al., Science 239: 487 (1986)]. Both the method of ordering cosmid
libraries and the method of amplifying DNA use the work of Wallace for
conditions of hybridization that only allowed oligonucleotides of almost
perfect homology to their target DNA to hybridize at all [Wallace et al.,
Nucl. Acids Res.: 3543 (1979)]. In these conditions, almost perfect
homology means that the perfect homology has to exist at least in the
central part of the hybridizing oligonucleotide/target duplex.
SUMMARY OF THE INVENTION
The present invention provides a new method of sequencing that is ideally
suited to the sequencing of large complex genomes because it avoids the
intensive manual labor involved in resolving gel fragments by size on
polyacrylamide gels. The present invention provides methods for sequencing
a target nucleic acid by hybridization of overlapping short
oligonucleotide probes of known or predicted sequence to the nucleic acid
target serially or simultaneously. The oligomer probes of a given size can
contain all or most existing combinations of nucleotides for complete
sequencing and a part of all possible variants for partial sequencing.
Probes can also be composed of oligomers of different sizes as well as
comprising all known combinations of nucleotides that are possible for
that size oligonucleotide. As the size of the probes that are used
decreases, hybridization conditions that are still able to distinguish
between mismatched and perfectly matched short oligonucleotides must be
used.
In one embodiment of the invention, multiple oligonucleotides that are 11
nucleotides long or longer are hybridized to the target sequence.
Hybridization occurs using conditions which are controlled and varied to
ensure discrimination between perfectly matched oligonucleotides and
oligonucleotides having a one base pair mismatch with the target sequence
where the mismatch is located at either one of two ends of the
oligonucleotide.
In another embodiment of the invention, as an alternative to previous
numerous conditions each specific for different sizes and sequences of
probes, a single, or few, sets of conditions is invented for all lengths
and sequence of probes. These hybridization conditions allow
discrimination between perfectly matched and mismatched oligonucleotides
that are as short as six nucleotides long. The conditions allow
discrimination between a perfectly matching oligonucleotide and one that
has a single base mismatch as compared to the target sequence, the
mismatch being located at one of the ends of the oligonucleotide.
Following the detection of hybridization of perfectly matched
oligonucleotides of known sequence, the sequence of the target nucleic
acid is generated by an algorithm using the principle of maximal
nonidentical overlap of probe.
In determining sequence by hybridization, oligonucleotides are prepared,
target fragments are prepared appropriate for the length of
oligonucleotide used for hybridization, and hybridization of the target
with all the oligonucleotides occurs under defined conditions that allow
discrimination in binding of perfectly matched complementary
oligonucleotides and mismatched oligonucleotides. The relationship of
probe size and target length is defined and allows complete sequencing of
genomes. The novel theoretical basis of the relationship between
oligonucleotide probe size and target length is described infra.
To determine the amount of hybridization data that is needed for sequence
determination, the number of target fragments that compose the entire
sequence is multiplied by the number of different oligonucleotides
required to define the sequence of the target fragment. The shorter the
size of the oligonucleotides that are hybridized, the more target
fragments that must be analyzed. Similarly, as the oligonucleotide size
increases, fewer target fragments must be examined.
Hybridization reactions can be performed in separate reaction vessels or by
binding one of the two components (oligomers and DNA fragments) to a solid
surface, like nylon filters etc. Since the described method does not
require macromolecular separation like gel-based sequencing methods, the
surface, bound with either an oligomer or nucleic acid fragment can have
microdimensions.
Some of the advantages of the method of the present invention include the
following:
(1) rapidity, resulting in time effectiveness; (2) elimination of
polyacrylamide gel electrophoresis and the intensive manual labor it
requires; (3) reliability of the predicted base within the determined
sequence due to the hybridization of multiple oligonucleotides to the same
base within a target sequence; (4) the possibility of substantial
miniaturization of the process; (5) ease of automation; (6) resulting cost
effectiveness.
3.1. DEFINITIONS
The following terms and abbreviations will have the meanings indicated:
______________________________________
A adenine
bp base pair
C cytosine
G guanine
IF an M13 clone containing a 921
bp EcoR1-Bg1II human .beta..sub.1
interferon fragment
kD kilo Dalton
nG nanogram
nM nenomolar
pmol picomole
sc subclone
SF subfragment
SOH short oligonucleotide
hybridization
T thymine
CCD Charge Coupled Device
DNA Deoxyribonucleic acid
DP Discrete particle
HA Hybridization area
LAR Ligation-amplification reaction
ON Oligonucleotide
ONP Oligonucleotide probe
ONS Oligonucleotide sequence
PCR Polymerase chain reaction
RE Restriction Enzyme
RFLP Restriction fragment length
polymorphism
RNA Ribonucleic acid
SBH Sequencing by hybridization
______________________________________
DESCRIPTION OF THE FIGURES
FIG. 1 shows methods of generating and ordering subfragments in sequencing
by hybridization.
FIG. 1A shows the sequence of a hypothetical clone for use in the
generation and ordering of subfragments in sequencing by hybridization,
wherein NNNNNNN represents the ends of the vector sequence. The sequences
AGTCCCT and TTGGCTG are the only oligonucleotides 7 bp or longer which are
repeated within the depicted sequence.
FIG. 1B shows the formation of subfragments. Assuming that the content of
8-mers for the depicted sequence is known, the 8-mers are ordered by
maximal overlap which, in the case of the illustrated example, is 7 bp.
Beginning at the 5' 8-mer (NNNNNNNC), ordering is unambiguous up to
gAGTCCCT which, at its 3' end, contains a repeated 7-mer. Large capital
letters denote overlapping sequences shared by different oligonucleotides,
while lower-case letters denote unshared bases. Both AGTCCCTc and AGTCCCTg
may be overlapped with gAGTCCCT, preventing further ordering. Each of the
two sequences serves as a starting point for new ordering (not shown).
Therefore, each repeated sequence of 7 bp or longer represents a branch
point. Unambiguous sequences are obtained between two consecutive branch
points only.
FIG. 1(c) is a listing of subfragments formed from 8-mers of the depicted
sequence. The subfragments are horizontally displayed to indicate overlap.
The orientation is 5' to 3' and end subfragments are identified.
FIG. 1(d) shows that the subfragments cannot be unambiguously ordered into
a starting sequence without additional information. Both arrangements
shown are possible since AGTCCCTcgggTTGGCTG and AGTCCCTgatTTGGCTG have the
same 7-mers at their 5' and 3' ends, respectfully.
FIG. 1(e) demonstrate means of building the sequence from oligonucleotide
blocks. The box on the left represents all 8-mer oligonucleotide sequences
which occur in a 15-base DNA molecule of unknown sequence (NNN . . . NNN).
The 8-mers may be ordered by 7-base overlap (right box). Each additional
hybridizing 8-mer extends the sequence of the starting 8-mer (ACCGTAAA) by
one base. Thus, the sequence is generated by uniquely overlapped oligomer
blocks.
FIG. 2 presents the average number of SFs (N.sub.sf) as a function of the
length of DNA fragment (L.sub.f) for various values of the length of the
overlapping sequence (N-1, in bp), or average distance of two consecutive
identical N-1 sequences in DNA subjected to sequencing by hybridization
(A.sub.o), in kb. The curves are obtained using equation one as described
below in section 5.2.
FIG. 3 describes the kinetic stability of a fully matched hybrid obtained
with a probe 8 nucleotides in length. Stability is expressed as a fraction
of the hybrid dissociated in unit time (minutes) as a function of
temperature. 1.4 pmol of NCATGAGCANN as applied to each dot and hybridized
with TGCTCATG as probe in a concentration of 4 nM. The equal amounts of
hybrid were incubated at the indicated temperatures for a short time in a
large volume of buffer and the remaining hybrid measured. Each point
represents the average value for four dots. The curve is computer fitted
with E.sub..alpha. =47.3 Kcal/mol obtained from the experimental points by
the least squares method.
FIG. 4 indicates the properties of short oligonucleotide hybridization. In
FIG. 4a, non-optimized discrimination with probes 6, 7, and 8 nucleotides
in length is illustrated. The probe GCTCAT was hybridized to the target
sequence NCATGAGCANN which contains the perfectly matching sequence
(underlined). The NNCATGAGTTN target sequence contains an end mismatch
(double underlined). 1.4 pM of each target was applied to the filter. The
probe GCTCATG, and the probe TGCTCATG were used against 50 ng of IF and
M13 DNA. The probe concentration was 4 nM.
In FIG. 4b, limits of signal detection are examined. The indicated volumes
of IF culture supernatants of average titer of 6.times.10.sup.11 pfu/ml
were mixed with an equal volume of 1M NaOH, 3M NaCl and spotted on a
filter as described in a above. Hybridization was at 2.degree. C. with
TGCTCATG as the probe.
In FIG. 4c, the time course of hybridization at 13.degree. C. is shown. The
IF-M13 system was used with 50 ng of phage DNA per dot, and the probe was
TGCTCATG. The 3 hr IF dot contained 18020 cpm measured with 20%
efficiency.
In FIG. 5 the effect of the washing step on discrimination is indicated. In
FIG. 5a, inversion of the signal in IF-M13 pair upon washing is shown. 10
ng of IF and 500 ng M13 DNA were applied, and the probe was TGCTCATG. The
top row was not washed, the other rows were washed at 7, 13 and 25.degree.
C., respectively for the indicated times. A DNA control is included in the
top row also. Hybridization with the M13 specific probe AGCTGCTC measures
amounts of DNA in the two dots. In FIG. 5b, the change of discrimination
with time of washing at 0.degree. C. (full circles) and 13.degree. C.
(open circles) is depicted. 100 ng each of IF and M13 were applied to form
dots. The dots were hybridized to probe TGCTCATG and probe AGCTGCTC was
used in the control DNA hybridization (see top row, on the right, panel
a). The dots were then washed at the indicated temperatures. At each time
point the pairs of dots were removed and the ratio of radioactivity
remaining in the each dot was measured. The D or discrimination was
calculated as the mean value of the ratios for the duplicate pairs of
dots.
FIG. 6 demonstrates the effects of complexity of target sequences on
discrimination. 50 ng each of IF and M13 were hybridized with the
indicated probes at a concentration of 4 nM. No wash was performed. The
number of matched and end base mismatched targets in IF and M13 is
indicated for each probe.
FIG. 7 examines an array of clones for the presence of an oligonucleotide
sequence. 51 recombinant plasmid DNAs (10.+-.5 ng) were spotted in rows B
to H, columns 1 to 8 except row H). Line A and column 9 contained control
DNAs of known sequence. Unknown clones were taken from human brain cDNA
library in Bluescript vector (BS)(Stratogene Cat. No. 935205). Controls of
known sequence in lines A1 to A8 and A9 to G9 are: IF(M13) , M13, Alu(M13)
, IF(BS) , BS, 1M(pUC 9), pUC 9, 2M(pUC 9), respectively except that in
the vertical row Alu(M13) was omitted. 1M and 2M are rat .beta.-globin
gene subclones. The probe concentration was 8 nM. In FIG. 7a, BS specific
probe CTCCCTTT was also contained in IF and 2M inserts but not in M13 and
pUC vectors. In FIG. 7b, the sequence of probe CCAGTTTT was contained in
the IF insert but not in either vector. In FIG. 7c, the sequence of probe
GCCTTCTC was contained in the 1M insert only.
FIG. 8 sequencing 100 bp of 921 bp .beta..sub.1 -human interferon gene
fragment. (IF) by hybridization.
Part 1. Hybridization results. A. Hybridization with 93 probes (72)
octamers and 21 nonamers with the full match in IF. IF and controls rat
globin clones pHEA and pHI were PCR amplified while M13 mp18 and pUC18
were in linearized double stranded form. Base denatured DNA (20 ng of IF
and equimolar amounts of control DNA) were spotted on Gene Screen
membranes (N.E.N.). Hybridization was according to Drmanac et al.,
described in .sctn. 6 below. Briefly, vend labeled probes (3.3 pm, 10 mCI,
Amersham 3000 C/mM in concentration of 10 ng/ml were hybridized at
12.degree. C. in 0.5 M Na.sub.2 HPO.sub.4 pH 7.2, 7% Na-lauryl Sarcosine
for 3 hours. All probes were made by Genesys, Inc., Houston. Hybrids were
washed in 6.times.SSC at 0.degree. for 40 minutes and autoradiographed for
4-48 hours. Test dot signal intensity, Hp, and discrimination as ratio of
signals of test over control dot, D, were visually estimated. For probes
34 and 74, dot radioactivity was measured in a scintillation counter. Hp
was 6,000 and 300 cpm, D was 20 and 4, and a film was exposed for 4 and 48
hr respectively. B. Hybridization with 12 probes (11 octamers and 1
nonamer) which have end mismatch in IF fragment. Control DNAs having
single full match targets were pHEA for probes 97., 98., 102., pUC18 for
95., 100., 104., 105., and M13 for 94., 96., 99., 101., and 103.. Probes
104 and 105 have 3 end-mismatched targets in IF. Hybridization procedures
were as described in A. C. DNA Calibration. 1. and 2. IF | | |