|
Claims  |
|
|
I claim:
1. A method for simultaneous identification of overlapping cosmid clones
among multiple cosmid clones, comprising:
(a) arranging the multiple cosmid clones, whereby each clone may be
identified and replicas of said arrangement may be generated;
(b) pooling a first portion of the multiple cosmid clones and synthesizing
mixed end-specific RNA probes from the DNA inserts that have been prepared
from said pooled clones, wherein said portion includes less than all of
said multiple cosmid clones;
(c) hybridizing the probes to a replica of said arranged cosmid clones and
identifying the cosmid clones in the replica that hybridize to the probes,
wherein said identified clones include the pooled cosmid clones and cosmid
clones that contain DNA inserts that overlap with the DNA inserts in the
pooled clones;
(d) repeating said hybridization step with a second portion of mixed
end-specific probes that are prepared from a second pooled portion of
multiple cosmid clones; and
(e) identifying the cosmid clones in each replica to which both the probes
of steps b) and d) hybridize thereby identifying overlapping clones.
2. The method of claim 1, wherein said second portion includes at least one
clone that was present in the first portion.
3. The method of claim 1, wherein one clone from said first portion is
added to said second portion of pooled clones prior to preparing probes
therefrom, whereby the clones that hybridize to both portions of probes,
other than those that include the DNA inserts from which the probes were
synthesized, contain DNA inserts that overlap with said added clone from
the first portion.
4. The method of claim 1, further comprising:
(f) repeating said hybridization step with an additional portion of mixed
end-specific probes that are prepared from an additional portion of the
pooled multiple clones, wherein said additional portion includes at least
one clone that was present in the first and second portions, but does not
include any other clones that were previously pooled; and
(g) identifying the cosmid clones in the library to which the probes of
steps b), e), and f) hybridize, wherein the identified clones are other
than those which correspond to the pooled clones.
5. The method of claim 4, further comprising:
(h) repeating steps (f) and (g) a plurality of times until all of the
cosmid clones in the library have been pooled and hybridized to the
library.
6. The method of claim 1, wherein the arrangement is a two-dimensional
matrix and the clones are pooled pairwise according to the rows and
columns of a two-dimensional matrix.
7. The method of claim 1, wherein the arrangement is a three-dimensional
matrix and the clones are pooled according to intersecting planes of the
three-dimensional matrix, wherein following hybridization the replicas are
compared according to intersecting planes of the matrix.
8. The method of claim 7, wherein groups of three replicas produced by
hybridizing probes prepared from pooled clones according to three
intersecting planes are compared, whereby the clones on all three replicas
that hybridize to probes from each of the pooled clones include DNA that
overlaps with the clone that occurs at the intersection of the three
planes.
9. The method of claim 1, wherein said cosmid clones include sequence of
nucleotides flanking at least one end of the inserted DNA that serve as
promoters for the synthesis of the end-specific probes.
10. The method of claim 9, wherein said at least one of the flanking
sequences includes a sequence of nucleotides that is recognized as a
promoter by a bacteriophage polymerase, and that is positioned operatively
for transcription of the inserted DNA fragment.
11. The method of claim 10, wherein both flanking sequences include
sequences of nucleotides that are recognized as promoters by a
bacteriophage RNA polymerase, wherein said promoters are oppositely
oriented and positioned operatively for transcription of the inserted DNA
fragment.
12. The method of claim 11, wherein each of the bacteriophage RNA
polymerase-specific promoters is selected from the group consisting of
promoters specific for bacteriophage T7 RNA polymerase, and promoters
specific for bacteriophage T3 RNA polymerase.
13. The method of claim 9, wherein said cosmid clones are prepared by
inserting DNA fragments into the cloning sites of a cosmid vector selected
from the group consisting of pWE8, pWE10, pWE15, and PWE16.
14. The method of any one of claims 10 through 12, wherein said cosmid
clones include at least two cos sites.
15. The method of claim 9, wherein said cosmid clones are prepared by
inserting DNA fragments into the cloning sites of a cosmid vector selected
from the group consisting of sCOS-1, sCOS-2, and sCOS-4.
16. A method for physical mapping of complex genomes comprising:
(a) preparing a genomic library of cosmid clones by inserting DNA fragments
from said genome into cosmid vectors, wherein the cosmid vectors include
sequences of nucleotides that flank at least one end of the inserted DNA
and that serve as transcription initiation sites for the synthesis of
end-specific probes;
(b) arranging the cosmid clones, whereby each clone may be identified and
replicas of said arrangement may be generated;
(c) pooling portions of cosmid clones and synthesizing pools of mixed
end-specific probes from the DNA inserts that have been prepared from said
pooled clones, wherein each pool contains fewer than all of the cosmid
clones in the library but all of the cosmid clones in the library are
included in at least one pool;
(d) hybridizing each pool of probes to a replica of said arranged cosmid
clones and identifying the cosmid clones in each replica that hybridize to
the probes, wherein said identified clones include the pooled cosmid
clones and cosmid clones that contain DNA inserts that overlap with the
DNA inserts in the pooled clones;
(e) identifying the cosmid clones from among those identified in step (d)
the clones that hybridize to two or more pools of probes, thereby
identifying groups of cosmid clones that include overlapping DNA; and
(g) assembling contigs from said groups into a physical map of the genome
from which the library was derived.
17. The method of claim 16, wherein each portion includes at least one
common clone that was present in one of the other portions, whereby the
clones identified in step (e) contain DNA inserts that overlap with the
common clone.
18. The method of claim 16, wherein in step (e) the cosmid clones in each
replica that include clones that hybridize to two or more pools are
identified by comparing pairs of replicas produced by hybridizing pools
that include one clone in common.
19. The method of claim 16, wherein the location of each individual clone
in the replica is identified by unique coordinates that describe the
location of the clone in the replica.
20. The method of claim 16, wherein the arrangement is a matrix, and the
location of each clone int he matrix is uniquely identified by at least
two coordinates.
21. The method of claim 20, wherein the clones whose locations include one
or more common coordinates and at least one different coordinate are
pooled in step (c).
22. The method of claim 16, wherein said cosmid vectors contain at least
one promoter specific for a bacteriophage RNA polymerase and a cloning
site for the insertion of DNA fragments, wherein aid promoter is
positioned operatively for transcription of a DNA fragment into said
cloning site.
23. The method of claim 22, wherein said cosmid vectors contain two
oppositely oriented promoters, each of which is specific for a
bacteriophage RNA polymerase and is positioned operatively for
transcription of a DNA fragment inserted into said cloning site.
24. The method of claim 23, wherein each of said bacteriophage RNA
polymerase-specific promoters is selected from the group consisting of
promoters specific for bacteriophage T7 RNA polymerase, and promoters
specific for bacteriophage T3 RNA polymerase.
25. The method of claim 24, wherein said cosmid vector is selected from the
group consisting of pWE8, pWE10, pWE15, and pWE16.
26. The method of any one of claims 17 through 20, wherein said cosmid
vectors contain at least two cos sites.
27. The method of claim 26, wherein said cos sites are separated by unique
restriction sites.
28. The method of claim 27, wherein said cosmid vector is selected from the
group consisting of sCOS-1, sCOS-2, and sCOS-4. |
|
|
|
|
Claims  |
|
|
Description  |
|
|
RELATED APPLICATIONS
This application is related to the patent application U.S. Ser. No.
039,509, filed Apr. 17, 1987 and its continuation-in-part application U.S.
Ser. No. 181,836, filed Apr. 15, 1988, both of which are expressly
incorporated herein by reference.
FIELD OF THE INVENTION
The present invention relates to a method of recombinant DNA technology.
More particularly, the invention concerns a process for physical mapping
of large complex genomes, including human chromosomes. The process
("multiplex analysis") is an alternate strategy for "bottom-up" mapping,
and depends on the use of cosmid vectors containing endogenous
bacteriophage promoters to allow for the identification of overlapping
clones by hybridization with RNA probes synthesized directly from the DNA
fragments inserted therein. Since the recognition of overlaps is not based
on pattern recognition, analysis may be carried out simultaneously on
cosmid clones grouped together.
BACKGROUND OF THE INVENTION
The complete analysis of large complex genomes, such as genomes of higher
eukaryotes, including human, requires the extensive isolation,
purification and analysis of large fragments of DNA by cloning, generally
in E. coli. In the past, the lambda bacteriophage cloning system has been
used most frequently to generate genomic libraries. The lambda
bacteriophage vectors usually accommodate inserts up to about 20 kb.
Presently the primary system used to clone and manipulate large DNA
fragments is that of cosmid vectors. Cosmid vectors allow the packaging of
DNA fragments of up to about 45 kb in plasmids containing bacteriophage
cos sites for in vitro packaging.
The analysis of complex genomes involves the application of both "top-down"
and "bottom-up" mapping strategies. The "top-down" strategy depends on the
separation on pulsed field gels of large DNA fragments generated using
rare restriction endonucleases for physical linkage of DNA markers and the
construction of long-range maps [Schwartz et al., Cell 37, 67 (1984);
Southern et al., Nucleic Acids Res. 15, 5925 (1987); Burke et al., Science
236, 806 (1987)] The "bottom-up" strategy depends on identifying
overlapping sequences in a large number of randomly selected bacteriophage
or cosmid clones by unique restriction enzyme "fingerprinting" and their
assembly into overlapping sets of clones. "Top down" mapping is inherently
more rapid and less labor intensive but does not generate sets of DNA
clones for further structural or biological analysis. "Bottom-up" mapping
generates the required sets of overlapping clones but application of
current strategies and pattern matching algorithms to mammalian genomes
will require the analysis of thousands to tens of thousands of individual
clones for the generation of complete maps.
In the past few years, "bottom-up" mapping strategies have been
successfully applied to generate complete or partial genomic maps of E.
coli, C. elegans and S. cerevisiae.
Olson et al., Proc. Natl. Acad. Sci. USA 83, 7826 (1986), fingerprinted
5000 randomly selected lambda clones containing inserts of about 15 kb of
genomic DNA from S. cerevisiae, by measuring the restriction fragment
lengths obtained upon double digestion with EcoRI and HindIII. They used a
pattern matching algorithm to construct overlapping sets of clones
(contigs) extending over about 60% of the S. cerevisiae genome.
Coulson et al., Proc. Natl. Acad. Sci. USA 83, 7821 (1986) adopted a
somewhat different methodology to construct a physical map of the genome
of Caenorhabditis elegans, a nematode having a genome of approximately
8.times.10.sup.7 base pairs. They digested cosmid DNAs with the
restriction enzyme HindIII having a 6-bp specificity, filled the
5'-overhang with radioactive nucleotides, digested with the 4-bp specific
enzyme Sau3A, and determined the size of the labeled fragments by
electrophoresis in a sequencing gel followed by autoradiography. The mean
size of the DNA inserts in the cosmid vectors was about 34 kb. Eight
hundred sixty clusters of clones, totaling about 60% of the Caenorhabditis
elegans genome, have been characterized.
Kohara et al., Cell 50, 495 (1987) analyzed 1025 lambda phage clones
containing about 15.5-kb inserts of genomic E. coli DNA. For each clone
they constructed a complete restriction map by means of eight restriction
enzymes. The data for the 1025 clones were processed and sorted into 70
groups, including seven standing alone clones representing about 94% of
the entire genome of E. coli.
While effective, the application of these "fingerprinting" and pattern
matching strategies to mammalian genomes would require the individual
analysis of tens or hundreds of thousands of clones for map construction
as well as highly efficient computer algorithms for pattern recognition.
Moreover, these and similar "fingerprinting" protocols require substantial
amounts of overlap of 5 to 25% for the overlapping region to be detected.
A theoretical analysis of "fingerprinting techniques" has suggested that
the efficiency of the analysis is strongly dependent on the criteria used
to declare overlaps between clones. According to Lander et al., Genomics
2, 231 (1988), the minimum detectable overlap has a major effect on the
progress of the mapping project. Reducing the degree of overlap required
for detection would substantially decrease the number of the clones which
must be analyzed to obtain map closure.
Another way of detecting overlaps is the identification of overlapping
clones by hybridization with RNA probes instead of pattern recognition.
The identification of several bacteriophage-encoded RNA polymerases and
the sequencing of their promoters has spawned a new technology for
producing RNA probes. Cloning vectors are now available in which the
promoters for a single polymerase, or for two different polymerases, lie
adjacent to a cloning site. Transcription with any of the available
polymerases enables one to produce large quantities of high-specific
activity RNA probes which correspond to either the coding or the
non-coding strands [Wahl et al., Methods in Enzymology 152, 572 (1987)].
Wahl et al., Proc. Natl. Acad. Sci. USA 84, 2160 (1987) (see also U.S. Ser.
No. 181,836 filed Apr. 15, 1988) have designed special cosmid vectors for
rapid genomic "walking" and restriction mapping. These vectors (designated
as pWE for "walking easily") contain the transcription promoters from
either bacteriophage SP6, T7, or T3 flanking a unique cloning site for the
insertion of genomic DNA fragments. These vectors allow the synthesis of
end-specific RNA probes directly from the DNA inserts, and are suitable
for the detection of overlapping regions of several hundred bp in
contiguous cosmids.
One practical limitation of cloning in cosmid vectors, including the above
pWE vectors, is that most vectors require the initial preparation of very
high quality genomic DNA, digestion to appropriate size range for cloning,
and the careful purification of appropriately sized DNA fragments on
gradients or gels [DiLella et al., Methods in Enzymology 152, 199 (1987)].
In the traditional cosmid cloning procedure, linearized cosmid vectors are
dephosphorylated to avoid concatamerization, prior to ligation to the DNA
fragments. Since the DNA inserts cannot be dephosphorylated, their size
fractionation is unavoidable to avoid recombinational rearrangements
caused by multiple inserts ligated into a single cosmid. For these
manipulations, a substantial quantity of genomic DNA is required to
construct a representative genomic library, and cosmid cloning has not
been practical in situations where only submicrogram amounts of DNA can be
isolated. Bates et al. Gene 26, 137 (1983) described cosmid vectors with
two cos sites separated by a blunt-end restriction enzyme site. They found
that the double cos-site vectors eliminate the need to prepare two
separate cosmid arms, and the internal blunt-end restriction site prevents
cosmid concatamerization. Thus, a double restriction enzyme digestion was
found to be sufficient to prepare a vector for subsequent ligation with
DNA fragments which were dephosphorylated to prevent their self-ligation.
This technique eliminated the need to purify insert DNA of the proper size
(30-45 kb).
The use of cosmid vectors with two or more cos sites has been shown to
simplify the cloning procedure by eliminating complex preparation of
cloning "arms" by Ehrich et al. in Gene 57, 229 (1987).
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows the vector sCOS-1 designed for cosmid multiplex analysis. The
vector contains bacteriophage T3 and T7 promoters flanking a unique BamHI
cloning site, NotI sites for expedited restriction mapping and excision of
the insert DNA, duplicated cos sites for high efficiency microcloning, a
dominant selection for transfection into mammalian cells, Amp and Kn
resistance genes, and ColE1 origin of replication.
FIG. 2 illustrates the construction of cosmid vector sCOS-1. Relevant
restriction sites in the precursor molecules are shown. ClaI-SalI and
CalI-XhoI fragments were excised from pWE15 and pDVcos143 and purified on
agarose gels. The indicated fragments were joined using T4 DNA ligase and
coligation of the XhoI and SalI sites resulted in the loss of both sites
in the resulting plasmids.
FIG. 3 depicts the DNA sequences of the cloning site, bacteriophage
promoters and flanking restriction sites in sCOS vectors. Restriction
sites and T3 and T7 promoter sequences added using synthetic
oligonucleotides are shown. SfiI, NotI, EcoRI and SacII restriction sites
are indicated by thin lines. The direction of transcription using T3 or T7
polymerase is indicated by the arrows and the thick lines delineate the
critical nucleotides for promoter activity. The BamHI site is the cloning
site into which MboI digested genomic DNA is inserted. All linkers were
inserted by "linker-tailing" into the sites formed by digestion of sCOS-1
with EcoRI.
FIG. 4 illustrates the strategy for analysis of physical linkage using
groups of cosmids.
A. Cosmids prepared in vector sCOS-1 or one of its derivatives can be used
to synthesize end-specific probes for the detection of overlaps.
B. Cosmid clones are inoculated on the surface of a nitrocellulose or nylon
filter from 96-well archive plates stored at -70 degrees. Each clone on
the "grid" is assigned a unique identifying Y and X axis coordinate.
Individual clones in the collection contain the innate capacity of
generating probes specific for the extreme ends of the genomic DNA insert
and detecting overlapping clones on the filter. The arrows show the
locations of potential overlapping clones detected by hybridization of
probes generated from the clone at coordinates Y=2, X=7.
C. and D. To enable analysis of multiple clones simultaneously, cosmids are
pooled according to the rows and columns of the matrix, DNA prepared and a
mixed RNA probe synthesized. When hybridized to the matrix filter, the
probe detects a pattern of spots consisting of all of the template clones
and the collection of clones overlapping with one end of each of the
template clones. A similar procedure is carried out using cosmids pooled
according to columns of the matrix. When the two data sets are compared,
hybridizing clones identified by both of the mixed probes may be
overlapping with the template clone common to both sets: that clone
located at the intersection of the row and column. This procedure may then
be repeated using other combinations of pooled probes and either T7 or T3
polymerase. The arrows denote the location of a clone which overlaps with
the "T7 end" of the clone at coordinates Y=2, X=4.
FIG. 5 shows the cosmid multiplex analysis of a collection of cosmids
mapping to the long arm of human chromosome 11.
A. Multiplex analysis of human cosmid clones arrayed in a 36.times.36
matrix and hybridized with a mixed probe consisting of RNA transcripts
from clone of a row of the matrix. A portion of the filter is shown.
B. A portion of the filter shown in A hybridized with a mixed probe
representing a pool of all cosmids aligned along a column of the matrix.
The arrow identifies a cosmid clone which hybridizes with both mixed
probes and is linked to the clone located at the intersection of the row
and column from which probe mixtures were prepared.
FIG. 6 shows predicted contigs from human chromosome 11q and restriction
enzyme digestion analysis.
A. Predicted linkage and orientation of a representative cosmid contig
generated by multiplex analysis of the chromosome 11q cosmid set and data
analysis using the computer program "Contig-maker". The computer output
indicates the coordinates of linked clones (X,Y) and the arrows denote the
orientation of the linkage.
B. Restriction map and location of probes used to establish unequivocal
overlap of the cosmids. A restriction map of the overlapping clones
detected in A was determined by the analysis of partial EcoRI digestion
products hybridized with .sup.32 P-labeled T3 or T7 promoter-specific
oligonucleotides. Overlapping areas not confirmed by restriction map
analysis were confirmed by hybridization analysis using end-specific RNA
probes generated from individual cosmid clones. Cosmid clones c14,23 and
c19,27 are identical. .quadrature. indicates bacteriophage T3 promoter,
bacteriophage T7 promoter.
SUMMARY OF THE INVENTION
The present invention relates to a rapid and powerful method for
"bottom-up" mapping that is applicable to mammalian chromosomes and allows
for the simultaneous analysis of multiple cosmid clones for the detection
of overlaps. The method, called "cosmid multiplex analysis", depends on
the use of cosmid vectors allowing for the synthesis of corresponding RNA
sequences (probes) specific to the extreme ends of the DNA fragments
inserted therein, directly from the DNA inserts. In this way, rather than
depending on "fingerprinting" procedures for detection of overlapping
clones, cosmid libraries are constructed using vectors containing at least
one bacteriophage promoter adjacent to the genomic DNA insert, positioned
operatively for the transcription thereof. Preferably, the cosmid vectors
contain two bacteriophage promoters flanking the DNA fragment ligated into
the insertion site. Synthesis of an end-specific RNA probe from any clone
in the collection allows the overlapping clones to be easily detected by
hybridization. Because this strategy does not depend on pattern
recognition for detecting overlaps, analysis may be carried out
simultaneously on cosmid clones grouped together. The method is suitable
for the unambiguous detection of overlapping regions as small as several
hundred nucleotides in contiguous cosmids. Accordingly, the number of
clones needed for map closure can be reduced by up to three-fold. Finally,
this strategy represents essentially simultaneous cosmid "walking" and
thus is basically non-random, allowing the investigator the freedom to
pause and investigate some interesting biology rather than requiring
completion of the map before it becomes useful.
In one aspect, the present invention relates to a process for simultaneous
analysis of multiple cosmid clones, comprising:
(1) synthesizing mixed end-specific RNA sequences directly from DNA
templates prepared from groups of cosmid clones pooled together,
(2) hybridizing the mixed end-specific RNA sequences derived from
individual groups of cosmid clones to a replica of all cosmid clones to be
analyzed, whereby a data set of hybridization spots corresponding to all
of said DNA templates and the collection of DNAs overlapping with one end
of each of the DNA templates is identified,
(3) identifying cross-hybridizing clones which are common to two or more
data sets.
In a preferred embodiment, the cross-hybridizing clones are identified by
pairwise comparison of data sets obtained from two groups of cosmid clones
containing at least one common clone. The cosmid clones are preferably
pooled according to the rows and columns of a two-dimensional matrix.
In a further aspect, the invention relates to a process for physical
mapping of complex genomes, comprising:
(1) generating a genomic library of clones in cosmid vectors allowing for
the synthesis of end-specific RNA sequences directly from at least one end
of a DNA fragment inserted therein,
(2) providing groups of cosmid clones pooled together,
(3) synthesizing mixed end-specific RNA sequences directly from DNA
templates prepared from said groups of cosmid clones,
(4) hybridizing the mixed end-specific RNA sequences derived from
individual groups of cosmid clones to a replica of all cosmid clones to be
analyzed, whereby a data set of hybridization spots corresponding to all
of said DNA templates and the collection of DNAs overlapping with one end
of each of the DNA templates is identified,
(5) identifying cross-hybridizing clones which are common to two or more
data sets, and
(6) assembling contigs of said cross-hybridizing clones.
In a preferred embodiment, the cosmid vectors used in the above processes
comprise two oppositely oriented promoters, each of which is specific for
a bacteriophage RNA polymerase, positioned on two sides of the cloning
site. Most preferably, the vectors contain T3 and T7 endogenous
bacteriophage promoters flanking the cloned genomic DNA. Vectors
containing at least two cos sites are particularly preferred, since they
allow the use of DNA fragments without previous size separation.
From the list of linked clones produced by this technique, contigs can be
assembled either manually or through computer analysis of the data.
DETAILED DESCRIPTION OF THE INVENTION
Definitions
"Genomic library" is a mixture of clones constructed by inserting fragments
of genomic DNA into a suitable vector. The term "library" implies the
existence of large numbers of different recombinants out of which only a
few are of immediate interest to the investigator.
The terms "cosmid" and "cosmid vector" and grammatical variations thereof,
are used interchangeably and refer to plasmid vectors that contain a
lambda bacteriophage cos (cohesive end) site. The lambda bacteriophage
packaging system selects DNA molecules of about the size of the lambda
genome (37-52 kb). Accordingly, plasmid recombinant DNA having a minimum
size of about 38 kb and a maximum size of about 52 kb (about 78% and about
105% of phage lambda, respectively), can be packaged in vitro in the
lambda phage coat. In addition to the cos site(s) cosmid vectors usually
contain a marker gene allowing for selection in bacteria (antibiotic
resistance gene), and one or more unique restriction sites for cloning.
Plasmids with a large variety of cloning sites and prokaryotic and
eukaryotic selection markers can be converted to cosmids by insertion of
the lambda cos region.
The term "plasmid" refers to circular, double-stranded DNA loops which in
their vector form, are not bound to the chromosome.
As used herein, the term "a promoter specific for a bacteriophage RNA
polymerase" means a wild-type or non-wild-type promoter that can be used
by the bacteriophage RNA polymerase for in vitro transcription of a DNA
fragment. When a non-wild-type promoter is used for such in vitro
transcription of a DNA fragment, transcription will occur at a rate which
is at least 10% of the rate at which transcription would have occurred if
a wild-type or native promoter had been used by the bacteriophage RNA
polymerase to transcribe the DNA fragment in vitro.
The term "cloning site" as used herein, means restriction endonuclease site
on the DNA sequence of the cosmid vectors of the present invention where a
DNA fragment can be inserted without deleting any of the original DNA.
The term positioning a promoter "operatively for transcription of a DNA
fragment" as used herein, means that the promoter will be positioned in
such a way that any DNA sequences between the promoter's transcriptional
start site and the DNA fragment will not prevent transcription of at least
a portion of the DNA fragment by the promoter. The term "at least a
portion" means that preferably at least 8bp and more preferably at least
about 30 bp of the DNA fragment will be transcribed.
The terms "end-specific RNA sequences", "RNA probes", and grammatical
variations thereof, are used to refer to hybridization probes obtained by
transcription of corresponding DNA fragments.
Clones are overlapping if they contain contiguous DNA in the same
relationship as that in the genome. One method for detecting overlaps is
to synthesize an RNA probe from one end of a first clone. If this probe
detectably hybridizes with an end of the second clone under standard
hybridization conditions, the two clones are overlapping [Wahl et al.,
PNAS USA 84, 2160 (1987)].
The term "contig" was introduced by Rodger Staden, Nucleic Acids Res. 8,
3673 (1980) in connection with DNA sequence analysis, and refers to groups
of clones with contiguous nucleotide sequences.
Materials and General Methods
Unless otherwise stated, the present invention was performed using standard
procedures, as described, for example in Maniatis et al., Molecular
Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, N.Y., USA (1982); Davis et al., Basic Methods in Molecular
Biology, Elsevier Science Publishing, Inc., New York, USA (1986); or
Methods in Enzymology: Guide to Molecular Cloning Techniques Vol. 152, S.
L. Berger and A. R. Kimmerl Eds., Academic Press Inc., San Diego, USA
(1987).
Cell lines
TG 5D1-1 is a Friend cell line derived from somatic cell hybrid 5D1 that
carried an intact human X chromosome 11 [Pyati et al., Proc. Natl. Acad.
Sci USA 77, 3435 (1980)], and was selected for the loss of the entire X
chromosome and most of chromosome 11. TG 5D1-1 contains the distal portion
of chromosome 11 as the only human material in a mouse genomic background
[Maslen et al., Genomics 2, 66 (1988)]. Cytogenetic and molecular analysis
indicates that the amount of human DNA represented about 1% of the mouse
genomic background [Maslen et al., Supra].
Bacterial Strains
Cosmid vectors were replicated in E. coli strain DH5, a derivative of the
strongly recA. strain DH1 (commercially available, e.g. from Bethesda
Laboratories, Gaithersburg, Md., USA), in AG1 (Stratagene Cloning Systems,
San Diego, Calif.) a derivative of DH5 selected for high packaging
efficiency, or in HB101 (commercially available, e.g. from Bethesda
Laboratories, Gaithersburg, Md., USA).
Cosmid Vectors
Genomic libraries were constructed in cosmid vector sCOS-1 illustrated in
FIG. 1. sCOS-1 was prepared from cosmid vectors pWE15 [Evans et al.,
Methods in Enzymology 152, 604 (1987) and U.S. Ser. No. 181,836, ATCC
Accession No. American Type Culture Collection (ATCC), 12301 Parklawn
Drive, Rockville, Md. 20852 U.S.A. 37503] and pDVcos134 [a gift from J.
Reese, in wide circulation among scientists] as shown in FIG. 2. pWE15 DNA
was digested with ClaI and SalI, and the 6 kb ClaI-SalI restriction
fragment, lacking the cos sequence was purified. Cosmid pDVcos134 was
digested with ClaI and XhoI and a restriction fragment containing the
duplicated cos region was purified on a low melting point agarose gel. The
purified fragments were ligated using T4 DNA ligase and transformed into
E. coli host strain DH5.
Other pWE plasmids suitable for genomic mapping according to the invention
are disclosed in Evans et al., Methods in Enzymology, Supra and U.S. Ser.
No. 181,836. Cosmid vector pWE16 has been deposited with the American Type
Culture Collection, and has been accorded ATCC No. 37524.
Cosmids sCOS-2 and sCOS-4 are derivatives of sCOS-1 where the cloning site
has been altered to substitute other rare restriction sites for the NotI
sites. Cosmid vector sCOS-2 was constructed by digesting sCOS-1 with
EcoRI, and purifying the plasmid DNA away from the NotI-T3
promoter-BamHI-T7 promoter-NotI linker sequence by ethanol precipitation.
A 30-nucleotide double-stranded synthetic oligomer with EcoRI coadhesive
ends, containing NotI-T3 promoter-BamHI-T7 promoter-Sac2 sequences was
added by linker-tailing [Lathe et al., DNA 3, 173 (1984)]. sCOS-4 was
constructed using a similar procedure adding a double-stranded synthetic
oligonucleotide containing EcoTI coadhesive ends and a SfiI-T3
promoter-BamHI-T7 promoter-SfiI sequence. The sequences of the
linker-cloning promoter sequences of sCOS vectors are shown in FIG. 3.
Construction of Cosmid Libraries In sCOS Vectors
High molecular weight genomic DNA for cosmid cloning was prepared by
proteinase k digestion and gentle phenol extraction followed by dialysis
[DiLella et al., Methods in Enzymology 152, 199 (1987)]. The average
molecular size of the isolated DNA was determined using field inversion
gel electrophoresis [Carle et al., Science 232, 65 (1986)] and ranged from
about 500 kb to greater than 3 mb. DNA was digested with MboI under
conditions recommended by the manufacturers and the digestion terminated
by phenol/chloroform extraction. Following digestion, the DNA was analyzed
on field inversion gels or 0.3% agarose gels to determine the average size
of the digestion products. For the construction of genomic libraries in
cosmid vector sCOS-1 genomic DNA was digested to an average size of
100-120 kb, and dephosphorylated with calf intestinal phosphatase. The
genomic DNA was not size separated before cloning.
Vector cloning arms were prepared by first digesting purified sCOS vector
DNA with XbaI followed by dephosphorylation with calf intestinal alkaline
phosphatase. The reaction was terminated by phenol/chloroform extraction
and the DNA collected by ethanol precipitation. The linearized,
dephosphorylated vector DNA was then digested with BamHI, extracted with
phenol/chloroform and stored at a concentration of 1 mg/ml in 20mM
TRIS.HCl, pH.6, 1 mM EDTA. Ligations were performed using 1 .mu.g of
vector arms and 50 ng to 3 .mu.g of genomic DNA. Reactions were incubated
with 2 Weiss Units of T4 DNA ligase and packaged using commercial in vitro
packaging lysates. Bacteriophage lambda packaging extracts may contain
significant amounts of EcoK restriction activity. To avoid the possibility
that mammalian sequences containing an EcoK site might be underrepresented
in the library, genomic libraries are prepared using in vitro packaging
extracts which lack EcoK restriction activity (e.g. Gigapak-Gold;
Stratagene Cloning Systems, San Diego, Calif.).
Cosmid libraries were plated directly on LB agar containing 25 .mu.g/ml of
kanamycin sulfate and libraries screened without further amplification
[Evans et al., Methods in Enzymology 152, 604 (1987)]. Libraries were
stored as original non-amplified plate stocks in LB media with 15%
glycerol at a concentration of 2.2.times.10.sup.11 bacteria/ml at -70
degrees. The cosmid library used in the study described in the examples
consisted of 1.5.times.10.sup.7 independent clones.
Selection of Human Clones from a Somatic Cell Hybrid Genomic Library
Cosmid libraries were plated on 570 cm.sup.2 LB agar trays at a density of
10 clones/cm.sup.2, replica filters prepared and filters hybridized with
human placenta DNA labeled with .sup.32 P-dCTP to a specific activity of
10.sup.8 cpm/.mu.g. Under these hybridization conditions, no background
hybridization was detected against cosmids carrying mouse genomic DNA.
Cosmids containing human genomic DNA inserts were picked with toothpicks,
rescreened by hybridization to .sup.32 P-labeled human DNA, and archived
in 96-well microtitre plates containing LB media, 15% glycerol and 25
.mu.g/ml kanamycin sulfate at -70 degrees. Individual clones isolated from
cosmid libraries were routinely grown, replicated, and DNA prepared using
standard round-bottom 96-well microtitre plates. Replica transfer of
clones in 96-well microtitre plates and transfer from archived plates to
screening filters was carried out using an aluminum "hedgehog" made from
3-mm diameter brass rods set in plastic block, as described by Coulson et
al., Supra (p. 7822), or a laboratory robot (Beckman Biomek 1000).
Plating and Screening Libraries
For multiplex analysis, archived cosmids were inoculated on the surface of
a nitrocellulose or nylon based filter in a matrix or "grid" pattern. The
size and density of the "grid" was determined by the pattern of wells in a
standard 96-well microtitre plate and, in the experiments described in the
examples, a 36.times.36 matrix was used. Before applying bacterial
culture, a matrix pattern prepared on paper was transferred directly to
the filter membrane by passing the filter through a copying machine
followed by autoclaving. The clones were allowed to grow on the surface of
the filter at 37 degrees for 12 to 15 hours and bacterial DNA was fixed to
the filter using a standard colony lysis procedure [Vogeli et al., Methods
in Enzymology 152, 407 (1987)].
RNA Probe Synthesis and Hybridization Reactions
Cosmids were transferred from archives to fresh 96-well plates containing
liquid LB media with 25 .mu.g/ml kanamycin sulfate and incubated at 37
degrees in a humidified atmosphere for 6 to 10 hours. Supernatants from
individual wells were pooled and DNA prepared using a previously described
cosmid miniprep procedure [Evans et al., Methods in Enzymology, Supra].
Cosmids constructed with vector sCOS-1, or one of its derivatives, yield
up to 2 .mu.g of DNA from a 300 .mu.l culture and all probe synthesis and
mapping reactions were carried out with DNA prepared from minilysates. In
some cases, the pooled DNA was digested with a restriction endonuclease
such as BamHI or HindIII prior to probe synthesis. RNA probes were
synthesized as in patent applications U.S. Ser. Nos. 039,509 and 181,836
described, using bacteriophage T3 or T7 polymerase (Stratagene Cloning
Systems, San Diego, Calif., USA). Briefly, cosmid DNA was prepared and 1-2
.mu.g of the DNA was transcribed with T7 or T3 RNA polymerase in a 20
.mu.l reaction, as described by Melton, et al. (1984) Nucleic Acids Res.
12: 7035-7054, using 50 .mu.Ci of [.alpha.-.sup.32 P] UTP and 12 .mu.M
unlabeled UTP. .sup.32 P-UTP and polymerase reactions were terminated by
extraction with phenol and chloroform. 100 .mu.l of blocking mixture (a
mixture of sonicated human placenta DNA and cloned human repetitive
sequences at a concentration of 1 mg/ml) was added, and the probe mixture
was precipitated with ethanol. The nucleic acid was then resuspended in 20
.mu.l of 5X SSPE, 0.1% SDS, and prehybridized for 5 minutes at 42 degrees
to saturate repetitive sequences which might be present in the probe. The
probe was then added to a plastic bag containing a replica of the matrix
filter and hybridization buffer [5X SSPE, 50% formamide, 0.2% SDS,
1.times. Denhardt's solution (D. Denhardt, Biochem. Biophys. Res. Commun.
23, 641 (1966)), and 20 .mu.g/ml salmon sperm DNA] and the hybridization
reaction carried out for 12 to 18 hours. Filters were washed in 0.1.times.
SSPE, 0.1% SDS, at 65 degrees and exposed to X-ray film for 2 to 8 hours.
Restriction Enzyme Analysis
Restriction enzyme analysis of isolated cosmids was carried out using DNA
isolated from minilysates. Cosmid DNA was prepared from minilysates as
follows:
DNA was isolated from 1.5 ml cultures. A culture was inoculated with a
single bacterial colony and incubated with vigorous shaking at 37 degrees
for 6 hours. DNA was prepared using a modified boiling procedure [Evans et
al., Methods in Enzymology 152, 604 (1987)]. Cells were collected by a
brief (1 min.) centrifugation in a microcentrifuge and cells were
resuspended in 300 .mu.l of STET buffer. 20 .mu.l of freshly prepared
lysozyme (10 mg/ml) in STET buffer were added, the mixture vortexed and
incubated in a boiling water bath for one minute. The solution was
immediately centrifuged for 10 minutes in a microcentrifuge and the
gelatinous pellet removed with a toothpick and discarded. 325 .mu.l of
isopropanol were added and the mixture incubated at room temperature for 5
minutes. The precipitated DNA was collected by centrifugation at room
temperature in a microcentrifuge, the pellet dried and resuspended in
water.
DNA was digested to completion with NotI, digested partially with one or
more enzymes (typically BamHI, EcoHI, HindIII, SacII, PvuII, and KpnI),
separated on an agarose gel, transferred to a nitrocellulose filter and
hybridized with .sup.32 P-labeled oligonucleotides recognizing the T3 or
T7 bacteriophage promoters. T3 and T7 oligonucleotides (commercially
available as sequencing primers, Stratagene Cloning Systems, San Diego,
Calif., USA) were labeled using polynucleotide kinase and .gamma.-.sup.32
P ATP to a specific activity of 2.times.10.sup.8 cpm/.mu.g. The labeled
oligonucleotides were then hybridized to the filters in 6.times. SSC, 10%
Denhardt's solution for 12 hours at 42 degrees and washed in 2.times. SSC
for 10 minutes at 50 degrees. Filters were exposed to X-ray film for 20
minutes to 12 hours. The pattern of bands appearing on the autoradiograph
could then be interpreted as indicating the distance from the cloning site
to the restriction site, much as with the "cos"-mapping procedure of
Rackwitz et al., Gene 30, 195 (1984).
Alternatively, programmed automatic restriction enzyme digestions were
carried out to completion in 96-well microtitre plates using a laboratory
robot (Beckman Biomek 1000).
Data Analysis
The resulting hybridization data were manually entered into a computer file
and analyzed using two computer programs written by G. A. Evans in Turbo
Pascal (Borland International) running on Apple Macintosh II or Macintosh
SE computers. One program "Multiplex-mapper" compared data sets from
hybridization reactions using mixed probes, determined those clones which
were identified by more than one probe mixture, and produced a list of
linked clones. A second program, "Contig-maker" assembled the list of
overlapping clones into potential contigs which could be analyzed in
greater detail. In some cases, orientation and overlap of individual
cosmid clones in a contig were confirmed by detailed restriction mapping
and hybridization analysis of the individual cosmid clones.
Although data analysis was performed using the above-mentioned computer
programs, a person of ordinary skill in the art should have no difficulty
in carrying out the comparison of data and assembling the overlapping
clones into contigs using other software. Moreover, manual data comparison
and contig making are also possible, though more laborious.
LB media 10 g Bacto-tryptone, 5 g yeast extract, 5 g NaCl per lit. of
water. Autoclave.
LB agar LB media containing 1.2% Bacto-agar. Autoclave.
STET buffer 50 mM TRIS.HCl, pH 8.0, 8% sucrose, 5% Triton X-100 and 50 mM
EDTA
Denhardt's solution 0.2% Ficoll, 0.2% polyvinyl pyrrolidone, 0.2% bovine
serum albumin
Abbreviations
SDS: sodium dodecyl sulfate
SSPE: saline sodium phosphate EDTA
SSC: saline sodium citrate
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention relates to a new approach for "bottom-up" genomic
mapping using cosmid clones. It has been found that significant
improvements in the speed and efficiency of "bottom-up" genomic mapping
can be achieved, by 1) isolating restricted regions of large mammalian
genomes in a "sublibrary" preorganized on a solid matrix, 2) using
hybridization of end-specific probes for detection of overlapping clones
in the collection, rather than "fingerprinting" followed by pattern
recognition, and 3) analyzing multiple clones simultaneously for the
detection of all overlaps in the collection.
According to the invention, essentially the strategy illustrated in FIG. 4
is used for genomic mapping using cosmid vectors.
In a first step, a genomic library which represents a limited portion of a | | |