|
Description  |
|
|
FIELD OF THE INVENTION
The invention relates generally to methods for sequencing polynucleotides,
and more particularly, to a method of sorting and sequencing many
polynucleotides simultaneously.
BACKGROUND
The desire to decode the human genome and to understand the genetic basis
of disease and a host of other physiological states associated
differential gene expression has been a key driving force in the
development of improved methods for analyzing and sequencing DNA, Adams et
al, Editors, Automated DNA Sequencing and Analysis (Academic Press, New
York, 1994). Current genome sequencing projects use Sanger-based
sequencing technologies, which enable the sequencing and assembly of a
genome of 1.8 million bases with about 24 man-months of effort, e.g.
Fleischmann et al, Science, 269: 496-512 (1995). Such a genome is about
0.005 the size of the human genome, which is estimated to contain about
10.sup.5 genes, 15% of which--or about 3 megabases--are active in any
given tissue. The large numbers of expressed genes make it difficult to
track changes in expression patterns by sequence analysis. More commonly,
expression patterns are initially analyzed by lower resolution techniques,
such as differential display, indexing, substraction hybridization, or one
of the numerous DNA fingerprinting techniques, e.g. Liang et al, Science,
257: 967-971 (1992); Erlander et al, International patent application
PCT/US94/13041; McClelland et al, U.S. Pat. No. 5,437,975; Unrau et al,
Gene, 145: 163-169 (1994); and the like. Sequence analysis is then
frequently carried out on subsets of cDNA clones identified by application
of such techniques, e.g. Linskens et al, Nucleic Acids Research, 23:
3244-3251 (1995). Such subsequent analysis is invariably carried out using
conventional Sanger sequencing of randomly selected clones from a subset;
thus, the scale of the analysis is limited by the Sanger sequencing
technique.
Recently, two techniques have been reported that attempt to provide direct
sequence information for analyzing patterns of gene expression, Schena et
al, Science, 270: 467-469 (1995) (hybridizing mRNA to 45 expressed
sequence tags attached to a glass slide) and Velculescu et al, Science,
270: 484-486 (1995) (excision and concatination of short tags adjacent to
type IIs restriction sites in sequences from a cDNA library, followed by
Sanger sequencing of the concatinated tags). However, implementation of
these techniques has only involved relative few sequences (45 and 30,
respectively) so it is not clear whether they have the capability to track
a more meaningful sample of expressed genes, e.g. Kollner et al, Genomics,
23: 185-191 (1994). Without substantially larger sample sizes, the
techniques will not be able to track changes in the transcript levels of
low-expression genes.
It is clear from the above that there is a crucial need both for higher
throughput sequencing techniques that can reduce the time and effort
required to analyze genome-sized DNAs and that can be applied to the
analysis of large samples of sequences from complex mixtures of
polynucleotides, such as cDNA libraries. The availability of such
techniques would find immediate application in medical and scientific
research, drug discovery, diagnosis, forensic analysis, food science,
genetic identification, veterinary science, and a host of other fields.
SUMMARY OF THE INVENTION
An object of my invention is to provide a new method and approach for
determining the sequence of polynucleotides.
Another object of my invention is to provide a method for rapidly analyzing
patterns of gene expression in normal and diseased tissues and cells.
A further object of my invention is to provide a method, kits, and
apparatus for simultaneously analyzing and/or sequencing a population of
many thousands of different polynucleotides, such as a sample of
polynucleotides from a cDNA library or a sample of fragments from a
segment of genomic DNA.
Still another object of my invention is to provide a method, kits, and
apparatus for identifying populations of polynucleotides.
Another object of my invention is to provide a method for sequencing
segments of DNA in a size range corresponding to typical cosmid or YAC
inserts.
My invention achieves these and other objectives by providing each
polynucleotide of a population with an oligonucleotide tag for transfering
sequence information to a tag complement on spatially addressable array of
such complements. That is, a unique tag is attached to each polynucleotide
of a population which can be copied and used to shuttle sequence
information to its complement at a fixed position on an array of such
complements. After a tag hybridizes with its complement, a signal is
generated that is indicative of the transferred sequence information.
Sequences of the tagged polynucleotides are determined by repeated cycles
of information transfer and signal detection at the positions of the
corresponding tag complements.
At least two major advantages are gained by using tags to shuttle
information to discrete spatial locations rather than sorting an entire
population of target polynucleotides to such locations: First, tags are
much smaller molecular entities so that the kinetics of diffusion and
hybridization are much more favorable. Second, tag loading at the
spatially discrete locations only need be sufficient for detection, while
target polynucleotide loading would need to be sufficient for both
biochemical processing and detection; thus, far less tag needs to be
loaded on the spatially discrete sites.
An important aspect of my invention is the attachment of an oligonucleotide
tag to each polynucleotide of a population such that substantially all
different polynucleotides have different tags. As explained more fully
below, this is achieved by taking a sample of a full ensemble of
tag-polynucleotide conjugates wherein each tag has an equal probability of
being attached to any polynucleotide. The sampling step ensures that the
tag-polynucleotide conjugate population will fulfill the above-stated
condition that the tag of any polynucleotide of such population be
substantially unique.
Oligonucleotide tags employed in the invention are capable of hybridizing
to complementary oligomeric compounds consisting of subunits having
enhanced binding strength and specificity as compared to natural
oligonucleotides. Such complementary oligomeric compounds are referred to
herein as "tag complements." Subunits of tag complements may consist of
monomers of non-natural nucleotide analogs or they may comprise oligomers
having lengths in the range of 3 to 6 nucleotides or analogs thereof, the
oligomers being selected from a minimally cross-hybridizing set. In such a
set, a duplex made up of an oligomer of the set and the complement of any
other oligomer of the set contains at least two mismatches. In other
words, an oligomer of a minimally cross-hybridizing set at best forms a
duplex having at least two mismatches with the complement of any other
oligomer of the same set. The number of oligonucleotide tags available in
a particular embodiment depends on the number of subunits per tag and on
the length of the subunit, when the subunit is an oligomer from a
minimally cross-hybridizing set. In the latter case, the number is
generally much less than the number of all possible sequences the length
of the tag, which for a tag n nucleotides long would be 4.sup.n. Preferred
monomers for complements include peptide nucleic acid monomers and
nucleoside phosphoramidates having a 3'-NHP(O)(O-)O-5' linkage with its
adjacent nucleoside. The latter compounds are referred to herein as
N3'.fwdarw.P5' phosphoramidates. Preferably, both the oligonucleotide tags
and their tag complements comprise a plurality of subunits selected from a
minimally cross-hybridizing set consisting of natural oligonucleotides of
3 to 6 nucleotides in length.
Generally, the method of my invention is carried out by the following
steps: (a) attaching an oligonucleotide tag from a repertoire of tags to
each polynucleotide of a population to form tag-polynucleotide conjugates
such that substantially all different polynucleotides have different
oligonucleotide tags attached; (b) labeling each tag according to the
identity of one or more terminal nucleotides of its associated
polynucleotide; (c) cleaving the tags from the tag-polynucleotide
conjugates; and (d) sorting the labeled tags onto a spatially addressable
array of tag complements for detection. Preferably, the identity of the
one or more terminal nucleotides is determined by selectively amplifying
correct sequence primers in a polymerase chain reaction (PCR) employing
primers whose 3' terminal sequences are complementary to every possible
sequence of the one or more terminal nucleotides whose identity is sought.
Thus, when the identity of a single terminal nucleotide is sought, four
separate polymerase chain reactions may be carried out with one primer
identical in each of the four reactions, but with each of the other four
primers having a 3' terminal nucleotide that is either A, C, G, or T. As
used herein, this terminal nucleotide is referred to as a defined 3'
terminal nucleotide. The 3' terminal nucleotide is positioned so that it
must be complementary to the terminal nucleotide of the target
polynucleotide for amplification to occur. Thus, the identity of the
primer in a successful amplification gives the identity of the terminal
nucleotide of the target sequence. This information is then extracted in
parallel from the population of target polynucleotides by detaching the
amplified tags and sorting them onto their tag complements on a spatially
addressable array. By repeating this process for successive nucleotides
the sequences of a population of target polynucleotides are determined in
parallel.
My invention provides a readily automated system for obtaining sequence
information from large numbers of target polynucleotides at the same time.
My invention is particularly useful in operations requiring the generation
of massive amounts of sequence information, such as large-scale sequencing
of genomic DNA fragments, mRNA and/or cDNA fingerprinting, and highly
resolved measurements of gene expression patterns.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a flow chart illustrating a general algorithm for generating
minimally cross-hybridizing sets.
FIG. 2 illustrates the use of S and T primers in one embodiment of the
invention.
FIG. 3 diagrammatically illustrates an apparatus for detecting labeled tags
on a spatially addressable array of tag complements.
DEFINITIONS
"Complement" or "tag complement" as used herein in reference to
oligonucleotide tags refers to an oligonucleotide to which a
oligonucleotide tag specifically hybridizes to form a perfectly matched
duplex or triplex. In embodiments where specific hybridization results in
a triplex, the oligonucleotide tag may be selected to be either double
stranded or single stranded. Thus, where triplexes are formed, the term
"complement" is meant to encompass either a double stranded complement of
a single stranded oligonucleotide tag or a single stranded complement of a
double stranded oligonucleotide tag.
The term "oligonucleotide" as used herein includes linear oligomers of
natural or modified monomers or linkages, including deoxyribonucleosides,
ribonucleosides, -anomeric forms thereof, peptide nucleic acids (PNAs),
and the like, capable of specifically binding to a target polynucleotide
by way of a regular pattern of monomer-to-monomer interactions, such as
Watson-Crick type of base pairing, base stacking, Hoogsteen or reverse
Hoogsteen types of base pairing, or the like. Usually monomers are linked
by phosphodiester bonds or analogs thereof to form oligonucleotides
ranging in size from a few monomeric units, e.g. 3-4, to several tens of
monomeric units. Whenever an oligonucleotide is represented by a sequence
of letters, such as "ATGCCTG," it will be understood that the nucleotides
are in 5'.fwdarw.3' order from left to right and that "A" denotes
deoxyadenosine, "C" denotes deoxycytidine, "G" denotes deoxyguanosine, and
"T" denotes thymidine, unless otherwise noted. Analogs of phosphodiester
linkages include phosphorothioate, phosphorodithioate, phosphoranilidate,
phosphoramidate, and the like. It is clear to those skilled in the art
when oligonucleotides having natural or non-natural nucleotides may be
employed, e.g. where processing by enzymes is called for, usually
oligonucleotides consisting of natural nucleotides are required.
"Perfectly matched" in reference to a duplex means that the poly- or
oligonucleotide strands making up the duplex form a double stranded
structure with one other such that every nucleotide in each strand
undergoes Watson-Crick basepairing with a nucleotide in the other strand.
The term also comprehends the pairing of nucleoside analogs, such as
deoxyinosine, nucleosides with 2-aminopurine bases, and the like, that may
be employed. In reference to a triplex, the term means that the triplex
consists of a perfectly matched duplex and a third strand in which every
nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a
duplex between a tag and an oligonucleotide means that a pair or triplet
of nucleotides in the duplex or triplex fails to undergo Watson-Crick
and/or Hoogsteen and/or reverse Hoogsteen bonding.
As used herein, "nucleoside" includes the natural nucleosides, including
2'-deoxy and 2'-hydroxyl forms, e.g. as described in Kornberg and Baker,
DNA Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in
reference to nucleosides includes synthetic nucleosides having modified
base moieties and/or modified sugar moieties, e.g. described by Scheit,
Nucleotide Analogs (John Wiley, New York, 1980); Uhlman and Peyman,
Chemical Reviews, 90: 543-584 (1990), or the like, with the only proviso
that they are capable of specific hybridization. Such analogs include
synthetic nucleosides designed to enhance binding properties, reduce
degeneracy, increase specificity, and the like.
DETAILED DESCRIPTION OF THE INVENTION
The invention provides a method of sequencing large numbers of
polynucleotides in parallel by using oligonucleotide tags to shuttle
sequence information obtained in "bulk" or solution phase biochemical
processes to discrete spatially addressable sites on a solid phase.
Signals generated at the spatially addressable sites convey the sequence
information carried by the oligonucleotide tag. As explained more fully
below, sequencing is preferably carried out by alternating cycles of
identifying nucleotides and shortening the target polynucleotides. In the
shortening cycles, a predetermined number of previously identified
nucleotides are cleaved from the target polynucleotides and the shortened
polynucleotides are employed in the next cycle of nucleotide
identification.
In one aspect, the oligonucleotide tags of the invention comprise a
plurality of "words" or subunits selected from minimally cross-hybridizing
sets of subunits. Subunits of such sets cannot form a duplex or triplex
with the complement of another subunit of the same set with less than two
mismatched nucleotides. Thus, the sequences of any two oligonucleotide
tags of a repertoire that form duplexes will never be "closer" than
differing by two nucleotides. In particular embodiments, sequences of any
two oligonucleotide tags of a repertoire can be even "further" apart, e.g.
by designing a minimally cross-hybridizing set such that subunits cannot
form a duplex with the complement of another subunit of the same set with
less than three mismatched nucleotides, and so on. Usually,
oligonucleotide tags of the invention and their complements are oligomers
of the natural nucleotides so that they may be conveniently processed by
enzymes, such as ligases, polymerases, nucleases, terminal transferases,
and the like.
In another aspect of the invention, tag complements consist of non-natural
nucleotide monomers which encompass a range of compounds typically
developed for antisense therapeutics that have enhanced binding strength
and enhanced specificity for polynucleotide targets. As mentioned above
under the definition of "oligonucleotide," the compounds may include a
variety of different modifications of the natural nucleotides, e.g.
modification of base moieties, sugar moieties, and/or monomer-to-monomer
linkages. Such compounds also include oligonucleotide loops,
oligonucleotide "clamps," and like structures that promote enhanced
binding and specificity.
Constructing Oligonucleotide Tags from Minimally Cross-Hybridizing Sets of
Subunits
The nucleotide sequences of the subunits for any minimally
cross-hybridizing set are conveniently enumerated by simple computer
programs following the general algorithm illustrated in FIG. 1, and as
exemplified by program minhx whose source code is listed in Appendix I.
Minhx computes all minimally cross-hybridizing sets having subunits
composed of three kinds of nucleotides and having length of four.
The algorithm of FIG. 1 is implemented by first defining the characteristic
of the subunits of the minimally cross-hybridizing set, i.e. length,
number of base differences between members, and composition, e.g. do they
consist of two, three, or four kinds of bases. A table M.sub.n, n=1, is
generated (100) that consists of all possible sequences of a given length
and composition. An initial subunit S.sub.1 is selected and compared (120)
with successive subunits S.sub.i for i=n+1 to the end of the table.
Whenever a successive subunit has the required number of mismatches to be
a member of the minimally cross-hybridizing set, it is saved in a new
table M.sub.n+1 (125), that also contains subunits previously selected in
prior passes through step 120. For example, in the first set of
comparisons, M.sub.2 will contain S.sub.1 ; in the second set of
comparisons, M.sub.3 will contain S.sub.1 and S.sub.2 ; in the third set
of comparisons, M.sub.4 will contain S.sub.1, S.sub.2, and S.sub.3 ; and
so on. Similarly, comparisons in table M.sub.j will be between S.sub.j and
all successive subunits in M.sub.j. Note that each successive table
M.sub.n+1 is smaller than its predecessors as subunits are eliminated in
successive passes through step 130. After every subunit of table M.sub.n
has been compared (140) the old table is replaced by the new table
M.sub.n+1, and the next round of comparisons are begun. The process stops
(160) when a table M.sub.n is reached that contains no successive subunits
to compare to the selected subunit S.sub.i, i.e. M.sub.n =M.sub.n+1.
As mentioned above, preferred minimally cross-hybridizing sets comprise
subunits that make approximately equivalent contributions to duplex
stability as every other subunit in the set. Guidance for selecting such
sets is provided by published techniques for selecting optimal PCR primers
and calculating duplex stabilities, e.g. Rychlik et al, Nucleic Acids
Research, 17: 8543-8551 (1989) and 18: 6409-6412 (1990); Breslauer et al,
Proc. Natl. Acad. Sci., 83: 3746-3750 (1986); Wetmur, Crit. Rev. Biochem.
Mol. Biol., 26: 227-259 (1991); and the like. For shorter tags, e.g. about
30 nucleotides or less, the algorithm described by Rychlik and Wetmur is
preferred, and for longer tags, e.g. about 30-35 nucleotides or greater,
an algorithm disclosed by Suggs et al, pages 683-693 in Brown, editor,
ICN-UCLA Symp. Dev. Biol., Vol. 23 (Academic Press, New York, 1981) may be
conveniently employed.
A preferred embodiment of minimally cross-hybridizing sets are those whose
subunits are made up of three of the four natural nucleotides. As will be
discussed more fully below, the absence of one type of nucleotide in the
oligonucleotide tags permits target polynucleotides to be loaded onto
solid phase supports by use of the 5'.fwdarw.3' exonuclease activity of a
DNA polymerase. The following is an exemplary minimally cross-hybridizing
set of subunits each comprising four nucleotides selected from the group
consisting of A, G, and T:
TABLE I
______________________________________
Word: w.sub.1 w.sub.2 w.sub.3
w.sub.4
Sequence: GATT TGAT TAGA TTTG
Word: w.sub.5 w.sub.6 w.sub.7
w.sub.8
Sequence: GTAA AGTA ATGT AAAG
______________________________________
In this set, each member would form a duplex having three mismatched bases
with the complement of every other member.
Further exemplary minimally cross-hybridizing sets are listed below in
Table I. Clearly, additional sets can be generated by substituting
different groups of nucleotides, or by using subsets of known minimally
cross-hybridizing sets.
TABLE II
______________________________________
Exemplary Minimally Cross-Hybridizing Sets of 4-mer Subunits
______________________________________
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6
______________________________________
CATT ACCC AAAC AAAG AACA AACG
CTAA AGGG ACCA ACCA ACAC ACAA
TCAT CACG AGGG AGGC AGGG AGGC
ACTA CCGA CACG CACC CAAG CAAC
TACA CGAC CCGC CCGG CCGC CCGG
TTTC GAGC CGAA CGAA CGCA CGCA
ATCT GCAG GAGA GAGA GAGA GAGA
AAAC GGCA GCAG GCAC GCCG GCCC
AAAA GGCC GGCG GGAC GGAG
______________________________________
Set 7 Set 8 Set 9 Set 10 Set 11
Set 12
______________________________________
AAGA AAGC AAGG ACAG ACCG ACGA
ACAC ACAA ACAA AACA AAAA AAAC
AGCG AGCG AGCC AGGC AGGC AGCG
CAAG CAAG CAAC CAAC CACC CACA
CCCA CCCC CCCG CCGA CCGA CCAG
CGGC CGGA CGGA CGCG CGAG CGGC
GACC GACA GACA GAGG GAGG GAGG
GCGG GCGG GCGC GCCC GCAC GCCC
GGAA GGAC GGAG GGAA GGCA GGAA
______________________________________
The oligonucleotide tags of the invention and their complements are
conveniently synthesized on an automated DNA synthesizer, e.g. an Applied
Biosystems, Inc. (Foster City, Calif.) model 392 or 394 DNA/RNA
Synthesizer, using standard chemistries, such as phosphoramidite
chemistry, e.g. disclosed in the following references: Beaucage and Iyer,
Tetrahedron, 48: 2223-2311 (1992); Molko et al, U.S. Pat. No. 4,980,460;
Koster et al, U.S. Pat. No. 4,725,677; Caruthers et al, U.S. Pat. Nos.
4,415,732; 4,458,066; and 4,973,679; and the like. Alternative
chemistries, e.g. resulting in non-natural backbone groups, such as
phosphorothioate, phosphoramidate, and the like, may also be employed
provided that the resulting oligonucleotides are capable of specific
hybridization. As mentioned above, N3'.fwdarw.P5' oligonucleotide
phosphoramidates are preferred materials for tag complements in some
embodiments. Synthesis of such compounds is described in Chen et al,
Nucleic Acids Research, 23: 2661-2668 (1995). In some embodiments, tags
may comprise naturally occurring nucleotides that permit processing or
manipulation by enzymes, while the corresponding tag complements may
comprise non-natural nucleotide analogs, such as peptide nucleic acids, or
like compounds, that promote the formation of more stable duplexes during
sorting.
When microparticles are used as supports, repertoires of oligonucleotide
tags and tag complements are preferably generated by subunit-wise
synthesis via "split and mix" techniques, e.g. as disclosed in Shortle et
al, International patent application PCT/US93/03418. Briefly, the basic
unit of the synthesis is a subunit of the oligonucleotide tag. Preferably,
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides
are prepared for each subunit in a minimally cross-hybridizing set, e.g.
for the set first listed above, there would be eight 4-mer
3'-phosphoramidites. Synthesis proceeds as disclosed by Shortle et al or
in direct analogy with the techniques employed to generate diverse
oligonucleotide libraries using nucleosidic monomers, e.g. as disclosed in
Telenius et al, Genomics, 13: 718-725 (1992); Welsh et al, Nucleic Acids
Research, 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research,
21: 1321-1322 (1993); Hartley, European patent application 90304496.4; Lam
et al, Nature, 354: 82-84 (1991); Zuckerman et al, Int. J. Pept. Protein
Research, 40: 498-507 (1992); and the like. Generally, these techniques
simply call for the application of mixtures of the activated monomers to
the growing oligonucleotide during the coupling steps.
Double stranded forms of tags may be made by separately synthesizing the
complementary strands followed by mixing under conditions that permit
duplex formation. Alternatively, double stranded tags may be formed by
first synthesizing a single stranded repertoire linked to a known
oligonucleotide sequence that serves as a primer binding site. The second
strand is then synthesized by combining the single stranded repertoire
with a primer and extending with a polymerase. This latter approach is
described in Oliphant et al, Gene, 44: 177-183 (1986). Such duplex tags
may then be inserted into cloning vectors along with target
polynucleotides for sorting and manipulation of the target polynucleotide
in accordance with the invention.
In embodiments where specific hybridization occurs via triplex formation,
coding of tag sequences follows the same principles as for duplex-forming
tags; however, there are further constraints on the selection of subunit
sequences. Generally, third strand association via Hoogsteen type of
binding is most stable along homopyrimidine-homopurine tracks in a double
stranded target. Usually, base triplets form in T-A*T or C-G*C motifs
(where "-" indicates Watson-Crick pairing and "*" indicates Hoogsteen type
of binding); however, other motifs are also possible. For example,
Hoogsteen base pairing permits parallel and antiparallel orientations
between the third strand (the Hoogsteen strand) and the purine-rich strand
of the duplex to which the third strand binds, depending on conditions and
the composition of the strands. There is extensive guidance in the
literature for selecting appropriate sequences, orientation, conditions,
nucleoside type (e.g. whether ribose or deoxyribose nucleosides are
employed), base modifications (e.g. methylated cytosine, and the like) in
order to maximize, or otherwise regulate, triplex stability as desired in
particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci., 88:
9397-9401 (1991); Roberts et al, Science, 258: 1463-1466 (1992); Distefano
et al, Proc. Natl. Acad. Sci., 90: 1179-1183 (1993); Mergny et al,
Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. Chem. Soc., 114:
4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 2773-2776
(1992); Beal and Dervan, J. Am. Chem. Soc., 114: 4976-4982 (1992);
Giovannangeli et al, Proc. Natl. Acad. Sci., 89: 8631-8635 (1992); Moser
and Dervan, Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem.,
267: 5712-5721 (1992); Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844
(1992); Blume et al, Nucleic Acids Research, 20: 1777-1784 (1992); Thuong
and Helene, Angew. Chem. Int. Ed. Engl. 32: 666-690 (1993); and the like.
Conditions for annealing single-stranded or duplex tags to their
single-stranded or duplex complements are well known, e.g. Ji et al, Anal.
Chem. 65: 1323-1328 (1993).
When oligomeric subunits are employed, oligonucleotide tags of the
invention and their complements may range in length from 12 to 60
nucleotides or basepairs; more preferably, they range in length from 18 to
40 nucleotides or basepairs; and most preferably, they range in length
from 25 to 40 nucleotides or basepairs. When constructed from antisense
monomers, oligonucleotide tags and their complements preferably range in
length from 10 to 40 monomers; and more preferably, they range in length
from 12 to 30 monomers.
TABLE III
______________________________________
Numbers of Subunits in Tags in Preferred Embodiments
Monomers Nucleotides in Oligonucleotide Tag
in Subunit
(12-60) (18-40) (25-40)
______________________________________
3 4-20 subunits
6-13 subunits
8-13 subunits
4 3-15 subunits
4-10 subunits
6-10 subunits
5 2-12 subunits
3-8 subunits
5-8 subunits
6 2-10 subunits
3-6 subunits
4-6 subunits
______________________________________
Most preferably, oligonucleotide tags are single stranded and specific
hybridization occurs via Watson-Crick pairing with a tag complement.
After chemical synthesis libraries of tags are conveniently maintained as
PCR amplicons that include primer binding regions for amplification and
restriction endonuclease recognition sites to facilitate excision and
attachment to polynucleotides. Preferably, the composition of the primers
is selected so that the right and left primers have approximately the same
melting and annealing temperatures. In some embodiments, either one or
both of the primers and other flanking sequences of the tags consist of
three or fewer of the four natural nucleotides in order to allow the use
of a "stripping" and exchange reaction to render a construct containing a
tag single stranded in a selected region. Such reactions usually employ
the 3'.fwdarw.5' exonuclease activity of a DNA polymerase, such as T4 DNA
polymerase, or like enzyme, and are described in Sambrook et al, Molecular
Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 1989).
Solid Phase Supports for Tag Complements
Preferably, detection of sequence information takes place at spatially
discrete locations where tags hybridize to their complements. It is
important that the detection of signals from successive cycles of tag
transfer be associated with the same tag complement location throughout
the sequencing operation. Otherwise, the sequence of signals will not be a
faithful representation of the sequence of the polynucleotide
corresponding to the tag and tag complement. This requirement is met by
providing a spatially addressable array of tag complement. As used herein
"spatially addressable" means that the location of a particular tag
complement can be recorded and tracked throughout a sequencing operation.
Knowledge of the identity of a tag complement is not crucial; it is only
important that its location be identifiable from cycle to cycle of tag
transfers. Preferably, the regions containing tag complements are
discrete, i.e. non-overlapping with regions containing different tag
complements, so that signal detection is more convenient Generally,
spatially addressable arrays are constructed by attaching or synthesizing
tag complements on solid phase supports.
Solid phase supports for use with the invention may have a wide variety of
forms, including microparticles, beads, and membranes, slides, plates,
micromachined chips, and the like. Likewise, solid phase supports of the
invention may comprise a wide variety of compositions, including glass,
plastic, silicon, alkanethiolate-derivatized gold, cellulose, low
cross-linked and high cross-linked polystyrene, silica gel, polyamide, and
the like. Preferably, either a population of discrete particles are
employed such that each has a uniform coating, or population, of
complementary sequences of the same tag (and no other), or a single or a
few supports are employed with spacially discrete regions each containing
a uniform coating, or population, of complementary sequences to the same
tag (and no other). In the latter embodiment, the area of the regions may
vary according to particular applications; usually, the regions range in
area from several .mu.m, e.g. 3-5, to several hundred .mu.m, e.g. 100-500.
Tag complements may be used with the solid phase support that they are
synthesized on, or they may be separately synthesized and attached to a
solid phase support for use, e.g. as disclosed by Lund et al, Nucleic
| | |