|
Description  |
|
|
FIELD OF THE INVENTION
The present invention relates generally to methods for selecting peptide
ligands to receptor molecules of interest and, more particularly, to
methods for generating and screening large peptide libraries for peptides
with desired binding characteristics.
BACKGROUND OF THE INVENTION
As molecular biology has helped to define regions of proteins that
contribute to a particular biological activity, it has become desirable to
synthesize short peptides to mimic (or inhibit) those activities. Many of
the disadvantages encountered in therapeutic, diagnostic and industrial
settings with purified proteins, or those produced by recombinant means,
could easily be avoided by short synthetic peptides. For instance,
synthetic peptides offer advantages of specificity, convenience of sample
or bulk preparation, lower relative cost, high degree of purity, and long
shelf-life.
Despite the great promise of synthetic peptides, the technology remains, to
a large extent, a laboratory tool. Precise sequence and binding data are
not available for most proteins of significant medical, agricultural or
industrial interest. Even when the sequence of a protein is known, the
process of identifying short sequences which are responsible for or
contribute to a biological activity may be extremely tedious, if not
nearly impossible in many instances.
Thus, the ability to generate and efficiently screen very large collections
of peptides for desired binding activities would be of enormous interest.
It would enable the identification of novel agonists and antagonists for
receptors, the isolation of specific inhibitors of enzymes, provide probes
for structural and functional analyses of binding sites of many proteins,
and ligands for many other compounds employed in a wide variety of
applications.
The generation of large numbers of peptide sequences by the cloning and
expression of randomly-generated mixtures of oligonucleotides is possible
in the appropriate recombinant vectors. See, e.g., Oliphant et al., Gene
44:177-183 (1986). Such a large number of compounds can be produced,
however, that methods for efficient physical and genetic selection are
required. Without such methods the usefulness of these large peptide
libraries in providing ligands of potential interest may be lost. The
present invention provides methods for efficient screening and selection
from a large peptide library, fulfilling these and other related needs.
SUMMARY OF THE INVENTION
The present invention provides novel methods and compositions for
identifying peptides which bind to preselected receptor molecules. The
peptides find a variety of therapeutic, diagnostic and related uses, e.g.,
to bind the receptor or an analogue thereof and inhibit or promote its
activity.
In one embodiment the invention relates to methods for identifying the
peptides which bind to a preselected receptor. In certain aspects the
methods generally comprise constructing a bacteriophage expression vector
which comprises an oligonucleotide library of at least about 10.sup.6
members which encode the peptides. The library member is joined in reading
frame to the 5' region of a nucleotide sequence encoding an outer
structural protein of the bacteriophage. Appropriate host cells are
transformed with the expression vectors, generally by electroporation, and
the transformed cells cultivated under conditions suitable for expression
and assembly of bacteriophage. Using an affinity screening process,
bacteriophage library members are contacted with the preselected receptor
under conditions conducive to specific peptide-receptor binding, and
bacteriophage whose coat proteins have peptides which bind the receptor
molecule are selected. The nucleotide sequence which encodes the peptide
on the selected phage may then be determined. By repeating the affinity
selection process one or more times, the peptides of interest may be
enriched. By increasing the stringency of the selection, e.g., by reducing
the valency of the peptide-phage interaction towards substantial
monovalency, peptides of increasingly higher affinity can be identified.
In another aspect the methods are concerned with expression vectors having
the oligonucleotide library members joined in reading frame with a
nucleotide sequence to encode a fusion protein, wherein the library member
represents the 5' member of the fusion and the 3' member comprises at
least a portion of an outer structural protein of the bacteriophage. The
first residue of the peptide encoded by the library member may be at the
5'-terminus of the sequence encoding the phage coat protein. In preferred
embodiments, where phage proteins are initially expressed as preproteins
and then processed by the host cell to a mature protein, the library
members are inserted so as to leave the peptide encoded thereby at the
N-terminus of the mature phage protein after processing or a protein
substantially homologous thereto.
The invention also concerns host cells transformed with a bacteriophage
expression vector having an oligonucleotide library member, joined in
reading frame to the 5' region of a nucleotide sequence encoding an outer
structural protein of the bacteriophage, wherein the library member
encodes a peptide of at least about five to twenty-five amino acids.
Generally, the oligonucleotide library of the invention comprises a
variable codon region which encodes for the peptides of interest, and may
optionally comprise sequences coding for one or more spacer amino acid
residues, such as Gly. The variable region may be encoded by (NNK).sub.x
or (NNS).sub.x, where N is A, C, G or T, K is G or T, S is G or C, and x
is from 5 to at least about 8. In certain preferred embodiments the
variable region of the oligonucleotide library member encodes a
hexapeptide. The variable codon region may also be prepared from a
condensation of activated trinucleotides.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 depicts the construction of an oligonucleotide library. (A) The
vector fAFF1 contains two non-complementary BstXI sites separated by a 30
bp stuffer fragment. Removal of the BstXI fragment allows oriented
ligation of oligonucleotides with the appropriate cohesive ends. (B) The
oligonucleotide ON-49 was annealed to two "half-site" fragments to form
cohesive termini complementary to BstXI sites 1 and 2 in the vector. The
gapped structure, where the single-stranded region comprises the variable
hexacodon sequence and a 2 (gly) codon spacer, was ligated to the vector
and electro-transformed into E. coli.
FIG. 2 depicts the amino acid sequences (deduced from DNA sequence) of the
N-terminal hexapeptides on pIII of infectious phage randomly selected from
the library. Sequences begin at the signal peptidase site. Single letter
code for amino acids is A (Ala), C (Cys), D (Asp), E (Glu), F (Phe), G
(Gly), H (His), I (Ile), K (Lys), L (Leu), M (Met), N (Asn), P (Pro), Q
(Gln), R (Arg), S (Ser), T (Thr), V (Val), W (Trp), Y (Tyr).
FIG. 3 illustrates the composite DNA sequence of the variable region of
pools of (A) infectious phage from the library, and (B) phage recovered
from 1, 2, or 3 rounds of panning on mAB 3E7. Phage were amplified as
tetracycline resistant colonies and DNA from a pool of phage derived from
several thousand of these colonies was isolated and sequenced. The area of
the sequencing gel corresponding to the cloning site in geneIII is
displayed. A sequencing primer was annealed to the phage DNA.about.40
bases to the 3' side of the cloning site. The actual readout of the gel is
the sequence complementary to the coding strand. For clarity of codon
identification, the lanes may be read as C, T, A, G, left to right and 5'
to 3', top to bottom, to identify the sequence of the coding (+) strand.
FIG. 4 shows the amino acid sequences (deduced from DNA sequence) of the
N-terminal peptides of pIII of 52 phage isolated by three rounds of
panning on mAB 3E7.
FIG. 5 illustrates the results of phage sandwich ELISAs for YGGFL- and
YAGFAQ-phage with biotinylated monoclonal antibody 3E7 IgG (FIG. 5A) or
3E7 Fab fragments (FIG. 5B) immobilized at maximal density on streptavidin
coated wells and labeled polyclonal anti-phage antibodies to detect bound
phage.
FIG. 6 illustrates the results of phage sandwich ELISAs which compare the
effect of 3E7 Fab concentration at 5 nM (FIG. 6A) and 50 pM (FIG. 6B) and
wash times (minutes) on recoveries of YGGFL- and YAGFAQ-phage.
FIG. 7 shows 3E7 Fab dissociation from phage bearing peptides of known
affinity, YGGFL and YGFWGM.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS
Methods and compositions are provided for identifying peptides which bind
to receptor molecules of interest. The peptides are produced from
oligonucleotide libraries which encode peptides attached to a
bacteriophage structural protein. A method of affinity enrichment allows a
very large library of peptides to be screened and the phage carrying the
desired peptide(s) selected. The nucleic acid may then be isolated from
the phage and the variable region of the oligonucleotide library member
sequenced, such that the amino acid sequence of the desired peptide is
deduced therefrom. Using these methods a peptide identified as having a
binding affinity for the desired molecule may then be synthesized in bulk
by conventional means.
By identifying the peptide de novo one need not know the sequence or
structure of the receptor molecule or the sequence of its natural binding
partner. Indeed, for many "receptor" molecules a binding partner has not
yet been identified. A significant advantage of the present invention is
that no prior information regarding an expected ligand structure is
required to isolate peptide ligands of interest. The peptide identified
will thus have biological activity, which is meant to include at least
specific binding affinity for a selected receptor molecule, and in some
instances will further include the ability to block the binding of other
compounds, to stimulate or inhibit metabolic pathways, to act as a signal
or messenger, to stimulate or inhibit cellular activity, and the like.
The number of possible receptor molecules for which peptide ligands may be
identified by means of the present invention is virtually unlimited. For
example, the receptor molecule may be an antibody (or a binding portion
thereof). The antigen to which the antibody binds may be known and perhaps
even sequenced, in which case the invention may be used to map epitopes of
the antigen. If the antigen is unknown, such as with certain autoimmune
diseases, for example, sera or other fluids from patients with the disease
can be used in the present methods to identify peptides, and consequently
the antigen which elicits the autoimmune response. It is also possible
using these methods to tailor a peptide to fit a particular individual's
disease. Once a peptide has been identified it may itself serve as, or
provide the basis for, the development of a vaccine, a therapeutic agent,
a diagnostic reagent, etc.
The present invention can identify peptide ligands for a wide variety of
substances in addition to antibodies. These include, by way of example and
not limitation, growth factors, hormones, enzymes, interferons,
interleukins, intracellular and intercellular messengers, lectins,
cellular adhesion molecules and the like, as well as the ligands for the
corresponding receptors of the aforementioned molecules. It will be
recognized that peptide ligands may also be identified by the present
invention for molecules which are not peptides or proteins, e.g.,
carbohydrates, non-protein organic compounds, metals, etc. Thus, although
antibodies are widely available and conveniently manipulated, they are
merely representative of receptor molecules for which peptide ligands can
be identified by means of the present invention.
An oligonucleotide library, prepared according to the criteria as described
herein, is inserted in an appropriate vector encoding a bacteriophage
structural protein, preferably an accessible phage protein, such as a
bacteriophage coat protein. Although one skilled in the art will
appreciate that a variety of bacteriophage may be employed in the present
invention, in preferred embodiments the vector is, or is derived from, a
filamentous bacteriophage, such as, for example, f1, fd, Pf1, M13, etc. In
a more preferred embodiment the filamentous phage is fd, and contains a
selectable marker such as tetracycline (e.g., "fd-tet"). The fd-tet vector
has been extensively described in the literature. See, for example, Zacher
et al., Gene 9:127-140 (1980), Smith et al., Science 228:1315-1317 (1985)
and Parmley and Smith, Gene 73:305-318 (1988), each incorporated by
reference herein.
The phage vector is chosen to contain or is constructed to contain a
cloning site located in the 5' region of the gene encoding the
bacteriophage structural protein, so that the peptide is accessible to
receptors in an affinity selection and enrichment procedure as described
below. As the structural phage protein is preferably a coat protein, in
phage fd the preferred coat protein is pIII. Each filamentous fd phage is
known to have up to four or five copies of the pIII protein.
An appropriate vector allows oriented cloning of the oligonucleotide
sequences which encode the peptide so that the peptide is expressed at or
within a distance of about 100 amino acid residues of the N-terminus of
the mature coat protein. The coat protein is typically expressed as a
preprotein, having a leader sequence. Thus, desirably the oligonucleotide
library is inserted so that the N-terminus of the processed bacteriophage
outer protein is the first residue of the peptide, i.e., between the
3'-terminus of the sequence encoding the leader protein and the 5-terminus
of the sequence encoding the mature protein or a portion of the 5'
terminus.
The library is constructed by cloning an oligonucleotide which contains the
variable region of library members (and any spacers, framework
determinants, etc. as discussed below) into the selected cloning site.
Using known recombinant DNA techniques (see generally, Sambrook et al.,
Molecular Cloning, A Laboratory Manual, 2d ed., Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, N.Y., 1989, incorporated by
reference herein), an oligonucleotide may be constructed which, inter
alia, removes unwanted restriction sites and adds desired ones,
reconstructs the correct portions of any sequences which have been removed
(such as a correct signal peptidase site, for example), inserts the spacer
conserved or framework residues, if any, and corrects the translation
frame (if necessary) to produce active, infective phage. The central
portion of the oligonucleotide will generally contain one or more of the
variable region domain(s) and the spacer or framework residues. The
sequences are ultimately expressed as peptides (with or without spacer or
framework residues) fused to or in the N-terminus of the mature coat
protein on the outer, accessible surface of the assembled bacteriophage
particles.
The variable region domain of the oligonucleotide comprises the source of
the library. The size of the library will vary according to the number of
variable codons, and hence the size of the peptides, which are desired.
Generally the library will be at least about 10.sup.6 members, usually at
least 10.sup.7 and typically 10.sup.8 or more members. To generate the
collection of oligonucleotides which forms a series of codons encoding a
random collection of amino acids and which is ultimately cloned into the
vector, a codon motif is used, such as (NNK).sub.x, where N may be A, C,
G, or T (nominally equimolar), K is G or T (nominally equimolar), and x is
typically up to about 5, 6, 7, or 8 or more, thereby producing libraries
of penta-, hexa-, hepta-, and octa-peptides or more. The third position
may also be G or C, designated "S". Thus, NNK or NNS (i) code for all the
amino acids, (ii) code for only one stop codon, and (iii) reduce the range
of codon bias from 6:1 to 3:1. It should be understood that with longer
peptides the size of the library which is generated may become a
constraint in the cloning process and thus the larger libraries can be
sampled, as described hereinbelow. The expression of peptides from
randomly generated mixtures of oligonucleotides in appropriate recombinant
vectors is discussed in Oliphant et al., Gene 44:177-183 (1986),
incorporated herein by reference.
An exemplified codon motif (NNK).sub.6 produces 32 codons, one for each of
12 amino acids, two for each of five amino acids, three for each of three
amino acids and one (amber) stop codon. Although this motif produces a
codon distribution as equitable as available with standard methods of
oligonucleotide synthesis, it results in a bias against peptides
containing one-codon residues. For example, a complete collection of
hexacodons contains one sequence encoding each peptide made up of only
one-codon amino acids, but contains 729 (3.sup.6) sequences encoding each
peptide with only three-codon amino acids.
An alternative approach to minimize the bias against one-codon residues
involves the synthesis of 20 activated tri-nucleotides, each representing
the codon for one of the 20 genetically encoded amino acids. These are
synthesized by conventional means, removed from the support but
maintaining the base and 5-HO-protecting groups, and activated by the
addition of 3'O-phosphoramidite (and phosphate protection with beta
cyanoethyl groups) by the method used for the activation of
mononucleosides, as generally described in McBride and Caruthers,
Tetrahedron Letters 22:245 (1983), which is incorporated by reference
herein. Degenerate "oligocodons" are prepared using these trimers as
building blocks. The trimers are mixed at the desired molar ratios and
installed in the synthesizer. The ratios will usually be approximately
equimolar, but may be a controlled unequal ratio to obtain the over- to
under-representation of certain amino acids coded for by the degenerate
oligonucleotide collection. The condensation of the trimers to form the
oligocodons is done essentially as described for conventional synthesis
employing activated mononucleosides as building blocks. See generally,
Atkinson and Smith, Oligonucleotide Synthesis, M. J. Gait, ed. p35-82
(1984). Thus, this procedure generates a population of oligonucleotides
for cloning that is capable of encoding an equal distribution (or a
controlled unequal distribution) of the possible peptide sequences. This
approach may be especially useful in generating longer peptide sequences,
since the range of bias produced by the (NNK).sub.6 motif increases by
three-fold with each additional amino acid residue.
When the codon motif is (NNK).sub.x, as defined above, and when x equals 8,
there are 2.6.times.10.sup.10 possible octa-peptides. A library containing
most of the octa-peptides may be difficult to produce. Thus, a sampling of
the octa-peptides may be accomplished by constructing a subset library
using of about 0.1%, and up to as much as 1%, 5% or 10% of the possible
sequences, which subset of recombinant bacteriophage particles is then
screened. As the library size increases, smaller percentages are
acceptable. If desired, to extend the diversity of a subset library the
recovered phage subset may be subjected to mutagenesis and then subjected
to subsequent rounds of screening. This mutagenesis step may be
accomplished in two general ways: the variable region of the recovered
phage may be mutagenized, or additional variable amino acids may be added
to the regions adjoining the initial variable sequences.
A variety of techniques can be used in the present invention to diversify a
peptide library or to diversify around peptides found in early rounds of
panning to have sufficient binding activity. In one approach, the positive
phage (those identified in an early round of panning) are sequenced to
determine the identity of the active peptides. Oligonucleotides are then
synthesized based on these peptide sequences, employing a low level of all
bases incorporated at each step to produce slight variations of the
primary oligonucleotide sequences. This mixture of (slightly) degenerate
oligonucleotides is then cloned into the affinity phage as described
herein. This method produces systematic, controlled variations of the
starting peptide sequences. It requires, however, that individual positive
phage be sequenced before mutagenesis, and thus is useful for expanding
the diversity of small numbers of recovered phage.
Another technique for diversifying around the recognition kernel of the
selected phage-peptide involves the subtle misincorporation of nucleotide
changes in the peptide through the use of the polymerase chain reaction
(PCR) under low fidelity conditions. A protocol of Leung at al., Technique
1:11-15 (1989) alters the ratios of nucleotides and the addition of
manganese ions to produce a 2% mutation frequency. Yet another approach
for diversifying the selected phage involves the mutagenesis of a pool, or
subset, of recovered phage. Phage recovered from panning are pooled and
single stranded DNA is isolated. The DNA is mutagenized by treatment with,
e.g., nitrous acid, formic acid, or hydrazine. These treatments produce a
variety of damage in the DNA. The damaged DNA is then copied with reverse
transcriptase which misincorporates bases when it encounters a site of
damage. The segment containing the sequence encoding the variable peptide
is then isolated by cutting with restriction nuclease(s) specific for
sites flanking the variable region. This mutagenized segment is then
recloned into undamaged vector DNA in a manner similar to that described
herein. The DNA is transformed into cells and a secondary library is
constructed as described. The general mutagenesis method is described in
detail in Myers, et al., Nucl. Acids Res. 13:3131-3145 (1985), Myers et
al., Science 229:242-246 (1985), and Myers, Current Protocols in Molecular
Biology Vol I, 8.3.1-8.3.6, F. Ausebel, et al., eds, J. Wiley and Sons,
New York (1989), each of which are incorporated herein by reference.
In the second general approach, that of adding additional amino acids to a
peptide or peptides found to be active, a variety of methods are
available. In one, the sequences of peptides selected in early panning are
determined individually and new oligonucleotides, incorporating the
determined sequence and an adjoining degenerate sequence, are synthesized.
These are then cloned to produce a secondary library.
In another approach which adds a second variable region to a pool of
peptide-bearing phage, a restriction site is installed next to the primary
variable region. Preferably, the enzyme should cut outside of its
recognition sequence, such as BspMI which cuts leaving a four base 5'
overhang, four bases to the 3' side of the recognition site. Thus, the
recognition site may be placed four bases from the primary degenerate
region. To insert a second variable region, the pool of phage DNA is
digested and blunt-ended by filling in the overhang with Klenow fragment.
Double-stranded, blunt-ended, degenerately synthesized oligonucleotides
are then ligated into this site to produce a second variable region
juxtaposed to the primary variable region. This secondary library is then
amplified and screened as before.
While in some instances it may be appropriate to synthesize peptides having
contiguous variable regions to bind certain receptors, in other cases it
may be desirable to provide peptides having two or more regions of
diversity separated by spacer residues. For example, the variable regions
may be separated by spacers which allow the diversity domains of the
peptides to be presented to the receptor in different ways. The distance
between variable regions may be as little as one residue, sometimes five
to ten and up to about 100 residues. For probing a large binding site the
variable regions may be separated by a spacer of residues of 20 to 30
amino acids. The number of spacer residues when present will preferably be
at least two, typically at least three or more, and often will be less
than ten, more often less than eight residues.
Thus, an oligonucleotide library having variable domains separated by
spacers can be represented by the formula:
(NNK).sub.y -(abc).sub.n (NNK).sub.z
where N and K are as defined previously (note that S as defined previously
may be substituted for K), and y+z is equal to about 5, 6, 7, 8, or more,
a, b and c represent the same or different nucleotides comprising a codon
encoding spacer amino acids, n is up to about 20 to 30 amino acids or
more.
The spacer residues may be somewhat flexible, comprising oligo-glycine, for
example, to provide the diversity domains of the library with the ability
to interact with sites in a large binding site relatively unconstrained by
attachment to the phage protein. Rigid spacers, such as, e.g.,
oligo-proline, may also be inserted separately or in combination with
other spacers, including Gly. It may be desired to have the variable
domains close to one another and use a spacer to orient the variable
domain with respect to each other, such as by employing a turn between the
two sequences as might be provided by a spacer of the sequence
Gly-Pro-Gly, for example. To add stability to such a turn, it may be
desirable or necessary to add Cys residues at either or both ends of each
variable region. The Cys residues would then form disulfide bridges to
hold the variable regions together in a loop, and in this fashion may also
serve to mimic a cyclic peptide. Of course, those skilled in the art will
appreciate that various other types of covalent linkages for cyclization
may also be accomplished.
The spacer residues described above can also be situated on either or both
ends of the variable nucleotide region. For instance, a cyclic peptide may
be accomplished without an intervening spacer, by having a Cys residue on
both ends of the peptide. As above, flexible spacers, e.g., oligo-glycine,
may facilitate interaction of the peptide with the selected receptors.
Alternatively, rigid spacers may allow the peptide to be presented as if
on the end of a rigid arm, where the number of residues, e.g., Pro,
determines not only the length of the arm but also the direction for the
arm in which the peptide is oriented. Hydrophilic spacers, made up of
charged and/or uncharged hydrophilic amino acids, (e.g., Thr, His, Asn,
Gln, Arg, Glu, Asp, Met, Lys, etc.), or hydrophobic spacers of hydrophobic
amino acids (e.g., Phe, Leu, Ile, Gly, Val, Ala, etc.) may be used to
present the peptides to binding sites with a variety of local
environments.
Unless modified during or after synthesis by the translation machinery,
recombinant peptide libraries consist of sequences of the 20 normal
L-amino acids. While the available structural diversity for such a library
is large, additional diversity can be introduced by a variety of means,
such as chemical modifications of the amino acids.
For example, as one source of added diversity a peptide library of the
invention can have its carboxy terminal amidated. Carboxy terminal
amidation is necessary to the activity of many naturally occurring
bioactive peptides. This modification occurs in vivo through cleavage of
the N--C bond of a carboxy terminal Gly residue in a two-step reaction
catalyzed by the enzymes peptidylglycine alpha-amidation monooxygenase
(PAM) and hydroxyglycine aminotransferase (HGAT). See, Eipper et al., J.
Biol. Chem. 266:7827-7833 (1991); Mizuno et al., Biochem. Biophys. Res.
Comm. 137(3): 984-991 (1986); Murthy et al., J. Biol. Chem. 261(4):
1815-1822 (1986); Katopodis et al., Biochemistry 29:6115-6120 (1990); and
Young and Tamburini, J. Am. Chem. Soc. 111:1933-1934 (1989), each of which
are incorporated herein by reference.
Carboxy terminal amidation can be made to a peptide library of the
invention which has the variable region exposed at the carboxy terminus.
Amidation can be performed by treatment with enzymes, such as PAM and
HGAT, in vivo or in vitro, and under conditions conducive to maintaining
the structural integrity of the bioactive peptide. In a random peptide
library of the present invention, amidation will occur on a library
subset, i.e., those peptides having a carboxy terminal Gly. A library of
peptides designed for amidation can be constructed by introducing a Gly
codon at the end of the variable region domain of the library. After
amidation, an enriched library serves as a particularly efficient source
of ligands for receptors that preferentially bind amidated peptides.
Many of the C-terminus amidated bioactive peptides are processed from
larger pro-hormones, where the amidated peptide is flanked at its
C-terminus by the sequence -Gly-Lys-Arg-X . . . (where X is any amino
acid). In the present invention, oligonucleotides encoding the sequence
-Gly-Lys-Arg-X-Stop are placed at the 3' end of the variable
oligonucleotide region. When expressed, the Gly-Lys-Arg-X is removed by in
vivo or in vitro enzymatic treatment and the peptide library is carboxy
terminal amidated as described above.
Another means to add to the library diversity through carboxy terminal
amidation involves the use of proteins that typically have an exposed C
terminus, i.e., a protein that crosses a membrane with its carboxy
terminus exposed on the extracellular side of the membrane. In this
embodiment the variable oligonucleotide region, having a stop codon in the
last position, is inserted in the 3' end of a sequence which encodes C
terminus exposed protein, or at least a portion of the protein that is
responsible for the C-terminus out orientation. The transferrin receptor
protein is an example of one such protein. This receptor has been cloned
and sequenced, as reported in McClelland et al., Cell 39:267-274 (1984),
incorporated herein by reference. An internal transmembrane segment of the
transferrin receptor serves to orient the protein with its carboxy
terminus out. When the cDNA is expressed, typically in eucaryotic cells,
the random peptides are located extracellularly, having their amino
terminus fused to the transferrin receptor and with a free carboxy
terminus.
For carboxy terminal peptide libraries, a COS cell expression cloning
system can also be used and may be preferred in some circumstances. COS
cells are transfected with a variable nucleotide library contained in an
expression plasmid that replicates and produces mRNA extrachromosomally
when transfected into COS cells. Transfected cells bearing the random
peptides are selected on immobilized ligand or cells which bear a binding
protein, and the plasmid is isolated (rescued) from the selected cells.
The plasmid is then amplified and used to transfect COS cells for a second
round of screening. Because the random oligonucleotides are inserted
directly into the expression plasmid, much larger libraries (i.e., total
number of novel peptides) are constructed. Of course, for each round of
panning the plasmid needs to be rescued from the COS cells, transfected
into bacteria for amplification, re-isolated and transfected back into COS
cells.
Other expression systems for carboxy terminal amidation of peptides of the
invention can also be used. For example, the variable oligonucleotide
sequences are inserted into the 3' end of, e.g., the transferrin receptor
cDNA contained in a baculovirus transfer vector. Viral DNA and transfer
vector are co-transfected into insect cells (e.g., Sf9 cells) which are
used to propagate the virus in culture. When transferrin receptor is
expressed, cells harboring recombinant virus, i.e., those producing the
transferrin receptor/variable peptide fusion protein, are selected using
an anti-transferrin receptor monoclonal antibody linked to a particle such
as magnetic microspheres or other substance to facilitate separation. The
selected cells are further propagated, allowed to lyse and release the
library of recombinant extracellular budded virus into the media.
The library of recombinant virus is amplified (e.g., in Sf9 cells), and
aliquots of the library stored. Sf9 cells are then infected with the
library of recombinant virus and panned on immobilized target receptor,
where the panning is timed to occur with transferrin receptor expression.
The selected cells are allowed to grow and lyse, and the supernatant used
to infect new Sf9 cells, resulting in amplification of virus that encodes
peptides binding to the target receptor. After several rounds of panning
and amplification, single viruses are cloned by a Sf9 cell plaque assay as
described in Summers and Smith, A Manual of Methods for Baculovirus
Vectors and Insect Cell Culture Procedures, Texas Agricultural Experiment
Station Bulletin No. 1555 (1988), incorporated herein by reference. DNA in
the variable oligonucleotide insert region is then sequenced to determine
the peptides which bind to the target receptor.
An advantage of the baculovirus system for peptide library screening is
that expression of the transferrin receptor/random peptide fusion protein
is very high (>1 millions receptors per cell). A high expression level
increases the likelihood of successful panning based on stoichiometry
and/or contributes to polyvalent interactions with an immobilized target
receptor. Another advantage of the baculovirus system is that, similar to
the peptide on phage method, infectivity is exploited to amplify virus
which is selected by the panning procedure. During the series of pannings,
the DNA does not need to be isolated and used for subsequent transfections
of cells.
Other expression systems can be employed in the present invention. As
eucaryotic signal sequences are operable in yeast and bacteria, proteins
with a carboxy terminus out orientation, such as the transferrin receptor,
can be appropriately expressed and oriented in yeast or bacteria. The use
of yeast or bacteria allows large libraries and avoids potential problems
associated with amplification.
Other modifications found in naturally occurring peptides and proteins can
be introduced into the libraries to provide additional diversity and to
contribute to a desired biological activity. For example, the variable
region library can be provided with codons which code for amino acid
residues involved in phosphorylation, glycosylation, sulfation,
isoprenylation (or the addition of other lipids), etc. Modifications not
catalyzed by naturally occurring enzymes can be introduced by chemical
means (under relatively mild conditions) or through the action of, e.g.,
catalytic antibodies and the like. In most cases, an efficient strategy
for library construction involves specifying the enzyme (or chemical)
substrate recognition site within or adjacent to the variable nucleotide
region of the library so that most members of the library are modified.
The substrate recognition site added could be simply a single residue
(e.g., serine for phosphorylation) or a complex consensus sequence, as
desired.
Conformational constraints, or scaffolding, can also be introduced into the
structure of the peptide libraries. A number of motifs from known protein
and peptide structures can be adapted for this purpose. The method
involves introducing nucleotide sequences that code for conserved
structural residues into or adjacent to the variable nucleotide region so
as to contribute to the desired peptide structure. Positions nonessential
to the structure are allowed to vary.
A degenerate peptide library as described herein can incorporate the
conserved frameworks to produce and/or identify members of families of
bioactive peptides or their binding receptor elements. Several families of
bioactive peptides are related by a secondary structure that results in a
conserved "framework," which in some cases is a pair of cysteines that
flank a string of variable residues. This results in the display of the
variable residues in a loop closed by a disulfide bond, as discussed
above.
In some cases a more complex framework is shared among members of a peptide
family which contributes to the bioactivity of the peptides. An example of
this class is the conotoxins, peptide toxins of 10 to 30 amino acids
produced by venomous molluscs known as predatory cone snails. The
conotoxin peptides generally possess a high density of disulfide
cross-linking. Of those that are highly cross-linked, most belong to two
groups, mu and omega, that have conserved primary frameworks as follows:
______________________________________
mu CC.....C.....C.....CC; and
omega C.....C.....CC.....C.....C
______________________________________
The number of residues flanked by each pair of C's varies from 2 to 6 in
the peptides reported to date. The side chains of the residues which flank
the Cys residues are apparently not conserved in peptides with different
specificity, as in peptides from different species with similar or
identical specificities. Thus, the conotoxins have exploited a conserved,
densely cross-linked motif as a framework for hypervariable regions to
produce a huge array of peptides with many different pharmacological
effects.
The mu and omega classes (with 6 C's) have 15 possible combinations of
disulfide bonds. Usually only one of these conformations is the active
("correct") form. The correct folding of the peptides may be directed by a
conserved 40 residue peptide that is cleaved from the N-terminus of the
conopeptide to produce the small, mature bioactive peptides that appear in
the venom.
With 2 to 6 variable residues between each pair of C's, there are 125
(5.sup.3) possible framework arrangements for the mu class (2,2,2, to
6,6,6), and 625 (5.sup.4) possible for the omega (2,2,2,2 to 6,6,6,6).
Randomizing the identity of the residues within each framework produces
10.sup.10 to >10.sup.30 peptides. "Cono-like" peptide libraries are
constructed having a conserved disulfide framework, varied numbers of
residues in each hypervariable region, and varied identity of those
residues. Thus, a sequence for the structural framework for use in the
present invention comprises Cys-Cys-Y-Cys-Y-Cys-Cys, or
Cys-Y-Cys-Y-Cys-Cys-Y-Cys-Y-Cys, wherein Y is (NNK).sub.x or (NNS).sub.x,
and where N is A, C, G or T, K is G or T, S is G or C, and x is from 2 to
6.
Other changes can be introduced to provide residues that contribute to the
peptide structure, around which the variable amino acids are encoded by
the library members. For example, these residues can provide for
.alpha.-helices, a helix-turn-helix structure, four helix bundles, etc.,
as described.
Another exemplary scaffold structure takes advantage of metal ion binding
to conformationally constrain peptide structures. Properly spaced
invariant metal ligands (cysteines and histidines) for certain divalent
cations (e.g., zinc, cobalt, nickel, cadmium, etc.) can be specifi | | |