WikiPatents - Community Patent Review
Create Free Account  |  License or Sell Your Patent  |  WikiPatents Marketplace  |  WikiPatents Blog
Username:  Password:  
    
Advanced Search
Recombinant DNA transfer vectors    
United States Patent4363877   
Link to this pagehttp://www.wikipatents.com/4363877.html
Inventor(s)Goodman; Howard M. (San Francisco, CA); Shine; John (San Francisco, CA); Seeburg; Peter H. (San Francisco, CA)
AbstractRecombinant DNA transfer vectors containing codons for human somatomammotropin and for human growth hormone.
   














 Title Information Submit all comments and votes
 
Patent Text Patent PDF Print Page Summary File History
Plain text PDF images Print Summary File History
Inventor     Goodman; Howard M. (San Francisco, CA); Shine; John (San Francisco, CA); Seeburg; Peter H. (San Francisco, CA)
Owner/Assignee     The Regents of the University of California (Berkeley, CA)
Patent assignment
All assignments
Publication Date     December 14, 1982
Application Number     05/897,710
PAIR File History     Application Data   Transaction History
Image File Wrapper   Patent Term   Fees
Litigation
Filing Date     April 19, 1978
US Classification     435/320.1 435/69.4 435/91.41 435/849 536/23.51 930/10 930/120
Int'l Classification     C12N 001/00
Examiner     Tanenholtz; Alvin E.
Assistant Examiner    
Attorney/Law Firm     Keil & Witherspoon
Address
Parent Case     This application in a continuation-in-part of copending application, Ser. No. 836,218, filed Sept. 23, 1977 and now abandoned.
Priority Data    
USPTO Field of Search     435/172 435/317 435/820 435/68
Patent Tags     recombinant dna transfer vectors
   
Enter a comma (,) or semicolon (;) between multiple tag words/phrases.
Describe this patent:
 Amusing   
 Clever   
 Complex   
 Efficient   
 Historic   
 Important   
 Innovative   
 Interesting   
 Practical   
 Simple   
[no votes]
Patent WIKI

Share information and news about this patent, including information and news about the technology, inventors, company, ligation and licensing.

 References Submit all comments and votes
 
*references marked with an asterisk below are user-added references
 U.S. References
 
Add a new US reference:  
ReferenceRelevancyCommentsReferenceRelevancyComments
 Foreign References
 Other References
 Market Review Submit all comments and votes
   
Market Size
Estimate the gross annual revenues of the relevant market sector:
> $10B
$5B - $10B
$2B - $5B
$500M - $2B
$100M - $500M
$10M - $100M
$1M - $10M
$500K - $1M
$100K - $500K
< $100K
[No votes]
$0
 
$0   $2.5B   $5B   $7.5B   $10B
Market Share
Estimate the percentage of the relevant market sector this invention will capture:
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Reasonable Royalty
What percentage of gross sales should the inventor or assignee be paid?
75% - 100%
50% - 74.99%
25% - 49.99%
10 - 24.99%
5 - 9.99%
2 - 4.99%
1 - 1.99%
< 1%
[No votes]
0.0%
 
0%   25%   50%   75%   100%
Public's "Guesstimation" of Royalty Value
Market SizeN/A[No votes]
xMarket ShareN/A[No votes]
xReasonable RoyaltyN/A[No votes]

N/A

License Availablity
If you are NOT the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
License Availablity
If you ARE the owner or assignee, answer here:
Yes, license is available for purchase

No, license is not currently available



[No votes]
Competitive Advantage
Does this invention have a significant competitive advantage over similar technologies?
Yes

No



[No votes]
Most helpful competitive advantage comment
[No comments]

Commercial Alternatives
Are there viable commercial alternatives for this invention?
Yes

No



[No votes]
Most helpful commercial alternative comment
[No comments]

 Technical Review Submit all comments and votes
 Claims Submit all comments and votes
 


What is claimed is:

1. A recombinant DNA transfer vector comprising codons for human chorionic somatomammotropin comprising the nucleotide sequence:

5'-G GCL.sub.24 ATM.sub.25 GAK.sub.26 ACL.sub.27 TAK.sub.28 CAJ.sub.29 GAJ.sub.30 TTK.sub.31 GAJ.sub.32 GAJ.sub.33 ACL.sub.34 TAK.sub.35 ATM.sub.36 CCL.sub.37 AAJ.sub.38 GAK.sub.39 CAJ.sub.40 AAJ.sub.41 TAK.sub.42 QR.sub.43 S.sub.43 TTK.sub.44 X.sub.45 TY.sub.45 CAK.sub.46 GAK.sub.47 QR.sub.48 S.sub.48 CAJ.sub.49 ACL.sub.50 QR.sub.51 S.sub.51 TTK.sub.52 TGK.sub.53 TTK.sub.54 QR.sub.55 S.sub.55 GAK.sub.56 QR.sub.57 S.sub.57 ATM.sub.58 CCL.sub.59 ACL.sub.60 CCL.sub.61 QR.sub.62 S.sub.62 AAK.sub.63 ATGGAJ.sub.65 GAJ.sub.66 ACL.sub.67 CAJ.sub.68 CAJ.sub.69 AAJ.sub.70 QR.sub.71 S.sub.71 AAK.sub.72 X.sub.73 TY.sub.73 GAJ.sub.74 X.sub.75 TY.sub.75 X.sub.76 TY.sub.76 W.sub.77 GZ.sub.77 ATM.sub.78 QR.sub.79 S.sub.79 X.sub.80 TY.sub.80 X.sub.81 TY.sub.81 X.sub.82 TY.sub.82 ATM.sub.83 GAJ.sub.84 QR.sub.85 S.sub.85 TGGX.sub.87 TY.sub.87 GAJ.sub.88 CCL.sub.89 GTL.sub.90 W.sub.91 GZ.sub.91 TTK.sub.92 X.sub.93 TY.sub.93 W.sub.94 GZ.sub.94 QR.sub.95 S.sub.95 ATGTTK.sub.97 GCL.sub.98 AAK.sub.99 AAK.sub.100 X.sub.101 TY.sub. 101 GTL.sub.102 TAK.sub.103 GAK.sub.104 ACL.sub.105 QR.sub.106 S.sub.106 GAK.sub.107 QR.sub.108 S.sub.108 GAK.sub.109 GAK.sub.110 TAK.sub.111 CAK.sub.112 X.sub.113 TY.sub.113 X.sub.114 TY.sub.114 AAJ.sub.115 GAK.sub.116 X.sub.117 TY.sub.117 GAJ.sub.118 GAJ.sub.119 GGL.sub.120 ATM.sub.121 CAJ.sub.122 ACL.sub.123 X.sub.124 TY.sub.124 ATGGGL.sub.126 W.sub.127 GZ.sub.127 X.sub.128 TY.sub.128 GAJ.sub.129 GAK.sub.130 GGL.sub.131 QR.sub.132 S.sub.132 W.sub.133 GZ.sub.133 W.sub.134 GZ.sub.134 ACL.sub.135 GGL.sub.136 CAJ.sub.137 ATM.sub.138 X.sub.139 TY.sub.139 AAJ.sub.140 CAJ.sub.141 ACL.sub.142 TAK.sub.143 QR.sub.144 S.sub.144 AAJ.sub.145 TTK.sub.146 GAK.sub.147 ACL.sub.148 AAK.sub.149 QR.sub.150 S.sub.150 CAK.sub.151 AAK.sub.152 CAK.sub.153 GAK.sub.154 GCL.sub.155 X.sub.156 TY.sub.156 X.sub.157 TY.sub.157 AAJ.sub.158 AAK.sub.159 TAK.sub.160 GGL.sub.161 X.sub.162 TY.sub.162 X.sub.163 TY.sub.163 TAK.sub.164 TGK.sub.165 TTK.sub.166 W.sub.167 GZ.sub.167 AAJ.sub.168 GAK.sub.169 ATGGAK.sub.171 AAJ.sub.172 GTL.sub.173 GAJ.sub.174 ACL.sub.175 TTK.sub.176 X.sub.177 TY.sub.177 W.sub.178 GZ.sub.178 ATGGTL.sub.180 CAJ.sub.181 TGK.sub.182 W.sub. 183 GZ.sub.183 QR.sub.184 S.sub.184 GTL.sub.185 GAJ.sub.186 GGL.sub.187 QR.sub.188 S.sub.188 TGK.sub.189 GGL.sub.190 TTK.sub.191 TAGGTGCCCGAGTAGCATCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCC-3' wherein

A is deoxyadenyl,

G is deoxyguanyl,

C is deoxycytosyl,

T is thymidyl,

J is A or G;

K is T or C;

L is A,T,C or G;

M is A, C or T;

X is T or C, if the succeeding Y is A or G, and C if the succeeding Y is C or T;

Y is A, G, C or T, if the preceding X is C, and A or G if the preceding X is T;

W is C or A, if the succeeding Z is G or A, and C if the succeeding Z is C or T;

Z is A, G, C or T, if the preceding W is C, and A or G if the preceding W is A;

QR is TC, if the succeeding S is A, G, C or T, and AG if the succeeding S is T or C;

S is A, G, C or T, if the preceding QR is TC, and T or C if the preceding QR is AG and subscript numerals refer to the amino acid position in human growth hormone, for which the nucleotide sequence corresponds, according to the genetic code, the amino acid positions being numbered from the amino end.

2. The recombinant DNA transfer vector of claim 1 wherein

J is A in amino acid positions: 32, 33, 66, 68, 70, 119, 122 and 129,

J is G in amino acid positions: 29, 30, 38, 40, 41, 49, 65, 69, 74, 84, 88, 115, 118, 137, 140, 141, 145, 158, 168, 172, 174, 181 and 186;

K is T in amino acid positions: 31, 35, 42, 46, 72, 103, 109, 111, 146, 153 and 189;

K is C in amino acid positions: 26, 28, 39, 44, 47, 52, 53, 54, 56, 63, 92, 97, 99, 100, 104, 107, 110, 112, 116, 130, 143, 147, 149, 151, 152, 154, 159, 160, 164, 165, 166, 169, 171, 176, 182 and 191;

L is A in amino acid positions: 37, 60, 148, 155 and 175;

L is T in amino acid position: 135;

L is G in amino acid positions: 59, 67, 90, 102, 123, 126, 136, 161, 180 and 185;

L is C in amino acid positions: 24, 27, 34, 50, 61, 89, 98, 105, 120, 131, 142, 173, 187 and 190;

M is T in amino acid positions: 25 and 58;

M is C in amino acid positions: 36, 78, 83, 121 and 138;

X is C;

Y is A in amino acid positions: 73, 114 and 117;

Y is G in amino acid positions: 45, 75, 80, 81, 87, 101, 124, 128, 156, 162 and 177;

Y is C in amino acid positions: 76, 82, 93, 113, 139, 157 and 163;

W is A in amino acid positions: 94, 127 and 167;

W is C in amino acid positions: 77, 91, 133, 134, 178 and 183;

Z is G in amino acid positions: 91, 94, 127, 134 and 167;

Z is C in amino acid positions: 77, 133, 178 and 183;

QR is AG in amino acid positions: 95, 108, 132, 144 and 188;

QR is TC in amino acid positions: 43, 48, 51, 55, 57, 62, 71, 79, 85, 106, 150 and 184;

S is A in amino acid position: 55;

S is T in amino acid positions: 57, 95 and 184;

S is G in amino acid positions: 43, 85, 106 and 150; and

S is C in amino acid positions: 48, 51, 62, 71, 79, 108, 132, 144 and 188.

3. A recombinant DNA transfer vector according to claim 1 comprising in the nucleotide sequence, 5'-GTL.sub.1 CAJ.sub.2 ACL.sub.3 GTL.sub.4 CCL.sub.5 X.sub.6 TY.sub.6 QR.sub.7 S.sub.7 W.sub.8 GZ.sub.8 X.sub.9 TY.sub.9 TTK.sub.10 GAK.sub.11 CAK.sub.12 GCL.sub.13 ATGX.sub.15 TY.sub.15 CAJ.sub.16 GCL.sub.17 CAK.sub.18 W.sub.19 GZ.sub.19 GCL.sub.20 CAK.sub.21 CAJ.sub.22 X.sub.23 TY.sub.23 wherein Y.sub.23 is followed by GCL.sub.24 in the sequence of claim 15.

4. A recombinant plasmid vector comprising the nucleotide sequence coding for the growth hormone of an animal species and capable of transforming a microorganism, synthesized by a process comprising:

isolating polyadenylated RNA from pituitary cells of the animal species,

preparing double-stranded cDNA transcripts of the isolated RNA,

fractionating the cDNA according to its molecular length, in order to produce a fraction enriched for cDNA coding for the growth hormone of the animal species,

joining the cDNA coding for growth hormone covalently with a plasmid vector to produce a recombinant plasmid capable of transforming a microorganism.

5. A recombinant DNA transfer vector according to claim 2 wherein the transfer vector comprises the plasmid pMB-9.

6. A recombinant DNA transfer vector comprising codons for human growth hormone, comprising the nucleotide sequence:

5'-G GCL.sub.24 TTK.sub.25 GAK.sub.26 ACL.sub.27 TAK.sub.28 CAJ.sub.29 GAJ.sub.30 TTK.sub.31 GAJ.sub.32 GAJ.sub.33 ACL.sub.34 TAK.sub.35 ATM.sub.36 CCL.sub.37 AAJ.sub.38 GAJ.sub.39 CAJ.sub.40 AAJ.sub.41 TAK.sub.42 QR.sub.43 S.sub.43 TTK.sub.44 X.sub.45 TY.sub.45 CAJ.sub.46 AAK.sub.47 CCL.sub.48 CAJ.sub.49 ACL.sub.50 QR.sub.51 S.sub.51 X.sub.52 TY.sub.52 TGK.sub.53 TTK.sub.54 QR.sub.55 S.sub.55 GAJ.sub.56 QR.sub.57 S.sub.57 ATM.sub.58 CCL.sub.59 ACL.sub.60 CCL.sub.61 QR.sub.62 S.sub.62 AAK.sub.63 W.sub.64 GZ.sub.64 GAJ.sub.65 GAJ.sub.66 ACL.sub.67 CAJ.sub.68 CAJ.sub.69 AAJ.sub.70 QR.sub.71 S.sub.71 AAK.sub.72 X.sub.73 TY.sub.73 GAJ.sub.74 X.sub.75 TY.sub.75 X.sub.76 TY.sub.76 W.sub.77 GZ.sub.77 ATM.sub.78 QR.sub.79 S.sub.79 X.sub.80 TY.sub.80 X.sub.81 TY.sub.81 X.sub.82 TY.sub.82 ATM.sub.83 CAJ.sub.84 QR.sub.85 S.sub.85 TGGX.sub.87 TY.sub.87 GAJ.sub.88 CCL.sub.89 GTL.sub.90 CAJ.sub.91 TTK.sub.92 X.sub.93 TY.sub.93 W.sub.94 GZ.sub.94 QR.sub.95 S.sub.95 GTL.sub.96 TTK.sub.97 GCL.sub.98 AAK.sub.99 AAK.sub. 100 X.sub.101 TY.sub.101 GTL.sub.102 TAK.sub.103 GGL.sub.104 GCL.sub.105 QR.sub.106 S.sub.106 GAK.sub.107 QR.sub.108 S.sub.108 AAK.sub.109 GTL.sub.110 TAK.sub.111 GAK.sub.112 X.sub.113 TY.sub.113 X.sub.114 TY.sub.114 AAJ.sub.115 GAK.sub.116 X.sub.117 TY.sub.117 GAJ.sub.118 GAJ.sub.119 GGL.sub.120 ATM.sub.121 CAJ.sub.122 ACL.sub.123 X.sub.124 TY.sub.124 ATGGGL.sub.126 W.sub.127 GZ.sub.127 X.sub.128 TY.sub.128 GAJ.sub.129 GAK.sub.130 GGL.sub.131 QR.sub.132 S.sub.132 CCL.sub.133 W.sub.134 GZ.sub.134 ACL.sub.135 GGL.sub.136 CAJ.sub.137 ATM.sub.138 TTK.sub.139 AAJ.sub.140 CAJ.sub.141 ACL.sub.142 TAK.sub.143 QR.sub.144 S.sub.144 AAJ.sub.145 TTK.sub.146 GAK.sub.147 ACL.sub.148 AAK.sub.149 QR.sub.150 S.sub.150 CAK.sub.151 AAK.sub.152 CAK.sub.153 GAK.sub.154 GCL.sub.155 X.sub.156 TY.sub.156 X.sub.157 TY.sub.157 AAJ.sub.158 AAK.sub.159 TAK.sub.160 GGL.sub.161 X.sub.162 TY.sub.162 X.sub.163 TY.sub.163 TAK.sub.164 TGK.sub.165 TTK.sub.166 W.sub.167 GZ.sub.167 AAJ.sub.168 GAK.sub.169 ATGGAK.sub.171 AAJ.sub.172 GTL.sub.173 GAJ.sub.174 ACL.sub.175 TTK.sub.176 X.sub.177 TY.sub.177 W.sub.178 GZ.sub.178 ATM.sub.179 GTL.sub.180 CAJ.sub.181 TGK.sub.182 W.sub.183 GZ.sub.183 QR.sub.184 S.sub.184 GTL.sub.185 GAJ.sub.186 GGL.sub.187 QR.sub.188 S.sub.188 TGK.sub.189 GGL.sub.190 TTK.sub.191 TAGCTGCCCGGGTGGCATCCCTGTGACCCCTCCCCAGTGCCTCTCCTGGCC-3' wherein

A is deoxyadenyl,

G is deoxyguanyl,

C is deoxycytosyl,

T is thymidyl,

J is A or G;

K is T or C;

L is A,T,C or G;

M is A, C or T;

X is T or C, if the succeeding Y is A or G, and C if the succeeding Y is C or T;

Y is A, G, C or T, if the preceeding X is C, and A or G if the preceding X is T;

W is C or A, if the succeeding Z is G or A, and C if the succeeding Z is C or T;

Z is A, G, C or T, if the preceding W is C, and A or G if the preceding W is A;

QR is TC, if the succeeding S is A, G, C or T, and AG if the succeeding S is T or C;

S is A, G, C or T, if the preceding QR is TC, and T or C is the preceding QR is AG and subscript numerals refer to the amino acid position in human growth hormone, for which the nucleotide sequence corresponds, according to the genetic code, the amino acid positions being numbered from the amino end.

7. The recombinant DNA transfer vector of claim 6 wherein

J is A in amino acid positions: 32, 33, 39, 66, 68, 70, 119, 122 and 129,

J is G in amino acid positions: 29, 30, 38, 40, 41, 46, 49, 56, 65, 69, 74, 84, 88, 91, 115, 118, 137, 140, 141, 145, 158, 168, 172, 174, 181 and 186;

K is T in amino acid positions: 25, 31, 35, 42, 53, 111, 153 and 189;

K is C in amino acid positions: 26, 28, 44, 47, 54, 63, 72, 92, 97, 99, 100, 103, 107, 109, 112, 116, 130, 139, 143, 146, 147, 149, 151, 152, 154, 159, 160, 164, 165, 166, 169, 171, 176, 182 and 191;

L is A in amino acid positions: 37, 60, 67, 148, 155 and 175;

L is T in amino acid position: 135;

L is G in amino acid positions: 59, 90, 102, 123, 126, 136, 161, 180 and 185;

L is C in amino acid positions: 24, 27, 34, 48, 50, 61, 89, 96, 98, 104, 105, 110, 120, 131, 133, 142, 173, 187 and 190;

M is T in amino acid position: 58;

M is C in amino acid positions: 36, 78, 83, 121, 138, and 179;

X is C;

Y is A in amino acid positions: 73, 114, 117 and 156;

Y is G in amino acid positions: 45, 75, 80, 81, 87, 101, 124, 128, 162 and 177;

Y is C in amino acid positions: 52, 76, 82, 93, 113, 157 and 163;

W is A in amino acid positions: 64, 94, 127 and 167;

W is C in amino acid positions: 77, 134, 178 and 183;

Z is G in amino acid positions: 64, 94, 127, 134 and 167;

Z is C in amino acid positions: 77, 178 and 183;

QR is AG in amino acid positions: 95, 108, 132, 144 and 188;

QR is TC in amino acid positions: 43, 51, 55, 57, 62, 71, 79, 85, 106, 150 and 184;

S is A in amino acid positions: 43, 55 and 150;

S is T in amino acid positions: 57, 95, 106 and 184;

S is G in amino acid position: 85, and

S is C in amino acid positions: 51, 62, 71, 79, 108, 132, 144 and 188.

8. A transfer vector according to claim 7 comprising in addition the nucleotide sequence, 5'-TTK.sub.1 CCL.sub.2 ACL.sub.3 ATM.sub.4 CCL.sub.5 X.sub.6 TY.sub.6 QR.sub.7 S.sub.7 W.sub.8 GZ.sub.8 X.sub.9 TY.sub.9 TTK.sub.10 GAK.sub.11 AAK.sub.12 GCL.sub.13 ATGX.sub.15 TY.sub.15 W.sub.16 GZ.sub.16 GCL.sub.17 CAK.sub.18 W.sub.19 GZ.sub.19 X.sub.20 TY.sub.20 CAK.sub.21 CAJ.sub.22 X.sub.23 TY.sub.23 -3' and wherein Y.sub.23 is followed in sequence by GCL.sub.24 in the sequence of claim 6.
 Description Submit all comments and votes
 


BACKGROUND OF THE INVENTION

Proteins and peptides are synthesized in almost endless variety by living organisms. Many have proven to have medical, agricultural or industrial utility. Some proteins are enzymes, useful as specific catalysts for complex chemical reactions. Others function as hormones, which act to affect the growth or development of an organism or to affect the function of specific tissues in medically significant ways. Specific binding proteins may have commercial significance for the isolation and purification of trace substances and for the removal of contaminating substances. Both proteins and peptides are composed of linear chains of amino acids, the latter term being applied to short, single-chain sequences, the former referring to long-chain and multichain substances. The principles of the present invention apply equally to both proteins and peptides.

Proteins and peptides are generally high molecular weight substances, each having a specific sequence of amino acids. Except for the smaller peptides, chemical synthesis of peptides and proteins is frequently impractical, costly and time consuming, if not impossible. In the majority of instances, in order to make practical use of a desired protein, it must first be isolated from the organism which makes it. Frequently, the desired protein is present only in minuscule amounts. Often, the source organism cannot be obtained in quantities sufficient to provide an adequate amount of the desired protein. Consequently, many potential agricultural, industrial and medical applications for specific proteins are known, but remain undeveloped simply because an adequate supply of the desired protein or peptide does not exist.

Recently developed techniques have made it possible to employ microorganisms, capable of rapid and abundant growth, for the synthesis of commercially useful proteins and peptides, regardless of their source in nature. These techniques make it possible to genetically endow a suitable microorganism with the ability to synthesize a protein or peptide normally made by another organism. The technique makes use of a fundamental relationship which exists in all living organisms between the genetic material, usually DNA, and the proteins synthesized by the organism. This relationship is such that the amino acid sequence of the protein is reflected in the nucleotide sequence of the DNA. There are one or more trinucleotide sequence groups specifically related to each of the twenty amino acids most commonly occuring in proteins. The specific relationship between each given trinucleotide sequence and its corresponding amino acid constitutes the genetic code. The genetic code is believed to be the same or similar for all living organisms. As a consequence, the amino acid sequence of every protein or peptide is reflected by a corresponding nucleotide sequence, according to a well understood relationship. Furthermore, this sequence of nucleotides can, in principle, be translated by any living organism.

TABLE 1 ______________________________________ Genetic Code ______________________________________ Phenylalanine(Phe) TTK Histidine(HIS) CAK Leucine(Leu) XTY Glutamine(Gln) CAJ Isoleucine(Ile) ATM Asparagine(Asn) AAK Methionine(Met) ATG Lysine(Lys) AAJ Valine(Val) GTL Aspartic acid(AsP) GAK Serine(Ser) QRS Glutamic acid(Glu) GAJ Proline(Pro) CCL Cysteine(Cys) TGK Threonine(Thr) ACL Tryptophan(Tyr) TGG Alanine(Ala) GCL Arginine(Arg) WGZ Tyrosine(Tyr) TAK Glycine(Gly) GGL Termination signal TAJ Termination signal TGA ______________________________________ Key: Each 3letter triplet represents a trinucleotide of DNA having a 5' end on the left and a 3' end on the right. The letters stand for the purine or pyrimidine bases forming the nucleotide sequence. A = adenine G = guanine? C = cytosine J = A or G K = T or C L = A, T, C or G M = A, C or T T = thymine X = T or C if Y is A or G X = C if Y is C or T Y = A, G, C or T if X is C Y = A or G if X is T W = C or A if Z is C or T W = C if Z is C or T Z = A, G, C or T if W is G Z = A or G if W is A QR = TC if S is A, G, C or T QR = AG if S is T or C S = A, G, C or T if QR is TC S = T or C if QR is AG

The trinucleotides of Table 1, termed codons, are presented as DNA trinucleotides, as they exist in the genetic material of a living organism. Expression of these codons in protein synthesis requires that intermediate formation of messenger RNA (mRNA), as described more fully, infra. The mRNA codons have the same sequences as the DNA codons of Table 1, except that uracil is found in place of thymine. Complementary trinucleotide DNA sequences having opposite strand polarity are functionally equivalent to the condons of Table 1, as is understood in the art. An important and well known feature of the genetic code is its redundancy, whereby, for most of the amino acids used to make proteins, more than one coding nucleotide triplet may be employed. Therefore, a number of different nucleotide sequences may code for a given amino acid sequence. Such nucleotide sequences are considered functionally equivalent since they can result in the production of the same amino acid sequence in all organisms, although certain strains may translate some sequences more efficiently than they do others. Occasionally, a methylated variant of a purine or pyrimidine may be found in a given nucleotide sequence. Such methylations do not affect the coding relationship in any way.

In its basic outline, a method of endowing a microorganism with the ability to synthesize a new protein involves three general steps: (1) isolation and purification of the specific gene or nucleotide sequence containing the genetically coded information for the amino acid sequence of the desired protein, (2) recombination of the isolated nucleotide sequence with an appropriate transfer vector, typically the DNA of a bacteriophage or plasmid, and (3) transfer of the vector to the appropriate microorganism and selection of a strain of the recipient microorganism containing the desired genetic information.

A fundamental difficulty encountered in attempts to exploit commercially the above-described general process lies in the first step, the isolation and purification of the desired specific genetic information. DNA exists in all living cells in the form of extremely high molecular weight chains of nucleotides. A cell may contain more than 10,000 structural genes, coding for the amino acid sequences of over 10,000 specific proteins, each gene having a sequence many hundreds of nucleotides in length. For the most part, four different nucleotide bases make up all the existing sequences. These are adenine (A), guanine (G), cytosine (C), and thymine (T). The long sequences comprising the structural genes of specific proteins are consequently very similar in overall chemical composition and physical properties. The separation of one such sequence from the plethora of other sequences present in isolated DNA cannot ordinarily be accomplished by conventional physical and chemical preparative methods.

Two general methods have been used in the prior art to accomplish step (1) in the above-described general procedure. The first method is sometimes referred to as the shotgun technique. The DNA of an organism is fragmented into segments generally longer than the desired nucleotide sequence. Step (1) of the above-described process is essentially by-passed. The DNA fragments are immediately recombined with the desired vector, without prior purification of specific sequences. Optionally, a crude fractionation step may be interposed. The selection techniques of microbial genetics are relied upon to select, from among all the possibilities, a strain of microorganism containing the desired genetic information. The shotgun procedure suffers from two major disadvantages. Most importantly, the procedure can result in the transfer of hundreds of unknown genes into recipient microorganisms, so that during the experiment, new strains are created, having unknown genetic capabilities. Therefore, the use of such a procedure could create a hazard for laboratory workers and for the environment. A second disadvantage of the shotgun method is that it is extremely inefficient for the production of the desired strain, and is dependent upon the use of a selection technique having sufficient resolution to compensate for the lack of fractionation in the first step.

The second general method takes advantage of the fact that the total genetic information in a cell is seldom, if ever, expressed at any given time. In particular, the differentiated tissues of higher organisms may be synthesizing only a major proportion of the proteins which the organism is capable of making. In extreme cases, such cells may be synthesizing predominantly one protein. In such extreme cases, it has been possible to isolate the nucleotide sequence coding for the protein in question by isolating the corresponding messenger RNA from the appropriate cells.

Messenger RNA functions in the process of converting the nucleotide sequence information of DNA into the amino acid sequence structure of a protein. In the first step of this process, termed transcription, a local segment of DNA having a nucleotide sequence which specifies a protein to be made, is first copied into RNA. RNA is a polynucleotide similar to DNA except that ribose is substituted for deoxyribose and uracil is used in place of thymine. The nucleotide bases in RNA are capable of entering into the same kind of base pairing relationships that are known to exist between the complementary strands of DNA. A and U (T) are complementary, and G and C are complementary. The RNA transcript of a DNA nucleotide sequence will be complementary to the copied sequence. Such RNA is termed messenger RNA (mRNA) because of its status as intermediary between the genetic apparatus of the cell and its protein synthesizing apparatus. Generally, the only mRNA sequences present in the cell at any given time are those which correspond to proteins being actively synthesized at that time. Therefore, a differentiated cell whose function is devoted primarily to the synthesis of a single protein will contain primarily the RNA species corresponding to that protein. In those instances where it is feasible, the isolation and purification of the appropriate nucleotide sequence coding for a given protein can be accomplished by taking advantage of the specialized synthesis of such protein in differentiated cells.

A major disadvantage of the foregoing procedure is that it is applicable only in the relatively rare instances where cells can be found engaged in synthesizing primarily a single protein. The majority of proteins of commercial interest are not synthesized in such a specialized way. The desired proteins may be one of a hundred or so different proteins being produced by the cells of a tissue or organism at a given time. Nevertheless, the mRNA isolation technique is potentially useful since the set of RNA species present in the cell usually represents only a fraction of the total sequences existing in the DNA, and thus provides an initial purification. In order to take advantage of such purification, however, a method is needed whereby sequences present in low frequencies, such as a few percent, can be isolated in high purity.

The present invention provides a process whereby nucleotide sequences can be isolated and purified even when present at a frequency as low as 2% of a heterogeneous population of mRNA sequences. Furthermore, the method may be combined with known methods of fractionating mRNA to isolate and purify sequences present in even lower frequency in the total RNA population as initially isolated. The method is generally applicable to mRNA species extracted from virtually any organism and is therefore expected to provide a powerful basic tool for the ultimate production of proteins of commercial and research interest, in useful quantities.

Human growth hormone has medical utility in the treatment of defective pituitary function. Animal growth hormones have commercial utility in veterinary medicine and in agriculture, particularly in the case of animals used as food sources, where large size and rapid maturation are desirable attributes. Human chorionic somatomammotropin is of medical significance because of its role in the fetal maturation process.

The process of the present invention takes advantage of certain structural features of mRNA and DNA, and makes use of certain enzyme catalyzed reactions. The nature of these reactions and structural details as they are understood in the prior art are described herewith. The symbols and abbreviations used herein are set forth in the following table:

TABLE 2 ______________________________________ DNA -- deoxyribonucleic acid A -- Adenine RNA -- ribonucleic acid T -- Thymine cDNA -- complementary DNA G -- Guanine (enzymatically synthesized C -- Cytosine from an mRNA sequence) U -- Uracil mRNA -- messenger RNA Tris -- 2-Amino-2- dATP -- deoxyadenosine triphosphate hydroxyethyl- dGTP -- deoxyguanosine triphosphate 1-1,3-propanediol dGTP -- deoxycytidine triphosphate EDTA -- ethylene- HCS -- Human Chorionic diamine tetra- Somatomammotropin acetic acid TCA 13 Trichloroacetic acid ATP -- adenosine HGH -- Human Growth triphosphate Hormone dTTP -- thymidine triphosphate RGH -- Rat growth hormone ______________________________________

In its native configuration, DNA exists in the form of paired linear polynucleotide strands. The complementary base pairing relationships described above exist between the paired strands such that each nucleotide base of one strand exists opposite its complement on the other strand. The entire sequence of one strand is mirrored by a complementary sequence on the other strand. If the strands are separate, it is possible to synthesize a new partner strand, starting from the appropriate precursor monomers. The sequence of addition of the monomers starting from one end is determined by, and complementary to, the sequence of the original intact polynucleotide strand, which thus serves as a template for the synthesis of its complementary partner. The synthesis of mRNA corresponding to a specific nucleotide sequence of DNA is understood to follow the same basic principle. Therefore a specific mRNA molecule will have a sequence complementary to one strand of DNA and identical to the sequence of the opposite DNA strand, in the region transcribed. Enzymic mechanisms exist within living cells which permit the selective transcription of a particular DNA segment containing the nucleotide sequence for a particular protein. Consequently, isolating the mRNA which contains the nucleotide sequence coding for the amino acid sequence of a particular protein is equivalent to the isolation of the same sequence, or gene, from the DNA itself. If the mRNA is retranscribed to form DNA complementary thereto (cDNA), the exact DNA sequence is thereby reconstituted and can, by appropriate techniques, be inserted into the genetic material of another organism. The two complementary versions of a given sequence are therefore inter-convertible, and functionally equivalent to each other.

The nucleotide subunits of DNA and RNA are linked together by phosphodiester bonds between the 5' position of one nucleotide sugar and the 3' position of its next neighbor. Reiteration of such linkages produces a linear polynucleotide which has polarity in the sense that one end can be distinguished from the other. The 3' end may have a free 3'-hydroxyl, or the hydroxyl may be substituted with a phosphate or a more complex structure. The same is true of the 5' end. In eucaryotic organisms, i.e., those having a defined nucleus and mitotic apparatus, the synthesis of functional mRNA usually includes the addition of polyadenylic acid to the 3' end of the mRNA. Messenger RNA can therefore be separated from other classes of RNA isolated from an eucaryotic organism by column chromatography on cellulose to which is attached polythymidylic acid. See Aviv, H., and Leder, P., Proc.Nat. Acad.Sci. USA 69, 1408 (1972). Other chromatographic methods, exploiting the base-pairing affinity of poly A for chromatographic packing materials containing oligo dT, poly U, or combinations of poly T and poly U, for example, poly U-Sepharose, are likewise suitable.

Reverse transcriptase catalyzes the synthesis of DNA complementary to an RNA template strand in the presence of the RNA template, a primer which may be any complementary oligo or polynucleotide having a 3'-hydroxyl, and the four deoxynucleoside triphosphates, dATP, dGTP, dCTP, and dTTP. The reaction is initiated by the non-covalent association of the oligodeoxynucleotide primer near the 3' end of mRNA followed by stepwise addition of the appropriate deoxynucleotides, as determined by base-pairing relationships with the mRNA nucleotide sequence, to the 3' end of the growing chain. The product molecule may be described as a hairpin structure in which the original RNA is paired by hydrogen bonding with a complementary strand of DNA partly folded back upon itself at one end. The DNA and RNA strands are not covalently joined to each other. Reverse transcriptase is also capable of catalyzing a similar reaction using a single-stranded DNA template, in which case the resulting product is a double-stranded DNA hairpin having a loop of single-stranded DNA joining one set of ends. See Aviv, H. and Leder, P., Proc.Natl.Acad.Sci. USA 69, 1408 (1972) and Efstratiadis, A., Kafatos, F. C., Maxam, A. M., and Maniatis, T., Cell 7, 279 (1976).

Restriction endonucleases are enzymes capable of hydrolyzing phosphodiester bonds in DNA, thereby creating a break in the continuity of the DNA strand. If the DNA is in the form of a closed loop, the loop is converted to a linear structure. The principal feature of a restriction enzyme is that its hydrolytic action is exerted only at a point where a specific nucleotide sequence occurs. Such a sequence is termed the restriction site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their restriction sites. When acting on double-stranded DNA, some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single-stranded regions at each end of the cleaved molecule. Such single-stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible to cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterogeneous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. See Roberts, R. J., Crit.Rev.Biochem. 4, 123 (1976).

It has been observed that restriction sites for a given enzyme are relatively rare and are nonuniformly distributed. Whether a specific restriction site exists within a given segment is a matter which must be empirically determined. However, there is a large and growing number of restriction endonucleases, isolated from a variety of sources with varied site specificity, so that there is a reasonable probability that a given segment of a thousand nucleotides will contain one or more restriction sites.

For general background see Watson, J. D., The Molecular Biology of the Gene, 3d Ed., Benjamin, Menlo Park, Calif., (1976); Davidson, J. N., The Biochemistry of the Nucleic Acids, 8th Ed., Revised by Adams, R. L. P., Burdon, R. H., Campbell, A. M. and Smellie, R. M. S., Academic Press, New York, (1976); and Hayes, W., "The Genetics of Bacteria and Their Viruses", Studies in Basic Genetics and Molecular Biology, 2d Ed., Blackwell Scientific Pub., Oxford (1968).

SUMMARY OF INVENTION

A novel purification procedure of cDNA of desired nucleotide sequence complementary to an individual mRNA species is disclosed. The method employs restriction endonuclease cleavage of cDNA transcribed from a complex mixture of mRNA. The method does not require any extensive purification of RNA but instead makes use of transcription of RNA into cDNA, the sequence specific fragmentation of this cDNA with one or two restriction endonucleases, and the fractionation of the cDNA restriction fragments on the basis of their length. The use of restriction endonucleases eliminates size heterogeneity and produces homogeneous length DNA fragments from any cDNA species which contains at least two restriction sites. From the initially heterogeneous population of cDNA transcripts, uniform size fragments of desired sequence are produced. The fragments may be several hundred nucleotides in length and may in some instances include the entire structural gene for the desired protein. The length of the fragments depends on the number of nucleotides separating the restriction sites and will usually be different for different regions of DNA. Fractionation by length enables purification of a homogeneous population of fragments having the desired sequence. The fragments will be homogeneous in size and highly pure in terms of nucleotide sequence. Current separation and analysis methods enable the isolation of such fragments from a corresponding mRNA species representing at least 2% of the mass of the RNA transcribed. The use of prior art RNA fractionation methods to prepurify the mRNA before transcription will result in lowering the actual lower limit of detection to less than 2% of the total mRNA isolated from the organism.

Specific sequences purified by the procedure outlined above may be further purified by a second specific cleavage with a restriction endonuclease capable of cleaving the desired sequence at an internal site. This cleavage results in formation of two sub-fragments of the desired sequence, separable on the basis of their lengths. The sub-fragments are separated from uncleaved and specifically cleaved contaminating sequences having substantially the same original size. The method is founded upon the rarity and randomness of placement of restriction endonuclease recognition sites, which results in an extremely low probability that a contaminant having the same original length will be cleaved by the same enzyme to yield fragments having the same length as those yielded by the desired sequence. After separation from the contaminants, the sub-fragments of the desired sequence may be rejoined using techniques known in the art to reconstitute the original sequence. The two sub-fragments must be prevented from joining together in the reverse order of their original sequence. A method is disclosed whereby the sub-fragments can only join to each other in the proper order.

Variations of the above-recited methods may be used in combination with appropriate labelling techniques to obtain accurate, quantitative measurements of the purity of the isolated sequences. The combined techniques have been applied to produce a known nucleotide sequence with greater than 99% purity.

The cDNA isolated and purified by the described methods may be recombined with a suitable transfer vector and transferred to a suitable host microorganism. Novel plasmids have been produced, containing the nucleotide sequences coding for rat growth hormone and the major portions of human chorionic somatomammotropin and human growth hormone, respectively. Novel microorganisms have been produced having as part of their genetic makeup the genes coding for RGH, the major portion of HCS and the major portion of HGH, respectively. The disclosed techniques may be used for the isolation and purification of growth hormones from other animal species and for the construction of novel transfer vectors and microorganisms containing these genes.

DETAILED DESCRIPTION OF INVENTION

The present invention employs as starting material polyadenylated, crude or partially purified messenger RNA, which may be heterogeneous in sequence and in molecular size. The selectivity of the RNA isolation procedure is enhanced by any method which results in an enrichment of the desired mRNA in the heterodisperse population of mRNA isolated. Any such prepurification method may be employed in conjunction with the method of the present invention, provided the method does not introduce endonucleolytic cleavage of the mRNA. An important initial consideration is the selection of an appropriate source tissue for the desired mRNA. Often, this choice will be dictated by the fact that the protein ultimately to be produced is only made by a certain specialized tissue of a differentiated organism. Such is the case, for example, with the peptide hormones, such as growth hormone or HCS. In other cases, it will be found that a variety of cell types or microbial species can serve as a source of the desired mRNA. In those cases, some preliminary experimentation will be necessary in order to determine the optimal source. Frequently, it will be found that the proportion of desired mRNA can be increased by taking advantage of cellular responses to environmental stimuli. For example, treatment with a hormone may cause increased production of the desired mRNA. Other techniques include growth at a particular temperature and exposure to a specific nutrient or other chemical substance.

Prepurification to enrich for desired mRNA sequences may also be carried out using conventional methods for fractionating RNA, after its isolation from the cell. Any technique which does not result in degradation of the RNA may be employed. The techniques of preparative sedimentation in a sucrose gradient and gel electrophoresis are especially suitable.

The mRNA must be isolated from the source cells under conditions which preclude degradation of the mRNA. The action of RNase enzymes is particularly to be avoided because these enzymes are capable of hydrolytic cleavage of the RNA nucleotide sequence. The hydrolysis of one bond in the sequence results in disruption of that sequence and loss of the RNA fragment containing the original 5' end of the sequence. A suitable method for inhibiting RNase during extraction from cells is disclosed in U.S. application Ser. No. 805,023, now abandoned incorporated herein by reference, assigned to the same assignee as the instant application. The method involves the use of 4 M guanidinium thiocyanate and 1 M mercaptoethanol during the cell disruption step. In addition, a low temperature and a pH near 5.0 are helpful in further reducing RNase degradation of the isolated RNA.

Prior to application of the method of the present invention, mRNA must be prepared essentially free of contaminating protein, DNA, polysaccharides and lipids. Standard methods are well known in the art for accomplishing such purification. RNA thus isolated contains non-messenger as well as messenger RNA. A convenient method for separating the mRNA of eucaryotes is chromatography on columns of oligo-dT cellulose, or other oligonucleotide-substituted column material such a poly U-Sepharose, taking advantage of the hydrogen bonding specificity conferred by the presence of polyadenylic acid on the 3' end of eucaryotic mRNA.

The initial step in the process of the present invention is the formation of DNA complementary to the isolated heterogeneous sequences of mRNA. The enzyme of choice for this reaction is reverse transcriptase, although in principle any enzyme capable of forming a faithful complementary DNA copy of the mRNA template could be used. The reaction may be carried out under conditions described in the prior art, using mRNA as a template and a mixture of the four deoxynucleoside triphosphates dATP, dGTP, dCTP and dTTP, as precursors for the DNA strand. It is convenient to provide that one of the deoxynucleoside triphosphates be labeled with a radioisotope, for example .sup.32 P in the alpha position, in order to monitor the course of the reaction, to provide a tag for recovering the product after separation procedures such as chromatography and electrophoresis, and for the purpose of making quantitative estimates of recovery. See Efstratiadis, A., et al., supra.

The cDNA transcripts produced by the reverse transcriptase reaction are somewhat heterogeneous with respect to sequences at the 5' end and the 3' end due to variations in the initiation and termination points of individual transcripts, relative to the mRNA template. The variability at the 5' end is thought to be due to the fact that the oligo-dT primer used to initiate synthesis is capable of binding at a variety of loci along the polyadenylated region of the mRNA. Synthesis of the cDNA transcript begins at an indeterminate point in the poly-A region, and a variable length of poly-A region is transcribed depending on the initial binding site of the oligo-dT primer. It is possible to avoid this indeterminacy by the use of a primer containing, in addition to an oligo-dT tract, one or two nucleotides of the RNA sequence itself, thereby producing a primer which will have a preferred and defined binding site for initiating the transcription reaction.

The indeterminacy at the 3'-end of the cDNA transcript is due to a variety of factors affecting the reverse transcriptase reaction, and to the possibility of partial degradation of the RNA template. The isolation of specific cDNA transcripts of maximal length is greatly facilitated if conditions for the reverse transcriptase reaction are chosen which not only favor full length synthesis but also repress the synthesis of small DNA chains. Preferred reaction conditions for avian myeloblastosis virus reverse transcriptase are given in the examples section. The specific parameters which may be varied to provide maximal production of long-chain DNA transcripts of high fidelity are reaction temperature, salt concentration, amount of enzyme, concentration of primer relative to template, and reaction time.

The conditions of temperature and salt concentration are chosen so as to optimize specific base-pairing between the oligo-dT primer and the polyadenylated portion of the RNA template. Under properly chosen conditions, the primer will be able to bind at the polyadenylated region of the RNA template, but non-specific initiation due to primer binding at other locations on the template, such as short, A-rich sequences, will be substantially prevented. The effects of temperature and salt are interdependent. Higher temperatures and lower salt concentrations decrease the stability of specific base-pairing interactions. The reaction time is kept as short as possible, in order to prevent non-specific initiations and to minimize the opportunity for degradation. Reaction times are interrelated with temperature, lower temperatures requiring longer reaction times. At 42.degree. C., reactions ranging from 1 min. to 10 minutes are suitable. The primer should be present in 50 to 500-fold molar excess over the RNA template and the enzyme should be present in similar molar excess over the RNA template. The use of excess enzyme and primer enhances initiation and cDNA chain growth so that long-chain cDNA transcripts are produced efficiently within the confines of the sort incubation times.

In many cases, it will be possible to carry out the remainder of the purification process of the present invention using single-stranded cDNA sequences transcribed from mRNA. However, as discussed below, there may be instances in which the desired restriction enzyme is one which acts only on double-stranded DNA. In these cases, the cDNA prepared as described above may be used as a template for the synthesis of double-stranded DNA, using a DNA polymerase such as reverse transcriptase and a nuclease capable of hydrolyzing single-stranded DNA. Methods for preparing double-stranded DNA in this manner have been described in the prior art. See, for example, Ullrich, A., Shine, J., Chirgwin, J., Pictet, R., Tischer, E., Rutter, W. J. and Goodman, H. M., Science 196, 1313 (1977).

Heterogeneous cDNA, prepared by transcription of heterogeneous mRNA sequences, is then treated with one or two restriction endonucleases. The choice of endonuclease to be used depends in the first instance upon a prior determination that recognition sites for the enzyme exist in the sequence of the cDNA to be isolated. The method depends upon the existence of two such sites. If the sites are identical, a single enzyme will be sufficient. The desired sequence will be cleaved at both sites, eliminating size heterogeneity as far as the desired cDNA sequence is concerned, and creating a population of molecules, termed fragments, containing the desired sequence and homogeneous in length. If the restriction sites are different, two enzymes will be required in order to produce the desired homogeneous length fragments.

The choice of restriction enzyme(s) capable of producing an optimal length nucleotide sequence fragment coding for all or part of the desired protein must be made empirically. If the amino acid sequence of the desired protein is known, it is possible to compare the nucleotide sequence of uniform length nucleotide fragments produced by restriction endonuclease cleavage with the amino acid sequence for which it codes, using the known relationship of the genetic code common to all forms of life. A complete amino acid sequence for the desired protein is not necessary, however, since a reasonably accurate identification may be made on the basis of a partial sequence. Where the amino acid sequence of the desired protein is now known, the uniform length polynucleotides produced by restriction endonuclease cleavage may be used as probes capable of identifying the synthesis of the desired protein in an appropriate in vitro protein synthesizing system. Alternatively, the mRNA may be purified by affinity chromatography. Other techniques which may be suggested to those skilled in the art will be appropriate for this purpose.

The number of restriction enzymes suitable for use depends upon whether single-stranded or double-stranded cDNA is used. The preferred enzymes are those capable of acting on single-stranded DNA, which is the immediate reaction product of mRNA reverse transcription. The number of restriction enzymes now known to be capable of acting on single-stranded DNA is limited. The enzymes HaeIII, HhaI and Hin(f)I are presently known to be suitable. In addition, the enzyme MboII may act on single-stranded DNA. Where further study reveals that other restriction enzymes can act on single-stranded DNA, such other enzymes may appropriately be included in the list of preferred enzymes. Additional suitable enzymes include those specified for double-stranded cDNA. Such enzymes are not preferred since additional reactions are required in order to produce double-stranded cDNA, providing increased opportunities for the loss of longer sequences and for other losses due to incomplete recovery. The use of double-stranded c