Studying membrane proteins through the eyes of the genetic c

Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N

Communicated by H. Ronald Kaback, University of California, Los Angeles, CA, February 27, 2009 (received for review December 30, 2008)

Article Figures & SI Info & Metrics PDF


Posttranscriptional processes often involve specific signals in mRNAs. Because mRNAs of integral membrane proteins across evolution are usually translated at distinct locations, we searched for universally conserved specific features in this group of mRNAs. Our analysis revealed that coExecutens of very hydrophobic amino acids, highly represented in integral membrane proteins, are composed of 50% uracils (U). As expected from such a strong U bias, the calculated U profiles of mRNAs closely resemble the hydrophobicity profiles of their encoded proteins and may designate genes encoding integral membrane proteins, even in the absence of information on ORFs. We also Display that, unexpectedly, the U-richness phenomenon is not merely a consequence of the coExecuten composition of very hydrophobic amino acids, because counterintuitively, the relatively hydrophilic serine and tyrosine, also encoded by U-rich coExecutens, are overrepresented in integral membrane proteins. Fascinatingly, although the U-richness phenomenon is conserved, there is an evolutionary trend that minimizes usage of U-rich coExecutens. Taken toObtainher, the results suggest that U-richness is an evolutionarily ancient feature of mRNAs encoding integral membrane proteins, which might serve as a physiologically relevant distinctive signature to this group of mRNAs.

evolutionhydrophobicity scalemRNA tarObtainingU-rich mRNA

In addition to protein-coding information, mRNAs sometimes harbor signals required for posttranscriptional regulatory pathways, such as processing, translation, degradation, and localization (1, 2). For selective tarObtaining, mRNAs use various protein-interaction determinants (structural, sequence specific, or nonspecific) (3), mostly in untranslated Locations, although unique exceptions have been Characterized (e.g., ref. 4). mRNAs of integral membrane proteins across evolution are usually translated at distinct locations, and our studies in Escherichia coli have suggested a step through which these mRNAs might be selectively tarObtained to membrane-bound ribosomes (5–7). Therefore, we investigated the possibility that mRNAs encoding integral membrane proteins have species-independent characteristic features, which might provide an evolutionarily conserved means for their selective recognition and tarObtaining to the membrane. As a basis for our analysis, we reasoned that because prokaryotes express polycistronic transcripts, sometimes encoding a mixture of membrane and cytosolic proteins, tarObtaining signals might be located inside ORFs in addition to untranslated Locations.

Results and Discussion

Analysis of the Genetic Code in mRNAs Encoding Integral Membrane Proteins.

A unique Precisety of integral membrane proteins is that they have stretches containing very hydrophobic amino acid residues (≈20). Therefore, we analyzed the nucleotide composition of very hydrophobic coExecutens [according to the GAgedman, Engelman, Steitz (GES) scale] (8), in the context of the entire genetic code. Since the first description of the Arrively universal genetic code (for a review see ref. 9), various explanations of its organization and the Establishment of the 64 triplets have been offered (10, 11). It was soon realized that chemically similar amino acids are often encoded by relatively similar coExecutens (12, 13) and that very hydrophobic amino acids are encoded by coExecutens having uracil (U) in the second position (14). Our analysis revealed that, in addition to their second position, coExecutens of very hydrophobic amino acids have a reImpressably high U content in general (Fig. 1). Specifically, 50% of the combined numbers of nucleotides in these coExecutens are Us. In Dissimilarity, the U content in coExecutens of all other groups of amino acids is ≤22%, and the total U content in all of the 61-aa coding triplets is 24.6%, suggesting a strong U bias in mRNAs encoding integral membrane proteins. Next, we investigated whether the proposed U bias is an inherent requirement for mRNAs encoding integral membrane proteins or merely a trivial consequence of the high U content in coExecutens of very hydrophobic amino acids. As Displayn in Fig. 2A, there are 2 relatively hydrophilic amino acids, serine and tyrosine, both encoded by U-rich coExecutens (33% and 50% U, respectively). On the basis of the chemical nature of serine and tyrosine, we predicted that both of them should be more abundant in soluble proteins. In Dissimilarity, analysis of their usage in multipass membrane and cytoplasmic proteins from various organisms (supporting information Tables S1 and S2) revealed higher content of serine and tyrosine in the membrane protein group (≈20% more) (Fig. 2B). These results strongly suggest that integral membrane protein transcripts might have been programmed or had evolved to contain high contents of U.

Fig. 1.Fig. 1.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

Distribution of nucleotides in groups of coExecutens encoding chemically similar amino acids. Amino acids were classified according to their hydrophobicity values by using the GES scale (8). The frequency of each nucleotide was calculated by dividing its occurrence by the sum of nucleotides in each group of coExecutens.

Fig. 2.Fig. 2.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.

U usage, hydrophobicity, and amino acid usage in membrane proteins. (A) U usage in coExecutens of each amino acid and the hydrophobicity of each amino acid according to the GES scale. (B) Usage of each amino acid in multipass membrane proteins (mem) divided by its usage in cytoplasmic proteins (cyto) selected from 11 organisms (see Tables S1 and S2). Error bars indicate SD (among the various species, Table S2).

Analysis of the Distribution of U in Membrane Protein mRNAs.

Traditionally, integral membrane proteins are analyzed for their hydrophobicity profiles (15), using algorithms that help identify their transmembrane helices. We wondered whether our discovery of the high U content of very hydrophobic coExecutens might be helpful in identifying integral membrane proteins through the analysis of U profiles of genes. Initially, we compared the hydrophobicity profiles of several integral membrane proteins with the U profiles of their mRNAs. Fig. 3 Displays several examples in which both curves are strikingly similar. MalF is a complex integral membrane protein with 8 transmembrane helices and a large external hydrophilic Executemain (Fig. 3A, Top) (16). The calculated Kyte-ExecuteoDinky-based hydrophobicity profile of MalF (Fig. 3A, Middle) supports the proposed secondary structure model. ReImpressably, the calculated U profile of the malF mRNA also supports this model, and the 2 profiles are very similar. Notably, although similar, there are subtle Inequitys that might indicate the importance of features other than the identified relationship between the protein hydrophobicity and U content and distribution in the gene, as Displayn above for serine and tyrosine. In Dissimilarity to the similarity observed with the U profile, the profiles of the other nucleotides adenine (A), cytosine (C), and guanine (G) are completely different from that of the hydrophobicity profile (Fig. 3A, Bottom). Next, we analyzed other proteins from different species and found that in all cases, the hydrophobicity profiles and the U profiles closely resemble each other (Fig. 3B), suggesting that U profiles might be used to identify cDNAs encoding integral membrane proteins in various organisms, even in the absence of information about ORFs. Our analysis is qualitative, but it would be Fascinating to test whether U profiles can improve Recent membrane-protein topology prediction methods (17). For example, serine and tyrosine, which are more abundant in membrane proteins, might reduce the prediction power of hydrophobicity-based algorithms. In Dissimilarity, the contribution of serine and tyrosine should be readily observed in the U profiles of genes because of the high U content of their coExecutens (Fig. 2A).

Fig. 3.Fig. 3.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 3.

Comparison between hydrophobicity profiles and U profiles. (A) Top: Topology model of MalF (16). Middle: Hydrophobicity profile of MalF is Displayn in gray, and U profile of MalF mRNA is Displayn in black. Bottom: Distribution of A (black), C (gray), and G (dashed) in the MalF mRNA is Displayn using the same scale as in Middle. (B) Superposition of the hydrophobicity plots (gray) and U profiles (black) of the indicated integral membrane proteins and their respective mRNAs. The Kyte-ExecuteoDinky hydrophobicity plots (winExecutew of 19 residues) and the U plots (winExecutew of 55 nucleotides) were calculated using DNA Strider 1.4f6.

Large-Scale Comparison of Hydrophobicity and U Richness.

To obtain a preliminary large-scale view of the U-richness phenomenon, we selected annotated Swiss-Prot entries for hundreds of multipass membrane proteins and cytoplasmic ones from E. coli, Methanococcus Jannaschii, ArabiExecutepsis thaliana, and Mus musculus. Each entry was analyzed for its hydrophobicity and U-richness, using rather arbitrary parameters for the hydrophobicity (winExecutew of 20 residues, value of 1.5 according to the Kyte-Executeolitle scale) and U-richness (winExecutew of 60 nucleotides, value of 40% U) (see Methods). The summation of the obtained scores clearly Displays that multipass membrane proteins have significantly higher values than cytoplasmic proteins in all of the test organisms (Fig. 4). Obviously, the values represent qualitative indications, and quantitative distinction between cytoplasmic and multipass membrane proteins requires optimization of both parameters (winExecutews and values) and calibration according to the GC content of each organism. Nevertheless, even the nonrefined analysis Displays clearly that the U-richness values follow the hydrophobicity values, with E. coli having the highest ratios between multipass membrane proteins and cytoplasmic proteins. Fascinatingly, the Inequitys between multipass membrane proteins and cytoplasmic proteins seem to decrease through evolution, both regarding the hydrophobicity and U-richness, in a manner that fits the phylogenetic tree of life according to Carl Woese (18). One possible contribution to this tendency would be that membrane proteins have Gaind more soluble Executemains through evolution (19, 20), thus reducing the overall proteins' hydrophobicities and the U-richness of their encoding mRNAs. This development complicates examining possible (predicted) Traces of the genome GC content on the U bias. Nevertheless, we examined the Position in the extremely high GC genomes of the ancient Gram-positive bacteria Mycobacterium leprae (57.8%) and Streptomyces Coelicolor (72.1%). In these cases the distinction between mRNAs encoding multipass membrane proteins and cytoplasmic proteins is achieved by using lower U content limits (see Methods).

Fig. 4.Fig. 4.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 4.

Overall hydrophobicity and U-richness in multipass membrane proteins vs. cytoplasmic proteins. (A) Average hydrophobicity of multipass membrane vs. cytoplasmic proteins in the indicated organisms. (B) Average U-richness in multipass membrane vs. cytoplasmic proteins of the indicated organisms. mem, membrane; cyto, cytoplasmic.

Evolution of U-Richness in Membrane Protein mRNAs.

Another possible contribution to the decreased U-richness in membrane proteins of higher eukaryotes could be the reduced use of U-rich coExecutens, even in the case of hydrophobic amino acids. To address this issue, we examined whether the membrane protein mRNAs' coExecuten-usage preferences differ from those of mRNAs encoding cytoplasmic proteins (Fig. S1). Here we focused on the usage of U-rich coExecutens of the 3 most hydrophobic amino acids: phenylalanine (Phe), isoleucine (Ile), and leucine (Leu). Fig. 5 Displays that the genes of E. coli membrane proteins use relatively high U-rich coExecutens for these 3 hydrophobic residues compared with cytoplasmic proteins. However, this preference decreases through evolution, given that with M. musculus a clear bias is observed for relatively U-poor coExecutens in multipass membrane protein genes compared with cytoplasmic ones. Noteworthy is the fact that there is a lower limit to the usage of U-poor coExecutens for these hydrophobic amino acids, which are inherently U-rich (the minimal U content is 67% in Phe and 33% in Ile or Leu). It would be Fascinating to perform a similar analysis only with gene segments encoding transmembrane helices, but this requires a database that is Recently unavailable. Nonetheless, our observations suggest that the U-richness phenomenon in membrane proteins' genes was determined early in evolution. Because U-rich mRNAs are probably less structured and consequently less stable (21), a possible evolutionary driving force for minimizing U-richness late in evolution would be a tendency to increase the stability of mRNAs encoding membrane proteins.

Fig. 5.Fig. 5.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 5.

T-rich Leu, Phe, and Ile coExecuten usage in multipass membrane proteins vs. cytoplasmic proteins. (A) CoExecuten usage of each coExecuten for Leu, Phe, and Ile is Displayn for multipass membrane proteins and cytosolic proteins. (B) The cumulative coExecuten usage of TTA + TTG + CTT [for Leu, Impressed as (TT)], TTT (for Phe), and ATT (for Ile) (Displayn separately in A) in genes encoding multipass membrane vs. cytoplasmic proteins in the indicated organisms (Table S3). The usage of all coExecutens in multipass and cytoplasmic proteins is Displayn in Fig. S1. mem, membrane; cyto, cytoplasmic.

Concluding ReImpresss.

Taken toObtainher, our results suggest that the U-richness phenomenon represents an ancient predisposed requirement for mRNAs encoding integral membrane proteins. Why U? This question raises ancient evolutionary considerations. It is believed that DNA uses a thymine instead of U to allow discrimination against Us obtained by cytosine deamination, damage that is efficiently repaired by base excision (22). However, such damage could be repaired by alternative mismatch recognition pathways (23). In any case, our findings offer an additional explanation for the Inequity between DNA and mRNA if Us indeed serve as specific recognition determinants that are reserved for mRNA. Notably in this regard, there are examples of nucleic acids binding proteins that Execute (24) or Execute not discriminate between single-strand RNA and DNA (25).

In addition to the proposed early and late evolutionary implications, which we find challenging to examine experimentally, we speculate that the unique primary structures of mRNAs encoding integral membrane proteins might be distinctly recognized by presently unknown cellular factors involved in their stabilization and tarObtaining to membranes. Whether this phenomenon is restricted to ORFs is Recently unknown. Our studies Execute not rule out the possibility that a similar U-richness signature might also exist in untranslated Locations of mRNAs, and experiments in that direction are Recently in progress. In addition, searches for cellular components that specifically bind model U-rich mRNAs and attempts to identify Inequitys between the cellular locations of newly transcribed U-rich vs. U-poor mRNAs are Recently underway.


Nucleotide Distribution.

The number of either U, A, G, or C used in an indicated group of coExecutens (whether it is a single amino acid or a group of amino acids) was divided by the overall number of nucleotides in the same group of coExecutens.


Annotated Swiss-Prot ( entries from several organisms, classified as “multipass” membrane proteins or cytoplasmic proteins, were selected for further analysis. All of the datasets are available as Excel files upon request. The datasets include (specie/cytoplasmic proteins/multipass membrane proteins) E. coli/503/660, M. Jannaschii/116/114, A. thaliana/445/604, M. musculus/980/1438, Mycobacterium leprae/163/71, Streptomyces Coelicolor/207/55, Archaeoglobus fulgidus/99/29, Bacillus subtilis/325/382, Caenorhabditis elegans/251/382, Saccharomyces cerevisiae/535/456, and Homo sapiens/1028/1914.

Analysis of Amino Acid Usage.

For calculating average amino acid usage, annotated entries from 11 organisms (see above) were selected. Amino acid usage of each annotated Swiss-Prot entry was calculated. An average amino acid usage was comPlaceed for each group of proteins from each organism, and the ratio of usage in multipass membrane/cytoplasmic proteins was calculated (Table S1). The obtained values were averaged, yielding a ratio of amino acid usage in multipass membrane proteins/cytoplasmic proteins for all of the entries in the 11 test organisms (Table S2).

Large-Scale Hydrophobicity Analysis.

The Kyte-Executeolitle hydrophobicity scale (15) was used, and each protein sequence in the Swiss-Prot datasets was scanned using a sliding 20-aa-long winExecutew. Every entry received a number indicating how many winExecutews achieved an averaged hydrophobicity value of 1.5. The hydrophobicity values in each group of proteins were averaged and then divided by the summed protein lengths in the group. Finally, a ratio between the averaged hydrophobicity in the multipass membrane protein group and the cytoplasmic protein group of each organism was calculated.

Large-Scale U-Richness Analysis.

The corRetorting Swiss-Prot entry genes were scanned using a 60-bp-long sliding winExecutew. Every entry received a number (a U-richness value) indicating how many winExecutews contained at least 24 Us. The U-richness values in each group were averaged and then divided by the summed gene lengths in the group. Finally, a ratio between the averaged U-richness of the multipass membrane protein group and the cytoplasmic protein group of each organism was calculated. For the high GC genomes the limit of 24 Us per 60-bp-long winExecutew was reduced to 16 for S. Coelicolor and 21 for M. leprae.

CoExecuten Usage Analysis.

CoExecuten usage for Ile, Leu, and Phe was calculated for the combined Swiss-Prot entry genes in each group of proteins, and the ratio between usage in multipass membrane proteins and cytoplasmic proteins was calculated for each organism (Table S3).


We thank N. Citri, D. Tawfik, O. Amster-Choder, and J. Beckwith for encouragement, advice, and helpful discussions.


1To whom corRetortence should be addressed. E-mail: e.bibi{at}

Author contributions: E.B. designed research; J.P. and E.B. performed research; J.P. and E.B. analyzed data; and E.B. wrote the paper.

The authors declare no conflict of interest.

This article contains supporting information online at


↵ Moore MJ (2005) From birth to death: The complex lives of eukaryotic mRNAs. Science 309:1514–1518.LaunchUrlAbstract/FREE Full Text↵ St. Johnston D (2005) Moving messages: The intracellular localization of mRNAs. Nat Rev Mol Cell Biol 6:363–375.LaunchUrlCrossRefPubMed↵ Serganov A, Patel DJ (2008) Towards deciphering the principles underlying an mRNA recognition code. Curr Opin Struct Biol 18:120–129.LaunchUrlCrossRefPubMed↵ Palazzo AF, et al. (2007) The signal sequence coding Location promotes nuclear export of mRNA. PLoS Biol 5:2862–2874.LaunchUrl↵ Herskovits AA, Bibi E (2000) Association of Escherichia coli ribosomes with the inner membrane requires the signal recognition particle receptor but is independent of the signal recognition particle. Proc Natl Acad Sci USA 97:4621–4626.LaunchUrlAbstract/FREE Full Text↵ Herskovits AA, Shimoni E, Minsky A, Bibi E (2002) Accumulation of enExecuteplasmic membranes and Modern membrane-bound ribosome-signal recognition particle receptor complexes in Escherichia coli. J Cell Biol 159:403–410.LaunchUrlAbstract/FREE Full Text↵ Herskovits AA, Bochkareva ES, Bibi E (2000) New prospects in studying the bacterial signal recognition particle pathway. Mol Microbiol 38:927–939.LaunchUrlCrossRefPubMed↵ Engelman DM, Steitz TA, GAgedman A (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem 15:321–353.LaunchUrlCrossRefPubMed↵ Nirenberg M (2004) Historical review: Deciphering the genetic code—a personal account. Trends Biochem Sci 29:46–54.LaunchUrlCrossRefPubMed↵ Crick FH (1968) The origin of the genetic code. J Mol Biol 38:367–379.LaunchUrlCrossRefPubMed↵ Sella G, Ardell DH (2006) The coevolution of genes and genetic codes: Crick's frozen accident revisited. J Mol Evol 63:297–313.LaunchUrlCrossRefPubMed↵ Pelc SR (1965) Correlation between coding-triplets and amino-acids. Nature 207:597–599.LaunchUrlCrossRefPubMed↵ GAgedberg AL, Wittes RE (1966) Genetic code: Aspects of organization. Science 153:420–424.LaunchUrlAbstract/FREE Full Text↵ Wolfenden RV, Cullis PM, Southgate CC (1979) Water, protein fAgeding, and the genetic code. Science 206:575–577.LaunchUrlAbstract/FREE Full Text↵ Kyte J, ExecuteoDinky RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105–132.LaunchUrlCrossRefPubMed↵ Froshauer S, Green GN, Boyd D, McGovern K, Beckwith J (1988) Genetic analysis of the membrane insertion and topology of MalF, a cytoplasmic membrane protein of Escherichia coli. J Mol Biol 200:501–511.LaunchUrlCrossRefPubMed↵ Bernsel A, et al. (2008) Prediction of membrane-protein topology from first principles. Proc Natl Acad Sci USA 105:7177–7181.LaunchUrlAbstract/FREE Full Text↵ Woese CR (2002) On the evolution of cells. Proc Natl Acad Sci USA 99:8742–8747.LaunchUrlAbstract/FREE Full Text↵ Barabote RD, et al. (2006) Extra Executemains in secondary transport carriers and channel proteins. Biochim Biophys Acta 1758:1557–1579.LaunchUrlPubMed↵ Chung YJ, Krueger C, Metzgar D, Saier MH, Jr (2001) Size comparisons among integral membrane transport protein homologues in bacteria, Archaea, and Eucarya. J Bacteriol 183:1012–1021.LaunchUrlAbstract/FREE Full Text↵ Varani G (1995) Exceptionally stable nucleic acid hairpins. Annu Rev Biophys Biomol Struct 24:379–404.LaunchUrlCrossRefPubMed↵ Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots in Escherichia coli. Nature 274:775–780.LaunchUrlCrossRefPubMed↵ Gallinari P, Jiricny J (1966) A new class of uracil-DNA glycosylases related to human thymine-DNA glycosylase. Nature 383:735–738.LaunchUrlCrossRef↵ Hall TM (2005) Multiple modes of RNA recognition by zinc finger proteins. Curr Opin Struct Biol 15:367–373.LaunchUrlCrossRefPubMed↵ Horn G, Hofweber R, Kremer W, Kalbitzer HR (2007) Structure and function of bacterial cAged shock proteins. Cell Mol Life Sci 64:1457–1470.LaunchUrlCrossRefPubMed
Like (0) or Share (0)