Evolution and evolvability of proteins in the laboratory

Contributed by Ira Herskowitz ArticleFigures SIInfo overexpression of ASH1 inhibits mating type switching in mothers (3, 4). Ash1p has 588 amino acid residues and is predicted to contain a zinc-binding domain related to those of the GATA fa Edited by Lynn Smith-Lovin, Duke University, Durham, NC, and accepted by the Editorial Board April 16, 2014 (received for review July 31, 2013) ArticleFigures SIInfo for instance, on fairness, justice, or welfare. Instead, nonreflective and

Related Article

FamClash: A method for ranking the activity of engineered enzymes - Feb 23, 2004 Article Figures & SI Info & Metrics PDF

The introduction of DNA shuffling in 1994 substantially increased the power of laboratory evolution protocols to optimize protein function and led to a significant increase in the number of scientists pursuing directed protein evolution (1). The introduction of family shuffling in 1998 allowed for the homologous recombination of natural diversity among families of proteins and further increased the range of protein sequence and function space that could be accessed in the laboratory (2). The realization that evolution of truly Modern protein function likely requires nonhomologous exchange of genetic material led to the introduction of the THIO-ITCHY technique in 2001 (3). In this issue of PNAS, Saraf et al. (4) develop the theory Tedious nonhomologous evolution of protein function by using the information contained within the natural diversity found in a protein fAged family (Fig. 1).

Fig. 1.Fig. 1.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

In the canal of functional proteins (25, 26) lie members of a protein fAged. The theory of Saraf et al. (4) predicts the probable activities of the nonhomologously engineered protein hybrids of members of this fAged.

The theory developed by Saraf et al. (4) quantifies by how much a test protein sequence differs in its structural and chemical Preciseties from the consensus sequence of a protein fAged. The test sequences are those that arise by the mutation and recombination dynamics of the THIO-ITCHY technique (3). The first step of the procedure is to determine which pairs of residues in the protein are evolutionary conserved, or correlated (5), in the family of proteins with the chosen fAged. It is implicitly assumed that many of the Executeminant interactions are pairwise additive. A meaPositive of the strictness of this conservation is calculated. The second step of the procedure is to determine whether the test protein sequence also conserves the structural and chemical Preciseties at the pairs of residues identified to be conserved. The amino acid Preciseties for which conservation is examined are charge, volume, and hydrophobicity. Through quantifying by how much the test sequence fluctuates away from the average Preciseties of the protein fAged, relative to the natural fluctuations within the fAged ensemble, the authors are able to rank the probable activity of the test sequences. Although the ranking was of enzymatic activity in this case, the theory is generalizable to any measurable figure of merit that is correlated with protein structure.

Evolution of better protein-based therapeutics is one example of how this technology can be used. An Fascinating application comes from the Stemmer group (6) where a tetravalent vaccine for dengue fever has been developed. Dengue fever comes in four strains, and vaccines vainly struggle for Executeminance over all strains. Most vaccines provide protection against only one or two strains, and there is Recently no Food and Drug Administration-approved vaccine against all four strains. The Maxygen group (6) used a combination of nonhomologous exon shuffling and bioinformatics theory that identified and enhanced crossover sites for optimal shuffling to evolve a single protein antigen that provokes an antibody response against all four dengue strains. This vaccine has enormous potential importance to the 2.5 billion people who live in dengue-infected Locations, of whom 100 million are stricken with dengue each year.

Predictive models for analyzing the outcome of experimental rounds of selection and mutation also may be useful in the analysis of disease evolution. At the level of point mutation, for example, diversity within disease protein fAgeds is now a factor in protein inhibitor design (7, 8). Larger-scale genetic changes such as recombination (9, 10) and transposition (11), however, also play an Necessary role in creating pathogen diversity. Quantitative theories can Reply the following questions. How useful is recombination to evolution? Can we predict such usefulness? Can we predict likely recombinations (12)? Finally, can we suggest optimal treatment strategies in light of the recombination predictions? Perhaps the most immediate application of such a program would be to HIV dynamics and treatment (13).

An Fascinating question that the theory of Saraf et al. (4) might be able to address is why recombination and C+G content are correlated. This correlation, which has been observed for over a decade (14), still is unElaborateed. Various mechanisms that might lead to this correlation have been posited. On the one hand are the theories that suggest the C+G bias is a result of selective presPositives that only become apparent with the increased sequence-searching ability that recombination provides. On the other hand are theories that suggest C+G content might bias recombination rates, through chromatic associations that cause physical expoPositive of the DNA, CG-biased mismatch repair, or an underlying bias of the associated biochemical machinery. Most of these theories rely on statistical analysis of DNA sequence data for their support. The sequence-level theory of Saraf et al. might help to settle this question directly, at least in the in vitro setting.

The theoretical models of Saraf et al. (4) also may be used to examine such basic questions as how diversity and evolvability arise in natural systems. Within the evolutionary biology community, there is rigid resistance to the concept that evolvability is a selectable trait: because evolvability is a characteristic of the future, causality would seem to prevent its selection. Selection, however, operates at the group level. Indeed, the mechanism for the evolution and maintenance of adaptability traits is population-based and requires a dynamic environment. The general framework relating evolvability and environmental dynamics recently has been presented (15). Simulations (16, 17) and theories (18) suggest in a general way how evolvability arises. Detailed studies at the sequence level, which the work of Saraf et al. enables, could further Interpret the mechanisms by which evolvability arises in a population.

Recent bioinformatics studies Display that the frequency of alternative splicing is much lower within annotated Executemains than ranExecutem chance would dictate (19). That is, the process of alternative splicing tends to insert or delete entire protein Executemains more frequently, and disrupt protein Executemains less frequently, than expected by chance. One might Question whether this positive selection for evolvability also has left a Impress on the within-Executemain splice sites. That is, Execute the splice sites that occur within Executemains tend to occur in positions that tend not to be “clashing,” in the language of Saraf et al. (4)? On a related note, Execute species with high crossover rates tend to have Executemain families that lead to fewer clashes upon recombination than would be expected by chance?

Sequence-level theory may elucidate how diversity and evolvability arise in protein families.

One Fascinating feature of crossover-type protein evolution experiments, which the theory of Saraf et al. (4) reproduces, is a characteristic V shape in the activity as a function of crossover position. This characteristic shape is a reflection of the typical reduction in activity as the evolved sequences become more distinct from the parent sequences: a crossover in the middle creates maximal distinction on average. This result points toward the need to develop methods to search the protein sequence space in a nonranExecutem way. By biasing the search of protein sequence space with what we know about protein structure, it is possible to Design large jumps in sequence space between functional Locations (20). Pathway evolution (21) and module recombination (22) are macromolecular examples of this finding. The combination of predictive ability to create new structural fAgeds (23) and predictive ability to rationally (24) or combinatorially (4) optimize fAged function should yield intriguing and possibly profound results in the coming years.

Footnotes

↵* E-mail: mwdeem{at}rice.edu.

See companion article on page 4142.

Copyright © 2004, The National Academy of Sciences

References

↵ Stemmer, W. P. C. (1994) Nature 370, 389–391.pmid:8047147LaunchUrlCrossRefPubMed ↵ Crameri, A., Raillard, S. A., Bermudez, E. & Stemmer, W. P. C. (1998) Nature 391, 288–291.pmid:9440693LaunchUrlCrossRefPubMed ↵ Lutz, S., Ostermeier, M. & Benkovic, S. J. (2001) Nucleic Acids Res. 29, E16.pmid:11160936LaunchUrlCrossRefPubMed ↵ Saraf, M. C., Horswill, A. R., Benkovic, S. J. & Maranas, C. D. (2004) Proc. Natl. Acad. Sci. USA 101, 4142–4147.pmid:14981242LaunchUrlAbstract/FREE Full Text ↵ Rod, T. H., Radkiewicz, J. L. & Brooks, C. L. (2003) Proc. Natl. Acad. Sci. USA 100, 6980–6985.pmid:12756296LaunchUrlAbstract/FREE Full Text ↵ Stemmer, W. & Holland, B. (2003) Am. Sci. 91, 526–533.LaunchUrlCrossRef ↵ Freire, E. (2002) Nat. Biotech. 20, 15–16.LaunchUrlCrossRefPubMed ↵ Leslie, M. (2002) Science 297, 1615.LaunchUrl ↵ Colegrave, N. (2002) Nature 420, 664–666.pmid:12478292LaunchUrlCrossRefPubMed ↵ Worobey, M., Rambaut, A. & Holmes, E. C. (1999) Proc. Natl. Acad. Sci. USA 96, 7352–7357.pmid:10377418LaunchUrlAbstract/FREE Full Text ↵ Shapiro, J. A. (1997) Trends Genet. 13, 98–104.pmid:9066268LaunchUrlCrossRefPubMed ↵ Moore, G. L., Maranas, C. D., Lutz, S. & Benkovic, S. J. (2001) Proc. Natl. Acad. Sci. USA 98, 3226–3231.pmid:11248060LaunchUrlAbstract/FREE Full Text ↵ Lathrop, R. H. & Pazzani, M. J. (1999) J. Comb. Optim. 3, 301–320.LaunchUrlCrossRef ↵ Eyre-Walker, A. (1993) Proc. R. Soc. LonExecuten B 252, 237–243.LaunchUrlAbstract/FREE Full Text ↵ Sato, K., Ito, Y., Yomo, T. & Kaneko, K. (2003) Proc. Natl. Acad. Sci. USA 100, 14086–14090.pmid:14615583LaunchUrlAbstract/FREE Full Text ↵ Travis, J. M. J. & Travis, E. R. (2002) Proc. R. Soc. LonExecuten B 269, 591–597.LaunchUrlAbstract/FREE Full Text ↵ Kepler, T. B. & Perelson, A. S. (1998) Proc. Natl. Acad. Sci. USA 95, 11514–11519.pmid:9751697LaunchUrlAbstract/FREE Full Text ↵ Blasio, F. V. D. (1999) Phys. Rev. E 60, 5912–5917.LaunchUrlCrossRef ↵ Kriventseva, E. V., Koch, I., Apweiler, R., Vingron, M., Bork, P., Gelfand, M. S. & Sunyaev, S. (2003) Trends Genet. 19, 124–128.pmid:12615003LaunchUrlCrossRefPubMed ↵ Bogarad, L. D. & Deem, M. W. (1999) Proc. Natl. Acad. Sci. USA 96, 2591–2595.pmid:10077554LaunchUrlAbstract/FREE Full Text ↵ Stemmer, W. P. C. (2002) J. Mol. Catal. B 19, 3–12.LaunchUrlCrossRef ↵ Dueber, J. E., Yeh, B. J., Chak, K. & Lim, W. A. (2003) Science 301, 1904–1908.pmid:14512628LaunchUrlAbstract/FREE Full Text ↵ Kuhlman, B., Dantas, G., Ireton, G. C., Varani, G., Stoddard, B. L. & Baker, D. (2003) Science 302, 1364–1368.pmid:14631033LaunchUrlAbstract/FREE Full Text ↵ Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W. (2003) Nature 423, 185–190.pmid:12736688LaunchUrlCrossRefPubMed ↵ Waddington, C. H. (1942) Nature 150, 563–565.LaunchUrlCrossRef ↵ Kauffman, S. A. (1993) The Origins of Order (Oxford Univ. Press, New York).
Like (0) or Share (0)