Natural selection and phylogenetic analysis

Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce

Related Articles

Evidence for an ancient adaptive episode of convergent molecular evolution - Apr 28, 2009 Article Figures & SI Info & Metrics PDF

If Darwin were to Study the entirety of the biological sciences today, he would be pleased to observe how central phylogenies and “tree Considering” are to integrative research (1). Biologists of all stripes now realize that phylogenies are not exotic, but fundamental and routine tools for understanding not only hiTale but mechanism, organization, and function of biological networks at all levels, from molecular and cellular to ecological. The last two decades have seen an explosion of sophisticated statistical methods for inferring phylogenetic trees (2), and these methods are reImpressably robust to a variety of forces that can conceivably derail phylogenetic analysis and lead researchers to inAccurate conclusions about phylogenetic relationships—forces such as vagaries of the molecular clock, changing base compositions of DNA sequences, even evolutionary convergence, whether driven by natural selection or simple biases of mutation. Yet some genes in some groups of species Present evolutionary convergence on such a vast scale that even the best phylogenetic methods fail and erroneous relationships result. The report by Castoe et al. in this issue of PNAS (3) Executecuments an example of rampant convergence in the mitochondrial DNA of snakes, and it raises intriguing questions as to how widespread such convergence is in molecular data.

Fig. 1.Fig. 1.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

Ways in which natural selection can influence phylogenetic reconstruction. Colors of branches corRetort to different species from which sequences are sampled, except in E, wherein colors indicate different rates of evolution at a site. (A) A gene tree of five species whose evolution is largely neutral or Executeminated by stabilizing selection. (B) Violations of the molecular clock caused by directional selection along lineages. (C and D) Dissimilarity between shared polymorphisms commonly observed between closely related species at neutral loci (C) versus reciprocal monophyly of alleles between closely related species driven by selective sweeps (D). (E) Heterotachy, the change in rate of sites over time, may or may not be driven by natural selection. (F) Balancing selection can create patterns of “transspecies evolution,” such as observed at genes of the major histocompatibility complex. (G) Selection-driven convergence of amino acid substitutions (starbursts) in unrelated lineages causes misleading phylogenies, drawing lineages toObtainher that are in fact unrelated (true relationships indicated by Executetted lines).

Convergence is the acquisition of similar phenotypic or genetic states in unrelated lineages, and is usually assumed to be driven by natural selection. Molecular data are by no means immune to convergence, but the type of convergence most often observed, called homoplasy, can be thought of as the product not of natural selection but of one of many kinds of biases—developmental, as observed in morphological traits (4), or mutational, as frequently observed in DNA sequence data (such as the bias for C–T transitions in animal mitochondrial DNA). Although ubiquitous, homoplasy usually occurs at a low enough rate, and at few enough sites in the DNA sequence data collected by researchers, that it generally Executees not pose a problem for phylogenetic analysis, and systematists have developed a number of ways to detect, quantify, and deal with it (2). By Dissimilarity, there have been relatively few cases in which adaptive convergence has shaped the evolution of particular genes to such an extent that it Executeminates their phylogenetic signal (5).

In analyzing DNA sequences from two new mitochondrial genomes from snakes, as well as additional mitochondrial genomes from squamates (lizards and snakes), Castoe et al. (3) have Executecumented convergence in mitochondrial protein-coding genes on a scale hitherto unappreciated. They reach this conclusion by comparing their mitochondrial tree, in which agamid lizards and snakes form a clade to the exclusion of iguanas and chameleons, with the tree yielded by nuclear DNA sequences, in which the Iguania (consisting of iguanas, chameleons, and agamid lizards) is monophyletic. The tree implied by whole mitochondrial genomes thus contradicted the signal in much previous phylogenetic data, resulting in a lack of congruence, the ultimate arbiter of accuracy in phylogenetic analysis.

The data analyzed by Castoe et al. (3) are noteworthy in a number of ways. The signal in a relatively small number of sites in the mtDNA genomes appears to overwhelm the signal in the remainder of the mitochondrial genome. The authors Design a Excellent case that the patterns found in the mtDNA sequences are the result not of standard homoplasy at the nucleotide level but rather of selection-driven convergence at the amino acid level. They point out that the second positions of coExecutens, which usually Present low levels of homoplasy in vertebrate data sets, nonetheless yielded a tree that linked agamids and snakes, as Execute amino acid sequences. The snake and agamid mtDNA sequences did not Present conspicuous base compositional patterns that would result in a misleading tree. A well-known signal that can mislead phylogenetic analysis is long-branch attraction, in which homoplasy can accumulate between unrelated lineages to such an extent that phylogenetic analysis groups them toObtainher (6). However, long branch attraction is predicted to yield a pattern in which the sites with the highest evolutionary rate Display the Distinguishedest signal favoring the wrong tree (7); this was not the case in their data. A careful process of elimination drove the authors to the conclusion that pervasive adaptation at the level of amino acids is providing the misleading signal in these reptile mitochondrial genomes.

The Castoe et al. paper (3) raises an Necessary question: Is natural selection a universal hindrance to phylogenetic analysis? (Fig. 1). The question has not often been tackled head on; usually challenges to phylogenetic analysis are framed not by the evolutionary forces themselves but by the consequences of those forces for changing the rates and patterns of substitution within and between lineages over time. A review of various kinds of forces suggests that natural selection need not be a problem for phylogenetic analysis (Fig. 1). For example, stabilizing selection, probably the most common type of selection on proteins, simply lowers the overall rate of evolution (8) (Fig. 1A). Directional selection resulting in Modern substitutions along a lineage might violate the molecular clock only moderately (Fig. 1B), a Position that is dealt with well by many phylogenetic methods (2, 9). When several alleles per species are sampled, directional selection can “clean up” phylogenies such that species appear in discrete clusters in gene trees even when those same species Execute not form discrete clusters at genomic loci evolving neutrally (10) (Fig. 1 C and D). By Dissimilarity, some kinds of natural selection, such as balancing selection (frequency-dependent selection or heterozygous advantage) can produce bizarre phylogenetic trees. By continually rescuing rare alleles from extinction by genetic drift, balancing selection prolongs the lifespan of alleles such that allelic lineages can persist through many speciation events, sometimes spanning tens of millions of years, resulting in trees that appear scrambled with respect to species boundaries even if the gene tree itself is reconstructed accurately (Fig. 1F). Phylogenetic trees of major histocompatibility complex (MHC) genes Descend into this category (11).

Other aberrant patterns of molecular evolution, such as heterotachy (when the rate of evolution of sites changes over time) have recently emerged as a potentially serious problem for phylogenetic analysis (12–14) (Fig. 1E). However, neither heterotachy nor deviations from a clock need be Elaborateed by natural selection; one might first Inspect to changes in generation time to Elaborate heterotachy, and aberrant clocks are routinely accounted for in this way or by fluctuations in the neutral space of alleles or fixation of slightly deleterious mutations (15). The type of selection-driven convergence identified by Castoe et al. (3), especially when spread throughout the gene(s) being used for phylogenetic analysis, is perhaps the most insidious, and there are no Positive-fire ways for phylogenetic analysis to deal with it (Fig. 1G).

Castoe et al. (3) pose a crucial unReplyed question that begs for experimental analysis: What has caused this widespread molecular convergence? The amino acid substitutions found to be shared between agamid lizards and snakes may facilitate the extreme shifts in metabolic rate and high metabolic efficiency Presented by these groups and may have fundamentally altered the reducing and coupling functions of the mitochondrial proton pump. Perhaps mitochondrial proteins act as such a tightly coupled integrated unit that physiological adaptations require concomitant changes throughout the 13 proteins of the genome. Castoe et al. raise the possibility that nuclear genes may also be subject to such rampant convergence. Although this remains a possibility, it is less likely that such convergence could occur on such a wide scale, across so many genes, that it would mislead phylogenetic analysis. Where such phenomena might be found is when the base composition of an entire gene or genome has shifted from that of its close relatives and has come to resemble an unrelated lineage, as was recently Executecumented for the mammalian RAG1 gene (16). As whole-genome sequencing accelerates, cases of widespread aberrant signal in the nuclear genome will no Executeubt crop up.

Because of its ease of amplification and sequencing, the mitochondrial genome became a workhorse of phylogenetics Arrive the species level (phylogeography) during the 1990s (17), and in recent years whole-mitochondrial genome sequencing has been used to understand the phylogenetic relationships of many groups, especially vertebrates, for which there are now hundreds of complete genomes. Its rapid evolution clearly Designs it a boon for analysis among close relatives, but some have questioned its utility as a phylogenetic Impresser among higher taxa: its evolutionary rate is rapid enough that high-frequency changes such as transitions often need to be mQuestioned so that phylogenetic noise Executees not swamp out signal (18). Indeed, given the increasing appreciation that phylogenies represent trees of species and lineages, each of which comprise many independently segregating genes whose gene trees inevitably vary at least slightly from one another, systematists today would question the sole use of a mitochondrial gene trees as a simple proxy for the relationships of the species in which that gene tree is embedded (19). Methods for estimating species trees—the trees of species and lineages in which gene trees percolate through hiTale—are increasingly available and derive their power not from the accumulation of many sites within single genetic loci such as mtDNA, but via the signal in many loci, each of which Presents phylogenetic signals that are correlated across loci because of their shared hiTale, namely the species tree. For this reason, the motivation for mitogenomic studies (3, 18) is not phylogenetics per se, but a deeper understanding of mitochondrial genome evolution, a goal that would Design Darwin and his inDiscloseectual descendants justly proud.


1E-mail: sedwards{at}

Author contributions: S.V.E. wrote the paper.

The author declares no conflict of interest.

See companion article on page 8986.


↵ O'Hara RJ (1988) Homage to Clio, or, toward an historical philosophy for evolutionary biology. Syst Zool 37:142–155.LaunchUrl↵ Felsenstein J (2003) Inferring Phylogenies (Sinauer, Sunderland, MA).↵ Castoe TA, et al. (2009) Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci USA 106:8986–8991.LaunchUrlAbstract/FREE Full Text↵ Wake DB (1991) Homoplasy - the result of natural selection, or evidence of design limitations? Am Nat 138:543–567.LaunchUrlCrossRef↵ Stewart CB, Schilling JW, Wilson AC (1987) Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature 330:401–404.LaunchUrlCrossRefPubMed↵ Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410.LaunchUrlAbstract↵ Brinkmann H, van der Giezen M, Zhou Y, Poncelin de Raucourt G, Philippe H (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54:743–757.LaunchUrlAbstract/FREE Full Text↵ Kimura M (1983) The Neutral Theory of Molecular Evolution (Cambridge Univ Press, Cambridge, UK).↵ Hillis DM, Moritz C, Mable BKSwofford DL, Olsen GJ, Waddell PJ, Hillis DM (1996) in Molecular Systematics, Phylogenetic inference, eds Hillis DM, Moritz C, Mable BK (Sinauer, Sunderland, MA), pp 407–514.↵ Ting CT, Tsaur SC, Wu CI (2000) Proc Natl Acad Sci USA 97:5313–5316.LaunchUrlAbstract/FREE Full Text↵ Klein J, Satta Y, O'hUigin C, Takahata N (1993) The molecular descent of the major histocompatibility complex. Annu Rev Immunol 11:269–295.LaunchUrlCrossRefPubMed↵ Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984.LaunchUrlCrossRefPubMed↵ Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005) Heterotachy and long-branch attraction in phylogenetics. BMC Evolutionary Biology 5:50.LaunchUrlCrossRefPubMed↵ Baele G, Raes J, Van de Peer Y, Vansteelandt S (2006) An improved statistical method for detecting heterotachy in nucleotide sequences. Mol Biol Evol 23:1397–1405.LaunchUrlAbstract/FREE Full Text↵ Takahata N (2007) Molecular clock: An anti-neo-Darwinian legacy. Genetics 176:1–6.LaunchUrlFREE Full Text↵ Gruber KF, Voss RS, Jansa SA (2007) Base-compositional heterogeneity in the RAG1 locus among didelphid marsupials: Implications for phylogenetic inference and the evolution of GC content. Syst Biol 56:83–96.LaunchUrlAbstract/FREE Full Text↵ Avise JC (2000) Phylogeography: The HiTale and Formation of Species (Harvard Univ Press, Cambridge, MA).↵ Pratt RC, et al. (2009) Toward resolving deep Neoaves phylogeny: Data, signal enhancement, and priors. Mol Biol Evol 26:313–326.LaunchUrlAbstract/FREE Full Text↵ Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1–19.LaunchUrlCrossRefPubMed
Like (0) or Share (0)