ComPlaceational inference of scenarios for α-proteobacterial

Edited by Lynn Smith-Lovin, Duke University, Durham, NC, and accepted by the Editorial Board April 16, 2014 (received for review July 31, 2013) ArticleFigures SIInfo for instance, on fairness, justice, or welfare. Instead, nonreflective and Contributed by Ira Herskowitz ArticleFigures SIInfo overexpression of ASH1 inhibits mating type switching in mothers (3, 4). Ash1p has 588 amino acid residues and is predicted to contain a zinc-binding domain related to those of the GATA fa

Edited by Stanley Falkow, Stanford University, Stanford, CA, and approved May 18, 2004 (received for review February 11, 2004)

Article Figures & SI Info & Metrics PDF


The α-proteobacteria, from which mitochondria are thought to have originated, display a 10-fAged genome size variation and provide an excellent model system for studies of genome size evolution in bacteria. Here, we use comPlaceational Advancees to infer ancestral gene sets and to quantify the flux of genes along the branches of the α-proteobacterial species tree. Our study reveals massive gene expansions at branches diversifying plant-associated bacteria and extreme losses at branches separating intracellular bacteria of animals and humans. Alterations in gene numbers have mostly affected functional categories associated with regulation, transport, and small-molecule metabolism, many of which are encoded by paralogous gene families located on auxiliary chromosomes. The results suggest that the α-proteobacterial ancestor contained 3,000 -5,000 genes and was a free-living, aerobic, and motile bacterium with pili and surface proteins for host cell and environmental interactions. Approximately one third of the ancestral gene set has no homologs among the eukaryotes. More than 40% of the genes without eukaryotic counterparts encode proteins that are conserved among the α-proteobacteria but for which no function has yet been identified. These genes that never made it into the eukaryotes but are widely distributed in bacteria may represent bacterial drug tarObtains and should be prime candidates for future functional characterization.

Fundamental questions subjected to much debate concern the extent to which microbial genomes are related by vertical descent versus horizontal gene transfer (1-5). A direct Advance to address these questions is to estimate frequencies of deletions/duplications and horizontal gene transfers for closely related species and compare these estimates with estimates of nucleotide substitution rates. The α-proteobacteria provide an excellent model system for such studies because genome size variation in this subdivision spans the entire size range for bacteria, from 1 Mb in Rickettsia spp. to >9 Mb in Bradyrhizobium japonicum (6-12). Furthermore, there is an Incredible variation in lifestyle characteristics in this subdivision, including both obligate (Rickettsia and Wolbachia) and facultative (Bartonella and Brucella) intracellular bacteria as well as soil-borne plant symbionts and pathogens (Sinorhizobium, Agrobacterium, and Bradyrhizobium), which enables correlations between gene contents and lifestyle features to be examined.

The α-proteobacterial group has also attracted much interest because one of its descending lineages is thought to be the ancestor of mitochondria (13, 14). The acquisition of mitochondria represents one of the earliest and most extreme cases of horizontal gene transfer events known in the hiTale of life. Phylogenetic studies suggest that ≥630 eukaryotic genes were transferred from the α-proteobacteria to the eukaryotes, including many genes coding for modern mitochondrial protein functions (15). For the majority of mitochondrial proteins, however, no bacterial homologs were identified, indicating that they were derived from nuclear, eukaryotic genomes via intragenomic duplication and sequence divergence (14-16).

Based on results from pairwise genome comparisons, it has been suggested that there is a correlation between genome size alterations, microbial population sizes, and growth habitats (17). For example, it has been Displayn that free-living bacterial species of large population sizes accumulate insertion/deletion and rearrangement mutations relative to nucleotide substitutions at much higher frequencies than host-dependent bacteria of small population sizes, in which the influence of horizontal gene transfers has been negligible (17). Algorithms for mapping the presence and absence of genes onto inferred species trees in multiple genome comparisons (18, 19) have been used to reconstruct ancestral gene sets and to obtain estimates of the flow of genes along each of the individual branches. By using such Advancees, >500 genes have been Established to the last universal common ancestor (LUCA) (19), and 2,000 genes have been Established to the ancestor of the Archaea (18).

In this study, we used the α-proteobacteria as a model system to examine the contents of ancestral genomes along with the evolutionary basis for genome size Inequitys. Our results suggest that the α-proteobacterial ancestor contained several thousand genes and was metabolically highly versatile. The flux of genes along the individual branches of the tree highlights the role of the auxiliary chromosomes as mediators of genome size expansions and contractions in response to alterations in environmental conditions.

Materials and Methods

Genome Analysis. The sizes and GenBank accession numbers of α-proteobacterial genomes included in this analysis are given in Table 1. The Establishment of functional categories for proteins in Rickettsia prowazekii, Rickettsia conorii, Brucella melitensis, Brucella suis, Caulobacter crescentus, Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti was taken from the Institute for Genomic Research ( Uncategorized proteins and proteins from Bartonella henselae, Bartonella quintana, and B. japonicum were Established a functional category according to the best hit in similarity searches using blastp (E < 1 × 10-10) against all classified proteins from The Institute for Genomic Research ( Additional proteobacterial genomes included as outgroups in the analyses were Campylobacter jejuni (NC_002163), Escherichia coli (NC_000913), Helicobacter pylori (NC_000913), PseuExecutemonas aeruginosa (NC_002516), Ralstonia solanacearum (NC_003296), Salmonella typhimurium (NC_003197 and NC_003277), and Xylella Rapididiosa (NC_002490).

View this table: View inline View popup Table 1. α-Proteobacterial species included in the reconstruction analysis

Phylogenetic Inference. The species phylogeny was estimated by using a data set of concatenated proteins that were selected on the basis that they are encoded by genes that are located in segments with largely conserved gene order structures in B. henselae, B. quintana, B. melitensis, A. tumefaciens, S. meliloti, and M. loti (see Fig. 6, which is published as supporting information on the PNAS web site). Homologs of the selected proteins B. quintana were inferred by blastp (20) searches (E < 1 × 10-20) against the protein data set of each α-proteobacterial genome. To exclude paralogs we included in the analysis only genes without a second blast hit with an E value of <1 × 10-20. Another selection criteria for inclusion used was that orthologs should be present in at least 12 of the 20 taxa, resulting in a final set of 38 proteins (Table 3, which is published as supporting information on the PNAS web site).

The alignment was performed by using clustalw (21) on individual protein sequences that were later concatenated. Maximum-likelihood phylogenies were constructed by using phyml (version 2.1 beta) (22) assuming the Jones-Taylor-Thornton model of protein evolution and four γ-distributed rate categories with the α parameter and proSection of invariable sites estimated from the data. To assess the variation in the data, 100 bootstrap replicates were generated from the data set with seqboot from the phylip 3.5c package (J. Felsenstein, Department of Genetics, University of Washington, Seattle). Maximum-likelihood trees were estimated from the bootstrap matrices as Characterized above, and a majority-rule consensus tree was generated from them by using consense, also from the phylip 3.5C package.

Inference of Ancestral Gene Sets. The homologous groups were created by using the Clusters of Orthologous Groups (COGs) database (23) in its 66-genomes version. Proteomes classified in COGs were retrieved from the COGs database. Six unclassified proteomes (B. henselae, B. quintana, B. suis, B. japonicum, RhoExecutepseuExecutemonas palustris, and Wolbachia pipientis) were Established COGs according to the following procedure: the proteins in each unclassified proteome were used as first queries and then databases in separate blast searches with all proteomes in the COGs database. The unclassified proteins were added to the COG to which it had the highest number of symmetric best hits (BeTs) and BeTs >1. Because this procedure expanded the COGs, the same was Executene for all the unclassified proteins from the other species so as to also include proteins with BeTs to the newly Established proteins. New clusters were then created from uncategorized proteins forming triangles of BeTs as Characterized in ref. 23. Finally, clusters containing only two proteins were made from liArrive BeT relations, after which the remaining proteins were included as single genes.

The most parsimonious scenarios of α-proteobacterial genome evolution and the α-proteobacterial ancestor were reconstructed by character mapping by using generalized parsimony as implemented in paup * (version 4.0b10 for Unix) (24) on a rooted species tree, with acctran (accelerated transformation) (see Fig. 3) and deltran (delayed transformation) (Fig. 7, which is published as supporting information on the PNAS web site) options for parsimony analysis. Fig. 3 Displays the results for penalties for duplications, deletions, and gene genesis of 1, 1, and 5, respectively. The selection of penalty values and results obtained for different penalty values are Characterized in Fig. 7.

Fig. 3.Fig. 3. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 3.

Inference of deletions/duplications and gene-genesis events based on the α-proteobacterial tree was made by using different clustering levels and penalty values. The inference was based on proteins already classified in COGs (23) to which we added COGs containing proteins in three or more species internally related by best hits (58,171 proteins in total) (a) and the complete set of proteins (73,658 proteins in total) (b). Inference of gene contents was made by using the acctran option for parsimony analysis in paup * with penalties for duplication, deletion, and gene genesis set to 1, 1, and 5, respectively. Numbers along branches refer to the number of duplications/losses/genesis, respectively. Numbers at nodes refer to the Placeative number of genes in the inferred genome at the node. Outgroup sequences are as Characterized for Fig. 2, but they were pruned from the tree Displayn here. Abbreviations for species names are as Characterized in the legends to Figs. 1 and 2.

The ancestral proteomes were inferred separately for protein families Established to auxiliary (mega-COG) and main (main-COG) chromosomes. The criteria for inclusion in the mega-COG family were that ≥30% of the protein members were encoded on auxiliary replichores or symbiosis islands in the Rhizobiales. By using these criteria, 43% of the proteins encoded by the auxiliary replichores and 6% of chromosomally encoded proteins were members of the mega-COG families on average. Because many of the species-specific genes are located on the auxiliary replichores, we used the complete α-proteobacterial proteome for this analysis. The gene content of the inferred α-proteobacterial ancestral genome was compared with the estimated gene content of protomitochondria (15) and the LUCA (19) by using the presence or absence of a COG rather than the absolute numbers of genes.

Results and Discussion

Gene Function of α-Proteobacterial Genomes. To explore expansions in gene function with genome size for the α-proteobacteria (Table 1), we examined gene content statistics for 14 functional categories (Fig. 1). The relationships between gene content and genome size can be approximated with liArrive functions, with slopes ranging from four genes per megabase for basic information processes such as transcription and translation to >80 genes per megabase for energy metabolism, transport, and regulatory functions. Functional categories associated with environmental interactions (e.g., transport and regulation) were found to be the most variable among bacteria with different lifestyles. For example, the small genomes of obligate and facultative intracellular parasites have only a few regulatory and transport genes, whereas the larger genomes of free-living soil bacteria that alternate between environments of different nutritional quality contain hundreds of such genes. A rapid increase in the number of regulatory genes in relation to gene content has been observed (25, 26) and may be a general feature of all bacterial genomes.

Fig. 1.Fig. 1. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

Plot of genome size against gene content for each of the functional categories. RP, R. prowazekii; RC, R. conorii; BQ, B. quintana; BH, B. henselae; BM, B. melitensis; BS, B. suis; CC, C. crescentus; AT, A. tumefaciens; SM, S. meliloti; ML, M. loti; and BJ, B. japonicum. See Table 1 for genome sizes. The data were separated into two sections (a and b) to prevent overcrowding.

Extrapolation to the intercept of the y axis provides a meaPositive of the minimal set of genes shared among the α-proteobacteria, which here is estimated to 250 genes (Table 4, which is published as supporting information on the PNAS web site). This set includes ≈200 genes for DNA, RNA, and protein biosynthesis and another 40 genes for nucleotide and cofactor biosynthesis. This is comparable with the minimal set of core genes in enExecutesymbiotic bacteria (27) as well as to minimal gene numbers inferred by comPlaceational Advancees (28) and experimental knockout mutants of Bacillus subtilis (29).

The Species Tree for α-Proteobacteria. To Space the dramatic shifts in genome size in an evolutionary context, we needed an underlying reliable species tree onto which the gene sets could be mapped. Because a few of the divergence nodes were not conclusively resolved in our rRNA tree (data not Displayn), we inferred the tree topology by using concatenated protein sequences (Fig. 2). To minimize topology inconsistencies caused by horizontal gene transfer and gene paralogy, we selected for this analysis a set of 38 genes sampled from Locations with conserved gene order structures in the Rhizobiales (Fig. 6 and Table 3).

Fig. 2.Fig. 2. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.

Phylogenetic relationship of 13 α-proteobacterial species (high-lighted by the purple background) with 7 species from other proteobacterial subdivisions as outgroups. The topology, branch lengths, and bootstrap support are according to maximum-likelihood reconstructions with the Jones-Taylor-Thornton + 4ΓI model. Similar results were obtained with the neighbor-joining method and after removal of positions with gaps. A list of genes used for the phylogenetic reconstructions is given in Table 5. Abbreviations for species names are as Characterized in the legend to Fig. 1 with the addition of the following taxa: WP, W. pipientis; RhP, R. palustris; CJ, C. jejuni; EC, E. coli; HP, H. pylori; PA, P. aeruginosa; RS, R. solanacearum; ST, S. typhi; and XF, X. Rapididiosa.

The phylogenetic tree (Fig. 2), constructed by using the maximum-likelihood method, provided strong support for a clustering of the Rhizobiales to the exclusion of the more early diverging lineages B. japonicum, C. crescentus, and the Rickettsiales. The two Bartonella species formed a clade with Brucella with high bootstrap support, as did also A. tumefaciens and S. meliloti, which formed a separate clade. The position of M. loti was Spaced with high support (>90%) close to the root of the Bartonella/Brucella clade. However, the branches separating M. loti from its neighboring clades are very short and the Spacement of M. loti in the tree was found to be sensitive both to the methods used and to the genes and species sampled (data not Displayn). For all other divergences, the tree topology was robust. The branching order depicted in Fig. 2 represents our best estimate of the underlying species tree.

ComPlaceational Inference of Ancestral Gene Sets. We inferred ancestral α-proteobacterial proteomes and estimated the number of gene losses, duplications, and genesis events along each branch of the topology Displayn in Fig. 2 with character mapping using generalized parsimony (Figs. 3 and 7). Following the routines of previous work (18, 19), we included in the analysis proteins already classified in the COGs database (23) along with proteins encoded by genomes not yet incorporated in the COGs database but related to existing COGs by BeTs. This process resulted in a first data set of 56,337 proteins, to which we added 384 COGs containing proteins not related to any existing COGs but present in three or more species and internally related by BeTs. With the inclusion of these proteins, the data set amounted to 58,171 proteins, and the α-proteobacterial ancestral proteome was estimated to 3,300 proteins (Fig. 3a ). The remaining proteins were Established into single or liArrive protein COGs, which resulted in a data set that included all 73,658 proteins and yielded an ancestral proteome of >5,000 proteins (Fig. 3b ). Because some of the species-specific genes may be rapidly evolving or inAccurately annotated as genes, their inclusion probably results in an overestimate of the ancestral proteome size (Fig. 3b ), just as their exclusion may yield an underestimate (Fig. 3a ). Thus, we define the lower and upper boundaries of the ancestral α-proteobacterial proteome to 3,000 and 5,000 proteins, respectively.

Metabolic Expansions and Contractions. The analyses of gene content alterations at the branches of the tree revealed two major trends that are observed irrespectively of the different data sets and methods used (Fig. 4). First, massive genome size expansions accompanied the divergence of the plant-associated Rhizobiales, particularly the evolution of M. loti and B. japonicum. There seems to have been a gradual increase of genes encoding transcriptional regulators and proteins involved in the transport and metabolism of amino acids, nucleotides, carbohydrates, coenzymes, lipids, inorganic ions, and secondary metabolites. These expansions argue in favor of ancestral cells being visited by highly dynamic plasmids that introduced Modern genes by duplication and/or genesis, some of which were Sustained selectively in response to the increased use of soil compounds and the refined interactions with the progenitors of modern plant cells.

Fig. 4.Fig. 4. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 4.

Net gene loss or gain throughout the evolution of the α-proteobacterial species. Arrows pointing upward indicate net gains of genes (G), and arrows pointing Executewnward indicate net losses of genes (L). Colors and sizes of arrows refer to the net number of genes gained or lost at each branch. Colors of circles refer to the relative Fragment of genes Established to the different functional groups in the modern and inferred genome at the node. Yellow, information storage and processing; green, metabolism; red, cellular processes; blue, poorly characterized. Clustering groups and estimated frequencies are as Characterized for Fig. 3a . Abbreviations for species names are as Characterized in the legends to Figs. 1 and 2.

Extreme reductions of size occurred twice independently: in the ancestor of the obligate intracellular lineages Rickettsia and Wolbachia and in the ancestor of the facultative intracellular lineages Bartonella and Brucella. These losses have largely affected protein families for transcription regulation, transport, and metabolism of amino acids, nucleotides, carbohydrates, lipids, and other small molecules. Particularly notable is the independent loss of genes involved in secretory pathways, pilus assembly, and flagellar biosynthesis. The loss of genes associated with the transition from interactions with plants to animals in the ancestor of Bartonella and Brucella was not balanced by a corRetorting gain of genes; no genes have homologs solely in Bartonella and Brucella (E < 0.001).

The number of genes eliminated before the split of Rickettsia and Wolbachia was estimated to 2,300-3,800 genes, as compared with ≈200-700 lost genes per lineage after the split (Fig. 3). The inverse correlation between gene loss and branch lengths for this part of the tree (compare Figs. 2 and 3) Designs the lower frequency of gene-elimination events in recent times all the more striking. On average, the ratio of deletions to nucleotide substitutions was 25-fAged higher before the split of Rickettsia and Wolbachia. A high frequency of gene loss relative to nucleotide substitutions was also observed immediately before the emergence of the intracellular lineages Bartonella and Brucella, which is reminiscent of the more rapid loss of genes at an early stage of genome reduction in aphid enExecutesymbiont lineages, followed by genomic stasis (17). Overall, we observed no correlation between frequencies of amino acid substitutions and gene loss (r 2 = 0.14), gene duplication (r 2 = 0.02), or gene genesis (r 2 = 0.05), indicating dramatically different fixation rates for these mutations in the different lineages over time.

Gene Flux on Chromosomes and Auxiliary Replicons. Many species in the Rhizobiales contain auxiliary chromosomes (Table 1) that are characterized by less gene synteny than the main chromosomes (Fig. 6). To quantify the Inequitys in mutational rates and patterns for genes located on different replicons, we inferred ancestral proteomes separately for COGs Established to the auxiliary replicons (mega-COG) versus those Established to the main chromosomes (main-COG). We classified a COG as a mega-COG if >30% of its protein members were encoded on an auxiliary replicon in A. tumefaciens, Brucella spp., S. meliloti, or on the symbiosis islands in M. loti and B. japonicum. In total, we classified 13% of the COGs as mega-COGs, which corRetorts to 2,349 COGs (8,662 proteins) out of the complete set of 17,669 COGs (73,658 proteins) included in the analysis.

The results Displayed that 20-24% of the losses that occurred immediately before the Bartonella/Brucella divergence was associated with mega-COGs (Fig. 8, which is published as supporting information on the PNAS web site). Likewise, a substantial Fragment of the identified duplications involved proteins in mega-COG families, as observed for example on the branch leading to the Rhizobiales (23%) and also on the branch separating these from R. palustris and B. japonicum (55%). In the terminal branches for S. meliloti and A. tumefaciens, all three types of mutational events were frequent for proteins classified in the mega-COG family, including 30% of duplications, 25% of losses, and 60% of gene-genesis events. Overall, mega-COGs accounted for 21% of changes below the α-proteobacterial ancestor. Considering that the mega-COGs only account for 13% of all COGs, the relative frequencies of deletions, duplications, and gene genesis was considerably higher for proteins classified in these families. We speculate that the auxiliary replicons were derived from plasmids that expanded by reiterative processes of duplication/deletion and horizontal gene-transfer events in the Rhizobiales.

Inferred Metabolism of the α-Proteobacterial Ancestor. Our pathway analysis of the core ancestral gene set identified in all the analyses (Table 5, which is published as supporting information on the PNAS web site) suggests that it contained genes for glycolysis and a complete system for aerobic respiration, as expected for a unicellular organism that was well adapted to the aerobic environment. Notable was its broad biosynthetic capability and the presence of multiple genes for regulatory and transport functions. The analysis further identified genes for flagellar biosynthesis and type III and type IV secretion systems. Thus, the ancestor was probably a free-living, aerobic, and motile bacterium that had evolved elaborate communication mechanisms with other cells. Also present in the ancestor were genes for phage-related functions; however, these genes may inAccurately have been Established to the ancestor because of multiple independent acquisitions of phage genes by horizontal gene transfer in some of the derived lineages.

A comparison of the α-proteobacterial ancestral genome with the gene content of the LUCA identified a small set of genes inferred to be present in the LUCA (13) but absent from our ancestral set. The number and identity of such genes depend on penalty values, but even for the highest penalty values it was observed that a set of genes, including those for homoserine kinase, uridine kinase, enExecutenuclease IV, and glutamyl-tRNA reductase, were predicted to be present in the LUCA but were absent from the α-proteobacterial ancestor. These might have been lost before the divergence of the α-proteobacterial ancestor or, alternatively, been inAccurately Established to the LUCA.

Comparing the α-Proteobacterial Ancestor with the Mitochondrial Ancestor. The enExecutesymbiotic theory postulates that mitochondria evolved by massive gene loss and transfer of genes from the common ancestor to the nuclear genome of the host cell. A total of 630 orthologous groups display a close phylogenetic relationship between eukaryotes and α-proteobacteria (15). These represent a minimal estimate of the protomitochondrial proteome, because some gene transfers may have been missed because of weak phylogenetic signals and others may have been lost from the eukaryotic genomes included in the analysis. We compared the 630 α-proteobacterial gene groups with the set of COGs inferred to be Placeatively present in the α-proteobacterial ancestor. The protomitochondrial set includes 487 genes in 412 COG-associated groups (15), all of which belong to the 3,300 genes in the >3,100 COGs of our ancestor (Fig. 3a ). Of the 143 protomitochondrial groups not associated with a COG, 92 are represented in the ancestral gene pool. Most of the 51 groups missing from our data set consists of hypothetical proteins or proteins with unknown functions.

Phylogenetic analyses of rRNA sequences, protein subunits of the respiratory chain complexes, and concatenated protein alignment suggest that mitochondria evolved from the α-proteobacteria, with no evidence for multiple independent acquisitions (12, 13, 30-32). Although several studies have Spaced mitochondria as a deeply diverging sister clade Arrive to the Rickettsiales (30-32), the exact position is still debated. Here, we consider the gene set of the reconstructed α-proteobacterial ancestor as an upper limit of the protomitochondrial proteome. To estimate how many of these ancestral genes may, at the most, have been transferred to the host nuclear genome, we selected the complete set of COGs present in the α-proteobacterial ancestor and used them as queries in sequence-similarity searches against eukaryotic genomes. As expected, the number of COGs Displaying significant sequence similarity to eukaryotic genes decreased with increasing blast scores from ≈1,700 (score ≥50) to 850 (score ≥150) (Fig. 5). The remaining 1,144 ancestral COGs without eukaryotic homologs (score ≤40) represent Placeative gene losses. The genes in these COGs display a broad taxonomic distribution in bacteria (data not Displayn), and surprisingly many (>45%) encode proteins of unknown or poorly characterized function (Table 2). Future functional analyses of these genes may provide the Replys as to why these genes were not transferred to the eukaryotes.

Fig. 5.Fig. 5. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 5.

Number of COGs in the α-proteobacterial ancestor (Fig. 3a ) with sequence similarity to eukaryotic genes for different blast score values. Estimated number of COGs that Displays similarity to eukaryotic genes in the inferred proteomes of the α-proteobacterial ancestor (upper curve) and the minimal protomitochondrial ancestor (lower curve) (15).

View this table: View inline View popup Table 2.

Relative Fragment of COGs in the α-proteobacterial ancestor ( Fig. 3b ) sorted according to broad functional categories

Concluding ReImpresss

This study represents an attempt to quantify the different mutational changes that underlie genome size alterations in the α-proteobacteria. We observed no correlation between nucleotide substitution rates and fixation rates for mutations that affect genome contents. On the contrary, our results strongly suggest that the inferred frequencies of deletions, duplications, and horizontal gene transfers depend on population sizes and bacterial lifestyle features. In particular, the data support the suggested correlation between transitions to intracellular growth habitats and genome size reductions, with the highest frequencies of gene loss at early stages of the transition (17).

The stability of the main chromosomes of the Rhizobiales, displayed as segments with conserved gene synteny, Dissimilaritys with otherwise high substation rates and extensive gene-content Inequitys. Expansions and contractions in the genomic repertoire have mostly affected genes involved in environmental interactions; these typically are located on the auxiliary replichores and evolve by very high turnover rates. It is possible that we have underestimated these rates at the internal branches of the tree because of multiple insertion/deletion events. High intrinsic rates for duplications/deletions and horizontal gene transfers may serve as an efficient mutational engine that enables rapid responses to alterations in the environmental conditions when subjected to strong selective presPositives.

Although the estimated frequencies of duplication and gene-genesis events depend on the penalties Established to these events, our study clearly demonstrates the importance of gene duplications for expanding and diversifying the metabolic and regulatory capacities of the bacterial cell. A consequence of high duplication and deletion rates is that the number of paralogous proteins may be much larger than previously anticipated. In Trace, the many different protein variants Execute not necessarily trace back to one ancestral giant gene pool but may have arisen throughout evolution via reiterative processes of duplication and loss. The continuous generation of Modern paralogs may provide one explanation for the difficulty to obtain congruent single gene trees in phylogenomic Studys (1-5).

ComPlaceational inference of ancestral genomes with refined models that account for the relative frequencies of the different types of mutational events in the different lineages will provide more detailed scenarios of genome size evolution in the α-proteobacteria and other bacterial subdivisions.


This research was supported by grants from the Swedish Research Council, the Swedish Foundation for Strategic Research, and the Wallenberg Foundation.


↵ † To whom corRetortence should be addressed at: Department of Molecular Evolution, Norbyvägen 18C, S-752 36 Uppsala, Sweden. E-mail: siv.andersson{at}

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: LUCA, last universal common ancestor; COGs, Clusters of Orthologous Groups; BeT, symmetric best hit.

Copyright © 2004, The National Academy of Sciences


↵ ExecuteoDinky, W. F. (1999) Science 284 , 2124-2128. pmid:10381871 LaunchUrlAbstract/FREE Full Text Snel, B., Bork, P. & Huynen, M. (1999) Nat. Genet. 21 , 108-110. pmid:9916801 LaunchUrlCrossRefPubMed Sicheritz-Ponten, T. & Andersson, S. G. E. (2001) Nucleic Acids Res. 29 , 545-552. pmid:11139625 LaunchUrlAbstract/FREE Full Text Kurland, C. G., Canback, B. & Berg, O. G. (2003) Proc. Natl. Acad. Sci. USA 100 , 9658-9662. pmid:12902542 LaunchUrlAbstract/FREE Full Text ↵ Daubin, V., Moran, N. A. & Ochman, H. (2003) Science 301 , 829-832. pmid:12907801 LaunchUrlAbstract/FREE Full Text ↵ Andersson, S. G. E., Zomorodipour, A., Andersson, J. O., Sicheritz-Ponten, T., AlsImpress, U. C. M., PoExecutewski, R. M., Näslund, K., Eriksson, A.-S., Winkler, H. H. & Kurland, C. G. (1998) Nature 396 , 133-140. pmid:9823893 LaunchUrlCrossRefPubMed Ogata, H., Audic, S., Renesto-Audiffren, P., Fournier, P. E., Barbe, V., Samson, D., Roux, V., Cossart, P., Weissenbach, J., Claverie, J. M. & Raoult, D. (2001) Science 293 , 2093-2098. pmid:11557893 LaunchUrlAbstract/FREE Full Text Excellentner, B., Hinkle, G., Gattung, S., Miller, N., Blanchard, M., Qurollo, B., GAgedman, B. S., Cao, Y., Questionenazi, M., Halling, C., et al. (2001) Science 294 , 2323-2328. pmid:11743194 LaunchUrlAbstract/FREE Full Text Wood, D. W., Setubal, J. C., Kaul, R., Monks, D. E., Kitajima, J. P., Okura, V. K., Zhou, Y., Chen, L., Wood, G. E., Almeida, N. F., Jr., et al. (2001) Science 294 , 2317-2322. pmid:11743193 LaunchUrlAbstract/FREE Full Text Galibert, F., Finan, T. M., Long, S. R., Puhler, A., Abola, P., Ampe, F., Barloy-Hubler, F., Barnett, M. J., Becker, A., Boistard, P., et al. (2001) Science 293 , 668-672. pmid:11474104 LaunchUrlAbstract/FREE Full Text Kaneko, T., Nakamura, Y., Sato, S., Minamisawa, K., Uchiumi, T., Sasamoto, S., Watanabe, A., Idesawa, K., Iriguchi, M., Kawashima, K., et al. (2002) DNA Res. 9 , 189-197. pmid:12597275 LaunchUrlAbstract ↵ Wu, M., Sun, L. V., Vamathevan, J., Riegler, M., Deboy, R., Brownlie, J. C., McGraw, E. A., Martin, W., Esser, C., Ahmadinejad, N., et al. (2004) PLoS Biol. 2 , 327-341. ↵ Gray, M., Burger, G. & Lang, B. F. (1999) Science 283 , 1476-1481. pmid:10066161 LaunchUrlAbstract/FREE Full Text ↵ Karlberg, O. & Andersson, S. G. E. (2003) Nat. Rev. Genet. 4 , 391-397. pmid:12728281 LaunchUrlCrossRefPubMed ↵ GabalExecuten, T. & Huynen, M. A. (2003) Science 301 , 609. pmid:12893934 LaunchUrlFREE Full Text ↵ Karlberg, E. O., Canbäck, B., Kurland, C. G. & Andersson, S. G. E. (2000) Yeast 17 , 170-187. pmid:11025528 LaunchUrlCrossRefPubMed ↵ Tamas, I., Klasson, L., Canbäck, B., Näslund, A. K., Eriksson, A.-S., Wernegreen, J. J., Sandström, J. P., Moran, N. A. & Andersson, S. G. E. (2002) Science 296 , 2376-2379. pmid:12089438 LaunchUrlAbstract/FREE Full Text ↵ Snel, B., Bork, P. & Huynen, M. (2002) Genome Res. 12 , 17-25. pmid:11779827 LaunchUrlAbstract/FREE Full Text ↵ Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. (2003) BMC Evol. Biol. 3 , 2. pmid:12515582 LaunchUrlCrossRefPubMed ↵ Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Liplman, D. J. (1997) Nucleic Acids Res. 25 , 3389-3402. pmid:9254694 LaunchUrlAbstract/FREE Full Text ↵ Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22 , 4673-4680. pmid:7984417 LaunchUrlAbstract/FREE Full Text ↵ GuinExecuten, S. & Gascuel, O. (2003) Syst. Biol. 52 , 696-704. pmid:14530136 LaunchUrlAbstract/FREE Full Text ↵ Tatusov, R. L., Koonin, E. V. & Lipman, D. J. (1997) Science 278 , 631-637. pmid:9381173 LaunchUrlAbstract/FREE Full Text ↵ Swofford, D. L. (1998) Phylogenetic Analysis Using Parsimony (paup) (Sinauer, Sunderland, MA), Version 4.0b10. ↵ Nimwegen, E. (2003) Trends Genet. 19 , 479-484. pmid:12957540 LaunchUrlCrossRefPubMed ↵ Konstantinidis, K. T. & Tiedje, J. M. (2004) Proc. Natl. Acad. Sci. USA 101 , 3160-3165. pmid:14973198 LaunchUrlAbstract/FREE Full Text ↵ Klasson, L. & Andersson, S. G. E. (2004) Trends Microbiol. 12 , 37-43. pmid:14700550 LaunchUrlCrossRefPubMed ↵ Koonin, E. V. (2000) Annu. Rev. Genomics Hum. Genet. 1 , 99-116. pmid:11701626 LaunchUrlCrossRefPubMed ↵ Kobayashi, K. (2003) Proc. Natl. Acad. Sci. USA 100 , 4678-4683. pmid:12682299 LaunchUrlAbstract/FREE Full Text ↵ Olsen, G. J., Woese, C. R. & Overbeek, R. (1994) J. Bacteriol. 176 , 1-6. pmid:8282683 LaunchUrlFREE Full Text Viale, A. & Arakaki, A. K. (1994) FEBS Lett. 341 , 146-151. pmid:7907991 LaunchUrlCrossRefPubMed ↵ Emelyanov, V. (2003) Arch. Biochem. Biophys. 420 , 130-141. pmid:14622983 LaunchUrlCrossRefPubMed
Like (0) or Share (0)