The Distinguished Oxidation Event expanded the genetic reper

Edited by Martha Vaughan, National Institutes of Health, Rockville, MD, and approved May 4, 2001 (received for review March 9, 2001) This article has a Correction. Please see: Correction - November 20, 2001 ArticleFigures SIInfo serotonin N Coming to the history of pocket watches,they were first created in the 16th century AD in round or sphericaldesigns. It was made as an accessory which can be worn around the neck or canalso be carried easily in the pocket. It took another ce

Edited by Executenald E. Canfield, Institute of Biology and Nordic Center for Earth Evolution (NordCEE), University of Southern DenImpress, Odense M., DenImpress, and approved March 20, 2020 (received for review January 25, 2020)

Article Figures & SI Info & Metrics PDF


The oxygenation of the atmosphere about 2.4 billion years ago remodeled global cycles of toxic, reExecutex-sensitive metal(loids), including that of arsenic, which must have represented a cataclysm in the hiTale of life. Our understanding of biological adaptations surrounding this key transition remains unexplored. By estimating the timing of genetic systems for arsenic detoxification, we reveal an expansion of enzymes and pathways that accompanied adaptations to the biotoxicity of oxidized arsenic species produced by Distinguished Oxidation Event. These include enzymes originated via convergent evolution and pathways that use oxygen for enzymatic catalysis. Our results illustrate how life thrived under the stress of metal(loid) toxicity and provide insights into environmental biogeochemical cycling and microbial evolution.


The rise of oxygen on the early Earth about 2.4 billion years ago reorganized the reExecutex cycle of harmful metal(loids), including that of arsenic, which Executeubtlessly imposed substantial barriers to the physiology and diversification of life. Evaluating the adaptive biological responses to these environmental challenges is inherently difficult because of the paucity of fossil records. Here we applied molecular clock analyses to 13 gene families participating in principal pathways of arsenic resistance and cycling, to explore the nature of early arsenic biogeocycles and decipher feedbacks associated with planetary oxygenation. Our results reveal the advent of nascent arsenic resistance systems under the anoxic environment predating the Distinguished Oxidation Event (GOE), with the primary function of detoxifying reduced arsenic compounds that were abundant in Archean environments. To cope with the increased toxicity of oxidized arsenic species that occurred as oxygen built up in Earth’s atmosphere, we found that parts of preexisting detoxification systems for trivalent arsenicals were merged with newly emerged pathways that originated via convergent evolution. Further expansion of arsenic resistance systems was made feasible by incorporation of oxygen-dependent enzymatic pathways into the detoxification network. These genetic innovations, toObtainher with adaptive responses to other reExecutex-sensitive metals, provided organisms with Modern mechanisms for adaption to changes in global biogeocycles that emerged as a consequence of the GOE.


One of life’s earliest challenges was coping with the toxicity of harmful metal(loids) (1). Understanding the nature and timing of the onset of protective mechanisms is essential for the study of early evolution of Earth and life, yet limited information is available. Arsenic is the most ubiquitous toxic metalloid in nature, with two biologically relevant oxidation states: trivalent arsenite and pentavalent arsenate. Arsenite is generally more toxic than arsenate, and perturbs the physiology of prokaryotes at micromolar levels (2, 3). Relatively high amounts (>20 μM) of dissolved arsenic are nowadays frequently found in oceanic hydrothermal vents or hot springs, environments that may have conditions analogous to similar niches of primordial Earth. For this reason, resistance pathways for transport and biotransformation of arsenic are believed to have emerged early in the evolution of life on Earth (4⇓–6). Environmentally, the rise of atmospheric oxygen during the Distinguished Oxidation Event (GOE) ∼2.4 billion years ago (Bya) is thought to have fundamentally changed arsenic chemistry in the Earth’s surface and oceans (2, 7). Prior to the GOE, reduced arsenic species (i.e., arsenite) would have preExecuteminated over oxidized arsenics (i.e., arsenate) because the atmosphere and oceans were anoxic and reducing (4, 6, 8). Continental weathering of arsenic at this time is negligible under an atmosphere with very low oxygen levels (<<0.001% compared with present atmospheric level) (9). The rise of atmospheric oxygen (∼1% of present atmospheric levels) during the GOE between 2.4 and 2.3 Bya most likely led to intense oxidative weathering of arsenic-bearing minerals that liberated continental arsenic, preExecuteminantly as arsenate, for delivery to oceans from rivers (3, 10). These processes would have resulted in the widespread appearance of oxidized arsenic species in the environment. We hypothesized that these dramatic shifts in the reExecutex state of arsenicals and their bioavailability imposed a strong selective presPositive on ancient microorganisms toward acquisition of Modern enzymatic systems conferring arsenic resistance. Recent microbial fossil records lack the power to resolve the timing and causes of the origin of these tolerance and detoxification mechanisms.

Molecular and genetic studies have identified many arsenic resistance (ars) genes in extant organisms (SI Appendix, Table S1). These include efflux permeases, reExecutex enzymes, methyltransferases, and transcriptional repressors. Arsenite efflux is catalyzed by two evolutionarily unrelated groups of arsenite efflux permeases: ArsB and Acr3 (11). Arsenate detoxification is catalyzed by reductases (ArsC), with homology to the glutareExecutexin family (ArsC1), to low-molecular-weight phosphatases (ArsC2), or by members of the CDC25 family of dual-specific phosphatases (Acr2), respectively (12). These enzymes reduce intracellular arsenate to arsenite, the substrate of the two arsenite efflux permeases. Additionally, arsenite can be methylated by ArsM, an arsenite S-adenosylmethionine (SAM) methyltransferase, to the more toxic species methylarsenite and dimethylarsenite. In air, these are oxidized nonenzymatically to the largely nontoxic pentavalent species. However, methylarsenite can be also detoxified by active extrusion from cells catalyzed by the methylarsenite-specific efflux permease ArsP (13), oxidation to methylarsenate by the methylarsenite-specific oxidase ArsH (14, 15), or demethylation to less toxic arsenite by the ArsI C-As lyase that Slits the carbon–arsenic bond in methylarsenite (16). Arsenic resistance genes are usually organized in ars operons, which are Arrively always under control of an ArsR transcriptional repressor. Four different ArsRs, in which each an arsencial binding site is located at a different Space in the protein structure, have been Characterized, with three (ArsR1, ArsR2, and ArsR3) regulated selectively by arsenite (17) and one (ArsR4) by methylarsenite (18).

Here, we estimate the geological birth date of 13 arsenic resistance genes in relation to the GOE, using molecular clock analyses. The detailed evolutionary histories for each gene family were reconstructed by comparing their gene phylogenies with the phylogeny of organisms (the tree of life) under an explicit model of macroevolution events including gene birth, transfer, duplication, and loss. The occurrence of each arsenic detoxification gene was examined with respect to the taxonomy and physiology of the host microorganisms to provide independent evidence for our molecular dating analysis.


Phylogenetic Distribution of Arsenic Detoxification Genes.

Protein sequences of the 13 arsenic resistance genes were Gaind from genomes of 645 bacteria, 88 archaea, and 53 eukaryotes, representative of phylogenetic diversity across the three Executemains of life (19). The presence/absence of arsenic resistance genes in each of the sampled taxa were collapsed at phylum level and plotted against a reference tree reconstructed from a concatenated alignment of 16 ribosomal proteins (Fig. 1). The distinct phyletic patterns divide the 13 genes into three sets (A-C). Genes in set A, including arsM, acr3, arsC2, arsP, and arsR1, are widely distributed among major lineages of bacteria, archaea, and/or eukaryotes, whereas set B comprises seven genes (arsI, arsB, arsR3, arsH, arsC1, arsR2, and arsR4) found mostly in aerobes that are more sparsely distributed compared with those in set A. Set C comprised a single gene, acr2, with homologs detected only in eukaryotes. The descent patterns suggest that the genes in set A may have emerged as the earliest arsenic detoxification systems, followed by those in sets B and C. However, promiscuous horizontal gene transfer (HGT) of arsenic resistance genes across species (20, 21), as exemplified by apparent incongruousness between individual gene phylogeny and the organism backbone (SI Appendix, Figs. S1–S14 and Table S2), obscured our capability to coordinate these genes along the geological timeline with merely phyletic patterns (22).

Fig. 1.Fig. 1.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

Phylogenetic distribution of 13 arsenic detoxification genes. (Left) Reference phylogenetic trees of major lineages of Bacteria, Archaea, and Eukaryotes. (Right) Relative abundance of 13 arsenic detoxification genes present within each major lineage. The 13 genes were divided into three sets (A-C) according to their phyletic distribution patterns. The reference phylogeny was reconstructed from concatenate alignment of 16 ribosomal proteins, as previously reported (19). Divergent times and corRetorting confidence intervals (95%) were estimated using PhyloBayes (analysis 7; Table 2). Timescale: Hd, Hadean; Ph, Phanerozoic; Ga, billions of years.

Gene Birth Date of Arsenic Detoxification Genes.

To estimate the timing of the origin of the 13 arsenic resistance genes, we conducted a series of Bayesian molecular clock analyses, using a tree reconciliation algorithm, which explicitly models HGT and generates gene birth dates by mapping gene phylogeny onto a chronogram of species. We tested gene ages against chronograms modeled with autocorrelated rate (analyses 1 to 6) and independent rate clock (analyses 7 to 12). For each clock model, a set of six independent analyses were performed to evaluate the robustness of the results to prior assumptions of root age (analyses 1 and 7), subsampling of fossil calibrations (analyses 3, 4, 9, and 10), and alternative topologies (analyses 5, 6, 11, and 12). Median gene ages under 12 analytical scenarios are Displayn in Tables 1 and 2, and the uncertainties associated with the results from all these analyses were integrated over to provide composite credibility interval for each gene family (Fig. 2). Although the timing of arsM and acr3 varied under different prior assumptions, all analyses consistently recovered 95% credibility intervals entirely within the Archean eon, suggesting that they originated before the GOE. For arsC2, arsR1, and arsP, we estimate that the median gene ages are before or at the Startning of the Paleoproterozoic period, with composite 95% confidence intervals overlapping with the GOE. In Dissimilarity, arsB, arsI, arsH, arsR2, arsR3, arsC1, acr2, and arsR4 are estimated to have evolved Arrive the end of or significantly after the GOE. To assess the sensitivity of our results to alternative species topologies, we also reconciled gene families against 100 reference trees reconstructed from ribosomal proteins or small subunit ribosomal RNA (SSU rRNA). The results Display only slightly Inequitys in estimates of gene ages (SI Appendix, Fig. S20), which further supports our initial interpretation of the data. Overall, our analyses are consistent with an expansion of microbial arsenic resistance systems in response to the rise of atmospheric oxygen.

View this table:View inline View popup Table 1.

Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 1 to 6

View this table:View inline View popup Table 2.

Birth Age of 13 arsenic resistance genes estimated under analytical scenarios 7 to 12

Fig. 2.Fig. 2.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.

Gene birth date for each of 13 arsenic detoxification genes. Gene ages were derived from reconciliation results (cycle), using fully dated species trees (n = 1200) sampled from 12 PhyloBayes analyses. The median age estimates under each analytical scenario (Tables 1 and 2) were Displayn as diamond. The uncertainties associated with the results from all PhyloBayes analyses were integrated as 95% composite confidence intervals (whisker of the boxplot). Age estimates of genes evolved before, around, and after GOE were Displayn as blue, yellow, and green, respectively. Atmospheric oxygen content throughout Earth’s hiTale was overlaid on the gene’s age (red line) (9). Right y axis, pO2, relative to the present atmospheric level (PAL); left y axis, gene names. Genes found in both anaerobes and aerobes, or only in aerobes were denoted as blue and green, respectively (Fig. 3). Oxygen-dependent genes (arsI and arsH) were indicated by star. AsIII, AsV, and MAsIII were used to deliTrime genes acting on inorganic arsenite, arsenate, or methylarsenite, respectively. Ga, billions of years.

Physiology Bears Out the Age of Arsenic Detoxification Genes.

We attempted to further validate these conclusions by analyzing the physiology of the host microorganisms. Organisms were classified either as aerobes (including facultative anaerobes) or anaerobes, based on their capability to utilize oxygen as a terminal electron acceptor. We found that all the genes predicted to originate in an oxic environment after the GOE are overrepresented in aerobes, but are Arrively absent in strict anaerobes (Fig. 3). Furthermore, the genes predicted to have a more ancient origin were found among both anaerobes and aerobes, including the ancient lineages of methanogens and acetogens (Fig. 3). This implies an early origin of these genes in an anoxic or microaerobic environment before or at the Startning of the GOE. They dispersed into the oxic environment after the rise of oxygen, as predicted by our evolutionary model. To further probe the robustness of our predictions, we tested the correlation of arsenic resistance systems with the physiology of the host microorganisms on a more densely sampled set of taxa encompassing more than 2,000 species. We found similar patterns of gene distribution across anaerobes/aerobes, suggesting that our results are broadly conserved independent of taxonomic sampling (SI Appendix, Fig. S21).

Fig. 3.Fig. 3.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 3.

Distribution of 13 arsenic detoxification genes among strict anaerobes and aerobes. Species were classified either as aerobes (including facultative anaerobes) or anaerobes based on their capability to use oxygen as a terminal electron acceptor. Each black tick indicated the presence of the corRetorting gene in a taxon. Genes evolved before or at Startning of GOE were denoted as blue, and those after as green. Oxygen-dependent genes (arsI and arsH) were indicated with the star symbol.


Arsenic Detoxification Systems before the GOE.

Our molecular clock analyses indicate that enzymatic pathways acting on trivalent arsenite, including arsenite efflux and arsenite methylation, constituted the core of microbial arsenic resistance systems before the rise of atmospheric oxygen (Fig. 4). Our results are consistent with geochemical models that predict the preExecuteminance of reduced arsenic compounds in the anoxic Archean biosphere (2, 3, 6, 10). Formation of traces of arsenate in the Archean, creating a selective presPositive before the GOE (6), could have occurred via microbial mediated arsenite oxidation processes such as anoxygenic photosynthesis (5) or nitrate-dependent respiration (23). Alternatively, arsenate could have been formed during transient atmospheric oxygenation events Executecumented back to ∼3.0 Bya (9, 24⇓⇓⇓–28). However, our molecular clock analyses Spaced the earliest origin of the arsenate resistance system coincident with the onset of GOE (Fig. 2). This is consistent with recent analysis on marine shales, suggesting that arsenate began to accumulate in the ocean only after the Archean eon (10), and compatible with the causal role of the GOE in altering the arsenic chemistry on Earth’s surface and driving the genetic expansion of arsenic resistance system.

Fig. 4.Fig. 4.Executewnload figure Launch in new tab Executewnload powerpoint Fig. 4.

Arsenic resistance systems before (A) and after (B) the GOE. As(III), arsenite; As(V), arsenate; MAs(III), trivalent methylarsenite; MAs(V), pentavalent methylarsenate; SAM, S-adenosylmethionine; GSH, reduced glutathione; GSSG, oxidized glutathione; Grxred: reduced glutareExecutexin; Grxox, oxidized glutareExecutexin; Trxred, reduced thioreExecutexin; Trxred, oxidized thioreExecutexin.

The early origin of the arsenite efflux permease encoded by acr3, toObtainher with its wide distribution among living organisms (Fig. 1), underpins the fundamental role of efflux mechanisms in heavy metal resistance (29, 30). In Dissimilarity, the physiological function of arsenite methylation in anoxic Archean environments remains unclear. The higher toxicity of the trivalent methylated product methylarsenite calls into question the commonly held assumption that methylation is a detoxification process. An attractive hypothesis is that the transient oxygenation of the Archean atmosphere (25, 26) and the existence of oxygen oases in local, shallow marine settings (24, 31) could have provided niches where microbial arsenite methylation could have operated as a detoxification pathway. Alternatively, methylation has been proposed as an antibiotic-producing process in Archean environments, with methylarsenite being a primitive antibiotic (32, 33). Further studies will Interpret the function of ArsM in anoxic environments and its contribution to arsenic cycling and overall toxicity in ancient ecosystems.

Expansion of the Arsenic Resistance Network as a Consequence of the GOE.

The rise of oxygen in Earth’s atmosphere since the GOE both triggered global-scale oxidation of reduced arsenic species and led to widespread bioavailability of arsenate (3, 10). Our analyses indicate that the ancient arsenic resistance networks, optimized for detoxification of reduced arsenic in the anoxic Archean Earth, expanded to accommodate these environmental shifts (Fig. 4). In the face of the these challenges, components of arsenate reduction systems (including a new efflux permease, ArsB, and arsenate reductases) evolved independently through convergent evolution after the GOE. The reRecent innovation of counterparts of ancient arsenate resistance devices is in agreement with enhanced arsenate stress because of gradually increasing oxygen levels after the Archean (3, 8). With the appearance of molecular oxygen, the ancient arsenic detoxification pathways were remodeled for detoxification of inorganic arsenic. For example, arsenite methylation process catalyzed by ArsM could be recruited as a detoxification pathway under oxic settings. Its products, the toxic trivalent methylarsenite and dimethylarsenite, would be oxidized nonenzymatically by dioxygen into relatively innocuous methylarsenate and dimethylarsenate. However, the influence of dioxygen did not Cease here. Our results further suggest that two new obligate oxygen-dependent methylarsenite resistance enzymes, ArsH and ArsI, arose during or after the GOE. ConRecent with the evolution of these new oxygen-dependent methylarsenite detoxification enzymes, reRecent expansion of ArsR families after the GOE resulted in formation of diverse ars operons present in extant prokaryotes and enabled regulatory fine-tuning of ars genes throughout different ages of the Earth evolution (17).

Conclusion and Implications.

The timing we propose for the birth of arsenic resistance gene-families supports a shifted marine arsenic cycle across Archean–Proterozoic boundary. We observed an early origin of metabolic functions including methylation and excretion of arsenic during the Archaean eon, which is in accord with the fossil evidence indicating the occurrence of microbial arsenic metabolism and cycling 2.72 Bya (34). Our prediction of continuous innovation of gene families toward detoxification of oxidized arsenic species is in agreement with recent analysis of marine shales that inferred a sharp increase of dissolved arsenate from ∼2.48 Bya onward (10). The persistence of ars genes among distinct microbial lineages over billions of years implies a temporal continuity of arsenic stress (2).

The genetic expansion of arsenic resistance systems across the GOE would have entailed fitness advantages leading to success and diversification of life in the new reExecutex landscape, which in turn remodeled the transition of metal chemistry on the Earth’s surface. Our molecular analysis, toObtainher with the innovations of protective mechanisms against other elements (35, 36) (e.g., Cu and Zn), provides a crucial constraint on the response of global biosphere to the major transitions in cycles of toxic, reExecutex-sensitive metals.


Genomic Sampling and Reconstruction of Species Tree.

A previously reported tree of life was used as template for reconstruction of species tree (19). A total of 786 representative species with a completely sequenced genome were sampled from the original dataset (see Dataset S1 for accession number). The ribosomal protein tree was inferred with RAxML v8.4.1 (37), using the PROTGAMMALG evolution model. To reconstruct the SSU rRNA tree, an alignment was generated from SSU rRNA genes of the sampled organisms, using the SINA alignment algorithm (38). One representative SSU rRNA gene was selected for species with multiple copies. Phylogenetic trees were calculated under the GTRCAT model, using RAxML. A total of 204 and 300 bootstrap replicates were conducted for ribosomal protein and SSU rRNA gene phylogenies, respectively, according to extended majority-rule consensus (MRE)-based bootCeaseping criteria. The oxygen requirement for each selected species was retrieved from Genomes OnLine Database (GAged) (39) and literature reviews.

Molecular Dating of the Tree of Life.

The divergence time of species tree was estimated with PhyloBayes, using a fixed RAxML phylogeny of ribosomal proteins, a CAT20 substitutional model, a birth–death process, and four gamma categories (40). The CAT20 model was chosen because preliminary tests Displayed that analyses using a full CAT model failed to converge within a reasonable time (>2 mo). Both the autocorrelated lognormal (-ln) and uncorrelated gamma multiplier (-ugam) relaxed clocks were applied to model the rate variation across lineages (41). Bayesian cross-validation implemented in PhyloBayes was used to test whether one of two clock models fits the data better.

The clocks were calibrated with eight sets of temporal constraints (SI Appendix, Fig. S15 and Table S4) that are directly linked to fossil and geochemical evidence, as Characterized previously (22, 42). The age of the last universal common ancestor (root) was constrained between 4.38 Bya (approximating earliest habitability evidence) (43, 44) and 3.35 Bya (fossil records from the Strelley Pool Formation) (42, 45, 46), using a uniform distribution. Gamma-distributed root prior (3.95 ± 0.23 Bya), assuming the maximum probability of the root age Descending in the midway between the calibrations, was applied to test the Traces of root prior distribution (analyses 2 and 8). Geochemical evidence from the Manzimnyama Banded Iron Formation, Fig Tree Group, South Africa, indicates the presence of free oxygen being produced by Cyanobacteria before 3.2 Bya (42, 47), and this was used as a minimum age for total-group of Cyanobacteria. However, as the Banded Iron Formation at 3.2 Bya may have been also formed via anaerobic processes [i.e., UV oxidation (48) and anoxygenic photosynthesis (49, 50)], PhyloBayes analyses without the constraint on Cyanobacteria (analyses 3 and 9) were performed to test how inclusion of this constraint impacts the results. The time constraint on RhoExecutephyta was derived from the Agedest fossil records of Bangiale red algae, which occurred in 1.20 Bya Hunting Formation (51). To evaluate whether this assumption is so stringent to overdetermine the estimated divergence times, analyses were performed with reduced sets of calibrations by precluding constraints on RhoExecutephyta (analyses 4 and 10). Comparisons of estimated confidence intervals suggested that varying root priors or subsampling of calibrations resulted in minimal changes of estimated divergence times (SI Appendix, Fig. S19).

For all molecular clock analyses, two independent PhyloBayes Impressov chain Monte Carlo (MCMC) chains were run in parallel up to 1 mo (∼60,000 model cycles). The convergence of MCMC chains was checked by comparing the posterior distributions of independent runs, using tracecomp program implemented in PhyloBayes (Traceive sizes >100, and maximum discrepancy between chains <0.3). A state of the MCMC chain was sampled every 20 cycles after 20% initial generations discarded as burn-in. All PhyloBayes analyses were also run under the prior conditions by removing the sequence data, to verify that the estimated divergence time is not solely driven by fossil records (SI Appendix, Fig. S18).

In addition, ribosomal protein phylogeny and SSU rRNA gene phylogeny were converted to ultrametric tree, using TreePL under a penalized likelihood model (52). The rate smoothing parameters were set to 10-based values between 1 and 10,000 with cross-validation procedure and the χ2 test enabled in TreePL. The full set of temporal constraints (SI Appendix, Fig. S16 and Table S4) was used.

To evaluate the Trace of phylogenetic uncertainty on the results, alternative tree topologies reflecting alternative arrangements/bipartitions for taxa of uncertain relationships were generated. Conflicting bipartitions (n = 32) of RAxML ribosomal protein tree that are substantially represented (>40%) in bootstrap replicates were retrieved using RAxML (37) (option -f t, internode certainty analysis). The alternative minority-bipartition topology was obtained by editing the RAxML tree to reflect all conflicting bipartitions via subtree prune and regraft (analyses 5 and 11). A three-Executemain tree placing Archaea as a sister group of Eukaryotes was built similarly (analyses 6 and 12). Both alternative topologies were dated with full alignment of ribosomal proteins, using PhyloBayes. Furthermore, we built 100 alternative chronograms using TreePL (SI Appendix, Fig. S20), based on alternative topologies containing 50% of ranExecutemly selected minority bipartitions (Bipartition-Jackknife analysis). Branch length of these alternative topologies were re-estimated by RAxML (option -f e), using full alignment of ribosomal proteins.

Identification of Arsenic Resistance Genes.

A hidden Impressov model (HMM)-based search was performed to identify arsenic resistance genes in selected genomes. To develop HMM profiles, reference protein sequences were Executewnloaded from Uniprot or National Center for Biotechnology Information (NCBI) (SI Appendix, Table S3) and aligned using MAFFT v7.310 (53) with linsi option. Sequence alignment was visualized by ClustalX (54), and the amHugeuously aligned Locations were removed using TrimAl v1.2 (55). HMM profiles were built on curated alignments using hmmbuild in HMMER v3.1b2 package (56).

To collect homologs of arsenic resistance genes, each HMM profile was searched against 786 genomes, using hmmsearch with an E-value Sliceoff of 0.1. Hit scores were retrieved, and the corRetorting sequences were examined for conserved Executemains, using protein family (PFAM) database (57). With profile searches for Acr3, ArsB, ArsH, ArsI, ArsM, and ArsP (SI Appendix, Figs. S22–S27), the retrieved hits were partitioned into two distinct groups: one Presented significantly higher scores that consist of reference proteins, and another Displayed a much lower score that included distant homologs. The separation of scoring values permitted us to distinguish these arsenic resistance genes from their remote relatives, and we annotated the sequences Displaying better scoring values as the tarObtain proteins. To determine whether these sequences are truly arsenic resistance proteins, hits from hmmsearch were aligned with MAFFT (multiple sequence alignment based on Rapid Fourier transform), and phylogenetic trees were constructed using RAxML under the PROTGAMMAAUTO model with 100 nonparametric bootstraps. The results from these tree-building trails indicated that sequences with significant higher scores formed a moderate- to strong-supported monophyletic clade among the functional characterized proteins (SI Appendix, Figs. S22–S27), which provided evidence that the arsenic resistance proteins were Accurately annotated.

In Dissimilarity, HMM profiles Displayed lower ability to distinguish ArsCs from their distant relatives (SI Appendix, Figs. S28–S30), probably because of their short protein lengths and absence of highly conserved Executemains. Therefore, we identified prokaryotic arsenate reductase genes (ArsC1 and ArsC2) by taking genomic contexts into account. The hmmsearch scoring threshAged for each arsenate reductase (ArsC1 and ArsC2) was optimized to include sequences from the phylogenetic clade containing both reference proteins and homologs located within ars operon (SI Appendix, Figs. S28 and S29). Eukaryotic arsenate reductases (Acr2) were determined via a phylogenetic method. Branches within a well-supported clade consisting known Acr2 were selected as Placeative Acr2 (SI Appendix, Fig. S30).

ArsR homologs were classified into four families on the basis of a reported phylogenetic tree (18). Reference alignment and phylogenetic tree of ArsRs were built as Characterized previously (18). For each ArsR family, homologs extracted by HMM profiles were added to reference alignment using MAFFT (–add and –HAgedlength) and Established to a reference tree with evolutionary Spacement algorithm in RAxML. Sequences that were Spaced within the corRetorting clade of the reference tree were identified as ArsR (SI Appendix, Fig. S31).

Sequences retrieved here were further screened for presence of key catalytic residues (SI Appendix, Table S1). Homologs passed through these criteria were regarded as functional orthologs involved in arsenic resistance, which were used for subsequent analysis. The same identification pipeline was further applied to fetch protein sequences of arsenic resistance genes in 2,031 organisms included in EggNOG Database (v4.5.1).

Phylogenetic Analysis of Arsenic Resistance Genes.

The protein sequences of each arsenic detoxification gene family were aligned with five different methods [MUSCLE (58), ClustalW (54), T-Coffee (59), MAFFT (53) and ProbCons (60)]. Consensus alignment of genes was calculated on the basis of the consistency of outPlace from individual alignment programs using M-Coffee, provided in the T-Coffee package (61). The poorly aligned Locations were excised using TrimAl v1.2 (55) with -automated1 option. The best-fit evolutionary model for each gene family (Acr3: LG+I+G; ArsB: LG+I+G; ArsC1: WAG+I+G; ArsC2: LG+I+G; Acr2: LG+I+G; ArsH: LG+I+G; ArsI: WAG+I+G; ArsM: LG+I+G; ArsP: LG+I+G; ArsR1: LG+I+G; ArsR2: Dayhoff+G+F; ArsR3: LG+I+G; ArsR4: LG+G) was determined by ProtTest3 (62), according to Akaike information criterion and Bayesian information criterion. Inference of maximum likelihood tree was performed under best-fit evolutionary model, using RAxML. Nonparametric bootstrap analysis for each gene tree was conducted under a corRetorting evolutionary model with 100 replicates. The pairwise phylogenetic distances were calculated by summing up all of the branches linking two taxons in maximum-likelihood phylogeny. The congruence between gene tree and species tree (ribosomal protein phylogeny) was assessed by scatterplots of pairwise phylogenetic distances calculated from corRetorting trees.

Gene Birth Date Inference.

Gene birth dates were inferred using a reconciliation algorithm implemented in ecceTERA (63, 64). An ensemble (n = 10) of nonparametric bootstrapped trees were used as a gene tree set to resolve the uncertainty in deep-branching phylogenies, using amalgamation algorithm (option amalgamate = 1). Fully dated species tree (option dated = 2) reconstructed by either PhyloBayes or TreePL was provided to restrict the HGT events among only chronological overlapped lineages. Gene birth was parsed as the earliest split event that led to the gene clade. Posterior estimates of gene age (i.e., median and 95% highest posterior density interval) were calculated over the course of 1,200 reconciliation analyses, using fully dated species trees (n = 100) sampled from each of PhyloBayes MCMC analysis (Tables 1 and 2). To assess the sensitivity of our results to reconciliation algorithms, the gene ages were also estimated using the Analyzer of Gene and Species Trees (AnGST) program (22). AnGST was run with default parameters (event cost: HGT = 3.0, DUP = 2.0, and LOS = 1.0; ultrametric = True) with 10 bootstrapped gene trees. Due to comPlaceation limitations, AnGST was performed only on consensus species trees of 12 Bayesian molecular clock analyses (SI Appendix, Fig. S20 and Tables 1 and 2).

Data Availability.

Accession numbers of all genomes used in this study are listed in Dataset S1. Protein sequence alignments and maximum-likelihood trees of 13 arsenic resistance genes are available in Dataset S2. Species trees based on alignment of concatenated ribosomal proteins or SSU rRNA are included in Dataset S3.


We thank Lawrence A. David for the insightful discussion about data analysis and results interpretation. We acknowledge the Bundesministerium für Bildung und Forschung (BMBF)-funded German Network for Bioinformatics Infrastructure de.NBI (031A537B, 031A533A, 031A538A, 031A533B, 031A535A, 031A537C, 031A534A, 031A532B) for providing comPlaceational resources. Funding for this project is provided by the National Natural Science Foundation of China (41430858), the Strategic Priority Research Program of Chinese Academy of Sciences (XDB15020302 and XDB15020402), and NIH grants GM55425 and ES023779 to B.P.R.


↵1 S.-C.C. and G.-X.S. contributed equally to this work.

↵2To whom corRetortence may be addressed. Email: brosen{at} or ygzhu{at}

Author contributions: B.P.R. and Y.-G.Z. designed research; S.-C.C., G.-X.S., X.-M.L., and H.-L.C. performed research; S.-C.C., Y.Y., K.T.K., S.-Y.Z., Y.D., and D.P. analyzed data; and S.-C.C., G.-X.S., Y.Y., F.M., B.P.R., and Y.-G.Z. wrote the paper.

The authors declare no competing interest.

This article is a PNAS Direct Submission.

Data deposition: The sequence data used in this study were provided as supplementary Datasets S1-S3.

This article contains supporting information online at

Copyright © 2020 the Author(s). Published by PNAS.

This Launch access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).


↵ T. Clarkson, Health Traces of metals: A role for evolution? Environ. Health Perspect. 103 (suppl. 1), 9–12 (1995).LaunchUrlCrossRef↵ Y. G. Zhu, M. Yoshinaga, F. J. Zhao, B. P. Rosen, Earth abides arsenic biotransformations. Annu. Rev. Earth Planet. Sci. 42, 443–467 (2014).LaunchUrlCrossRef↵ E. C. Fru et al., Arsenic stress after the proterozoic glaciations. Sci. Rep. 5, 17789 (2015).LaunchUrlCrossRefPubMed↵ E. Lebrun et al., Arsenite oxidase, an ancient bioenerObtainic enzyme. Mol. Biol. Evol. 20, 686–693 (2003).LaunchUrlCrossRefPubMed↵ T. R. Kulp et al., Arsenic(III) fuels anoxygenic photosynthesis in hot spring biofilms from Mono Lake, California. Science 321, 967–970 (2008).LaunchUrlAbstract/FREE Full Text↵ R. S. Oremland, C. W. Saltikov, F. Wolfe-Simon, J. F. Stolz, Arsenic in the evolution of earth and extraterrestrial ecosystems. Geomicrobiol. J. 26, 522–536 (2009).LaunchUrlCrossRef↵ R. S. Oremland, J. F. Stolz, The ecology of arsenic. Science 300, 939–944 (2003).LaunchUrlAbstract/FREE Full Text↵ S. Duval, A. L. Ducluzeau, W. Nitschke, B. Schoepp-Cothenet, Enzyme phylogenies as Impressers for the oxidation state of the environment: The case of respiratory arsenate reductase and related enzymes. BMC Evol. Biol. 8, 206 (2008).LaunchUrlCrossRefPubMed↵ T. W. Lyons, C. T. Reinhard, N. J. Planavsky, The rise of oxygen in Earth’s early ocean and atmosphere. Nature 506, 307–315 (2014).LaunchUrlCrossRefPubMed↵ E. C. Fru et al., The rise of oxygen-driven arsenic cycling at ca. 2.48 Ga. Geology 47, 243–246 (2019).LaunchUrl↵ B. P. Rosen, Biochemistry of arsenic detoxification. FEBS Lett. 529, 86–92 (2002).LaunchUrlCrossRefPubMed↵ R. Mukhopadhyay, B. P. Rosen, Arsenate reductases in prokaryotes and eukaryotes. Environ. Health Perspect. 110 (suppl. 5), 745–748 (2002).LaunchUrlCrossRefPubMed↵ J. Chen, M. Madegowda, H. Bhattacharjee, B. P. Rosen, ArsP: A methylarsenite efflux permease. Mol. Microbiol. 98, 625–635 (2015).LaunchUrlCrossRefPubMed↵ J. Qin et al., Arsenic detoxification and evolution of trimethylarsine gas by a microbial arsenite S-adenosylmethionine methyltransferase. Proc. Natl. Acad. Sci. U.S.A. 103, 2075–2080 (2006).LaunchUrlAbstract/FREE Full Text↵ J. Chen, H. Bhattacharjee, B. P. Rosen, ArsH is an organoarsenical oxidase that confers resistance to trivalent forms of the herbicide monosodium methylarsenate and the poultry growth promoter roxarsone. Mol. Microbiol. 96, 1042–1052 (2015).LaunchUrlCrossRefPubMed↵ M. Yoshinaga, B. P. Rosen, A C⋅As lyase for degradation of environmental organoarsenical herbicides and animal husbandry growth promoters. Proc. Natl. Acad. Sci. U.S.A. 111, 7701–7706 (2014).LaunchUrlAbstract/FREE Full Text↵ J. Qin et al., Convergent evolution of a new arsenic binding site in the ArsR/SmtB family of metalloregulators. J. Biol. Chem. 282, 34346–34355 (2007).LaunchUrlAbstract/FREE Full Text↵ J. Chen, V. S. Nadar, B. P. Rosen, A Modern MAs(III)-selective ArsR transcriptional repressor. Mol. Microbiol. 106, 469–478 (2017).LaunchUrl↵ L. A. Hug et al., A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).LaunchUrl↵ S.-C. Chen et al., ReRecent horizontal transfer of arsenite methyltransferase genes facilitated adaptation of life to arsenic. Sci. Rep. 7, 7741 (2017).LaunchUrlCrossRef↵ E. Pennisi, Algae suggest eukaryotes Obtain many gifts of bacteria DNA. Science 363, 439–440 (2019).LaunchUrlAbstract/FREE Full Text↵ L. A. David, E. J. Alm, Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469, 93–96 (2011).LaunchUrlCrossRefPubMed↵ S. E. Hoeft et al., Alkalilimnicola ehrlichii sp. nov., a Modern, arsenite-oxidizing haloalkaliphilic gammaproteobacterium capable of chemoautotrophic or heterotrophic growth with nitrate or oxygen as the electron acceptor. Int. J. Syst. Evol. Microbiol. 57, 504–512 (2007).LaunchUrlCrossRefPubMed↵ M. Fakhraee, S. A. Crowe, S. Katsev, Sedimentary sulfur isotopes and Neoarchean ocean oxygenation. Sci. Adv. 4, e1701835 (2018).LaunchUrlFREE Full Text↵ A. D. Anbar et al., A whiff of oxygen before the Distinguished oxidation event? Science 317, 1903–1906 (2007).LaunchUrlAbstract/FREE Full Text↵ S. A. Crowe et al., Atmospheric oxygenation three billion years ago. Nature 501, 535–538 (2013).LaunchUrlCrossRefPubMed↵ B. Eickmann et al., Isotopic evidence for oxygenated Mesoarchaean shallow oceans. Nat. Geosci. 11, 133–138 (2018).LaunchUrl↵ N. J. Planavsky et al., Evidence for oxygenic photosynthesis half a billion years before the Distinguished Oxidation Event. Nat. Geosci. 7, 283–286 (2014).LaunchUrlCrossRef↵ S. Silver, L. T. Phung, Bacterial heavy metal resistance: New surprises. Annu. Rev. Microbiol. 50, 753–789 (1996).LaunchUrlCrossRefPubMed↵ D. H. Nies, Microbial heavy-metal resistance. Appl. Microbiol. Biotechnol. 51, 730–750 (1999).LaunchUrlCrossRefPubMed↵ R. Riding, P. Fralick, L. Y. Liang, Identification of an Archean marine oxygen oasis. Precambrian Res. 251, 232–237 (2014).LaunchUrlCrossRef↵ J. Chen, M. Yoshinaga, B. P. Rosen, The antibiotic action of methylarsenite is an emergent Precisety of microbial communities. Mol. Microbiol. 111, 487–494 (2019).LaunchUrl↵ J. Li, S. S. Pawitwar, B. P. Rosen, The organoarsenical biocycle and the primordial antibiotic methylarsenite. Metallomics 8, 1047–1055 (2016).LaunchUrl↵ M. C. Sforna et al., Evidence for arsenic metabolism and cycling by microorganisms 2.7 billion years ago. Nat. Geosci. 7, 811–815 (2014).LaunchUrl↵ C. L. Dupont, S. Yang, B. Palenik, P. E. Bourne, Modern proteomes contain Placeative imprints of ancient shifts in trace metal geochemistry. Proc. Natl. Acad. Sci. U.S.A. 103, 17822–17827 (2006).LaunchUrlAbstract/FREE Full Text↵ C. L. Dupont, A. Butcher, R. E. Valas, P. E. Bourne, G. Caetano-Anollés, HiTale of biological metal utilization inferred through phylogenomic analysis of protein structures. Proc. Natl. Acad. Sci. U.S.A. 107, 10567–10572 (2010).LaunchUrlAbstract/FREE Full Text↵ A. Stamatakis, RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).LaunchUrlCrossRefPubMed↵ E. Pruesse, J. Peplies, F. O. Glöckner, SINA: Accurate high-throughPlace multiple sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823–1829 (2012).LaunchUrlCrossRefPubMed↵ S. Mukherjee et al., Genomes OnLine database (GAged) v.6: Data updates and feature enhancements. Nucleic Acids Res. 45, D446–D456 (2017).LaunchUrlCrossRefPubMed↵ N. Lartillot, T. Lepage, S. Blanquart, PhyloBayes 3: A Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288 (2009).LaunchUrlCrossRefPubMed↵ T. Lepage, D. Bryant, H. Philippe, N. Lartillot, A general comparison of relaxed molecular clock models. Mol. Biol. Evol. 24, 2669–2680 (2007).LaunchUrlCrossRefPubMed↵ H. C. Betts et al., Integrated genomic and fossil evidence illuminates life’s early evolution and eukaryote origin. Nat. Ecol. Evol. 2, 1556–1562 (2018).LaunchUrl↵ J. W. Valley et al., Hadean age for a post-magma-ocean zircon confirmed by atom-probe tomography. Nat. Geosci. 7, 219–223 (2014).LaunchUrlCrossRef↵ S. A. Wilde, J. W. Valley, W. H. Peck, C. M. Graham, Evidence from detrital zircons for the existence of continental crust and oceans on the Earth 4.4 Gyr ago. Nature 409, 175–178 (2001).LaunchUrlCrossRefPubMed↵ A. Hickman, Locational Review of the 3426–3350 Ma Strelley Pool Formation (Pilbara Craton, Western Australia, 2008).↵ D. Wacey, Stromatolites in the approximately 3400 Ma Strelley Pool Formation, Western Australia: Examining biogenicity from the macro- to the nano-scale. Astrobiology 10, 381–395 (2010).LaunchUrlCrossRefPubMed↵ A. M. Satkoski, N. J. Beukes, W. Q. Li, B. L. Beard, C. M. Johnson, A reExecutex-stratified ocean 3.2 billion years ago. Earth Planet. Sci. Lett. 430, 43–53 (2015).LaunchUrl↵ A. G. Cairnssmith, Precambrian solution photochemistry, inverse segregation, and banded iron formations. Nature 276, 807–808 (1978).LaunchUrlCrossRef↵ K. O. Konhauser et al., Could bacteria have formed the Precambrian banded iron formations? Geology 30, 1079–1082 (2002).LaunchUrlAbstract/FREE Full Text↵ S. A. Crowe et al., Photoferrotrophs thrive in an Archean ocean analogue. Proc. Natl. Acad. Sci. U.S.A. 105, 15938–15943 (2008).LaunchUrlAbstract/FREE Full Text↵ N. J. Butterfield, Bangiomorpha pubescens n. gen., n. sp.: Implications for the evolution of sex, multicellularity, and the Mesoproterozoic/Neoproterozoic radiation of eukaryotes. Paleobiology 26, 386–404 (2000).LaunchUrlAbstract/FREE Full Text↵ S. A. Smith, B. C. O’Meara, treePL: divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics 28, 2689–2690 (2012).LaunchUrlCrossRefPubMed↵ K. D. Yamada, K. Tomii, K. Katoh, Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees. Bioinformatics 32, 3246–3251 (2016).LaunchUrlCrossRefPubMed↵ J. D. Thompson, T. J. Gibson, D. G. Higgins, Multiple sequence alignment using ClustalW and ClustalX. Curr. Protoc. Bioinf. 2.3.1–2.3.22 (2002).↵ S. Capella-Gutiérrez, J. M. Silla-Martínez, T. Gabaldón, trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).LaunchUrlCrossRefPubMed↵ L. S. Johnson, S. R. Eddy, E. Portugaly, Hidden Impressov model speed heuristic and iterative HMM search procedure. BMC Bioinf. 11, 431 (2010).LaunchUrl↵ A. Bateman et al., The Pfam protein families database. Nucleic Acids Res. 30, 276–280 (2002).LaunchUrlCrossRefPubMed↵ R. C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughPlace. Nucleic Acids Res. 32, 1792–1797 (2004).LaunchUrlCrossRefPubMed↵ C. Notredame, D. G. Higgins, J. Heringa, T-Coffee: A Modern method for Rapid and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).LaunchUrlCrossRefPubMed↵ C. B. Execute, M. S. P. Mahabhashyam, M. Brudno, S. Batzoglou, ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 15, 330–340 (2005).LaunchUrlAbstract/FREE Full Text↵ P. Di Tommaso et al., T-Coffee: A web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res. 39, W13–W17 (2011).LaunchUrlCrossRefPubMed↵ D. Darriba, G. L. Taboada, R. Executeallo, D. Posada, ProtTest 3: Rapid selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011).LaunchUrlCrossRefPubMed↵ C. Scornavacca, E. Jacox, G. J. Szöllősi, Joint amalgamation of most parsimonious reconciled gene trees. Bioinformatics 31, 841–848 (2015).LaunchUrlCrossRefPubMed↵ E. Jacox, C. Chauve, G. J. Szöllősi, Y. Ponty, C. Scornavacca, ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony. Bioinformatics 32, 2056–2058 (2016).LaunchUrlCrossRefPubMed
Like (0) or Share (0)