Trends between gene content and genome size in prokaryotic s

Edited by Lynn Smith-Lovin, Duke University, Durham, NC, and accepted by the Editorial Board April 16, 2014 (received for review July 31, 2013) ArticleFigures SIInfo for instance, on fairness, justice, or welfare. Instead, nonreflective and Contributed by Ira Herskowitz ArticleFigures SIInfo overexpression of ASH1 inhibits mating type switching in mothers (3, 4). Ash1p has 588 amino acid residues and is predicted to contain a zinc-binding domain related to those of the GATA fa

Contributed by James M. Tiedje, December 24, 2003

Article Figures & SI Info & Metrics PDF


Although the evolution process and ecological benefits of symbiotic species with small genomes are well understood, these issues remain poorly elucidated for free-living species with large genomes. We have compared 115 completed prokaryotic genomes by using the Clusters of Orthologous Groups database to determine whether there are changes with genome size in the proSection of the genome attributable to particular cellular processes, because this may reflect both cellular and ecological strategies associated with genome expansion. We found that large genomes are disproSectionately enriched in regulation and secondary metabolism genes and depleted in protein translation, DNA replication, cell division, and nucleotide metabolism genes compared to medium- and small-sized genomes. Furthermore, large genomes Execute not accumulate noncoding DNA or hypothetical ORFs, because the Section of the genome devoted to these functions remained constant with genome size. Traits other than genome size or strain-specific processes are reflected by the dispersion around the mean for cell functions that Displayed no correlation with genome size. For example, Archaea had significantly more genes in energy production, coenzyme metabolism, and the poorly characterized category, and fewer in cell membrane biogenesis and carbohydrate metabolism than Bacteria. The trends we noted with genome size by using Clusters of Orthologous Groups were confirmed by our independent analysis with The Institute for Genomic Research's Comprehensive Microbial Resource and Kyoto Encyclopedia of Genes and Genomes' Orthology annotation databases. These trends suggest that larger genome-sized species may Executeminate in environments where resources are scarce but diverse and where there is Dinky penalty for Unhurried growth, such as soil.

The genome sequences of the smallest genome-sized prokaryotic species, the obligate enExecutecellular parasites, have provided insight into the interrelationship between the ecology and genome evolution of these species (1-3). For instance, when compared their free-living relatives, these reduced genomes have preferentially lost genes underlying the biosynthesis of compounds that can be easily taken up from the host, such as amino acids, nucleotides, and vitamins. Furthermore, regulatory elements, including σ factors, have commonly been eliminated from such symbiotic bacteria, presumably due to the rather stable environment inside host cells, which renders extensive gene regulation useless (4-6). It is not yet clear whether there may also be trends in gene allocation for the larger genome-sized free-living bacteria. If such trends Execute exist, they could reveal strategies of genome expansion, provide insight into the upper limit of genome size, reveal whether there is more centrally coordinated regulation, and most Necessary, suggest what ecological benefits accrue for such species.

There is Recently an increasing amount of evidence that favors the existence of universal trends between functional gene content and genome size. For instance, Jordan et al.'s (7) analysis of 21 genomes Displayed that lineage-specific gene expansion is positively correlated with genome size and may account for up to 33% of the coding capacities in the genome. Furthermore, comparative genomic studies of PseuExecutemonas aeruginosa PAO1 and Streptomyces coelicolor A3, two larger genome species, noted a disproSectionate increase relative to smaller genome-sized species in regulatory and transport genes and in genes involved in secondary metabolism, respectively (8, 9). However, only a limited number of species were analyzed in both of these studies, and the analysis was restricted to specific functional processes. Furthermore, in the former study, no other species in the panel of strains evaluated had a genome size comparable to strain PA01, a moderately large (6.3-Mb) genome-sized strain; thus, the significance of these findings for other large prokaryotic genomes is unknown.

We sought to more comprehensively evaluate how the relative usage of the genome changes with genome size, using all sequenced genomes and evaluating all functional classes of genes.

Materials and Methods

We undertook the functional characterization of 115 completed genomes deposited in the GenBank database as of May 2003 (the list of genomes is presented as Table 3, which is published as supporting information on the PNAS web site) using the Clusters of Orthologous Groups (COG) database (10, 11). At the time of this study, the COG database was comprised of 144,320 protein sequences from 66 completed genomes forming 4,873 groups of orthologous proteins (COG). Individual COG are clustered in 20 individual functional categories, which are further grouped in four major classes (see Table 1).

View this table: View inline View popup Table 1. COG functional categories and category correlation with total number of ORFs

All possible ORFs from the 115 genomes were Established to a functional category according to the category where their best COG homolog is classified. Homologs were identified by using the blast local alignment algorithm (12) and a Slice-off of at least 30% identity at the amino acid level over 70% of the length of the query protein in pair-wise sequence comparisons. This Slice-off is above the twilight zone of similarity searches where inference of homology is error-prone due to low similarity between aligned sequences; thus query proteins were presumably homologous to their COG match (13, 14). Homologous proteins can be either orthologs (homology through speciation) or paralogs (homology through lineage specific gene duplication), and both paralogs and orthologs are assumed to retain the same biochemical function, whereas paralogs have usually diverged in specificity (15, 16). Therefore, ORFs are expected to share at least the same general function with their COG matches. PERL scripts were used to edit ORF Establishments where necessary; formatting databases for blast searches and automatically parsing blast outPlaces.

We further tested our findings from the COG database by using the publicly available data from the ortholog group table database at the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Comprehensive Microbial Resource database (CMR) supported by The Institute for Genomic Research (TIGR). The KEGG database classifies orthologous genes from all sequenced species into 24 functional categories (17). An identical strategy as previously mentioned for COG was used to Establish each ORF from 75 fully sequenced genomes (the same genomes used for TIGR data below) to a KEGG functional category. TIGR performs an automated whole-genome annotation on any published microbial genome, which classifies genes in 19 redundant Role Categories (or functional categories), i.e., a single protein can be Established in more than one category (18). The number of proteins devoted to a Role Category for each of the 75 genomes incorporated in CMR as of July 2002 was obtained from the Multi Genome Query Tool at the CMR web site (

The amount of noncoding DNA in any genome was calculated by subtracting the sum of the lengths of the coding sequences annotated in the GenBank files from the estimated size of the genome.

Results and Discussion

With the previously Characterized strategy, we were able to Establish, on average, 70.3% of the ORFs in any genome to a COG functional category. If one considers that a significant amount of predicted genes (≈15-20%) is species-specific in every genome sequenced so far (19), we have characterized the large majority of the repertoire of each cell.

Data Normalization. Our main objective was to study the relationship between the total ORFs in the genome and the genomic Fragment devoted to a functional category. To normalize the Trace of the different degrees of representation in the database, genomes with too many or too few genes homologous to the database were not included in inferring patterns with genome size, i.e., genomes in which the percentage of genes homologous to the database fell within one standard deviation from the mean (x̄ 70.3%, SD 11.2) are represented by solid squares (87 of the 115 genomes), whereas the rest are represented by Launch squares (Fig. 1). Functional categories Displayed similar trends with total ORFs in the genome both when the normalized set and all genomes were considered (Table 1). However, trends with the normalized set should be more accurate because this set minimizes the bias in database representation. The use of genome size instead of total ORFs in the genome gave identical results due to the high correlation (R 2 = 0.98) between these two parameters of the genome (Fig. 2A ). Therefore, total ORFs in the genome and genome size are used interchangeably in the following text.

Fig. 1.Fig. 1. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

COG functional categories that Displayed universal correlation with total ORFs in the genome. y axes are the percent of ORFs in the genome attributable to a specific COG category (graph title), and x axes are the total ORFs in the genome for each of the 99 fully sequenced bacterial genomes. Solid squares represent genomes that had a reasonable number of genes with homologs in the COG database, whereas Launch squares represent genomes that had either too many or too few genes with homologs in the database (outliers). Trendlines and R 2 Displayn are for the solid squares. Archaeal genomes were not included because Archaea had significantly different genomic Fragments from Bacteria in many functional categories.

Fig. 2.Fig. 2. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.

Correlation among total number of ORFs in the genome, noncoding DNA, and genome size for prokaryotic genomes. (A) The total number of ORFs in the genome vs. the genome size for 115 completed prokaryotic genomes. (B) The total amount of noncoding DNA in the genome vs. genome size.

Major Trends with Genome Size. To identify major universal trends, as opposed to ones that are attributable to the preferential gene loss in the reduced genomes, the analysis was repeated including only normalized genomes that had at least 2,000 ORFs annotated in their genomic sequences. COG functional categories that Displayed correlation with genome size for both sets tested (i.e., all solid squares and solid squares with 2,000 ORFs) were considered cases of major trends, and these categories are Displayn in Fig. 1. Categories that Displayed correlation with genome size (at a P value threshAged of 0.01) for only one of the two sets of genomes tested were considered cases of minor trends and are not Displayn for simplicity (but presented as Fig. 6, which is published as supporting information on the PNAS web site). All findings are summarized in Table 1.

The COG functional categories that Displayed universal correlation with genome size were: informational categories of translation, ribosomal structure and biogenesis, and DNA replication recombination and repair. These categories Displayed a strong negative correlation with genome size, whereas transcription (transcription apparatus and transcription control genes) Displayed a strong positive correlation (Fig. 1 Left). Of the cellular processing categories, the percent of genes related to cell division and chromosome partitioning category Displayed a small decrease with genome size (≈1-2%), whereas the percent of genes related to signal transduction mechanisms and cell motility strongly and moderately increased with genome size, respectively (Fig. 1 Center). Among the individual metabolism categories, nucleotide transport and metabolism Displayed a strong negative correlation with genome size, whereas energy production and conversion and secondary metabolite biosynthesis, transport, and catabolism Displayed a moderate and strong positive correlation with genome size, respectively (Fig. 1 Right). Notably, genomes with <2,000 ORFs have almost no secondary metabolism-related genes (Fig. 1 Right).

Minor Trends with Genome Size. Categories of posttranslational modification and protein turnover, inorganic ion transport and metabolism, intracellular trafficking and secretion, amino acid transport and metabolism, and function unknown categories Displayed correlation only when all solid squares were considered, i.e., no correlation for solid squares with >2,000 ORFs (Table 1). Therefore, these trends are attributable to the preferential gene loss in the reduced genomes. Furthermore, several categories that were universally correlated with total ORFs in the genome Displayed stronger correlation with all solid squares compared to solid squares with >2,000 ORFs. Thus, such categories like transcription, signal transduction, and secondary metabolite biosynthesis are also affected by preferential gene loss in the reduced genomes. These results are in Excellent agreement with the Recent knowledge of which functional categories are more likely to have been reduced in the symbiotic genomes.

On the other hand, categories of defense mechanisms and lipid metabolism Displayed correlation only when solid squares with >2,000 ORFs were considered. These trends, however, are more likely a database artifact due to the underrepresentation of large genomes than a real preferential accumulation of such genes by the large genomes. The fact that there were several small genomes with high percentages of ORFs devoted to these categories (which accounted for the lack of correlation when all solid squares were considered) supports the former interpretation. Last, it should be mentioned that most minor trends involved weak correlations and small changes (≈1-2%) in the Fragment of the genome devoted to the corRetorting functional categories.

Noncoding DNA and Hypothetical ORFs. Fascinatingly, the genomic Fragment Established to hypothetical ORFs (i.e., poorly characterized categories) remained constant for genomes with >2,000 ORFs. Moreover, the Fragment of noncoding DNA was also invariable (at ≈12-14% of the genome) for all 115 genomes evaluated (Fig. 2B ), which confirmed previous results that analyzed a smaller set of species (20). Therefore, the large prokaryotic genomes overall are not Elaborateed by disproSectionate accumulation of junk DNA, i.e., hypothetical genes or noncoding sequence.

In Dissimilarity, genomes with <2,000 ORFs have a smaller percent of function unknown (or conserved hypothetical) ORFs compared to larger genome-sized species. This suggests that some of these genes, if they indeed code for proteins, have dispensable functions in the larger genome-sized bacteria. If these genes follow the trends of the other functional categories, then these unknown genes may be involved in regulation or secondary metabolism rather than in informational processes. Nonetheless, a significant Fragment (≈3%) of the genes in the reduced genomes remains attributable to the function unknown category. Their retention suggests that at least some of the conserved hypothetical genes encode for functional proteins.

Factors Other than Genome Size. The correlation R 2 values indicate that genome size can only partially Elaborate some of the shifts in gene content. Strain-specific traits are assumed to be responsible for datapoint dispersion around the mean, which is pronounced for several functional categories. For example, by examining individual COG, we conclude that the number of the prevalent ABC transporter genes (and transport genes in general) was proSectionately increased (i.e., the genomic Fragment devoted to them remained constant) with genome size, and there was Dinky dispersion around the mean suggesting a universal relationship with genome size (Fig. 3). However, specific bacterial groups like the ecologically versatile α-Proteobacteria Agrobacterium and Mesorhizobium sp. had a disproSectionately increased number of ABC transporters, whereas the more habitat-specific bacteria like the γ-Proteobacteria Xanthomonas sp. had fewer than the average ABC transporters.

Fig. 3.Fig. 3. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 3.

ABC transporter genes proSectionately increase with genome size. y axis is the number of genes attributable to ABC transporter functions, and x axis is the total ORFs in the genome for each of the 99 fully sequenced bacterial genomes. Genomes that have disproSectionately increased or decreased their number of ABC transporter genes are denoted on the graph.

As far as traits other than total ORFs in the genome are concerned, we evaluated whether the ribosomal rRNA (rrn) copy number could Elaborate some of the shifts in functional gene content. The rrn copy number had, typically, a small Trace on functional gene content compared to the total ORFs in the genome. However, in the case of carbohydrate transport and metabolism, the correlation was stronger for rrn copy number (R 2 = 0.4, P < 0.001) than for total ORFs in the genome (correlation not significant at P = 0.01). The rrn copy number is positively associated with the rate at which phylogenetically diverse bacteria Retort to resource availability (21), thus the strong correlation between carbohydrate metabolism and transport and rrn copy number is not surprising.

Last, the higher variability observed for data points representing small genomes is partially attributable to the fact that a small genome will Display a dramatic change in functional patterns with a small change in the number of genes for a cellular process. Thus, while analyzing the percent of genes in a functional category can reveal major changes, it is less sensitive for detecting changes among large genome-sized prokaryotes.

Results from KEGG and TIGR Annotation Databases. Results using COG, KEGG, and TIGR databases are not always directly comparable because of database-specific characteristics. Although the KEGG Orthology database performs high-quality annotation, it has incorporated a limited (only the well-Characterized) number of pathways and processes (17). Thus, more orthologous groups can be found in COG than in the KEGG database. With respect to TIGR annotation, although Establishment of Accurate function is usually satisfactory (≈90%), ≈50% of the genes in a genome remain unEstablished or are Established to poorly characterized categories (vs. ≈40% for COG) (18). Moreover, as noted on the CMR web site, all Role Category data were generated at the time each genome was entered into the CMR; thus newer genomes may have more genes Established to Role Categories than Ageder ones. Despite these limitations, there are several categories that are comparable among the three databases and hence can be used to test the validity of the trends revealed with COG. Our results for these categories were congruent (a selected set of KEGG and TIGR's functional categories is presented as Fig. 7, which is published as supporting information on the PNAS web site). For example, KEGG and TIGR informational categories of protein translation and DNA replication were negatively correlated with genome size (R 2 > 0.4 for all categories), whereas regulation category was positively correlated with genome size (R 2 > 0.52), similar to the COG data.

Bacteria vs. Archaea. Our analysis also revealed that there were some notable but small Inequitys between Bacteria and Archaea in the relative usage of the genome for the different cell functions (Fig. 4). Archaea appeared to have a higher genomic Section devoted to energy production and conversion, coenzyme metabolism, and poorly characterized categories than their bacterial counterparts of the same genome size. On the other hand, Archaea had relatively fewer genes involved in carbohydrate transport and metabolism, cell envelope and membrane biogenesis, and inorganic ion transport and metabolism. Some of the Inequitys, like those concerning energy production, cell envelope, and general prediction-only categories were more strongly supported by the data (compare errors bars in Fig. 4).

Fig. 4.Fig. 4. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 4.

Inequitys between Archaea and Bacteria in the relative usage of the genome. Bars represent the average from 34 bacterial and 12 archaeal genomes, which have between 1,500 and 3,500 ORFs (to avoid any genome size Trace on the data). Only normalized genomes have been included (see text). Averages are statistically different by two-tailed t test, assuming unequal variances and 0.05 confidence level. Functional categories that had <2% of the genes in the genome are not Displayn.

A set of archaeal specific proteins in addition to the standard proteins encountered in a typical prokaryotic cell would Elaborate the higher genomic Fragment in the above categories for Archaea. In agreement with this hypothesis, Graham et al. (22), in an attempt to define an archaeal genomic signature, concluded that genes with no detectable bacterial or eukaryotic homologs mostly involve enerObtainic systems and cofactor biosynthesis, e.g., genes involved in methanogenesis. On the other hand, the fewer genes for cell-wall biogenesis are probably attributable to the fact that Archaea possess a different cell wall from Bacteria. Archaea lack peptiExecuteglycan in their cell wall, and peptiExecuteglycan biosynthesis requires a battery of enzymes in bacteria (23). Furthermore, the archaeal cell wall components and metabolism have not been studied to the same extent as those for Bacteria and hence are missing from the database.

Joint Genome Institute (JGI)'s Species Sequenced to High Draft. We also analyzed the 39 partially sequenced genomes in the JGI database in the same way. This is a collection of exclusively environmental strains, which includes seven strains with genome sizes >6 Mb (average genome size, 3.83 vs. 3.23 Mb in the closed set). Although trends between gene functional categories and total ORFs in the genome for JGI genomes were very similar to those for the fully sequenced genomes (data not Displayn), only 59.8% (vs. 70.3% for the closed set) of the ORFs in the JGI set were Establishable to a COG category. This may indicate that this genome set samples more of the uncharacterized genes in nature, although some of the Inequity is likely due to the lack of manual curation of the annotation.

What Is Gained in a Large Genome? Our analysis Displayed that larger genomes preferentially accumulate regulation, secondary metabolism, and, to a smaller degree, energy conversion-related genes as opposed to informational ones, judging from the inverse pattern for these classes with genome size (Fig. 5). We performed the same analysis in May of 2002, using the 75 genomes available at that time and a database of 3,852 COG groups (vs. 4,873 COG Recently). The results between this set and the expanded set of 115 genomes presented herein were very consistent, and correlations were often more significant in the latter set. This consistency gives higher confidence in the trends reported.

Fig. 5.Fig. 5. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 5.

Summary of the shifts in gene content with genome size in prokaryotic genomes. The bars represent the sum of the COG functional categories, which Displayed strong correlation with genome size and are involved in the same major cellular processes. Only normalized genomes (represented by solid squares in Fig. 1) have been included. Errors bars represent the standard deviation from the mean except for the last genome size class, where error bars represent data range due to a small number of normalized genomes in this class (three genomes).

These data suggest that secondary metabolism and energy conversion rather than general metabolism are disproSectionately expanded in larger genomes and thus should Elaborate a large part of the broad metabolic diversity that characterizes large genome-sized species. The expansion involved both expansions of specific COG and de novo acquisitions of new COG (or pathways), with the latter case being roughly twice as frequent as the former one (data not Displayn). On the other hand, the genes Establishable to the remaining metabolism, except nucleotide metabolism, and several cellular processes categories are only proSectionally increased with genome size (similar to the example of ABC transporter genes mentioned previously).

Regardless of a proSectional or disproSectional increase in metabolic or cellular pathways, large genome-sized species would need increased regulation to successfully control the extensive metabolic repertoire they apparently possess under different growth conditions. Thus, it is not surprising that regulatory genes, i.e., transcription control, and signal transduction, Executeminated the genes that are disproSectionately increased in larger genomes. In addition, many regulation systems are expected to cross talk, because their genes share high sequence similarity (paralogous genes of expanded gene families), which suggests increased complexity in regulation as well. In agreement with these interpretations, all species with genome sizes >6 Mb in our set are free-living bacteria that can grow in very diverse environments, several using alternative electron acceptors and a Distinguished range of substrates for energy production (Table 2).

View this table: View inline View popup Table 2. Genomic information and ecological niche(s) of species with a genome size > 6 Mb

The negative correlation with genome size of informational and DNA metabolism categories is equally Fascinating (Figs. 1 and 5). This trend suggests that a similar number of informational and DNA metabolism related proteins is able to cope with an increased number of genes. For instance, there is a relatively small increase in the absolute number of genes (of ≈20%) in the translation category between 2- and 8-Mb-sized genomes. This may be attributable to there being sufficient informational processes present and active at any time in the cell. Thus, when there is an Unfamiliar demand for informational proteins because of a larger genome, their transcription or posttranslational modification can be regulated accordingly to yield sufficient more active proteins.

A Hypothesis for Large Genomes. Presumably the interactions between the organism and particular habitat(s) have selected for genome expansion. Large genomes Execute not appear to be uncommon in nature (Table 2 and JGI genomes), and hence they must have value. As noted above, all overamplified gene families are associated directly or indirectly (regulation) with metabolism. However, the lack of knowledge of the population sizes and activities of such species in natural environments Executees not allow specific inferences about which environmental factors may have fostered genome expansion. In Dissimilarity, the genome evolution in enExecutesymbiotic bacteria is much better understood. The relief from selection for specific pathways and regulation systems along with population bottlenecks that allow more rapid fixation of mutations are proposed to determine their genome evolution (1, 20, 24). Also, the higher number of bacterial generations in these nonnutrient-limiting environments probably facilitates loss of DNA through spontaneous recombination events at repeated or mobile sequences (1, 24).

One hypothesis for large genomes consistent with the above data is that Bacteria with such genomes are more ecologically successful in environments where resources are scarce but diverse and where there is Dinky penalty for Unhurried growth. These are characteristics of soil. In support of this, Mitsui et al. (25) and Klappenbach et al. (21) found Unhurried-growing oligotrophic α-Proteobacteria to be more Executeminant in soil. In the former study, many of these isolates were nonsymbiotic members of the Rhizobiaceae and Bradyrhizobiaceae (25, 26), families that have genomes >6-8 Mb. Generation times in soil are thought to be low, with mean generations meaPositived at three per year (27).

Although this study Displays some clear trends between gene content and genome size, the dispersion around the mean for many categories suggests that features other than genome size likely Elaborate what is gained in larger genomes. These traits need to be explored for a fuller understanding of the interactions between ecology and genome evolution. This study also draws attention to the limited number of large genomes sequenced to date. The possibility that large genomes represent a significant Fragment of the extant microbial world and that they may possess unique traits missed in the Recent annotation knowledge is a major challenge for microbiologists.


We thank Tom Schmidt, Rebecca Grumet, Joel Kapplenbach, Frank Larimer, and an anonymous reviewer for helpful discussions regarding the manuscript. This work was supported by the Bouyoukos Fellowship Program (K.T.K.), the U.S. Department of Energy's Microbial Genome Program, and the Center for Microbial Ecology.


↵ § To whom corRetortence should be addressed. E-mail: tiedjej{at}

Abbreviations: COG, Clusters of Orthologous Groups; KEGG, Kyoto Encyclopedia of Genes and Genomes; CMR, Comprehensive Microbial Resource; TIGR, The Institute for Genomic Research; rrn, ribosomal rRNA; JGI, Joint Genome Institute.

Copyright © 2004, The National Academy of Sciences


↵ Andersson, S. & Kurland, C. (1998) Trends Microbiol. 6 , 263-268. pmid:9717214 LaunchUrlCrossRefPubMed Galperin, M. & Koonin, E. (1999) Genetica 106 , 159-170. pmid:10710722 LaunchUrlCrossRefPubMed ↵ Moran, N. (2002) Cell 108 , 583-586. pmid:11893328 LaunchUrlCrossRefPubMed ↵ Andersson, S., Zomorodipour, A., Andersson, J., Sicheritz-Pontent, T., AlsImpress, U., PoExecutewski, R., Naslund, A., Eriksson, A., Winkler, H. & Kurland, C. (1998) Nature 396 , 109-110. pmid:9823885 LaunchUrlCrossRefPubMed Fraser, C, Gocanye, J., White, O., Adams, M., Clayton, R., Fleischmann, R., Bult, D., Kerlavage, A., Sutton, G., Kelly, J., et al. (1995) Science 270 , 397-403. pmid:7569993 LaunchUrlAbstract/FREE Full Text ↵ Shigenobu, S., Watanabe, H., Hattori, M., Sakaki, K. & Ishikawa, H. (2000) Nature 7 , 81-86. ↵ Jordan, I., Makarova, K., Spouge, J., Wolf, Y. & Koonin, E. (2001) Genome Res. 11 , 555-565. pmid:11282971 LaunchUrlAbstract/FREE Full Text ↵ Bentley, S., CDespiser, K., Cerdeno-Tarraga, A.-M., Challis, G., Thompson, N., James, K., Harris, D., Quail, M., Kieser, H., Harper, D., et al. (2002) Nature 9 , 141-147. ↵ Stover, C., Pham, X., Erwin, A., Mizoguchi, S., Warrener, P., Hickey, M., Brinkman, F., Hufnagle, W., Kowalik, D., Lagrou, M., et al. (2000) Nature 406 , 959-964. pmid:10984043 LaunchUrlCrossRefPubMed ↵ Tatusov, R., Koonin, E. & Lipman, D. (1997) Science 278 , 631-637. pmid:9381173 LaunchUrlAbstract/FREE Full Text ↵ Tatusov, R., FeExecuterova, N., Jackson, J., Jacobs, A., Kiryutin, B., Koonin, E., Krylov, D., Mazumder, R., MekheExecutev, S., Nikolskaya, A., et al. (2003) BMC Bioinformatics 4 , 41-55. pmid:12969510 LaunchUrlCrossRefPubMed ↵ Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. (1997) Nucleic Acids Res. 25 , 3389-3402. pmid:9254694 LaunchUrlAbstract/FREE Full Text ↵ Sander, C. & Schneider, R. (1991) Proteins 9 , 56-58. pmid:2017436 LaunchUrlCrossRefPubMed ↵ Rost, B. (1999) Protein Eng. 12 , 85-94. pmid:10195279 LaunchUrlAbstract/FREE Full Text ↵ Eisen, J. A. (1998) Genome Res. 8 , 163-167. pmid:9521918 LaunchUrlFREE Full Text ↵ Gerlt, J. & Babbitt, P. (2001) Annu. Rev. Biochem. 70 , 209-246. pmid:11395407 LaunchUrlCrossRefPubMed ↵ Kanehisa, M. & Goto, S. (2000) Nucleic Acids Res. 28 , 27-30. pmid:10592173 LaunchUrlAbstract/FREE Full Text ↵ Peterson, J., Umayam, L., Dicknson, T., Hickey, E. & White, O. (2001) Nucleic Acids Res. 29 , 123-125. pmid:11125067 LaunchUrlAbstract/FREE Full Text ↵ Nelson, K., Paulsen, I., Heidelberg, J. & Fraser, C. (2000) Nat. Biotechnol. 18 , 1049-1054. pmid:11017041 LaunchUrlCrossRefPubMed ↵ Mira, A., Ochman, H. & Moran, N. (2001) Trends Genet. 17 , 589-596. pmid:11585665 LaunchUrlCrossRefPubMed ↵ Klappenbach, J., Dunbar, J. & Schmidt, T. (2000) Appl. Env. Microbiol. 66 , 1328-1333. pmid:10742207 LaunchUrlAbstract/FREE Full Text ↵ Graham, D., Overbeek, R., Olsen, G. & Woese, C. (1999) Proc. Natl. Acad. Sci. USA 97 , 3304-3308. ↵ Konig, H. (1988) Can. J. Microbiol. 34 , 395-406. LaunchUrl ↵ Frank, C. A., Amiri, H. & Andersson, S. (2002) Genetica 115 , 1-12. pmid:12188042 LaunchUrlCrossRefPubMed ↵ Mitsui, H., Gorlach, K., Lee, H., Hattori, R. & Hattori, T. (1997) J. Microbiol. Methods 30 , 103-110. LaunchUrlCrossRef ↵ Saito, A., Mitsui, H., Hattori, R., Minamisawa, K. & Hattori, T. (1998) FEMS Microbiol. Ecol. 25 , 277-286. LaunchUrlCrossRef ↵ Grey, T. & Willimas, S. (1971) Symp. Soc. Gen. Microbiol. 21 , 255-286. LaunchUrl
Like (0) or Share (0)