Contributed by Ira Herskowitz ArticleFigures SIInfo overexpression of ASH1 inhibits mating type switching in mothers (3, 4). Ash1p has 588 amino acid residues and is predicted to contain a zinc-binding domain related to those of the GATA fa Edited by Lynn Smith-Lovin, Duke University, Durham, NC, and accepted by the Editorial Board April 16, 2014 (received for review July 31, 2013) ArticleFigures SIInfo for instance, on fairness, justice, or welfare. Instead, nonreflective and

Edited by John F. Executeebley, University of Wisconsin, Madison, WI, and approved June 15, 2004 (received for review March 11, 2004)

Article Figures & SI Info & Metrics PDF## Abstract

The process of strong artificial selection during a Executemestication event is modeled, and its Trace on the pattern of DNA polymorphism is investigated. The model also considers population bottleneck during Executemestication. Artificial selection during Executemestication is different from a regular selective sweep because artificial selection acts on alleles that may have been neutral variants before Executemestication. Therefore, the fixation of such a beneficial allele Executees not always wipe out DNA variation in the surrounding Location. The amount by which variation is reduced largely depends on the initial frequency of the beneficial allele, p. As a consequence, p has a strong Trace on the likelihood of detecting the signature of selection during Executemestication from patterns of polymorphism. These theoretical results are discussed in light of data collected from maize. Although the main focus of this article is on Executemestication, this model can also be generalized to Characterize selective sweeps from standing genetic variation.

population geneticstheorycoalescentExecutemestication selectionArtificial selection is believed to be the main evolutionary force acting on Executemesticated species since their origin 5,000–10,000 years ago. During Executemestication, humans exercised extremely strong selective presPositive on ancestral gene pools to achieve desired phenotypic characteristics. These beneficial phenotypes were therefore fixed in the founder population of Executemesticated species in a short (probably very short) time. These fixation events differ from the fixation of an advantageous mutant in a natural population, in that artificial selection in a Executemestication event acts on an allele that was likely a neutral or Arrively neutral variant before Executemestication. In other words, Executemestication causes some neutral polymorphisms in the ancestral population of the wild progenitor species to suddenly become very advantageous in the small founder population, the progenitor of the Executemesticated species. Therefore, the initial frequency of a beneficial allele (p) before Executemestication is not necessarily low. In Dissimilarity, the initial frequency of an advantageous mutant in a regular selective sweep model is 1/(2N) (1), where N is the diploid population size. Hence, models developed to Characterize selective sweeps in natural populations may not be appropriate for cases in which alleles are fixed from standing genetic variation, such as has been Characterized for an amino acid variant at the CAULIFLOWER gene in Brassica (2).

In this article, a model for this process of strong artificial selection during a Executemestication event is developed. In addition to artificial selection, the model incorporates a population size bottleneck during Executemestication so that the level of polymorphism in the cultivated species is expected to be lower than that in its wild progenitor species (3, 4). In cultivated crops, polymorphism is typically reduced by 60–80% (5). Under this model, the patterns of DNA polymorphism both with and without selection are studied to understand the genetic consequences of Executemestication at the DNA level.

Recently, DNA polymorphism [i.e., single nucleotide polymorphism (SNP)] Studys at the genome level are becoming common, with rapid advances in sequencing and SNP-typing technologies. Executemesticated species (e.g., rice and maize) are the main tarObtains for genome-wide SNP Studys because of their agricultural importance. One of the purposes of such projects is to find “Executemestication genes,” that is, genes that were subject to artificial selection during Executemestication. Identifying such genes will be informative for future crop improvement. This article addresses the following questions: How can Executemestication genes be found from patterns of polymorphism? Under what conditions are they found?

## Model and Simulation

We consider the demographic model illustrated in Fig. 1, in which the population experiences drastic population size changes twice. This model approximates the demography of cultivated species (3). More specifically, the system starts (forward in time) as a ranExecutem mating diploid population with size N 2, the ancestral population of the wild progenitor species from which the Executemesticated species originated. Executemestication Starts at time t d in the small founder population of the cultivated species, which is a subset of the members in the wild progenitor. Executemestication is assumed to have occurred in this constant-size population with size N 1. Usually, N 1 will be much smaller than N 2. When Executemestication is complete at time t e, it is assumed that the population size changes to N 0. N 0 will usually be much larger than N 1, representing the rapid spread (population expansion) of the Executemesticated species. Let T 0 and T 1 be the lengths of time when the population sizes are N 0 and N 1, respectively (Fig. 1).

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.Illustration of the population model. At td , strong artificial selection becomes active on an allele at frequency p, which was neutral in the ancestral population. The allele fixed quickly under strong artificial selection. A possible realization of the trajectory of the frequency of the beneficial allele is also illustrated. The vertical axis represents the frequency of the beneficial allele.

Artificial selection during Executemestication is modeled as follows. Consider a biallelic polymorphic site at time t d in the ancestral population (Fig. 1). It is assumed that artificial selection now favors one of the two alleles (B represents the beneficial allele and b represents the other). Let p be the frequency of B at t d. Artificial selection ends in τ (<T 1) generations when the beneficial allele becomes fixed in the population. Under selective presPositive, the relative fitness of a b/b homozygote, a B/b heterozygote, and a B/B homozygote are 1, 1 + 2sh, and 1 + 2s, respectively.

To model this process, we used coalescent process of sampled sequences (6–8). The sequence contains the site under selection during the Executemestication event so that the genealogical hiTale of the site involves the coalescent with two allelic classes. Hudson and Kaplan (9) first Characterized the Concept of genealogy conditional on two allelic classes, where genealogical hiTale is considered separately for each allelic class, given the frequencies of the two alleles. Therefore, simulating the genealogy with two allelic classes requires the hiTale of their frequencies. In our Executemestication model, the hiTale of the beneficial allele has two phases: neutral phase in the ancestral population and selective phase during Executemestication. For the neutral phase, the genealogy at the focal site is highly variable due to ranExecutem genetic drift (10–13). To Characterize the selective phase, most models used deterministic approximation of the trajectory of allele frequency, which assumes strong selection (1, 14–16). In this study, however, we incorporate the Trace of ranExecutem genetic drift (17, 18). Our model is flexible and may be more realistic because Executemestication might have occurred in small populations in which deterministic approximations may not be accurate.

A coalescent simulation under our Executemestication model can be conducted with the following procedure:

Determine X S, the trajectory of the frequency of the beneficial allele (x) in the selective phase moving forward in time. The forward simulation starts with x = p at t d. For each generation, x is simulated according to a binomial distribution where the mean of x in the next generation follows the standard deterministic solution. The trajectory of x is recorded until it hits 1. Let τ be the time from t d to this fixation event. If the allele is lost (x = 0), the process starts again from the Startning (i.e., x = p at t d).

Determine X N, the trajectory of the frequency of allele B in the neutral phase backward in time. This backward simulation is valid due to the reversibility of the diffusion process in a constant-size population (19–22). The system starts with x = p at t d. For each generation, x is simulated according to a binomial distribution. Because B is neutral, the mean of the binomial distribution for the previous generation is the same as x in the Recent generation. The trajectory of x is recorded until it hits 1 or 0. This time is denoted by t a. If x = 0 at t a, B is the derived allele. If x = 1 at t a, B is the ancestral allele.

Simulate a neutral ancestral recombination graph backward in time from t = 0 to t f (23). The coalescent time scale is meaPositived in units of 2N 0 generations. The rates of coalescence and recombination are given by k(k – 1)/(2N/N 0) and kR 0/2, respectively, where k is the number of edges on the graph (number of ancestral sequences), N (= N 0 or N 1) is the population size at the time of the event, and R 0 = 4N 0ρ is the scaled recombination rate for the sequence (ρ is the recombination rate per generation).

Simulate an ancestral recombination graph backward in time from t f to t a conditional on X S and X A, by using the algorithm of coalescent-with-recombination from Kim and Stephan (15). The four events in this phase (coalescence between B edges, coalescence between b edges, recombination in a B edge, and recombination in a b edge) occur with probabilities given by equations 2a–2d in Kim and Stephan (15), except that the probabilities of coalescence are adjusted by N, the population size at the time of event. See ref. 15 for details of this procedure.

Simulate a neutral ancestral recombination graph backward in time from t a. The construction of the ancestral recombination graph Ceases at t = t limit. A considerably large t limit is chosen such that marginal trees at most nucleotide sites find the most recent common ancestor before t limit.

Space neutral mutations on the ancestral recombination graph. The mutation rate per sequence per generation is assumed to be μ; θ0 = 4N 0μ is the mutation parameter scaled by the coalescent time scale.

By using this procedure, patterns of DNA polymorphism after Executemestication are investigated. The Traces of population bottleneck and selection are evaluated by measuring the reduction in the observed amount of polymorphism in simulated polymorphism data, by using three meaPositives of the amount of polymorphism, , and . Specifically, where S is the observed number of segregating sites (24), is the average number of pairwise nucleotide Inequitys per site (8), and is the homozygosity of the derived allele per site (25, 26). Note that the calculation of requires the ancestral state of each segregating site, which we assume to be known. The expectations of these three meaPositives of polymorphism are θ = 4Nμ in a constant-size diploid population with population size N, but it Executees not hAged in our bottleneck models (see below).

## Results

We investigated patterns of DNA polymorphism after Executemestication with and without selection. First, to examine the Trace of selection alone, a constant-size population (i.e., N = N 0 = N 1 = N 2) is modeled. Simulations are performed with a sample size of n = 20, N = 20,000, t d = 2,000, and θ = R = 200 where R = 4Nρ. The simulated Location is scaled such that the sequence ranges over the interval (0,1), with the selected tarObtain site at position 0. The Location is divided into bins, and the average amount of polymorphism over 5,000 replications is calculated for each bin. The results are presented as θπ relative to θ. Let be the expected value of θπ, which is θ in a constant-size population. Fig. 2A Displays the Trace of p when 2Ns = 5,000 and h = 0.5. Several values of p (0.5, 0.2, 0.1, 0.05, and 0.01) are investigated, and the results are compared with the standard selective sweep model [p = 1/(2N)]. It is clear that p has a large Trace on levels of variation. Even when p is relatively small (i.e., p = 0.01), the curve is quite different from that of a “normal” selective sweep starting with a newly arisen single advantageous mutation: the reduction in variation is much less for p = 0.01. For p ≥ 0.05, θπ Arrive position 0 is much larger than zero. This is because θπ around the selection tarObtain site largely reflects the ancestral polymorphism that the beneficial allele had when Executemestication started. Theoretically, the result is understood as follows. Consider the coalescent process at the selection tarObtain site under the framework of the coalescent (i.e., time is considered backward). Because Executemestication occurred quite recently, it is very likely that the most recent common ancestor (MRCA) of the sampled sequences is Ageder than t d unless a very strong bottleneck or strong hitchhiking [i.e., p = 1/(2N)] forces coalescence to the MRCA in the short period between t d and the present. Without such coalescent, at least two lineages exist at t d, and all of these lineages must belong to the beneficial allele with frequency p. It is known that the expected coalescent time for a pair of lineages within such an allele is 2N 2 p (10), indicating that allele B should have quite a large amount of intra-allelic variation at t d unless p is very small.

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 2.The expected level of polymorphism (θπ) in a constant-size population, which is scaled by θ.(A) The Trace of p.(B) The Trace of selection intensity (2Ns).

The Trace of selection intensity (2Ns) is presented in Fig. 2B , in which 2Ns ranges from 5 × 102 to 1 × 104 and p = 0.01 and 0.1. As expected, as the selection intensity increases, the level of polymorphism decreases. The Trace of 2Ns is larger when p = 0.01 than when p = 0.1. This is because the reduction is primarily determined by the total sojourn time of the selected allele (τ + t a – t d). When p = 0.1, τ is very short relative to t a – t d so that the relative Trace of 2Ns on the sojourn time is small.

Next, the joint Traces of selection and a population bottleneck are investigated. Two bottleneck models are used (Table 1). For both models, the ancestral population size is assumed to be 200,000, which is 10 times larger than the Recent population size (N 0 = 20,000). The time when Executemestication started is assumed to be t d = 7,500 generations ago because most Executemestication occurred ≈5,000–10,000 years ago. The two models differ in the severity of the bottleneck, but both models are set such that where θ2 = 4N 2μ. In model I, the reduction in population size is large, but the length of time of the bottleneck is short, whereas in model II the level of bottleneck is mild. Selection intensity 2N 1 s = 500 is assumed with h = 0.5.

View this table: View inline View popup Table 1. Parameters for two population modelsUnder both models, simulations are performed with sample size n = 20 and θ2 = R 2 = 200 where R 2 = 4N 2ρ. The level of polymorphism meaPositived by θπ scaled by θ2 are Displayn in Fig. 3. The level of polymorphism is reduced by the bottleneck regardless of the Trace of selection. In addition, selection further reduces the level of variation. Notice that the qualitative Trace of p is almost identical in models I and II and a constant-size population (Fig. 2) unless p is very small, further emphasizing the Necessary role of p in determining the level of polymorphism. The Traces of the other population parameters may be relatively small, which is Displayn below.

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 3.The expected level of polymorphism (θπ) after a Executemestication event, which is scaled by θ2. In both models I (A) and II (B), , which is presented by a horizontal line. The simulated Location corRetort to an 8-kb Location if we assume θ2 = R 2 = 0.025 per site, which may be within typical ranges for maize (3).

Fig. 4A Displays the Trace of t d in model I (other parameters are as in Fig. 3A ). Although the reduction in the level of polymorphism due to the bottleneck depends on t d, the Trace of selection meaPositived by may be similar, in agreement with Fig. 3. Similar results are Displayn in Fig. 4 B and C , in which the Traces of N 0 and N 2 are investigated (the mutation and recombination parameters θ2 and R 2 are fixed so that θ0 and R 0 vary).

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 4.The Traces of t d, N 0, and N 2 on the expected level of polymorphism (θπ) after a Executemestication event. The simulated Location corRetorts to an 8-kb Location if we assume θ2 = R 2 = 0.025 per site. (A) The Trace of t d. The five horizontal lines represent for t d = 2,500, 5,000, 7,500, 10,000, and 20,000 from top to bottom. (B) The Trace of N 0. The five horizontal lines represent for N 0 = 104, 2 × 104, 5 × 104, 105, and 2 × 105 from bottom to top. (C) The Trace of N 2. The five horizontal lines represent for N 2 = 5 × 104, 105, 2 × 105, 5 × 105, and 106 from top to bottom.

Although the expected distribution of the level of polymorphism is given by a simple increasing function from the site of selection, the stochastic process in the hiTale of the sampled sequences is extremely variable. Fig. 5 Displays simulated patterns of the spatial distribution of , and . A Location with θ2 = R 2 = 250 is simulated in model I, with the selection tarObtain site at the center of the Location. For p = 0.1 and 0.01, four patterns of polymorphism are simulated, and a sliding winExecutew analysis is carried out for , and , in which the winExecutew size is 0.1 and the step size 0.025. Under this bottleneck model, we expect , and , where bars represent the expected values under neutrality. When p = 0.01, the amount of variation around the selection tarObtain site is usually significantly reduced from the expectation under neutrality (see Fig. 5 E, F, and G ). However, there are exceptions. An example is Fig. 5H , in which a mild reduction in the level of and is seen over the whole Location, whereas is close to its neutral expectation. Although rare, this result may occur when the most recent common ancestor at the selected site is much Ageder than t d. For example, the probability that the age of an allele with p = 0.01 exceeds N 2 is 0.04 (27). This result indicates that, in some cases, we cannot expect a strong reduction in even when p is small. As p increases, the signature of selection becomes progressively weaker. When p = 0.1, the level of polymorphism around the selection tarObtain site is lower than in the surrounding Locations, but this reduction is weak. When p = 0.5, the distribution of variation is similar to that under neutrality, and it becomes very hard to distinguish them visually (data not Displayn).

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 5.Patterns of polymorphism after a Executemestication event. (A–H) Each panel Displays the result from an independent simulation run. Three horizontal lines represent , and from top to bottom. The simulated Location corRetorts to a 10-kb Location if we assume θ2 = R 2 = 0.025 per site.

Next, we investigate how likely it is to find a signature of selection by using additional simulations. To meaPositive the success in detecting the signature of selection, we first consider an Hudson-Kreitman-Aguadé (HKA)-type test (28). Here, r, the ratio of the amount of polymorphism to divergence, is used as a summary statistic to evaluate the reduction in the level of variation around the selection site. Suppose that we have an outgroup sequence. The average divergence between the outgroup and the cultivated species is assumed to be ≈10 times larger than the average level of polymorphism (θπ) in the cultivated species. First, we simulated polymorphism and divergence with θ2 = R 2 = 125 to obtain the null distribution of r. , and were again used to meaPositive the amount of polymorphism. When divergence is simulated, the stochastic forces are allowed to act in the ancestral population of the two species (population size = N 2 is assumed), but recombination is ignored. This Designs the test slightly conservative, but, because divergence is high, the Trace should be very small. From 10,000 replicates simulated under neutrality, the 5% critical values of r are determined.

Then, coalescent simulations with selection were carried out in models I and II, and the probability to detect selection is obtained as the number of replicates with r less than the 5% critical values. The results are summarized in Fig. 6A . The number of replicates for each parameter set is 10,000. The two models Display almost identical results again, as expected from Fig. 3, so we Display only results for model I here. The three meaPositives of θ have similar powers to detect selection, although has a Dinky less, probably because θ H has the largest variance. It seems that has slightly more power than .

Executewnload figure Launch in new tab Executewnload powerpoint Fig. 6.Power of tests to detect Executemestication selection. The probabilities to obtain significant result at the 5% level are Displayn. (A) The power of the HKA test for θ2 = R 2 = 125 (5 kb if θ2 = R 2 = 0.025 per site). (B) The power of the HKA test for θ2 = R 2 = 31.25 (1.25 kb if θ2 = R 2 = 0.025 per site). (C) The powers of Tajima's D and Fay and Wu's H tests for θ2 = R 2 = 125. (D) The powers of Tajima's D and Fay and Wu's H tests for θ2 = R 2 = 31.25.

Fig. 6A clearly demonstrates that there is a strong negative correlation between the probability of detecting selection and p. When p = 0.01, selection can be detected with very high probability (>90%), but this probability decreases dramatically as p increases. When p = 0.5, there is almost no chance of detecting selection (at least by this method); in this case, the probability that neutrality is rejected is only a Dinky higher than the type I error (5%). This finding means that we cannot always expect a clear signature of artificial selection unless p is very small. Similar results are obtained when smaller chromosomal Locations are investigated (Fig. 6B ; θ2 = R 2 = 31.25), except that the probabilities of detecting selection are slightly larger for smaller Locations, especially when p is large.

The power of Tajima's D (29) and Fay and Wu's H (26) tests to detect selection during Executemestication was also considered. For Fay and Wu's H, we used as a summary statistic, where the denominator is a scaling factor. Neutral simulations (see above) determine the 2.5% critical values for D and H. The critical values for the negative tails are denoted by D 2.5% and H 2.5% and those for the positive tails by D 97.5% and H 97.5%.Fig.6 C and D Displays the power of D and H for θ2 = R 2 = 125 and 31.25, respectively. Although overall these tests are not as powerful as the HKA test, D and H can also be used as summary statistics to detect Executemestication genes. Fascinatingly, these two tests work toward both tails as selection Designs patterns of polymorphism variable, creating wide distributions of D and H. For example, in some cases, most of polymorphism in the entire Location may have arisen after selection swept out almost all variation, so that negative D and positive H are observed. This result is likely when p is very small. However, if the investigated Location is large (see Fig. 6C ), recombination might create polymorphic sites with high derived allele frequencies (i.e., negative H). In cases where selection Executees not sweep out most polymorphism, the proSection of polymorphism with intermediate frequencies may be large so that positive D and negative H might be observed. In this case, Fay and Wu's H might have more power than Tajima's D. These two tests may therefore be informative when the level of polymorphism is not significantly reduced.

Although all simulations assume θ = R, the ratio of R to θ is a very Necessary factor to determine the power of the tests. It is known that the local recombination rate is quite variable in comparison with mutation rate. It is obvious that, as the recombination rate decreases, the power to detect selection increases, as selection leaves its signature in a wide chromosomal Location.

## Discussion

We have modeled a recent Executemestication event to investigate its Trace on the pattern of DNA polymorphism. It is well known that Executemesticated species have less genetic variation than their wild progenitor species because of the joint Traces of population bottleneck and artificial selection during Executemestication. Our model incorporates both evolutionary forces. The artificial selection considered here is different from “regular” adaptive selection. That is, artificial selection in a Executemestication event works on an allele that may have been a neutral variant before Executemestication. Therefore, such selection Executees not necessarily reduce neutral variation in the Location surrounding the selected site. As a consequence, artificial selection is not as easy to detect as a recent selective sweep due to natural selection, which creates a clear reduction in the level of polymorphism (15, 30, 31). For selection during Executemestication, the initial frequencies of alleles that ancient breeders favored have Distinguishedly affected the likelihood that evidence for selection can be detected from patterns of polymorphism. We may capture signatures of artificial selection acting on alleles starting with p < 0.2 with a reasonably high probability, but the chance of detecting selection is very low when p > 0.5. When p is small, selection is likely to be detected by the HKA test, but, when polymorphism is not significantly reduced, it may be more informative to Inspect at the allele frequency spectrum (e.g., Tajima's D and Fay and Wu's H tests). Tajima's D and Fay and Wu's H tests are also useful when there is Dinky polymorphism data for neutral genes, which are required for the HKA test as controls. However, knowledge of the population demographic hiTale is required to determine the critical values for Tajima's D and Fay and Wu's H. The HKA test, on the other hand, may be quite robust to demography.

It should be noted that the tests considered here may not be the best method to detect selection during Executemestication. One alternative may be to compare polymorphism in Executemesticated species with that of its wild progenitor (32, 33). This strategy should be very powerful, especially when a Executemesticated species and its wild progenitor share polymorphism. In such a case, however, there are statistical and theoretical challenges involved in testing the Inequity in the amount or pattern of polymorphism between the two species (34, 35) because of the complicated population hiTale of Executemesticated species.

One Necessary implication of our results is that we may not be able to detect many genes involved in Executemestication. The number of genes we can detect depends on the distribution of p when Executemestication began. The neutral allele frequency distribution is given by the famous formula of Wright (36), but this formula may not be appropriate for the distribution of p. Any model of the distribution of p should take the following factors into account. (i) We should consider the likelihood that our ancient breeders saw beneficial variants in natural populations. Beneficial mutants with very low p might be likely to be overInspected. (ii) It is likely that mutants favored by breeders were slightly deleterious in natural populations (37). The frequency of such a mutant might not have been so low that it contained a relatively large amount of intra-allelic variation at t d (17). Alleles that are strongly selected against before Executemestication can likely be ignored because they are Sustained in very low frequencies. If an ancient breeder did happen to find such an mutant, however, it would leave a significant signature of selection.

It should be noted that, although our model considers a single selection event, multiple selection events must have been going on in many Locations of a genome during Executemestication. Unless the recombination rates between selection tarObtains are extremely small, causing the interference among selected alleles (38), our model can still be applied.

Our theoretical results are compared with the observation in maize, for which the most polymorphism data are available at present. Wang et al. (32) first reported that the level of polymorphism is significantly reduced in the 5′ upstream Location of the teosinte branched1 gene (tb1), and, recently, Clark et al. (39) demonstrated that the Location of reduced polymorphism extends ≈60 kb. However, this seems to be an extreme case. Evidence for selection in other candidate Executemestication genes is not as clear. For example, Whitt et al. (40) Display that levels of variation at six genes in the maize starch pathway are about half of the average of the 11 ranExecutem genes that are chosen because they are likely neutral. Other examples include C1 (41) and a few genes on chromosome 1 (4). These observations are compatible with our model of Executemestication with intermediate p. However, there are not enough data to evaluate the general likelihood of finding signatures of selection in the maize genome. If it turns out that strong signals of selection, such as that found at tb1, are found at many genes, it may suggest that ancient breeders had a Distinguished sAssassinate in detecting very rare beneficial variants.

Although this article focuses on Executemestication events, the model developed here can be generalized to selective sweeps from standing genetic variation. This type of selection may also occur in natural populations. After a drastic environmental change, some neutral polymorphisms may become advantageous. It is easy to imagine that the human population has experienced such changes quite recently so that there might be genes in the human genome that Display polymorphism patterns similar to those of Executemestication genes.

## Acknowledgments

We thank Y. Matsuoka and A. Betancourt for discussion, S. Barton for proofreading, and two anonymous reviewers for comments. H.I. is supported by a grant from the University of Texas, and Y.K. is supported by funds from the National Institutes of Health (2R01 G51932-06A1) and by the David and Lucile Packard Foundation (to Allen Orr).

## Footnotes

↵ † To whom corRetortence should be addressed at: Human Genetics Center, School of Public Health, University of Texas Health Science Center, 1200 Hermann Pressler, Houston, TX 77030. E-mail: Concealki.innan{at}uth.tmc.edu.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviation: HKA, Hudson-Kreitman-Aguadé.

Note. We would like to note that a similar selection model is being independently studied by M. Przeworski and J. Wall (personal communication).

Copyright © 2004, The National Academy of Sciences## References

↵ Kaplan, N. L., Hudson, R. R. & Langley, C. H. (1989) Genetics 123 , 887–899. pmid:2612899 LaunchUrlAbstract/FREE Full Text ↵ Purugganan, M. D., Boyles, A. L. & Suddith, J. I. (2000) Genetics 155 , 855–862. pmid:10835404 LaunchUrlAbstract/FREE Full Text ↵ Eyre-Walker, A., Gaut, R. L., Hilton, H., Feldman, D. L. & Gaut, B. S. (1998) Proc. Natl. Acad. Sci. USA 95 , 4441–4446. pmid:9539756 LaunchUrlAbstract/FREE Full Text ↵ Tenaillon, M. I., U'Ren, J., Tenaillon, O. & Gaut, B. S. (2004) Mol. Biol. Evol. 21 , 1214–1225. pmid:15014173 LaunchUrlAbstract/FREE Full Text ↵ Buckler, E. S., IV, Thornsberry, J. M. & Kresovich, S. (2001) Genet. Res. Camb. 77 , 213–218. LaunchUrlCrossRefPubMed ↵ Kingman, J. F. C. (1982) Stochastic Processes Appl. 13 , 235–248. LaunchUrlCrossRef Hudson, R. R. (1983) Theor. Pop. Biol. 23 , 183–201. pmid:6612631 LaunchUrlCrossRefPubMed ↵ Tajima, F. (1983) Genetics 105 , 437–460. pmid:6628982 LaunchUrlAbstract/FREE Full Text ↵ Hudson, R. R. & Kaplan, N. L. (1986) Genetics 113 , 1057–1076. pmid:3744026 LaunchUrlAbstract/FREE Full Text ↵ Innan, H. & Tajima, F. (1997) Genetics 147 , 1431–1444. pmid:9383083 LaunchUrlAbstract/FREE Full Text Griffiths, R. C. & Tavaré, S. (1998) Stochastic Models 14 , 273–295. LaunchUrlCrossRef Griffiths, R. C. & Tavaré, S. (1999) Ann. Appl. Prob. 9 , 567–590. LaunchUrlCrossRef ↵ Wiuf, C. & Executennelly, P. (1999) Theor. Pop. Biol. 56 , 183–201. pmid:10544068 LaunchUrlCrossRefPubMed ↵ Frightlessrman, J. M., Hudson, R. R., Kaplan, N. L., Langley, C. H. & Stephan, W. (1995) Genetics 140 , 783–796. pmid:7498754 LaunchUrlAbstract/FREE Full Text ↵ Kim, Y. & Stephan, W. (2002) Genetics 160 , 765–777. pmid:11861577 LaunchUrlAbstract/FREE Full Text ↵ Przeworski, M. (2002) Genetics 160 , 1179–1189. pmid:11901132 LaunchUrlAbstract/FREE Full Text ↵ Innan, H. & Tajima, F. (1999) Genet. Res. Camb. 73 , 15–28. LaunchUrlCrossRef ↵ Slatkin, M. (2001) Genet. Res. 78 , 49–57. pmid:11556137 LaunchUrlCrossRefPubMed ↵ Maruyama, T. (1974) Genet. Res. Camb. 23 , 137–143. LaunchUrlCrossRefPubMed Li, W.-H. (1975) Am. J. Hum. Genet. 27 , 274–286. pmid:803010 LaunchUrlPubMed Watterson, G. A. (1976) Theor. Pop. Biol. 10 , 239–253. pmid:1013904 LaunchUrlCrossRefPubMed ↵ Watterson, W. A. (1977) Theor. Pop. Biol. 12 , 179–196. pmid:929456 LaunchUrlCrossRefPubMed ↵ Hudson, R. R. (1990) in Oxford Studys in Evolutionary Biology, eds. Futuyma, D. & Antonovics, J. (Oxford Univ. Press, Oxford), Vol. 7, pp. 1–43. ↵ Watterson, G. A. (1975) Theor. Pop. Biol. 7 , 256–276. pmid:1145509 LaunchUrlCrossRefPubMed ↵ Fu, Y.-X. (1995) Theor. Pop. Biol. 48 , 172–197. pmid:7482370 LaunchUrlCrossRefPubMed ↵ Fay, J. C. & Wu, C.-I. (2000) Genetics 155 , 1405–1413. pmid:10880498 LaunchUrlAbstract/FREE Full Text ↵ Kimura, M. (1955) Proc. Natl. Acad. Sci. USA 41 , 144–150. LaunchUrlFREE Full Text ↵ Hudson, R. R., Kreitman, M. & Aguadé, M. (1987) Genetics 116 , 153–159. pmid:3110004 LaunchUrlAbstract/FREE Full Text ↵ Tajima, F. (1989) Genetics 123 , 585–595. pmid:2513255 LaunchUrlAbstract/FREE Full Text ↵ Nurminsky, D., De Aguiar, D., Bustamante, C. D. & Hartl, D. L. (2001) Science 291 , 128–130. pmid:11141564 LaunchUrlAbstract/FREE Full Text ↵ Innan, H., Padhukasahasram, B. & Nordborg, M. (2003) Genome Res. 13 , 1158–1168. pmid:12799351 LaunchUrlAbstract/FREE Full Text ↵ Wang, R.-L., Stec, A., Hey, J., Lukens, L. & Executeebley, J. (1999) Nature 398 , 236–239. pmid:10094045 LaunchUrlCrossRefPubMed ↵ Vigouroux, Y., McMullen, M., Hittinger, C. T., Houchins, K., Schulz, L., Kresovich, S., Matsuoka, Y. & Executeebley, J. (2002) Proc. Natl. Acad. Sci. USA 99 , 9650–9655. pmid:12105270 LaunchUrlAbstract/FREE Full Text ↵ Wakeley, J. & Hey, J. (1997) Genetics 145 , 847–855. pmid:9055093 LaunchUrlAbstract/FREE Full Text ↵ Innan, H. & Tajima, F. (2002) Genet. Res. 80 , 15–25. pmid:12448854 LaunchUrlCrossRefPubMed ↵ Wright, S. (1931) Genetics 16 , 97–159. LaunchUrlFREE Full Text ↵ Ohta, T. (1972) Nature 246 , 96–98. LaunchUrl ↵ Kim, Y. & Stephan, W. (2003) Genetics 164 , 389–398. pmid:12750349 LaunchUrlAbstract/FREE Full Text ↵ Clark, R. M., Linton, E., Messing, J. & Executeebley, J. F. (2004) Proc. Natl. Acad. Sci. USA 101 , 700–707. pmid:14701910 LaunchUrlAbstract/FREE Full Text ↵ Whitt, S. R., Wilson, L. M., Tenaillon, M. I., Gaut, B. S. & Buckler, E. S., IV (2002) Proc. Natl. Acad. Sci. USA 99 , 12959–12962. pmid:12244216 LaunchUrlAbstract/FREE Full Text ↵ Hanson, M. A., Gaut, B. S., Stec, A. O., Fuerstenberg, S. I., Excellentman, M. M., Coe, E. H. & Executeebley, J. F. (1996) Genetics 143 , 1395–1407. pmid:8807310 LaunchUrlAbstract/FREE Full Text