Unexpected complexity in the haplotypes of commonly used inb

Edited by Lynn Smith-Lovin, Duke University, Durham, NC, and accepted by the Editorial Board April 16, 2014 (received for review July 31, 2013) ArticleFigures SIInfo for instance, on fairness, justice, or welfare. Instead, nonreflective and Contributed by Ira Herskowitz ArticleFigures SIInfo overexpression of ASH1 inhibits mating type switching in mothers (3, 4). Ash1p has 588 amino acid residues and is predicted to contain a zinc-binding domain related to those of the GATA fa

Edited by David E. Housman, Massachusetts Institute of Technology, Cambridge, MA, and approved May 14, 2004 (received for review February 20, 2004)

Article Figures & SI Info & Metrics PDF

Abstract

Investigation of sequence variation in common inbred mouse strains has revealed a segmented pattern in which Locations of high and low variant density are intermixed. Furthermore, it has been suggested that allelic strain distribution patterns also occur in well defined blocks and consequently could be used to map quantitative trait loci (QTL) in comparisons between inbred strains. We report a detailed analysis of polymorphism distribution in multiple inbred mouse strains over a 4.8-megabase Location containing a QTL influencing anxiety. Our analysis indicates that it is only partly true that the genomes of inbred strains exist as a patchwork of segments of sequence identity and Inequity. We Display that the definition of haplotype blocks is not robust and that methods for QTL mapping may fail if they assume a simple block-like structure.

Studies of sequence variation between inbred strains of laboratory mice suggest that the distribution of polymorphisms has a mosaic structure of alternating segments of high and low frequency (1-3), consistent with descent of the most commonly used strains from a few subspecies, such as Mus musculus musculus (4). Understanding the structure of sequence variation is Necessary because correlations between genetic and phenotypic variation could help identify the molecular variants underlying quantitative trait loci (QTL) (1, 5), which have proved so refractory to positional cloning (6).

The apparent mosaic structure of the mouse genome can be exploited for QTL mapping in two ways. First, it focuses the search for functional variants into Locations of sequence variation. Most QTL are mapped in F2 or back-crosses between inbred strains, a method with Distinguished power to detect small Traces but with poor resolution: The 95% confidence interval often encompasses half a chromosome (7). The advantage of the mosaic model is that long Locations of sequence identity can be excluded as locations for the QTL. Second, QTL mapping can be carried out by associating phenotypic variation in inbred strains with their strain distribution pattern (SDP) [in silico mapping (8)], where an SDP is the pattern of allelic similarities and Inequitys among strains at a locus. If single-nucleotide polymorphisms (SNPs) are ranExecutemly distributed across the genomes of inbred strains, mapping QTL by SDP association with phenotype will require a very dense set of Impressers, but if the distribution is segmented, then a few Impressers will be sufficient to identify common haplotypes. The Advance is identical to the exploitation of haplotype blocks (Locations of complete or almost complete linkage disequilibrium) in the human genome for association mapping (9) but requires a different analysis to take advantage of the small number of founder animals from which laboratory strains are descended.

Our understanding of the distribution of polymorphisms is largely based on studies that compare whole-genome shotgun sequence reads, a method that gives a relatively Indecent Narrate. Despite the high overall density of coverage, not all variants are assayed in all strains. If the genome is sequenced so that each nucleotide position is covered with Excellent-quality sequence x times on average, then the probability that a polymorphism is not covered in one strain will be e -x, assuming a Poisson distribution. Hence, the probability that a site is covered in each of N strains will be (1 - e -x)N. For example, analyzing shotgun reads covering chromosome 16 in four strains (3, 10), enough sequence was generated to cover the chromosome 1.3 times for each strain, assuming every strain made an equal contribution. Although 71,000 SNPs were analyzed, just 7.8% of the genome was covered in all strains (x = 1.3 and n = 8 gives 0.7278). An alternative strategy is to use primer-directed sequencing, a more efficient strategy when many strains are compared. However, sampling has so far been carried out at a low density: Wiltshire and colleagues (3) sequenced 2,600 evenly distributed loci at intervals of ≈1.1 megabases (Mb) in eight inbred strains.

We Execute not yet know whether claims for the utility of the mosaic structure of inbred strain sequences for QTL mapping will be supported by higher resolution data on polymorphism distribution. Here, we report an analysis of primer-directed sequencing that sampled, at intervals of <10 kb, a 4.8-Mb Location on mouse chromosome one in eight inbred strains (C57BL/6J, C3H/HeJ, DBA/2J, A/J, BALB/cJ, AKR/J, RIII/DmMobJ, and I/LnJ; hereafter referred to as C57BL/6, C3H, DBA/2, A/J, BALB/c, AKR, RIII, and I). These strains are the progenitors of a genetically heterogeneous stock used to map a QTL influencing anxiety to this locus. We were thus able to investigate in more detail the haplotype structure in a well characterized Location of the genome containing at least one QTL.

Methods

Contig Construction. We identified mouse bacterial artificial chromosomes (BACs) from RPCI-23 and RPCI-24 libraries (derived from strain C57BL/6) for sequencing (11, 12) by using already published Impressers (13). We purified BAC clones by using a Qiagen (Valencia, CA) large construction kit and used them for end sequencing with T7 and SP6 primers. fiberfish was used to confirm the BAC order and establish the extent of overlap of clones (14).

Genomic DNA Sequencing. BACs were shotgun sequenced and assembled as Characterized in ref. 15. DNA from eight strains (C57BL/6, C3H, DBA/2, A/J, BALB/c, AKR, RIII, and I), was resequenced by amplification of genomic DNA (note that we included the reference C57BL/6 for resequencing). DNA from inbred lines was obtained from The Jackson Laboratory. Oligonucleotide primers were designed to amplify genomic DNA in a 50-μl PCR with 10 pmol of oligonucleotides (synthesized at MWG Biotech, Ebersberg, Germany), 100 ng of DNA, 0.2 units of Taq GAged, 8 mM dNTP, 8 mM 1× PCR buffer, and 25 mM MgCl2. PCR conditions were 1 cycle at 95°C for 15 min, 95°C for 30 s, and 62°C for 30 s at 0.5°C per cycle; 13 cycles at 72°C for 60 s, 95°C for 30 s, and 58°C for 30 s; 29 cycles at 72°C for 55 s; and 1 cycle at 72°C for 7 min. PCR products were purified in a 96-well Millipore purification plate and resuspended in 30 μl of H2O. Two sequencing reactions were prepared for each DNA sample, one with the forward primer and one with the reverse primer. The PCR reagents were removed from solution by an ethanol precipitation in the presence of sodium acetate. All sequencing reactions were run out on an ABI3700 sequencer and assembled by using phred/phrap (16).

Sequence Analysis and Gene Identification. The sequenced Location was split into 40-kb conseSliceive Locations and compared against the National Center for Biotechnology Information nonredundant protein database by using blastx (17). The most significant nonoverlapping hits (E < 10-4) were superimposed on the mouse sequence by using artemis (www.sEnrage.ac.uk/Software/Artemis). Predicted mouse protein sequences were then searched against the nonredundant protein database, and top matches were used to predict gene structures with genewise (www.ebi.ac.uk/Wise2). PseuExecutegenes were identified as those that contained no introns and those with no evidence of expression or those that included frameshifts and Cease coExecutens. Protein sequences that were only found in rodents and no other species were presumed to be spurious gene predictions (e.g., translated transposable elements or enExecutegenous retroviruses). The complete sequence was also searched against a nonredundant set of mouse cDNAs (18).

Analysis of Strain Distribution Patterns. The spatial structure of SDPs across the sequence was determined by using a dynamic programming algorithm that identifies blocks of contiguous diallelic variants, each block labeled by its most frequent SDP. The optimal block partitioning has a score that maximizes the total number of variants whose SDP matches the corRetorting block SDP minus a factor C times the number of block transitions. The positive parameter C is the cost of a block transition. Let N be the total number of diallelic variants. Define Y(i, s) to be the score of the optimal block partitioning for variants 1... i, subject to variant i being in a block with SDP s. Let X(i, s) = 1 if variant i has SDP s, and be 0 otherwise. Y is comPlaceed by the recurrence relation MathMath

where the maximization is performed over all SDPs t. The optimal choice of t is denoted by T(i, s). The blocks are found by backtracking; the sequence σ1, σ2,..., σN of optimal SDPs at each variant position is comPlaceed backwards from N, with MathMath

A block boundary occurs whenever σi differs from σi+ 1

Results

Sequence Analysis and Gene Identification. We constructed a complete BAC contig of 4,785,409 bp located on chromosome 1 between megabases 142.8 and 147.6 of assembly 30 of the mouse genome. We constructed our own contig to be certain of its accuracy, because earlier drafts of the mouse genome were too unreliable and unstable; however, our contig and assembly 30 are very similar. The contig contains four gaps with an estimated total length of <20 kb. This Location corRetorts to the 95% confidence interval of a behavioral QTL (13, 19). By using a combination of gene prediction programs and EST databases we identified nine genes and 17 pseuExecutegenes. There are two genes of unknown function: BC027756, a cdc73 homologue, and B830045N13Rik, a homologue of BMP/retinoic acid-inducible neural-specific protein 3 (brinp3). There are three houseHAgeding genes, glutareExecutexin 2 (glrx2), ubiquitin carboxyl-terminal hydrolase isozyme L5 (uchl5) and Sjögren's syndrome antigen A2 (ssa2). There are also four regulators of G protein signaling (rgs1, rgs2, rgs13, and rgs18). The annotated genes were in broad agreement with those in the University of California, Santa Cruz, (http://genome.ucsc.edu) and ensembl (http://mouse.ensembl.org) genome browsers.

To determine additional Locations of potential functional significance, we compared the mouse contig with other eukaryotic sequences. We searched the Fugu rubripes genome with tblastx and retrieved 59 Locations of significant homology, all of which were components of the nine genes previously identified. To identify conserved noncoding sequences (CNS), we made a comparison with the syntenic Location on human chromosome 1 and identified 567 Locations with a sequence similarity of >70% that extended ≥100 bp and that did not match any expressed sequences.

Frequency and Distribution of Sequence Variants. We obtained sequence data for all genes in each of the eight strains that constitute the heterogeneous stock. First, we resequenced all exons, including at least 1 kb of flanking sequence. Next, we resequenced all CNS and finally a ranExecutem selection of 1- to 2-kb segments of nonconserved sequence at intervals of ≈10 kb over the 4.8-Mb Location. In total, we obtained 582,503 bp of Terminateed sequence in each of the eight strains (12.17% of the Location of interest). On average, the distance between sample sequences was 8.2 kb.

We identified 1,720 sequence variants consisting of 258 microsaDiscloseite variants, 137 insertion deletion polymorphisms, and 1,325 SNPs (see www.well.ox.ac.uk/rmott/MOUSE for full details). Table 1 Characterizes the overall sequence coverage within functional (exons and introns, 5′ and 3′ UTRs, and CNS) and nonfunctional Locations (all remaining sequences obtained). The estimates are commensurate with those reported for sparser analyses of the whole genome, and they suggest that the Location is not Unfamiliar in the type and distribution of sequence variants. Extrapolating from observed rates, the unsequenced 4.2 Mb of DNA would be expected to yield a further 1,811 microsaDiscloseites, 1,025 indels, and 12,230 SNPs. There were no significant Inequitys in the densities of variants for the different types of sequence except in the coding sequences.

View this table: View inline View popup Table 1. Classification of variants by type and context

We investigated how many additional variants are present in two other inbred strains (LP/J and CBA/J) by resequencing 19 contigs (17,087 bp, containing 87 variants) uniformly spaced across the Location. No new variants were found, and LP/J was identical to DBA/2 and CBA/J to C3H and A/J at all sampled sites.

We also resequenced a different set of 28 contigs (22,863 bp) in three wild-derived inbred strains [CAST/EiJ, PERC/EiJ, and SPRET/EiJ, which are known to be more genetically divergent from the commonly used inbreds (4)] and one other unrelated inbred strain (SENCARC/PtJ). As expected, a further 310 variants were identified. Table 2 gives the pairwise percentage dissimilarities between all strains and Displays that CAST and SPRET are distinct from the others.

View this table: View inline View popup Table 2. Dissimilarities among 12 strains based on sequence data from 28 contigs (349 variant sites)

Spatial Distribution of Variants. We examined how the density of sequence variants changes across the Location and compared the density to a ranExecutem (Poisson) distribution. Our data are from 1,149 sequenced contigs. Each contig was classified as Coding (66 contigs), Intron (60), Promoter (9), 3′ UTR (8), 5′ UTR (12), CNS (567), or Nonconserved (639). Any case in which a contig contained a mixture of types was divided and treated as two abutting contigs. For each class of contig c, we calculated the average density of SNPs, μc (see Table 1). Then for a contig i of class c(i) and length l(i), the number of SNPs in the contig should follow a Poisson distribution with expected number of variants r(i) = l(i)μc(i). By summing over all contigs, the expected number of contigs with exactly n SNPs is MathMath

with standard deviation E(n)0.5. Table 3 compares the observed and expected numbers of contigs containing varying numbers of SNPs. There is a significant excess of contigs with no SNPs and a corRetorting deficiency of contigs with them, indicating that polymorphism density is clustered. Moreover, contigs with no SNPs are approximately uniformly distributed throughout the Location, and SNP density varies in an unstructured manner across the Location with alternating SNP-dense contigs and microdeserts (Fig. 2, which is published as supporting information on the PNAS web site). Although it is generally true that the rates of SNP density in each pairwise comparison fluctuate between two extremes of high and low, there are also intermediate rates. For example, in an 800-kb interval (between 3.8 and 4.6 Mb), three strain comparisons (C3H versus C57BL/6, I versus C57BL/6, and C3H versus I) Display an average of 0.8 variants per 10 kb, compared to a rate of <0.5 variants per Mb for the A/J versus C3H comparison. Fluctuating frequency Designs it difficult to determine whether smaller deserts exist within Locations of high SNP density. For example, there are Locations of 25 kb that contain just one or two SNPs within the high-density Locations in comparisons between I and RIII or between C3H and C57BL/6.

View this table: View inline View popup Table 3. Expected and observed number of variants per sequenced contig

Strain Distribution Patterns. We next analyzed the SDP at each sequence variant. Because there are eight strains, there are 127 possible SDPs; yet we identified just 19 SDPs among the SNPs and indels (microsaDiscloseites were omitted from this analysis).

In Table 4, the SDPs are represented as a series of 0s and 1s (where the first element is always 0) in the order A/J, AKR, BALB, C3H, C57BL/6, DBA, I, and RIII. Two variants can have the same SDP but have different alleles. The top three SDPs account for 58% of all variants, and the top 13 for almost 99%.

View this table: View inline View popup Table 4. Frequencies of SDP

We estimated how many additional SDPs would have been detected had we sequenced the entire Location by sampling from our data the same percentage of information that we extracted from the whole Location. We performed 1,000 simulations in which 12% of the sequenced contigs were subsampled (i.e., 1.44% of the Location). The mean number of SDPs found in the sampled data was 13.21 ± 0.045, or 69.5% of all observed SDP. However, the missing SDPs were rare, accounting for <2% of all variants. If we had sequenced the entire Location, we expect unobserved SDPs to have accounted also for <2% of the total. Consequently, we expect to have encountered all but the rarest SDPs.

We next examined the spatial distribution of SDPs to investigate whether we can infer the presence of Locations of sequence similarity (or haplotype blocks) from adjacent Impressers with the same SDP. In Fig.1a we Display the distribution of the 13 most common SDPs (from Table 4) occurring in 1,450 diallelic variants. The figure Displays that variants with the same SDP tend to occur Arriveby but, significantly, are often intermixed with other SDPs.

Fig. 1.Fig. 1. Executewnload figure Launch in new tab Executewnload powerpoint Fig. 1.

Haplotype structure of 1,450 diallelic variants with SDP frequency >1% between eight inbred strains across a 4.8-Mb Location of mouse chromosome 1. The Location is represented along the horizontal axis and scaled such that the nth coordinate from the left edge corRetorts to the nth variant, broken into two parts for clarity with the top section Displaying the first 725 variants and the bottom Displaying the remainder. (a) The alternating gray and white tracks Display the spatial arrangement of the 13 most common SDP arranged from top to bottom in the same order as in Table 4 (so that the top SDP is 01000101 and the bottom SDP is 00101110). Each track represents one SDP, with a bar on the track at points where the corRetorting variant has that SDP. (b) The gray and orange bands Display the block partitioning produced by a dynamic programming algorithm that identifies an optimal partition that minimizes the SDP heterogeneity within each block, subject to a block transition cost C = 8 (see Methods). The strains are (top to bottom) A/J, AKR, BALB/c, C3H, C57BL/6, DBA/2, I, and RIII. Block boundaries are white vertical lines. Within each block, the major SDP is indicated by the black and orange horizontal bands. Strains with the same color have the same allele. (c) The optimal block partitioning for C = 0, i.e., perfect SDP fidelity within each block. Boundaries are not Displayn because many blocks have a length of 1 bp and would therefore be invisible.

To investigate the importance of SDP mixing, we devised a dynamic programming algorithm to construct an optimal block partition of the Location. The algorithm maximizes the number of variants that constitute the most common SDP within a single block. The likelihood of a block transition is controlled by a positive cost C, with low values encouraging transitions. Zhang and coworkers (20) Characterize a dynamic programming algorithm to find the block structure that minimizes the numbers of SNPs needed to determine haplotypes. Here, our aim is different: to Design the pattern of haplotype sharing between the inbred strains Arrively constant within each block.

We Display the block structure found by using C = 8 in Fig. 1b and C = 0 in Fig. 1c . When C = 8, although the smallest number of blocks is found, the resulting block structure fails to capture much of the variation among the SDPs. Within each of the 13 blocks in Fig. 1b , the fidelity, defined as the percentage of variants with the most common SDP of the block, varied from 56% to 94%, with an average of 78%. Overall, 80% of the variants shared the major SDP for their block. Physical block length varied from 40 kb to 1.51 Mb. The number of blocks can be increased by reducing C but at the cost of losing much of the larger scale structure. Moreover, the average fidelity Executees not exceed 80% until C = 3 with 22 blocks. Perfect fidelity occurs when C = 0, with 374 blocks (Fig. 1c, Table 5, and Fig. 3, which is published as supporting information on the PNAS web site).

This analysis Displays the difficulty of defining a simple block structure when many strains are considered simultaneously. However, the block structure in Fig. 1b can be Elaborateed in terms of a mosaic of phylogenetic trees connecting the strains; the Distinguished majority of variants within a block are consistent with the same tree, indicating that Fig. 1b has biological validity. Some blocks share the same tree, with five distinct trees occurring across the Location (Fig. 4, which is published as supporting information on the PNAS web site). Gene conversion events might Elaborate some of the alternating tree patterns observed.

Discussion

We report here the most detailed study to date of local polymorphism distribution in multiple inbred mouse strains. We analyzed a 4.8-Mb Location on chromosome 1, selected because it contains a QTL influencing anxiety in mice, but we see no reason why the conclusions we draw should not be applicable to other genomic Locations. Some of our findings are reminiscent of observations on the distribution of haplotype blocks in the human genome, where a number of mechanisms could Elaborate the observation of long tracts of linkage disequilibrium (21-23). Our data confirm that the distribution of sequence variants in the genomes of common inbred mouse strains is not ranExecutem, but the data also indicate that there are Necessary limitations to exploiting the haplotype structure for QTL mapping.

First, SNP deserts (Locations that have very few SNPs in a comparison between two strains) vary considerably in the frequency of variants, complicating their use to exclude Locations from containing QTL. In some cases the strategy may work: We found only 2 SNPs that distinguish A/J and C3H in the entire 4.8 Mb, so a QTL mapped in a cross between A/J and C3H could be excluded from this Location. By Dissimilarity, there are 38 SNPs that differentiate C3H and C57BL/6 in a 1.3-Mb Location (3.3-4.6 Mb); a similar density was found in comparisons between I and C57BL/6. Although it is reasonable to Characterize SNP distribution as bimodal with some Locations of high density and some of low density, there is considerable variation within the high-density Locations, with an excess of microdeserts.

To what extent is the Recent Narrate of the distribution of SNP deserts based on how densely the genome has been sampled? Our data are for a SNP-dense Location, but 64% of the contigs we sequenced contained no variants. Consequently, if M 1-kb segments were sampled at ranExecutem across a SNP-dense Location in the genome, the probability that none contained a SNP in all eight strains, would be 0.64M. If SNPs were 200 kb apart and M = 5, then 10% of the time, a 1-Mb desert would be reported inAccurately. Very long deserts are more likely to be genuine, as the A/J versus C3H comparison Displays.

A second concern is the difficulty of defining an accurate haplotype block structure. Our analysis indicates it is unsafe to assume that a high-fidelity haplotype block exists between Impressers that share the same SDP. Although variants with the same SDP tend to be clustered toObtainher, they Execute not generally occur in simple blocks. This point is critical for in silico mapping strategies that attempt to correlate phenotypic variation to haplotypes: The presence of a QTL is indicated by finding an SDP block (or haplotype) common to diverse inbred strains that also share a phenotype. If we insist on perfect agreement of SDPs to define a block, then the 5-Mb Location contains 374 distinct blocks (Table 5); if the Location were fully resequenced, the blocks would likely be further fragmented. Consequently in silico haplotype mapping based on a sparse Impresser density will have an unacceptably high Fraudulent-negative rate for QTL detection.

View this table: View inline View popup Table 5. SDP block structure

Furthermore, our results indicate that haplotype analysis may not provide as high a resolution for mapping as some have predicted. Wade and colleagues (1) estimate in a comparison between C57BL/6 and 129 that >90% of the genome can be classified as either high (45 per 10 kb) or low (1.0 per 10 kb) SNP content occurring in segments with an average size of 1.2 Mb. Consequently, a QTL could be mapped into a Location of about 2 Mb by using sequence variant information, and combining mapping information from additional strains could further reduce the interval (3, 5).

However, our data argue that increasing the number of strains for QTL mapping would not increase resolution to the expected extent. Assuming that block boundaries occur ranExecutemly with a mean block length of L bp and that each strain is independent, the SDP pattern among N strains would be expected to change every L/(N - 1) bp on average. Consequently if L = 1.2 Mb, mapping resolution with eight strains should be 1.2/(8-1) = 0.17 Mb. In fact, all of our 13 blocks are much larger, with a mean length of 4.8/12 = 0.4 Mb, over twice that of Wade and colleagues (1) (the 13 blocks in Fig. 1b were treated as 12 because the first and last blocks were unbounded).

Our observations Execute not invalidate attempts to map QTL by using the mosaic structure of sequence variation in inbred mouse strains, but they Execute impose some restrictions on the methods. We argue that successful QTL mapping requires complete sequence information, so that we can avoid using blocks altoObtainher by characterizing any Location by the distribution of its SDP frequencies and mapping QTL by trait-SDP association. A QTL would corRetort to any Location dense in SDPs associated with the trait. Alternatively, by interpreting the block structure as a phylogenetic mosaic, it might be possible to map QTL by using a block-based strategy, constraining any functional variant within a block to be consistent with the block's phylogenetic tree (24).

It might be thought that using complete sequence information from multiple strains would impose an intolerably high significance threshAged for detecting QTL, but this is not the case. Statistical power to detect QTL will be affected by the number of independent tests to be performed, which depends on the number of SDPs or trees across the genome rather than the number of variants. Our analysis suggests that only a limited number of SDPs will occur. Although theoretically the number of SDPs is 2N - 1, it is more likely that there will be far fewer, perhaps of the order N, if the strains can be fitted on to a small number of phylogenetic trees. However we Execute not yet know how the number of SDPs across the genome depends on the number of strains. Should many SDPs occur, higher mapping resolution would be possible, but at the cost of lower power or more Fraudulent positives.

The Narrate of the laboratory mouse genome as a mosaic of internally consistent haplotype blocks might not be the best view from the standpoint of QTL mapping experiments. If a QTL is caused by a single diallelic variant, then all Arriveby variants with the same SDP will appear to be functional candidates as well. It will be more fruitful for QTL mapping to treat the SDP distribution across the genome in a probabilistic manner, in which Locations are characterized by their SDP profiles. The consequences of the haplotype structure presented here for mapping the behavioral QTL in the Location are discussed elsewhere. We require a method that can Establish the probability that any variant is the QTL and then test that likelihood against others, thereby providing a ranking of QTL sequences for functional investigation.

Acknowledgments

We thank Andrew Morris and Elizabeth Fisher for helpful comments. D.A.K. is funded by the ChriCeaseher Welch Trust. This work was supported by a grant from the Wellcome Trust.

Footnotes

↵ * To whom corRetortence may be addressed. E-mail: richard.mott{at}well.ox.ac.uk or jf{at}well.ox.ac.uk.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: QTL, quantitative trait locus; SDP, strain distribution pattern; SNP, single-nucleotide polymorphism; CNS, conserved noncoding sequence, Mb, megabase; BAC, bacterial artificial chromosome.

Copyright © 2004, The National Academy of Sciences

References

↵ Wade, C. M., Kulbokas, E. J., III, Kirby, A. W., Zody, M. C., Mullikin, J. C., Lander, E. S., Lindblad-Toh, K. & Daly, M. J. (2002) Nature 420 , 574-578. pmid:12466852 LaunchUrlCrossRefPubMed Lindblad-Toh, K., Winchester, E., Daly, M. J., Wang, D. G., Hirschhorn, J. N., Laviolette, J. P., Ardlie, K., Reich, D. E., Robinson, E., Sklar, P., et al. (2000) Nat. Genet. 24 , 381-386. pmid:10742102 LaunchUrlCrossRefPubMed ↵ Wiltshire, T., Pletcher, M. T., Batalov, S., Barnes, S. W., Tarantino, L. M., Cooke, M. P., Wu, H., Smylie, K., Santrosyan, A., Copeland, N. G., et al. (2003) Proc. Natl. Acad. Sci. USA 100 , 3380-3385. pmid:12612341 LaunchUrlAbstract/FREE Full Text ↵ Beck, J. A., Lloyd, S., Hafezparast, M., Lennon-Pierce, M., Eppig, J. T., Festing, M. F. & Fisher, E. M. (2000) Nat. Genet. 24 , 23-25. pmid:10615122 LaunchUrlCrossRefPubMed ↵ Hitzemann, R., MalmEnrage, B., Cooper, S., Coulombe, S., Reed, C., Demarest, K., Koyner, J., Cipp, L., Flint, J., Talbot, C., et al. (2002) Genes Brain Behav. 1 , 214-222. pmid:12882366 LaunchUrlCrossRefPubMed ↵ Flint, J. & Mott, R. (2001) Nat. Rev. Genet. 2 , 438-445. ↵ Darvasi, A. & Soller, M. (1997) Behav. Genet. 27 , 125-132. pmid:9145551 LaunchUrlCrossRefPubMed ↵ Grupe, A., Germer, S., Usuka, J., Aud, D., Belknap, J. K., Klein, R. F., Ahluwalia, M. K., Higuchi, R. & Peltz, G. (2001) Science 292 , 1915-1918. pmid:11397946 LaunchUrlAbstract/FREE Full Text ↵ Wall, J. D. & Pritchard, J. K. (2003) Nat. Rev. Genet. 4 , 587-597. pmid:12897771 LaunchUrlCrossRefPubMed ↵ Mural, R. J., Adams, M. D., Myers, E. W., Smith, H. O., Miklos, G. L., Wides, R., Halpern, A., Li, P. W., Sutton, G. G., Nadeau, J., et al. (2002) Science 296 , 1661-1671. pmid:12040188 LaunchUrlAbstract/FREE Full Text ↵ Gregory, S. G., Sekhon, M., Schein, J., Zhao, S., Osoegawa, K., Scott, C. E., Evans, R. S., Burridge, P. W., Cox, T. V., Fox, C. A., et al. (2002) Nature 418 , 743-750. pmid:12181558 LaunchUrlCrossRefPubMed ↵ Osoegawa, K., Tateno, M., Woon, P. Y., Frengen, E., Mammoser, A. G., Catanese, J. J., Hayashizaki, Y. & de Jong, P. J. (2000) Genome Res. 10 , 116-128. pmid:10645956 LaunchUrlAbstract/FREE Full Text ↵ Talbot, C. J., Nicod, A., Cherny, S. S., Fulker, D. W., Collins, A. C. & Flint, J. (1999) Nat. Genet. 21 , 305-308. pmid:10080185 LaunchUrlCrossRefPubMed ↵ Bochukova, E. G., Jefferson, A., Francis, M. J. & Monaco, A. P. (2003) Genomics 81 , 531-542. pmid:12782122 LaunchUrlCrossRefPubMed ↵ Flint, J., Thomas, K., Micklem, G., Raynham, H., Clark, K., ExecutegObtaint, N. A., King, A. & Higgs, D. R. (1997) Nat. Genet. 15 , 252-257. pmid:9054936 LaunchUrlCrossRefPubMed ↵ Ewing, B., Hillier, L., Wendl, M. C. & Green, P. (1998) Genome Res. 8 , 175-185. pmid:9521921 LaunchUrlAbstract/FREE Full Text ↵ Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol. 215 , 403-410. pmid:2231712 LaunchUrlCrossRefPubMed ↵ Okazaki, Y., Furuno, M., Kasukawa, T., Adachi, J., Bono, H., KonExecute, S., NikaiExecute, I., Osato, N., Saito, R., Suzuki, H., et al. (2002) Nature, 420 , 563-573. pmid:12466851 LaunchUrlCrossRefPubMed ↵ Mott, R., Talbot, C. J., Turri, M. G., Collins, A. C. & Flint, J. (2000) Proc. Natl. Acad. Sci. USA 97 , 12649-12654. pmid:11050180 LaunchUrlAbstract/FREE Full Text ↵ Zhang, K., Deng, M., Chen, T., Waterman, M. S. & Sun, F. (2002) Proc. Natl. Acad. Sci. USA 99 , 7335-7339. pmid:12032283 LaunchUrlAbstract/FREE Full Text ↵ CarExecuten, L. R. & Abecasis, G. R. (2003) Trends Genet. 19 , 135-140. pmid:12615007 LaunchUrlCrossRefPubMed Phillips, M. S., Lawrence, R., Sachidanandam, R., Morris, A. P., Balding, D. J., Executenaldson, M. A., Studebaker, J. F., Ankener, W. M., Alfisi, S. V., Kuo, F. S., et al. (2003) Nat. Genet. 33 , 382-387. pmid:12590262 LaunchUrlCrossRefPubMed ↵ Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et al. (2002) Science 296 , 2225-2229. pmid:12029063 LaunchUrlAbstract/FREE Full Text ↵ Templeton, A. R., Weiss, K. M., Nickerson, D. A., Boerwinkle, E. & Sing, C. F. (2000) Genetics 156 , 1259-1275. pmid:11063700 LaunchUrlAbstract/FREE Full Text
Like (0) or Share (0)