Content area
Full Text
Mutations that add, subtract, rearrange, or otherwise refashion genome structure often affect phenotypes, although the fragmented nature of most contemporary assemblies obscures them. To discover such mutations, we assembled the first new reference-quality genome of Drosophila melanogaster since its initial sequencing. By comparing this new genome to the existing D. melanogaster assembly, we created a structural variant map of unprecedented resolution and identified extensive genetic variation that has remained hidden until now. Many of these variants constitute candidates underlying phenotypic variation, including tandem duplications and a transposable element insertion that amplifies the expression of detoxification-related genes associated with nicotine resistance. The abundance of important genetic variation that still evades discovery highlights how crucial high-quality reference genomes are to deciphering phenotypes.
Mutations underlying phenotypic variation remain elusive in traitmapping studies1 despite the exponential accumulation of genomic data, suggesting that many causal variants are invisible to current genotyping approaches2-5. In fact, mutations like duplications, deletions, and transpositions6,7 are systematically under-represented by standard methods7, even as a consensus emerges that such structural variants (SVs) are important factors in the genetics of complex traits2. Addressing this problem requires compiling an accurate and complete catalog of the genomic features that are relevant to phenotypic variation, a goal most readily achieved by comparing nearly complete high-quality genomes7. Although the development of high-throughput short-read sequencing led to a steep drop in cost and a commensurate increase in the pace of sequencing8, it also led to a focus on single-nucleotide changes and small indels3,9. Paradoxically, this has also resulted in deterioration of the contiguity and completeness of new genome assemblies, due primarily to read-length limitations10.
Here we present a reference-quality assembly of a second D. melanogaster strain called A4 and introduce a comprehensive map of SVs, which identifies a large amount of hidden variation exceeding that due to SNPs and small indels, and which includes strong candidates to explain complex traits. The A4 strain is a part of the Drosophila Synthetic Population Resource (DSPR)11, a resource for mapping phenotypically relevant variants. We assembled the new A4 genome using high-coverage (147x) long reads through singlemolecule real-time sequencing of DNA extracted from females (Supplementary Fig. 1), following an approach that has been shown to yield complete and contiguous assemblies12. The A4 assembly is more...