Content area
Full Text
About the Authors:
Daniel R. Schrider
Roles Conceptualization, Investigation, Methodology, Software, Writing - original draft, Writing - review & editing
* E-mail: [email protected]
Current address: Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
Affiliations Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of America
ORCID http://orcid.org/0000-0001-5249-4151
Julien Ayroles
Roles Data curation, Writing - review & editing
Affiliations Ecology and Evolutionary Biology Department, Princeton University, Princeton, New Jersey, United States of America, Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
Daniel R. Matute
Roles Data curation, Writing - original draft, Writing - review & editing
Affiliation: Biology Department, University of North Carolina, Chapel Hill, North Carolina, United States of America
ORCID http://orcid.org/0000-0002-7597-602X
Andrew D. Kern
Roles Conceptualization, Investigation, Methodology, Software, Writing - original draft, Writing - review & editing
Affiliations Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America, Human Genetics Institute of New Jersey, Rutgers University, Piscataway, New Jersey, United States of AmericaAbstract
Hybridization and gene flow between species appears to be common. Even though it is clear that hybridization is widespread across all surveyed taxonomic groups, the magnitude and consequences of introgression are still largely unknown. Thus it is crucial to develop the statistical machinery required to uncover which genomic regions have recently acquired haplotypes via introgression from a sister population. We developed a novel machine learning framework, called FILET (Finding Introgressed Loci via Extra-Trees) capable of revealing genomic introgression with far greater power than competing methods. FILET works by combining information from a number of population genetic summary statistics, including several new statistics that we introduce, that capture patterns of variation across two populations. We show that FILET is able to identify loci that have experienced gene flow between related species with high accuracy, and in most situations can correctly infer which population was the donor and which was the recipient. Here we describe a data set of outbred diploid Drosophila sechellia genomes, and combine them with data from D. simulans to examine recent introgression between these species using FILET. Although we find that these populations may have split...