Content area
Full text
About the Authors:
Robert J. Elshire
Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
Jeffrey C. Glaubitz
Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
Qi Sun
Affiliation: Computational Biology Service Unit, Cornell University, Ithaca, New York, United States of America
Jesse A. Poland
Affiliation: Hard Winter Wheat Genetics Research Unit, United States Department of Agriculture/Agricultural Research Service, Manhattan, Kansas, United States of America
Ken Kawamoto
Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
Edward S. Buckler
Affiliations Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America, Plant, Soil and Nutrition Research Unit, United States Department of Agriculture/Agricultural Research Service, Ithaca, New York, United States of America
Sharon E. Mitchell
* E-mail: [email protected]
Affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
Introduction
During the last decade, extensive public resources were dedicated to genotyping humans, a species with relatively low genetic diversity (about one substitution per thousand nucleotides) [1]–[3]. Many species including maize [4], [5], Drosophila [6], and some bacteria [7], however, are at least 10 times more diverse than humans (more than one substitution per hundred nucleotides). Besides containing high levels of nucleotide diversity, the maize genome also exhibits frequent transposon-mediated rearrangements that produce extensive presence/absence variation that often encompasses genic regions [8]–[10]. Standard, fixed-sequence approaches like single base extension assays or microarrays require invariant primer binding sites in order to obtain consistent results. Such invariant regions are often difficult to find in maize [11]. Furthermore, the large-scale structural variation also complicates DNA sequence alignment, resulting in a maize “reference” genome that contains only 70% or less of the species-wide genome space [12].
Although abundant diversity is a challenge to assays that rely on scoring fixed positions, it is advantageous to direct sequencing approaches because sequencing efficiency for genotyping scales directly with genetic diversity. We have developed a technically simple, highly multiplexed, genotyping-by-sequencing (GBS) approach that is suitable for population studies, germplasm characterization, breeding, and trait mapping in diverse organisms. This procedure, which can be generalized to any species at a low per-sample cost, is based on high-throughput, next-generation sequencing of genomic subsets targeted by restriction enzymes...