Abstract

As ever-larger cohorts of human genomes are collected in pursuit of genotype/phenotype associations, sequencing informatics must scale up to yield complete and accurate genotypes from vast raw datasets. Joint variant calling, a data processing step entailing simultaneous analysis of all participants sequenced, exhibits this scaling challenge acutely. We present GLnexus (GL, Genotype Likelihood), a system for joint variant calling designed to scale up to the largest foreseeable human cohorts. GLnexus combines scalable joint calling algorithms with a persistent database that grows efficiently as additional participants are sequenced. We validate GLnexus using 50,000 exomes to show it produces comparable or better results than existing methods, at a fraction of the computational cost with better scaling. We provide a standalone open-source version of GLnexus and a DNAnexus cloud-native deployment supporting very large projects, which has been employed for cohorts of >240,000 exomes and >22,000 whole-genomes.

Details

Title
GLnexus: joint variant calling for large cohort sequencing
Author
Lin, Michael F; Rodeh, Ohad; Penn, John; Bai, Xiaodong; Krasheninina, Olga; Salerno, William J; Reid, Jeffrey G
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2018
Publication date
Jun 11, 2018
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2068561954
Copyright
�� 2018. This article is published under http://creativecommons.org/licenses/by/4.0/ (���the License���). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.