Abstract

As ever-larger cohorts of human genomes are collected in pursuit of genotype/phenotype associations, sequencing informatics must scale up to yield complete and accurate genotypes from vast raw datasets. Joint variant calling, a data processing step entailing simultaneous analysis of all participants sequenced, exhibits this scaling challenge acutely. We present GLnexus (GL, Genotype Likelihood), a system for joint variant calling designed to scale up to the largest foreseeable human cohorts. GLnexus combines scalable joint calling algorithms with a persistent database that grows efficiently as additional participants are sequenced. We validate GLnexus using 50,000 exomes to show it produces comparable or better results than existing methods, at a fraction of the computational cost with better scaling. We provide a standalone open-source version of GLnexus and a DNAnexus cloud-native deployment supporting very large projects, which has been employed for cohorts of >240,000 exomes and >22,000 whole-genomes.

Details

Title
GLnexus: joint variant calling for large cohort sequencing
Author
Lin, Michael F; Rodeh, Ohad; Penn, John; Bai, Xiaodong; Krasheninina, Olga; Salerno, William J; Reid, Jeffrey G
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2018
Publication date
Jun 11, 2018
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2068561954
Copyright
�� 2018. This article is published under http://creativecommons.org/licenses/by/4.0/ (���the License���). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.