Abstract

The Genome Analysis Toolkit (GATK) is commonly used for variant calling of single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels) from short-read sequencing data aligned against a reference genome. There have been a number of variant calling comparisons against GATK, but an equally comprehensive comparison for VarScan not yet been performed. More specifically, we compared four lists of variants called by GATK (using the UnifiedGenotyper and the HaplotypeCaller algorithms, with and without filtering low quality variants) and three lists of variants called using VarScan (with varying sets of parameters). Variant calling was performed on three datasets (1 targeted exon dataset and 2 exome datasets), each with approximately a dozen subjects. We found that running VarScan with a conservative set of parameters (referred to as “VarScan-Cons”) resulted in a high quality gene list, with high concordance (>97%) when compared to high quality variants called by the GATK UnifiedGenotyper and HaplotypeCaller. These conservative parameters result in decreased sensitivity, but the VarScan-Cons variant list could still recover 84-88% of the high-quality GATK SNPs in the exome datasets. We also assessed the impact of pre-processing (e.g., indel realignment and quality score base recalibration using GATK). In most cases, these pre-processing steps had only a modest impact on the variant calls, but the importance of the pre-processing steps varied between datasets and variant callers. More broadly, we believe the metrics used for comparison in this study can be useful in assessing the quality of variant calls in the context of a specific experimental design. As an example, a limited number of variant calling comparisons are also performed on two additional variant callers.

Details

Title
Detailed comparison of two popular variant calling packages for exome and targeted exon studies
Author
Warden, Charles D; Adamson, Aaron W; Neuhausen, Susan L; Wu, Xiwei
Publication year
2014
Publication date
Jun 12, 2014
Publisher
PeerJ, Inc.
e-ISSN
21679843
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1961296477
Copyright
© 2014 Warden et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ PrePrints) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.