Abstract

The quality control of variants from whole-genome sequencing data is vital in clinical diagnosis and human genetics research. However, current filtering methods (Frequency, Hard-Filter, VQSR, GARFIELD, and VEF) were developed to be utilized on particular variant callers and have certain limitations. Especially, the number of eliminated true variants far exceeds the number of removed false variants using these methods. Here, we present an adaptive method for quality control on genetic variants from different analysis pipelines, and validate it on the variants generated from four popular variant callers (GATK HaplotypeCaller, Mutect2, Varscan2, and DeepVariant). FVC consistently exhibited the best performance. It removed far more false variants than the current state-of-the-art filtering methods and recalled ~51-99% true variants filtered out by the other methods. Once trained, FVC can be conveniently integrated into a user-specific variant calling pipeline.

FVC is a method for calling specific gene variants from whole genome data, for potential use in clinical diagnosis and human genetics research.

Details

Title
FVC as an adaptive and accurate method for filtering variants from popular NGS analysis pipelines
Author
Ren, Yongyong 1   VIAFID ORCID Logo  ; Kong, Yan 1   VIAFID ORCID Logo  ; Zhou, Xiaocheng 2   VIAFID ORCID Logo  ; Genchev, Georgi Z. 3   VIAFID ORCID Logo  ; Zhou, Chao 1   VIAFID ORCID Logo  ; Zhao, Hongyu 4   VIAFID ORCID Logo  ; Lu, Hui 5   VIAFID ORCID Logo 

 Shanghai Jiao Tong University, State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293); Shanghai Jiao Tong University, SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
 Shanghai Jiao Tong University, State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293) 
 Chulalongkorn University, Research Affairs, Faculty of Medicine, Bangkok, Thailand (GRID:grid.7922.e) (ISNI:0000 0001 0244 7875); Yale University, Department of Biostatistics, New Haven, USA (GRID:grid.47100.32) (ISNI:0000000419368710); Shanghai Children’s Hospital, Center for Biomedical Informatics, Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China (GRID:grid.415625.1) (ISNI:0000 0004 0467 3069) 
 Yale University, Department of Biostatistics, New Haven, USA (GRID:grid.47100.32) (ISNI:0000000419368710) 
 Shanghai Jiao Tong University, State Key Laboratory of Microbial metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293); Shanghai Jiao Tong University, SJTU-Yale Joint Center for Biostatistics and Data Science, National Center for Translational Medicine, Shanghai, China (GRID:grid.16821.3c) (ISNI:0000 0004 0368 8293); Shanghai Children’s Hospital, Center for Biomedical Informatics, Engineering Research Center for Big Data in Pediatric Precision Medicine, Shanghai, China (GRID:grid.415625.1) (ISNI:0000 0004 0467 3069) 
Publication year
2022
Publication date
2022
Publisher
Nature Publishing Group
e-ISSN
23993642
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2715005994
Copyright
© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.