Full Text

Turn on search term navigation

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Simple Summary

Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. For nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS.

Abstract

Copy-number variations (CNVs) have important clinical implications for several diseases and cancers. Relevant CNVs are hard to detect because common structural variations define large parts of the human genome. CNV calling from short-read sequencing would allow single protocol full genomic profiling. We reviewed 50 popular CNV calling tools and included 11 tools for benchmarking in a reference cohort encompassing 39 whole genome sequencing (WGS) samples paired current clinical standard—SNP-array based CNV calling. Additionally, for nine samples we also performed whole exome sequencing (WES), to address the effect of sequencing protocol on CNV calling. Furthermore, we included Gold Standard reference sample NA12878, and tested 12 samples with CNVs confirmed by multiplex ligation-dependent probe amplification (MLPA). Tool performance varied greatly in the number of called CNVs and bias for CNV lengths. Some tools had near-perfect recall of CNVs from arrays for some samples, but poor precision. Several tools had better performance for NA12878, which could be a result of overfitting. We suggest combining the best tools also based on different methodologies: GATK gCNV, Lumpy, DELLY, and cn.MOPS. Reducing the total number of called variants could potentially be assisted by the use of background panels for filtering of frequently called variants.

Details

Title
A Comparison of Tools for Copy-Number Variation Detection in Germline Whole Exome and Whole Genome Sequencing Data
Author
Gabrielaite, Migle 1   VIAFID ORCID Logo  ; Mathias Husted Torp 1   VIAFID ORCID Logo  ; Malthe Sebro Rasmussen 1 ; Andreu-Sánchez, Sergio 1   VIAFID ORCID Logo  ; Filipe Garrett Vieira 1 ; Pedersen, Christina Bligaard 2   VIAFID ORCID Logo  ; Kinalis, Savvas 1 ; Majbritt Busk Madsen 1 ; Kodama, Miyako 1   VIAFID ORCID Logo  ; Gül Sude Demircan 1 ; Simonyan, Arman 1 ; Yde, Christina Westmose 1 ; Olsen, Lars Rønn 2   VIAFID ORCID Logo  ; Marvig, Rasmus L 1   VIAFID ORCID Logo  ; Østrup, Olga 1 ; Rossing, Maria 3 ; Nielsen, Finn Cilius 1 ; Winther, Ole 4 ; Frederik Otzen Bagger 5   VIAFID ORCID Logo 

 Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark; [email protected] (M.G.); [email protected] (M.H.T.); [email protected] (M.S.R.); [email protected] (S.A.-S.); [email protected] (F.G.V.); [email protected] (C.B.P.); [email protected] (S.K.); [email protected] (M.B.M.); [email protected] (M.K.); [email protected] (G.S.D.); [email protected] (A.S.); [email protected] (C.W.Y.); [email protected] (L.R.O.); [email protected] (R.L.M.); [email protected] (O.Ø.); [email protected] (M.R.); [email protected] (F.C.N.); [email protected] (O.W.) 
 Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark; [email protected] (M.G.); [email protected] (M.H.T.); [email protected] (M.S.R.); [email protected] (S.A.-S.); [email protected] (F.G.V.); [email protected] (C.B.P.); [email protected] (S.K.); [email protected] (M.B.M.); [email protected] (M.K.); [email protected] (G.S.D.); [email protected] (A.S.); [email protected] (C.W.Y.); [email protected] (L.R.O.); [email protected] (R.L.M.); [email protected] (O.Ø.); [email protected] (M.R.); [email protected] (F.C.N.); [email protected] (O.W.); Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Ørsteds Pl. 345C, 2800 Kgs. Lyngby, Denmark 
 Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark; [email protected] (M.G.); [email protected] (M.H.T.); [email protected] (M.S.R.); [email protected] (S.A.-S.); [email protected] (F.G.V.); [email protected] (C.B.P.); [email protected] (S.K.); [email protected] (M.B.M.); [email protected] (M.K.); [email protected] (G.S.D.); [email protected] (A.S.); [email protected] (C.W.Y.); [email protected] (L.R.O.); [email protected] (R.L.M.); [email protected] (O.Ø.); [email protected] (M.R.); [email protected] (F.C.N.); [email protected] (O.W.); Department of Clinical Medicine, University of Copenhagen, 2200 Copenhagen, Denmark 
 Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark; [email protected] (M.G.); [email protected] (M.H.T.); [email protected] (M.S.R.); [email protected] (S.A.-S.); [email protected] (F.G.V.); [email protected] (C.B.P.); [email protected] (S.K.); [email protected] (M.B.M.); [email protected] (M.K.); [email protected] (G.S.D.); [email protected] (A.S.); [email protected] (C.W.Y.); [email protected] (L.R.O.); [email protected] (R.L.M.); [email protected] (O.Ø.); [email protected] (M.R.); [email protected] (F.C.N.); [email protected] (O.W.); Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark; Section for Cognitive Systems, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Matematiktorvet 303B, 2800 Kgs. Lyngby, Denmark 
 Center for Genomic Medicine, Rigshospitalet, University of Copenhagen, Blegdamsvej 9, 2100 Copenhagen, Denmark; [email protected] (M.G.); [email protected] (M.H.T.); [email protected] (M.S.R.); [email protected] (S.A.-S.); [email protected] (F.G.V.); [email protected] (C.B.P.); [email protected] (S.K.); [email protected] (M.B.M.); [email protected] (M.K.); [email protected] (G.S.D.); [email protected] (A.S.); [email protected] (C.W.Y.); [email protected] (L.R.O.); [email protected] (R.L.M.); [email protected] (O.Ø.); [email protected] (M.R.); [email protected] (F.C.N.); [email protected] (O.W.); Department of Biomedicine, UKBB Universitats-Kinderspital Basel, 4031 Basel, Switzerland; Swiss Institute of Bioinformatics, Hebelstrasse 20, 4031 Basel, Switzerland 
First page
6283
Publication year
2021
Publication date
2021
Publisher
MDPI AG
e-ISSN
20726694
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2612739182
Copyright
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.