Abstract

Long-read sequencing allows analyses of single nucleic-acid molecules and produces sequences in the order of tens to hundreds kilobases. Its application to whole-genome analyses allows identification of complex genomic structural-variants (SVs) with unprecedented resolution. SV identification, however, requires complex computational methods, based on either read-depth or intra- and inter-alignment signatures approaches, which are limited by size or type of SVs. Moreover, most currently available tools only detect germline variants, thus requiring separate computation of sample pairs for comparative analyses. To overcome these limits, we developed a novel tool (Germline And SOmatic structuraL varIants detectioN and gEnotyping; GASOLINE) that groups SV signatures using a sophisticated clustering procedure based on a modified reciprocal overlap criterion, and is designed to identify germline SVs, from single samples, and somatic SVs from paired test and control samples. GASOLINE is a collection of Perl, R and Fortran codes, it analyzes aligned data in BAM format and produces VCF files with statistically significant somatic SVs. Germline or somatic analysis of 30× sequencing coverage experiments requires 4–5 h with 20 threads. GASOLINE outperformed currently available methods in the detection of both germline and somatic SVs in synthetic and real long-reads datasets. Notably, when applied on a pair of metastatic melanoma and matched-normal sample, GASOLINE identified five genuine somatic SVs that were missed using five different sequencing technologies and state-of-the art SV calling approaches. Thus, GASOLINE identifies germline and somatic SVs with unprecedented accuracy and resolution, outperforming currently available state-of-the-art WGS long-reads computational methods.

Details

Title
GASOLINE: detecting germline and somatic structural variants from long-reads data
Author
Magi, Alberto 1 ; Mattei, Gianluca 2 ; Mingrino, Alessandra 3 ; Caprioli, Chiara 4 ; Ronchini, Chiara 5 ; Frigè, Gianmaria 4 ; Semeraro, Roberto 3 ; Baragli, Marta 2 ; Bolognini, Davide 3 ; Colombo, Emanuela 4 ; Mazzarella, Luca 5 ; Pelicci, Pier Giuseppe 4 

 University of Florence, Department of Information Engineering, Florence, Italy (GRID:grid.8404.8) (ISNI:0000 0004 1757 2304); National Research Council, Institute for Biomedical Technologies, Milan, Italy (GRID:grid.5326.2) (ISNI:0000 0001 1940 4177) 
 University of Florence, Department of Information Engineering, Florence, Italy (GRID:grid.8404.8) (ISNI:0000 0004 1757 2304) 
 University of Florence, Department of Experimental and Clinical Medicine, Florence, Italy (GRID:grid.8404.8) (ISNI:0000 0004 1757 2304) 
 IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Milan, Italy (GRID:grid.15667.33) (ISNI:0000 0004 1757 0843); University of Milan, Department of Oncology and Hemato-Oncology, Milan, Italy (GRID:grid.4708.b) (ISNI:0000 0004 1757 2822) 
 IEO European Institute of Oncology IRCCS, Department of Experimental Oncology, Milan, Italy (GRID:grid.15667.33) (ISNI:0000 0004 1757 0843) 
Pages
20817
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2894183846
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.