Abstract

Cellular genetic heterogeneity is common in many biological conditions including cancer, microbiome, and co-infection of multiple pathogens. Detecting and phasing minor variants play an instrumental role in deciphering cellular genetic heterogeneity, but they are still difficult tasks because of technological limitations. Recently, long-read sequencing technologies, including those by Pacific Biosciences and Oxford Nanopore, provide an opportunity to tackle these challenges. However, high error rates make it difficult to take full advantage of these technologies. To fill this gap, we introduce iGDA, an open-source tool that can accurately detect and phase minor single-nucleotide variants (SNVs), whose frequencies are as low as 0.2%, from raw long-read sequencing data. We also demonstrate that iGDA can accurately reconstruct haplotypes in closely related strains of the same species (divergence ≥0.011%) from long-read metagenomic data.

Cellular genetic heterogeneity is common across biological conditions, yet application of long-read sequencing to this subject is limited by error rates. Here, the authors present iGDA, a tool for detection and phasing of minor variants from long-read sequencing data, allowing accurate reconstruction of haplotypes.

Details

Title
Detecting and phasing minor single-nucleotide variants from long-read sequencing data
Author
Feng Zhixing 1   VIAFID ORCID Logo  ; Clemente, Jose C 1   VIAFID ORCID Logo  ; Wong, Brandon 2   VIAFID ORCID Logo  ; Schadt, Eric E 3   VIAFID ORCID Logo 

 Icahn School of Medicine at Mount Sinai, Icahn Institute for Data Science and Genomic Technology, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351); Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351) 
 Johns Hopkins University, Department of Biomedical Engineering, Baltimore, USA (GRID:grid.21107.35) (ISNI:0000 0001 2171 9311) 
 Icahn School of Medicine at Mount Sinai, Icahn Institute for Data Science and Genomic Technology, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351); Icahn School of Medicine at Mount Sinai, Department of Genetics and Genomic Sciences, New York, USA (GRID:grid.59734.3c) (ISNI:0000 0001 0670 2351); Sema4, Stamford, USA (GRID:grid.59734.3c) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2531421045
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.