Abstract

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce “RNA-Bloom2 [https://github.com/bcgsc/RNA-Bloom]”, a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.

Most existing long-read transcriptome assembly methods rely on reference genomes and transcript annotations, while reference-free methods remain scarce. Here, Nip et al. introduce RNA-Bloom2, a reference-free method that requires substantially less memory and runtime than other reference-free methods.

Details

Title
Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2
Author
Nip, Ka Ming 1   VIAFID ORCID Logo  ; Hafezqorani, Saber 1   VIAFID ORCID Logo  ; Gagalova, Kristina K. 1   VIAFID ORCID Logo  ; Chiu, Readman 2   VIAFID ORCID Logo  ; Yang, Chen 1   VIAFID ORCID Logo  ; Warren, René L. 2   VIAFID ORCID Logo  ; Birol, Inanc 3   VIAFID ORCID Logo 

 Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada (GRID:grid.434706.2) (ISNI:0000 0004 0410 5424); University of British Columbia, Bioinformatics Graduate Program, Vancouver, Canada (GRID:grid.17091.3e) (ISNI:0000 0001 2288 9830) 
 Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada (GRID:grid.434706.2) (ISNI:0000 0004 0410 5424) 
 Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada (GRID:grid.434706.2) (ISNI:0000 0004 0410 5424); University of British Columbia, Department of Medical Genetics, Vancouver, Canada (GRID:grid.17091.3e) (ISNI:0000 0001 2288 9830) 
Pages
2940
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2817278062
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.