Abstract

The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.

Details

Title
RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
Author
Camargo, Antonio P 1 ; Sourkov, Vsevolod 2 ; Pereira, Gonçalo A G 1 ; Carazzolle, Marcelo F 1 

 Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, University of Campinas, Campinas, SP, 13083-862, Brazil 
 Department of Computer Science, ReDNA Labs, Pattaya, Chonburi, 20150, Thailand 
Publication year
2020
Publication date
Mar 2020
Publisher
Oxford University Press
e-ISSN
26319268
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3170915430
Copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.