Full Text

Turn on search term navigation

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30–75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.

Details

Title
Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data
Author
Li, Zixiu 1 ; Zhou, Peng 1 ; Kwon, Euijin 2 ; Fitzgerald, Katherine A 3 ; Weng, Zhiping 4   VIAFID ORCID Logo  ; Chan, Zhou 5 

 Division of Biostatistics and Health Services Research, Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA 
 Division of Biostatistics and Health Services Research, Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA; Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA 
 Program in Innate Immunity, Division of Infectious Disease and Immunology, Department of Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA 
 Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA 
 Division of Biostatistics and Health Services Research, Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA; Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA; The RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA; UMass Cancer Center, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA 
First page
70
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
2311553X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2728501902
Copyright
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.