Full text

Turn on search term navigation

© 2012 He et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Background

MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the ‘seeding sequence’, largely determines its binding ability and specificity to target mRNAs. Therefore, methods that predict mature microRNA sequences with high accuracy will benefit the identification and characterization of novel microRNAs and their targets, and contribute to inferring the post-transcriptional regulation network at a genome scale.

Methodology/Principal Findings

We have developed a method, MiRmat, to predict the mature microRNA sequence. MiRmat is essentially composed of two parts: the prediction of Drosha processing site and the identification of Dicer processing site. Based on the analysis of microRNAs from 12 species, we found that the patterns of free energy profiles are conserved among vertebrate microRNA hairpins. Therefore, we introduced in our method the free energy distribution pattern of the downstream part of pri-microRNA secondary structure and Random Forest algorithm to predict the mature microRNA sequence. Based on the evaluation on an independent test dataset from 10 vertebrates, MiRmat was shown to identify 77.8% of the Drosha processing sites and 92.8% of the Dicer sites within a deviation of 2 nt. In a more stringent evaluation by excluding the microRNAs sharing the same family between the training set and test set, MiRmat kept a rather well performance of 71.9% and 87.2% of the identification rate on the Drosha and Dicer site respectively, which represents the ability to deal with the novel microRNA family. MiRmat outperforms other state-of-the-art methods and has a high degree of efficacy for the prediction of mature microRNA sequences of vertebrates.

Conclusion

MiRmat was developed for identifying microRNA mature sequence(s) by introducing the free energy distribution of RNA stem-loop structure and the Random Forest algorithm. We prove that MiRmat has better performance than the existing tools and is applicable among vertebrates. MiRmat is freely available at http://mcube.nju.edu.cn/jwang/lab/soft/MiRmat/.

Details

Title
MiRmat: Mature microRNA Sequence Prediction
Author
He, Chenfeng; Ying-Xin, Li; Zhang, Guangxin; Gu, Zuguang; Yang, Rong; Li, Jie; Lu, Zhi John; Zhi-Hua Zhou; Zhang, Chenyu; Wang, Jin
First page
e51673
Section
Research Article
Publication year
2012
Publication date
Dec 2012
Publisher
Public Library of Science
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1327217706
Copyright
© 2012 He et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.