Abstract

Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.

Competing Interest Statement

Stanford University is filing patent applications based on concepts described in this paper. R.D. is a cofounder of Inceptive.

Details

Title
Ribonanza: deep learning of RNA structure through dual crowdsourcing
Author
He, Shujun; Huang, Rui; Townley, Jill; Kretsch, Rachael C; Karagianes, Thomas G; David Bt Cox; Blair, Hamish; Penzar, Dmitry; Vyaltsev, Valeriy; Aristova, Elizaveta; Zinkevich, Arsenii; Bakulin, Artemy; Sohn, Hoyeol; Krstevski, Daniel; Fukui, Takaaki; Tatematsu, Fumiya; Uchida, Yusuke; Jang, Donghoon; Lee, Jun Seong; Shieh, Roger; Ma, Tom; Martynov, Eduard; Shugaev, Maxim V; Habib St Bukhari; Fujikawa, Kazuki; Onodera, Kazuki; Henkel, Christof; Shlomo, Ron; Romano, Jonathan; Nicol, John J; Nye, Grace P; Wu, Yuan; Choe, Christian; Reade, Walter; Participants, Eterna; Das, Rhiju
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Feb 27, 2024
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2932307765
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.