Abstract

RNAs constitute a vast reservoir of mostly untapped rug targets. Structure-based virtual screening (VS) methods screen large compound libraries for identifying promising candidate molecules by conditioning on binding site information. The classical approach relies on molecular docking simulations. However, this strategy does not scale well with the size of the small molecule databases and the number of potential RNA targets. Machine learning emerged as a promising technology to resolve this bottleneck. Efficient data-driven VS methods have already been introduced for proteins, but these techniques have not yet been developed for RNAs due to limited dataset sizes and lack of practical use-case evaluation. We propose a data-driven VS pipeline that deals with the unique challenges of RNA molecules through coarse grained modeling of 3D structures and heterogeneous training regimes using synthetic data augmentation and RNA-centric self supervision. We report strong prediction and generalizability of our framework, ranking active compounds among inactives in the top 2.8% on average on a structurally distinct drug-like test set. Those predictions are sensitive, but robust to pockets alterations, opening the door to its use on binding site detection methods outputs. Our model results in a ten thousand-times speedup over docking techniques while obtaining higher performance. Finally, we deploy our model on a recently pub- lished in-vitro small molecule microarray experiment with 20,000 compounds and report a mean enrichment factor at 1% of 2.93 on four unseen RNA riboswitches. To our knowledge, this is the first experimental evidence of success for structure- based deep learning methods in RNA virtual screening. Our source code and data, as well as a Google Colab notebook for inference, are available on GitHub.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* Corrected Fig 5 normalization error. Added dock6 to docking benchmark, included pocket perturbation experiment Fig 3c, d.

* https://github.com/cgoliver/rnamigos2

Details

Title
RNAmigos2: Fast and accurate structure-based RNA virtual screening with semi-supervised graph learning and large-scale docking data
Author
Carvajal-Patino, Juan G; Mallet, Vincent; Becerra, David; Nino, L Fernando, V; Oliver, Carlos; Waldispuhl, Jerome
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Nov 23, 2024
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2892798172
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.