Content area

Abstract

In order to begin to decipher the structure of the cell, we need to integrate multiple types of data of different scales on subcellular organization. Such integration requires dealing with multiple data modalities and with missing data. To this end, we developed MIRAGE, a multi-modal generative model for integrating protein sequence, protein-protein interaction and protein localization data. Our approach successfully learns a joint embedding space that captures the complex relationships between these diverse modalities. We evaluate our model's performance against existing methods, obtaining superior performance in several key tasks, including protein function prediction and module detection. MIRAGE source code is available at https://github.com/raminass/MIRAGE.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* School name, Computer Science -> Computer Science and AI

Details

1009240
Title
An adversarial scheme for integrating multi-modal data on protein function
Publication title
bioRxiv; Cold Spring Harbor
Publication year
2025
Publication date
Jan 21, 2025
Section
New Results
Publisher
Cold Spring Harbor Laboratory Press
Source
BioRxiv
Place of publication
Cold Spring Harbor
Country of publication
United States
University/institution
Cold Spring Harbor Laboratory Press
Publication subject
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
Document type
Working Paper
Publication history
 
 
Milestone dates
2025-01-20 (Version 1)
ProQuest document ID
3157267878
Document URL
https://www.proquest.com/working-papers/adversarial-scheme-integrating-multi-modal-data/docview/3157267878/se-2?accountid=208611
Copyright
© 2025. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-01-22
Database
ProQuest One Academic