Content area
In order to begin to decipher the structure of the cell, we need to integrate multiple types of data of different scales on subcellular organization. Such integration requires dealing with multiple data modalities and with missing data. To this end, we developed MIRAGE, a multi-modal generative model for integrating protein sequence, protein-protein interaction and protein localization data. Our approach successfully learns a joint embedding space that captures the complex relationships between these diverse modalities. We evaluate our model's performance against existing methods, obtaining superior performance in several key tasks, including protein function prediction and module detection. MIRAGE source code is available at https://github.com/raminass/MIRAGE.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
* School name, Computer Science -> Computer Science and AI