Content area

Abstract

The increase in available high-throughput molecular data creates computational challenges for the identification of cancer genes. Genetic as well as non-genetic causes contribute to tumorigenesis, and this necessitates the development of predictive models to effectively integrate different data modalities while being interpretable. We introduce EMOGI, an explainable machine learning method based on graph convolutional networks to predict cancer genes by combining multiomics pan-cancer data—such as mutations, copy number changes, DNA methylation and gene expression—together with protein–protein interaction (PPI) networks. EMOGI was on average more accurate than other methods across different PPI networks and datasets. We used layer-wise relevance propagation to stratify genes according to whether their classification was driven by the interactome or any of the omics levels, and to identify important modules in the PPI network. We propose 165 novel cancer genes that do not necessarily harbour recurrent alterations but interact with known cancer genes, and we show that they correspond to essential genes from loss-of-function screens. We believe that our method can open new avenues in precision oncology and be applied to predict biomarkers for other complex diseases.

Identifying cancer driver genes from high-throughput genomic data is an important task to understand the molecular basis of cancer and to develop new treatments including precision medicine. To tackle this challenge, EMOGI, a new deep learning method based on graph convolutional networks is developed, which combines protein–protein interaction networks with multiomics datasets.

Details

Title
Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms
Author
Schulte-Sasse, Roman 1   VIAFID ORCID Logo  ; Budach Stefan 1 ; Hnisz Denes 1   VIAFID ORCID Logo  ; Marsico Annalisa 2   VIAFID ORCID Logo 

 Max Planck Institute for Molecular Genetics, Berlin, Germany (GRID:grid.419538.2) (ISNI:0000 0000 9071 0620) 
 Max Planck Institute for Molecular Genetics, Berlin, Germany (GRID:grid.419538.2) (ISNI:0000 0000 9071 0620); Helmholtz Zentrum Munich, German Research Centre for Environmental Health, Institute for Computational Biology, Munich, Germany (GRID:grid.4567.0) (ISNI:0000 0004 0483 2525) 
Pages
513-526
Publication year
2021
Publication date
Jun 2021
Publisher
Nature Publishing Group
e-ISSN
25225839
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2622642646
Copyright
© The Author(s), under exclusive licence to Springer Nature Limited 2021.