Abstract

Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein–protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.

The study of disease modules facilitates insight into complex diseases, but their identification relies on knowledge of molecular networks. Here, the authors show that disease modules and genes can also be discovered in deep autoencoder representations of large human gene expression datasets.

Details

Title
Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder
Author
Dwivedi, Sanjiv K 1   VIAFID ORCID Logo  ; Tjärnberg Andreas 2   VIAFID ORCID Logo  ; Tegnér Jesper 3   VIAFID ORCID Logo  ; Gustafsson Mika 1   VIAFID ORCID Logo 

 Linköping University, Bioinformatics, Department of Physics, Chemistry and Biology, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922) 
 Linköping University, Bioinformatics, Department of Physics, Chemistry and Biology, Linköping, Sweden (GRID:grid.5640.7) (ISNI:0000 0001 2162 9922); New York University, Department of Biology, Center For Genomics and Systems Biology, New York, USA (GRID:grid.137628.9) (ISNI:0000 0004 1936 8753); New York University, Center for Developmental Genetics, Department of Biology, New York, USA (GRID:grid.137628.9) (ISNI:0000 0004 1936 8753) 
 King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, Thuwal, Saudi Arabia (GRID:grid.45672.32) (ISNI:0000 0001 1926 5090); Karolinska Institutet, Unit of Computational Medicine, Department of Medicine, Solna, Center for Molecular Medicine, Stockholm, Sweden (GRID:grid.4714.6) (ISNI:0000 0004 1937 0626); Science for Life Laboratory, Solna, Sweden (GRID:grid.452834.c) 
Publication year
2020
Publication date
2020
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2354096011
Copyright
This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.