Abstract

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream task of cluster agreement with cancer subtypes. We studied the effect of the latent space dimensionality, learning rate, optimizer and initialization on the quality of subsequent clustering of the TCGA samples. We found β-TCVAE and DIP-VAE to have a sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we correlated the different representations with various data characteristics such as age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics even for models specifically designed for disentanglement

Competing Interest Statement

The authors have declared no competing interest.

Details

Title
Benchmarking Variational AutoEncoders on cancer transcriptomics data
Author
Eltager, Mostafa; Abdelaal, Tamim; Charrout, Mohammed; Mahfouz, Ahmed; Reinders, Marcel; Makrodimitris, Stavros
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2023
Publication date
Feb 10, 2023
Publisher
Cold Spring Harbor Laboratory Press
Source type
Working Paper
Language of publication
English
ProQuest document ID
2775128214
Copyright
© 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.