Benchmarking Variational AutoEncoders on cancer

Abstract

Deep generative models, such as variational autoencoders (VAE), have gained increasing attention in computational biology due to their ability to capture complex data manifolds which subsequently can be used to achieve better performance in downstream tasks, such as cancer type prediction or subtyping of cancer. However, these models are difficult to train due to the large number of hyperparameters that need to be tuned. To get a better understanding of the importance of the different hyperparameters, we examined six different VAE models when trained on TCGA transcriptomics data and evaluated on the downstream task of cluster agreement with cancer subtypes. We studied the effect of the latent space dimensionality, learning rate, optimizer and initialization on the quality of subsequent clustering of the TCGA samples. We found β-TCVAE and DIP-VAE to have a sensitive to hyperparameters selection. Based on these experiments, we derived recommendations for selecting the different hyperparameters settings. In addition, we examined whether the learned latent spaces capture biologically relevant information. Hereto, we correlated the different representations with various data characteristics such as age, days to metastasis, immune infiltration, and mutation signatures. We found that for all models the latent factors, in general, do not uniquely correlate with one of the data characteristics even for models specifically designed for disentanglement

Competing Interest Statement

The authors have declared no competing interest.

Details

Title

Benchmarking Variational AutoEncoders on cancer transcriptomics data

Author

Eltager, Mostafa; Abdelaal, Tamim; Charrout, Mohammed; Mahfouz, Ahmed; Reinders, Marcel; Makrodimitris, Stavros

University/institution

Cold Spring Harbor Laboratory Press

Section

New Results

Publication year

2023

Publication date

Feb 10, 2023

Publisher

Cold Spring Harbor Laboratory Press

ISSN

2692-8205

Source type

Working Paper

Language of publication

English

DOI

https://doi.org/10.1101/2023.02.09.527832

ProQuest document ID

2775128214

Full text outside of ProQuest

https://www.biorxiv.org/content/10.1101/2023.02.09.527832v1

© 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Benchmarking Variational AutoEncoders on cancer transcriptomics data

Jump to:

Abstract

Details

Suggested sources