Abstract

Cancer is a complex disease that deregulates cellular functions at various molecular levels (e.g., DNA, RNA, and proteins). Integrated multi-omics analysis of data from these levels is necessary to understand the aberrant cellular functions accountable for cancer and its development. In recent years, Deep Learning (DL) approaches have become a useful tool in integrated multi-omics analysis of cancer data. However, high dimensional multi-omics data are generally imbalanced with too many molecular features and relatively few patient samples. This imbalance makes a DL based integrated multi-omics analysis difficult. DL-based dimensionality reduction technique, including variational autoencoder (VAE), is a potential solution to balance high dimensional multi-omics data. However, there are few VAE-based integrated multi-omics analyses, and they are limited to pancancer. In this work, we did an integrated multi-omics analysis of ovarian cancer using the compressed features learned through VAE and an improved version of VAE, namely Maximum Mean Discrepancy VAE (MMD-VAE). First, we designed and developed a DL architecture for VAE and MMD-VAE. Then we used the architecture for mono-omics, integrated di-omics and tri-omics data analysis of ovarian cancer through cancer samples identification, molecular subtypes clustering and classification, and survival analysis. The results show that MMD-VAE and VAE-based compressed features can respectively classify the transcriptional subtypes of the TCGA datasets with an accuracy in the range of 93.2-95.5% and 87.1-95.7%. Also, survival analysis results show that VAE and MMD-VAE based compressed representation of omics data can be used in cancer prognosis. Based on the results, we can conclude that (i) VAE and MMD-VAE outperform existing dimensionality reduction techniques, (ii) integrated multi-omics analyses perform better or similar compared to their mono-omics counterparts, and (iii) MMD-VAE performs better than VAE in most omics dataset.

Details

Title
Integrated multi-omics analysis of ovarian cancer using variational autoencoders
Author
Hira Muta Tah 1 ; Razzaque, M A 2 ; Angione Claudio 2 ; Scrivens, James 1 ; Sawan Saladin 3 ; Sarker Mosharraf 1 

 Teesside University, School of Health and Life Sciences, Middlesbrough, UK (GRID:grid.26597.3f) (ISNI:0000 0001 2325 1783) 
 Teesside University, School of Computing, Eng. & Digital Tech., Middlesbrough, UK (GRID:grid.26597.3f) (ISNI:0000 0001 2325 1783) 
 The James Cook University Hospital, Middlesbrough, UK (GRID:grid.411812.f) (ISNI:0000 0004 0400 2812) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2560160278
Copyright
© The Author(s) 2021. corrected publication 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.