Abstract

Background

Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging.

Results

We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables.

Conclusions

To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.

Details

Title
Integrate multi-omics data with biological interaction networks using Multi-view Factorization AutoEncoder (MAE)
Author
Ma, Tianle; Zhang, Aidong
Pages
1-11
Section
Research
Publication year
2019
Publication date
2019
Publisher
BioMed Central
e-ISSN
14712164
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2340837505
Copyright
© 2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.