Abstract

Motivation

Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework.

Results

The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest.

Availability and implementation

Algorithms, data, scripts, and a tutorial are open source under GPL-3 license and available in the MUVR2 R package at https://github.com/MetaboComp/MUVR2.

Details

Title
Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2
Author
Yan, Yingxiao 1 ; Schillemans, Tessa 2 ; Skantze, Viktor 3 ; Brunius, Carl 1 

 Department of Life Sciences, Chalmers University of Technology , Gothenburg, Sweden 
 Cardiovascular and Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institute , Stockholm, Sweden 
 Fraunhofer-Chalmers Research Centre for Industrial Mathematics , Gothenburg, Sweden 
Publication year
2024
Publication date
2024
Publisher
Oxford University Press
e-ISSN
26350041
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3192320233
Copyright
© The Author(s) 2024. Published by Oxford University Press. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.