Abstract

Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

Details

Title
Variable selection in microbiome compositional data analysis
Author
Susin, Antoni 1 ; Wang, Yiwen 2 ; Kim-Anh Lê Cao 2 ; M Luz Calle 3 

 Mathematical Department, UPC-Barcelona Tech, 08028 Barcelona, Spain 
 Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia 
 Biosciences Department, Faculty of Sciences and Technology, University of Vic—Central University of Catalonia, Carrer de la Laura, 13, 08500 Vic, Spain 
Publication year
2020
Publication date
Jun 2020
Publisher
Oxford University Press
e-ISSN
26319268
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3170919066
Copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.