Abstract

Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological sample replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma samples run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.

Mass spectrometry-based metabolomics is a powerful method for profiling large clinical cohorts but batch variations can obscure biologically meaningful differences. Here, the authors develop a computational workflow that removes unwanted data variation while preserving biologically relevant information.

Details

Title
A hierarchical approach to removal of unwanted variation for large-scale metabolomics data
Author
Kim Taiyun 1   VIAFID ORCID Logo  ; Tang, Owen 2 ; Vernon, Stephen T 2 ; Kott, Katharine A 2   VIAFID ORCID Logo  ; Koay Yen Chin 3 ; Park, John 2 ; James, David E 4   VIAFID ORCID Logo  ; Grieve, Stuart M 5 ; Speed, Terence P 6   VIAFID ORCID Logo  ; Yang, Pengyi 7   VIAFID ORCID Logo  ; Figtree, Gemma A 2 ; O’Sullivan John F 8 ; Yang Jean Yee Hwa 9   VIAFID ORCID Logo 

 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, School of Mathematics and Statistics, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Children’s Medical Research Institute, Computational Systems Biology Group, Westmead, Australia (GRID:grid.414235.5) (ISNI:0000 0004 0619 2154) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Royal North Shore Hospital, Department of Cardiology, Sydney, Australia (GRID:grid.412703.3) (ISNI:0000 0004 0587 9093); The University of Sydney, Cardiovascular Discovery Group, Kolling Institute of Medical Research, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, Faculty of Medicine and Health, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, Faculty of Medicine and Health, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Heart Research Institute, Sydney, Australia (GRID:grid.1076.0) (ISNI:0000 0004 0626 1885) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, School of Life and Environmental Sciences, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); University of Sydney, School of Medical Sciences, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); University of Sydney, Imaging and Phenotyping Laboratory, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Royal Prince Alfred Hospital, Department of Radiology, Camperdown, Australia (GRID:grid.413249.9) (ISNI:0000 0004 0385 0051) 
 Walter Eliza Hall Institute, Bioinformatics Division, Parkville, Australia (GRID:grid.1042.7); University of Melbourne, School of Mathematics and Statistics, Parkville, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, School of Mathematics and Statistics, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Children’s Medical Research Institute, Computational Systems Biology Group, Westmead, Australia (GRID:grid.414235.5) (ISNI:0000 0004 0619 2154); The University of Sydney, Faculty of Medicine and Health, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, Faculty of Medicine and Health, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); Heart Research Institute, Sydney, Australia (GRID:grid.1076.0) (ISNI:0000 0004 0626 1885); Royal Prince Alfred Hospital, Department of Cardiology, Sydney, Australia (GRID:grid.413249.9) (ISNI:0000 0004 0385 0051); Faculty of Medicine, TU Dresden, Germany (GRID:grid.4488.0) (ISNI:0000 0001 2111 7257) 
 The University of Sydney, Charles Perkins Centre, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X); The University of Sydney, School of Mathematics and Statistics, Sydney, Australia (GRID:grid.1013.3) (ISNI:0000 0004 1936 834X) 
Publication year
2021
Publication date
2021
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2562073384
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.