Abstract

Metabolomics-driven discoveries of biological samples remain hampered by the grand challenge of metabolite annotation and identification. Only few metabolites have an annotated spectrum in spectral libraries; hence, searching only for exact library matches generally returns a few hits. An attractive alternative is searching for so-called analogues as a starting point for structural annotations; analogues are library molecules which are not exact matches but display a high chemical similarity. However, current analogue search implementations are not yet very reliable and relatively slow. Here, we present MS2Query, a machine learning-based tool that integrates mass spectral embedding-based chemical similarity predictors (Spec2Vec and MS2Deepscore) as well as detected precursor masses to rank potential analogues and exact matches. Benchmarking MS2Query on reference mass spectra and experimental case studies demonstrate improved reliability and scalability. Thereby, MS2Query offers exciting opportunities to further increase the annotation rate of metabolomics profiles of complex metabolite mixtures and to discover new biology.

The authors develop a machine learning approach to find structurally related chemicals in mass spectral libraries. Their method boosts the annotation rate and aids in assessing novelty in metabolomics datasets.

Details

Title
MS2Query: reliable and scalable MS2 mass spectra-based analogue search
Author
de Jonge, Niek F. 1   VIAFID ORCID Logo  ; Louwen, Joris J. R. 1 ; Chekmeneva, Elena 2   VIAFID ORCID Logo  ; Camuzeaux, Stephane 2   VIAFID ORCID Logo  ; Vermeir, Femke J. 3 ; Jansen, Robert S. 3   VIAFID ORCID Logo  ; Huber, Florian 4   VIAFID ORCID Logo  ; van der Hooft, Justin J. J. 5   VIAFID ORCID Logo 

 Wageningen University & Research, Bioinformatics Group, Wageningen, the Netherlands (GRID:grid.4818.5) (ISNI:0000 0001 0791 5666) 
 Digestion and Reproduction, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, National Phenome Centre, Section of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, London, UK (GRID:grid.7445.2) (ISNI:0000 0001 2113 8111) 
 Radboud University, Department of Microbiology, Radboud Institute for Biological and Environmental Sciences, Nijmegen, the Netherlands (GRID:grid.5590.9) (ISNI:0000000122931605) 
 University of Applied Sciences Düsseldorf, Centre for Digitalization and Digitality (ZDD), Düsseldorf, Germany (GRID:grid.440973.d) (ISNI:0000 0001 0729 0889) 
 Wageningen University & Research, Bioinformatics Group, Wageningen, the Netherlands (GRID:grid.4818.5) (ISNI:0000 0001 0791 5666); University of Johannesburg, Auckland Park, Department of Biochemistry, Johannesburg, South Africa (GRID:grid.412988.e) (ISNI:0000 0001 0109 131X) 
Pages
1752
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2792195660
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.