Abstract

Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool.

Accuracy loss and slow speed affect the identification of compounds through matching of mass spectra using a large-scale spectral library. Here the authors use Word2vec spectral embedding and hierarchical navigable small-world graph to improve accuracy and speed of spectral matching on their own million-scale in-silico library.

Details

Title
Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
Author
Yang, Qiong 1 ; Ji, Hongchao 2 ; Xu, Zhenbo 1 ; Li, Yiming 1 ; Wang, Pingshan 1   VIAFID ORCID Logo  ; Sun, Jinyu 1 ; Fan, Xiaqiong 1 ; Zhang, Hailiang 1 ; Lu, Hongmei 1   VIAFID ORCID Logo  ; Zhang, Zhimin 1   VIAFID ORCID Logo 

 Central South University, College of Chemistry and Chemical Engineering, Changsha, China (GRID:grid.216417.7) (ISNI:0000 0001 0379 7164) 
 Chinese Academy of Agricultural Sciences, Agricultural Genomics Institute at Shenzhen, Shenzhen, China (GRID:grid.410727.7) (ISNI:0000 0001 0526 1937) 
Pages
3722
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2828554525
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.