Abstract

Spectrum clustering is a powerful strategy to minimize redundant mass spectral data by grouping highly similar mass spectra corresponding to repeatedly measured analytes. Based on spectrum similarity, near-identical spectra are grouped in clusters, after which each cluster can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public datasets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for datasets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.

Competing Interest Statement

The authors have declared no competing interest.

Details

Title
A comprehensive evaluation of consensus spectrum generation methods in proteomics
Author
Luo, Xiyang; Bittremieux, Wout; Griss, Johannes; Deutsch, Eric W; Sachsenberg, Timo; Levitsky, Lev I; Ivanov, Mark V; Bubis, Julia A; Gabriels, Ralf; Webel, Henry; Sanchez, Aniel; Bai, Mingze; Käll, Lukas; Perez-Riverol, Yasset
University/institution
Cold Spring Harbor Laboratory Press
Section
Confirmatory Results
Publication year
2022
Publication date
Jan 27, 2022
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2623196184
Copyright
© 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.