Full text

Turn on search term navigation

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Metaproteomics offers a powerful window into the active functions of microbial communities, but accurately identifying peptides remains challenging due to the size and incompleteness of protein databases derived from metagenomes. These databases often contain vastly more sequences than those from single organisms, creating a computational bottleneck in peptide-spectrum match (PSM) filtering. Here we present WinnowNet, a deep learning-based method for PSM filtering, available in two versions: one using transformers and the other convolutional neural networks. Both variants are designed to handle the unordered nature of PSM data and are trained using a curriculum learning strategy that moves from simple to complex examples. WinnowNet consistently achieves more true identifications at equivalent false discovery rates compared to leading tools, including Percolator, MS2Rescore, and DeepFilter, and outperforms filters integrated into popular analysis pipelines. It also uncovers more gut microbiome biomarkers related to diet and health, highlighting its potential to support advances in personalized medicine.

This study introduces WinnowNet, a deep learning-based method incorporating curriculum learning to enhance peptide identification in metaproteomics. WinnowNet consistently outperforms existing tools across diverse datasets, offering improved insights into microbial communities.

Details

Title
Enhancing peptide identification in metaproteomics through curriculum learning in deep learning
Author
Feng, Shichao 1 ; Zhang, Bailu 1 ; Wang, Huan 2 ; Xiong, Yi 3 ; Tian, Athena 4 ; Yuan, Xiaohui 1   VIAFID ORCID Logo  ; Pan, Chongle 5   VIAFID ORCID Logo  ; Guo, Xuan 1   VIAFID ORCID Logo 

 Department of Computer Science and Engineering, University of North Texas, Denton, TX, USA (ROR: https://ror.org/00v97ad02) (GRID: grid.266869.5) (ISNI: 0000 0001 1008 957X) 
 College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China (ROR: https://ror.org/023b72294) (GRID: grid.35155.37) (ISNI: 0000 0004 1790 4137) 
 School of Biological Sciences, University of Oklahoma, Norman, OK, USA (ROR: https://ror.org/02aqsxs83) (GRID: grid.266900.b) (ISNI: 0000 0004 0447 0018) 
 Department of Mathematics, Emory University, Atlanta, GA, USA (ROR: https://ror.org/03czfpz43) (GRID: grid.189967.8) (ISNI: 0000 0004 1936 7398) 
 School of Biological Sciences, University of Oklahoma, Norman, OK, USA (ROR: https://ror.org/02aqsxs83) (GRID: grid.266900.b) (ISNI: 0000 0004 0447 0018); School of Computer Science, University of Oklahoma, Norman, OK, USA (ROR: https://ror.org/02aqsxs83) (GRID: grid.266900.b) (ISNI: 0000 0004 0447 0018) 
Pages
8934
Section
Article
Publication year
2025
Publication date
2025
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3258809688
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.