Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Audio tagging, as a fundamental task in acoustic signal processing, has demonstrated significant advances and broad applications in recent years. Spiking Neural Networks (SNNs), inspired by biological neural systems, exploit event-driven computing paradigms and temporal information processing, enabling superior energy efficiency. Despite the increasing adoption of SNNs, the potential of event-driven encoding mechanisms for audio tagging remains largely unexplored. This work presents a pioneering investigation into event-driven encoding strategies for SNN-based audio tagging. We propose the SATRN (Spiking Audio Tagging Robust Network), a novel architecture that integrates temporal–spatial attention mechanisms with membrane potential residual connections. The network employs a dual-stream structure combining global feature fusion and local feature extraction through inverted bottleneck blocks, specifically designed for efficient audio processing. Furthermore, we introduce an event-based encoding approach that enhances the resilience of Spiking Neural Networks to disturbances while maintaining performance. Our experimental results on the Urbansound8k and FSD50K datasets demonstrate that the SATRN achieves comparable performance to traditional Convolutional Neural Networks (CNNs) while requiring significantly less computation time and showing superior robustness against noise perturbations, making it particularly suitable for edge computing scenarios and real-time audio processing applications.

Details

Title
SATRN: Spiking Audio Tagging Robust Network
Author
Gao, Shouwei; Deng, Xingyang; Fan, Xiangyu; Yu, Pengliang; Zhou, Hao  VIAFID ORCID Logo  ; Zhu, Zihao
First page
761
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3171006522
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.