Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems.

Details

Business indexing term

Subject:

Artificial intelligence

Identifier / keyword

speech enhancement; deep learning; security systems; GANs; CNNs

Title

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

Author

Mensah, Samuel Yaw¹

; Zhang, Tao²

; Mahmud, Nahid AI³

; Geng Yanzhang²

¹ School of Information Engineering, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China
² Digital Signal Processing Laboratory, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China; [email protected] (T.Z.); [email protected] (Y.G.)
³ School of Electrical & Information Engineering, Tianjin University, 92 Weijin Road, Nankai District, Tianjin 300072, China; [email protected]

Publication title

Electronics; Basel

Volume

Issue

First page

2643

Number of pages

Publication year

2025

Publication date

2025

Publisher

MDPI AG

Place of publication

Basel

Country of publication

Switzerland

Publication subject

Electronics

e-ISSN

20799292

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-06-30

Milestone dates

2025-04-08 (Received); 2025-06-11 (Accepted)

Publication history

First posting date

30 Jun 2025

DOI

https://doi.org/10.3390/electronics14132643

ProQuest document ID

3229142959

Document URL

https://www.proquest.com/scholarly-journals/deep-learning-based-speech-enhancement-robust/docview/3229142959/se-2?accountid=208611

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Last updated

2025-07-11

Database

ProQuest One Academic

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

Content area

Abstract

Details