Content area

Abstract

Recently, Speech emotion recognition (SER) performance has steadily increased as multiple deep learning architectures have adapted. Especially, convolutional neural network (CNN) models with spectrogram data preprocessing are the most popular approach in the SER. However, designing an effective and efficient preprocessing method and a CNN-based model for SER is still ambiguous. Therefore, it needs to search for more concrete preprocessing methods and a CNN-based model for SER. First, to search for a proper frequency-time resolution for SER, we prepare eight different datasets with preprocessing settings. Furthermore, to compensate for the lack of emotional feature resolution, we propose multiple short-term Fourier transform (STFT) preprocessing data augmentation that augments trainable data with all different sizes of windows. Next, because CNN’s channel filters are core to detecting hidden input features, we focus on the channel filters’ effectiveness on SER. To do so, we design several types of architecture that contain a 6-layer CNN model. Also, with efficient channel attention (ECA) that is well known to improve channel feature representation with only a few parameters, we find that it can more efficiently train the channel filters for SER. With two different SER datasets (Interactive Emotional Dyadic Motion Capture, Berlin Emotional Speech Database), increasing the frequency resolution in preprocessing emotional speech can improve emotion recognition performance. Consequently, the CNN-based model with only two ECA blocks can exceed the performance of previous SER models. Especially, with STFT data augmentation, our proposed model achieves the highest performance on SER.

Details

1009240
Title
Searching for effective preprocessing method and CNN based architecture with efficient channel attention on speech emotion recognition
Author
Kim, Byunggun 1 ; Kwon, Younghun 2 

 Department of Applied Artificial Intelligence, Hanyang University(ERICA), 425-791, Ansan, Kyunggi-Do, Republic of Korea (ROR: https://ror.org/046865y68) (GRID: grid.49606.3d) (ISNI: 0000 0001 1364 9317) 
 Department of Applied Artificial Intelligence, Hanyang University(ERICA), 425-791, Ansan, Kyunggi-Do, Republic of Korea (ROR: https://ror.org/046865y68) (GRID: grid.49606.3d) (ISNI: 0000 0001 1364 9317); Department of Applied Physics, Hanyang University(ERICA), 425-791, Ansan, Kyunggi-Do, Republic of Korea (ROR: https://ror.org/046865y68) (GRID: grid.49606.3d) (ISNI: 0000 0001 1364 9317) 
Volume
15
Issue
1
Pages
32689
Number of pages
21
Publication year
2025
Publication date
2025
Section
Article
Publisher
Nature Publishing Group
Place of publication
London
Country of publication
United States
Publication subject
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-09-24
Milestone dates
2025-09-11 (Registration); 2024-11-22 (Received); 2025-09-10 (Accepted)
Publication history
 
 
   First posting date
24 Sep 2025
ProQuest document ID
3253952784
Document URL
https://www.proquest.com/scholarly-journals/searching-effective-preprocessing-method-cnn/docview/3253952784/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-26
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic