Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

Abstract

Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature fusion technique. Rather than employing spectrograms as image-based input, frame-level characteristics (Mel-Frequency Cepstral Coefficients, Mel-Spectrograms, and Chroma vectors) are calculated throughout the sequences to preserve temporal information and reduce the computing expense. The model attained classification accuracies of 94.0% on MELD (multi-party talks) and 91.9% on RAVDESS (acted speech). Ablation experiments demonstrate that the integration of complimentary features significantly outperforms the utilisation of a singular feature as a baseline. Data augmentation techniques, including Gaussian noise and time shifting, enhance model generalisation. The proposed method demonstrates significant potential for real-time emotion recognition using audio only in embedded or resource-constrained devices.

Details

Business indexing term

Subject:

Artificial intelligence

Identifier / keyword

speech emotion recognition (SER); convolutional neural networks (CNNs); mel-spectrogram

Title

Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

Author

Waleed, Gheed T; Shaker, Shaimaa H

Publication title

Information; Basel

Volume

Issue

First page

518

Number of pages

Publication year

2025

Publication date

2025

Publisher

MDPI AG

Place of publication

Basel

Country of publication

Switzerland

Publication subject

Computers--Information Science And Information Theory

e-ISSN

20782489

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-06-21

Milestone dates

2025-05-10 (Received); 2025-06-19 (Accepted)

Publication history

First posting date

21 Jun 2025

DOI

https://doi.org/10.3390/info16070518

ProQuest document ID

3233222654

Document URL

https://www.proquest.com/scholarly-journals/speech-emotion-recognition-on-meld-ravdess/docview/3233222654/se-2?accountid=208611

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Last updated

2025-08-01

Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

Abstract

Details

Suggested sources

Search with indexing terms

Subject

Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

Content area

Abstract

Details

Suggested sources

Search with indexing terms

Subject