Abstract

With regard to human–machine interaction, accurate emotion recognition is a challenging problem. In this paper, efforts were taken to explore the possibility to complete the feature abstraction and fusion by the homogeneous network component, and propose a dual-modal emotion recognition framework that is composed of a parallel convolution (Pconv) module and attention-based bidirectional long short-term memory (BLSTM) module. The Pconv module employs parallel methods to extract multidimensional social features and provides more effective representation capacity. Attention-based BLSTM module is utilized to strengthen key information extraction and maintain the relevance between information. Experiments conducted on the CH-SIMS dataset indicate that the recognition accuracy reaches 74.70% on audio data and 77.13% on text, while the accuracy of the dual-modal fusion model reaches 90.02%. Through experiments it proves the feasibility to process heterogeneous information within homogeneous network component, and demonstrates that attention-based BLSTM module would achieve best coordination with the feature fusion realized by Pconv module. This can give great flexibility for the modality expansion and architecture design.

Details

Title
A novel dual-modal emotion recognition algorithm with fusing hybrid features of audio signal and speech context
Author
Xu, Yurui 1 ; Su, Hang 2 ; Ma, Guijin 1 ; Liu, Xiaorui 1   VIAFID ORCID Logo 

 Automation School of Qingdao University, Institute of Future, Qingdao, China (GRID:grid.410645.2) (ISNI:0000 0001 0455 0905) 
 Politecnico di Milano, Department of Electronics, Information and Bioengineering, Milan, Italy (GRID:grid.4643.5) (ISNI:0000 0004 1937 0327) 
Pages
951-963
Publication year
2023
Publication date
Feb 2023
Publisher
Springer Nature B.V.
ISSN
21994536
e-ISSN
21986053
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2778776714
Copyright
© The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.