Full text

Turn on search term navigation

© 2025 Mu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The rapid development of social media has significantly impacted sentiment analysis, essential for understanding public opinion and predicting social trends. However, modality fusion in sentiment analysis can introduce a lot of noise because of the differences in semantic representations among various modalities, ultimately impacting the accuracy of classification results. Thus, this paper presents a Semantic Enhancement and Cross-Modal Interaction Fusion (SECIF) model for sentiment analysis to address these issues. Firstly, BERT and ResNet extract feature representations from text and images. Secondly, the GMHA mechanism is proposed to aggregate important semantic information and mitigate the influence of noise. Then, an ICN module is created to capture complex contextual dependencies and enhance the capability of text feature representations. Finally, a cross-modal interaction fusion module is implemented. Text features are considered primary, and image features are auxiliary, enabling the profound integration of textual and visual features. The model's performance is optimized by combining cross-entropy and KL divergence losses. The experiments are conducted using a dataset collected from public opinion events on Sina Weibo. The results demonstrate that the proposed model outperforms comparison models. The SECIF model improves by 11.19%, 82.27%, and 4.83% over the average accuracy of the text-only, image-only, and multimodal models, respectively. The proposed SECIF model is compared with ten baseline models on the publicly available datasets. The experimental results show that the SECIF model improves accuracy by 4.70% and F1 score by 6.56%. Through multimodal sentiment analysis, governments can better understand public emotions and opinion trends, facilitating more targeted and effective management strategies.

Details

Title
Semantic enhancement and cross-modal interaction fusion for sentiment analysis in social media
Author
Mu, Guangyu; Chen, Ying; Li, Xiurong  VIAFID ORCID Logo  ; Li, Dai; Dai, Jiaxiu
First page
e0321011
Section
Research Article
Publication year
2025
Publication date
Apr 2025
Publisher
Public Library of Science
e-ISSN
19326203
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3196108677
Copyright
© 2025 Mu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.