Content area

Abstract

This study explores the growing field of Multimodal Sentiment Analysis (MSA), focusing on understanding how advanced fusion techniques can improve sentiment prediction in social media contexts. As platforms like X and TikTok continue to expand and facilitate sharing sentiment through digital media, there is an increasing need for neural network architectures that can accurately interpret sentiment across modalities. We implement a model using BERT for textual features and ResNet for visual features. A cross-attention fusion module aligns the modalities for joint representation. We conduct experiments on the MVSA-Single and MVSA-Multiple datasets, which contain over 5,000 and 17,000 labeled text-image pairs. Our research explores the interactions between modalities and proposes a sentiment classifier that builds upon and outperforms current baselines while quantifying the contribution of each modality through an intramodality utilization analysis.

Details

1010268
Business indexing term
Title
A BERT-ResNet Cross-Attention Fusion Network and Modality Utilization Assessment for Multimodal Sentiment Classification
Number of pages
107
Publication year
2025
Degree date
2025
School code
0465
Source
MAI 86/11(E), Masters Abstracts International
ISBN
9798314880944
Committee member
Mikhalevich, Irina; Yu, Zhe
University/institution
Rochester Institute of Technology
Department
Computer Science
University location
United States -- New York
Degree
M.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31999187
ProQuest document ID
3203040236
Document URL
https://www.proquest.com/dissertations-theses/bert-resnet-cross-attention-fusion-network/docview/3203040236/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic