Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Multimodal amed entity recognition (MNER) is a natural language-processing technique that integrates text and visual modalities to detect and segment entity boundaries and their types from unstructured multimodal data. Although existing methods alleviate semantic deficiencies by optimizing image and text feature extraction and fusion, a fundamental challenge remains due to the lack of fine-grained alignment caused by cross-modal semantic deviations and image noise interference. To address these issues, this paper proposes a dual-enhanced hierarchical alignment (DEHA) framework that achieves dual semantic and spatial enhancement via global–local cooperative alignment optimization. The proposed framework incorporates a dual enhancement strategy comprising Semantic-Augmented Global Contrast (SAGC) and Multi-scale Spatial Local Contrast (MS-SLC), which reinforce the alignment of image and text modalities at the global sample level and local feature level, respectively, thereby reducing image noise. Additionally, a cross-modal feature fusion and vision-constrained CRF prediction layer is designed to achieve adaptive aggregation of global and local features. Experimental results on the Twitter-2015 and Twitter-2017 datasets yield F1 scores of 77.42% and 88.79%, outperforming baseline models. These results demonstrate that the global–local complementary mechanism effectively balances alignment precision and noise robustness, thereby enhancing entity recognition accuracy in social media and advancing multimodal semantic understanding.

Details

Title
A Dual-Enhanced Hierarchical Alignment Framework for Multimodal Named Entity Recognition
Author
Wang, Jian  VIAFID ORCID Logo  ; Zhou, Yanan; He, Qi  VIAFID ORCID Logo  ; Zhang, Wenbo
First page
6034
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3217723442
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.