Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Medical report generation has made significant progress in recent years. However, generated reports still suffer from issues such as poor readability, incomplete and inaccurate descriptions of lesions, and challenges in capturing fine-grained abnormalities. The primary obstacles include low image resolution, poor contrast, and substantial cross-modal discrepancies between visual and textual features. To address these challenges, we propose an Anomaly-Driven Cross-Modal Contrastive Network (ADCNet), which aims to enhance the quality and accuracy of medical report generation through effective cross-modal feature fusion and alignment. First, we design an anomaly-aware cross-modal feature fusion (ACFF) module that introduces an anomaly embedding vector to guide the extraction and generation of anomaly-related features from visual representations. This process enhances the capability of visual features to capture lesion-related abnormalities and improves the performance of feature fusion. Second, we propose a fine-grained regional feature alignment (FRFA) module, which dynamically filters visual and textual features to suppress irrelevant information and background noise. This module computes cross-modal relevance to align fine-grained regional features, ensuring improved semantic consistency between images and generated reports. The experimental results from the IU X-Ray and MIMIC-CXR datasets demonstrate that the proposed ADCNet method significantly outperforms existing approaches. Specifically, ADCNet achieves notable improvements in natural language generation metrics, as well as the accuracy, completeness, and fluency of medical report generation.

Details

Title
ADCNet: Anomaly-Driven Cross-Modal Contrastive Network for Medical Report Generation
Author
Liu, Yuxue 1   VIAFID ORCID Logo  ; Zhang, Junsan 1   VIAFID ORCID Logo  ; Liu, Kai 2   VIAFID ORCID Logo  ; Tan, Lizhuang 3   VIAFID ORCID Logo 

 Shandong Province Key Laboratory of Intelligent Oil & Gas Industrial Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China; [email protected] 
 State Key Laboratory of Space Network and Communications, Tsinghua University, Beijing 100084, China; [email protected]; Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China 
 Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China; [email protected]; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan 250014, China 
First page
532
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3165774501
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.