Full Text

Turn on search term navigation

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Transformer-based approaches have shown good results in image captioning tasks. However, current approaches have a limitation in generating text from global features of an entire image. Therefore, we propose novel methods for generating better image captioning as follows: (1) The Global-Local Visual Extractor (GLVE) to capture both global features and local features. (2) The Cross Encoder-Decoder Transformer (CEDT) for injecting multiple-level encoder features into the decoding process. GLVE extracts not only global visual features that can be obtained from an entire image, such as size of organ or bone structure, but also local visual features that can be generated from a local region, such as lesion area. Given an image, CEDT can create a detailed description of the overall features by injecting both low-level and high-level encoder outputs into the decoder. Each method contributes to performance improvement and generates a description such as organ size and bone structure. The proposed model was evaluated on the IU X-ray dataset and achieved better performance than the transformer-based baseline results, by 5.6% in BLEU score, by 0.56% in METEOR, and by 1.98% in ROUGE-L.

Details

Title
Cross Encoder-Decoder Transformer with Global-Local Visual Extractor for Medical Image Captioning
Author
Lee, Hojun 1   VIAFID ORCID Logo  ; Cho, Hyunjun 1   VIAFID ORCID Logo  ; Park, Jieun 1   VIAFID ORCID Logo  ; Chae, Jinyeong 2   VIAFID ORCID Logo  ; Kim, Jihie 3   VIAFID ORCID Logo 

 Department of Computer Science and Engineering, Dongguk University, Seoul 04620, Korea; [email protected] (H.L.); [email protected] (H.C.); [email protected] (J.P.) 
 Department of Artificial Intelligence, Dongguk University, Seoul 04620, Korea; [email protected]; Okestro Ltd., Seoul 07326, Korea 
 Department of Artificial Intelligence, Dongguk University, Seoul 04620, Korea; [email protected] 
First page
1429
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2633329575
Copyright
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.