Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The Transformer-based approach represents the state-of-the-art in image captioning. However, existing studies have shown Transformer has a problem that irrelevant tokens with overlapping neighbors incorrectly attend to each other with relatively large attention scores. We believe that this limitation is due to the incompleteness of the Self-Attention Network (SAN) and Feed-Forward Network (FFN). To solve this problem, we present the Full-Memory Transformer method for image captioning. The method improves the performance of both image encoding and language decoding. In the image encoding step, we propose the Full-LN symmetric structure, which enables stable training and better model generalization performance by symmetrically embedding Layer Normalization on both sides of the SAN and FFN. In the language decoding step, we propose the Memory Attention Network (MAN), which extends the traditional attention mechanism to determine the correlation between attention results and input sequences, guiding the model to focus on the words that need to be attended to. Our method is evaluated on the MS COCO dataset and achieves good performance, improving the result in terms of BLEU-4 from 38.4 to 39.3.

Details

Title
Full-Memory Transformer for Image Captioning
Author
Lu, Tongwei 1   VIAFID ORCID Logo  ; Wang, Jiarong 1 ; Min, Fen 1   VIAFID ORCID Logo 

 School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China; Hubei Key Laboratory of Intelligent Robot Wuhan Institute of Technology, Wuhan 430205, China 
First page
190
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20738994
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2767291467
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.