Abstract

Image captioning task is highly used in many real-world applications. The captioning task is concerned with understanding the image using computer vision methods. Then, natural language processing methods are used to produce a description for the image. Different approaches were proposed to solve this task, and deep learning attention-based models have been proven to be the state-of-the-art. A survey on attention-based models for image captioning is presented in this paper including new categories that were not included in other survey papers. The attention-based approaches are classified into four main categories, further classified into subcategories. All categories and subcategories of the attention-based approaches are discussed in detail. Furthermore, the state-of-the-art approaches are compared and the accuracy improvements are stated especially in the transformer-based models, and a summary of the benchmark datasets and the main performance metrics is presented.

Details

Title
A Survey on Attention-Based Models for Image Captioning
Author
Osman, Asmaa A E; Wahby Shalaby, Mohamed A; Soliman, Mona M; Elsayed, Khaled M
Publication year
2023
Publication date
2023
Publisher
Science and Information (SAI) Organization Limited
ISSN
2158107X
e-ISSN
21565570
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2791786135
Copyright
© 2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.