A Survey on Attention-Based Models for Image

Abstract

Image captioning task is highly used in many real-world applications. The captioning task is concerned with understanding the image using computer vision methods. Then, natural language processing methods are used to produce a description for the image. Different approaches were proposed to solve this task, and deep learning attention-based models have been proven to be the state-of-the-art. A survey on attention-based models for image captioning is presented in this paper including new categories that were not included in other survey papers. The attention-based approaches are classified into four main categories, further classified into subcategories. All categories and subcategories of the attention-based approaches are discussed in detail. Furthermore, the state-of-the-art approaches are compared and the accuracy improvements are stated especially in the transformer-based models, and a summary of the benchmark datasets and the main performance metrics is presented.

Details

Title

A Survey on Attention-Based Models for Image Captioning

Author

Osman, Asmaa A E; Wahby Shalaby, Mohamed A; Soliman, Mona M; Elsayed, Khaled M

Publication year

2023

Publication date

2023

Publisher

Science and Information (SAI) Organization Limited

ISSN

2158107X

e-ISSN

21565570

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.14569/IJACSA.2023.0140249

ProQuest document ID

2791786135

© 2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

A Survey on Attention-Based Models for Image Captioning

Jump to:

Abstract

Details

Suggested sources