Full text

Turn on search term navigation

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In this paper, we provide external image features and use the internal attention mechanism to solve the VQA problem given a dataset of textual questions and related images. Most previous models for VQA use a pair of images and questions as input. In addition, the model adopts a question-oriented attention mechanism to extract the features of the entire image and then perform feature fusion. However, the shortcoming of these models is that they cannot effectively eliminate the irrelevant features of the image. In addition, the problem-oriented attention mechanism lacks in the mining of image features, which will bring in redundant image features. In this paper, we propose a VQA model based on adversarial learning and bidirectional attention. We exploit external image features that are not related to the question to form an adversarial mechanism to boost the accuracy of the model. Target detection is performed on the image—that is, the image-oriented attention mechanism. The bidirectional attention mechanism is conducive to promoting model attention and eliminating interference. Experimental results are evaluated on benchmark datasets, and our model performs better than other models based on attention methods. In addition, the qualitative results show the attention maps on the images and leads to predicting correct answers.

Details

Title
Adversarial Learning with Bidirectional Attention for Visual Question Answering
Author
Li, Qifeng 1 ; Tang, Xinyi 2 ; Yi, Jian 2 

 Shanghai Institute of Technical Physics of the Chinese Academy of Sciences, Shanghai 200083, China; [email protected] (X.T.); [email protected] (Y.J.); School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Infrared System Detection and Imaging Technology, Chinese Academy of Sciences, Shanghai 200083, China 
 Shanghai Institute of Technical Physics of the Chinese Academy of Sciences, Shanghai 200083, China; [email protected] (X.T.); [email protected] (Y.J.); Key Laboratory of Infrared System Detection and Imaging Technology, Chinese Academy of Sciences, Shanghai 200083, China 
First page
7164
Publication year
2021
Publication date
2021
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2596067618
Copyright
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.