Erasing-based Attention Learning for Visual Question AnsweringOpen Website

2019 (modified: 10 Nov 2022)ACM Multimedia 2019Readers: Everyone
Abstract: Attention learning for visual question answering remains a challenging task, where most existing methods treat the attention and the non-attention parts in isolation. In this paper, we propose to enforce the correlation between the attention and the non-attention parts as a constraint for attention learning. We first adopt an attention-guided erasing scheme to obtain the attention and the non-attention parts respectively, and then learn to separate the attention and the non-attention parts by an appropriate distance margin in a feature embedding space. Furthermore, we associate a typical classification loss with the above distance constraint to learn a more discriminative attention map for answer prediction. The proposed approach does not introduce extra model parameters or inference complexity, and can be combined with any attention-based models. Extensive ablation experiments validate the effectiveness of our method, and new state-of-the-art or competitive results on four publicly available datasets are achieved.
0 Replies

Loading