Debiased Visual Question Answering from Feature and Sample Perspectives

Zhiquan Wen; Guanghui Xu; Mingkui Tan; Qingyao Wu; Qi Wu

Debiased Visual Question Answering from Feature and Sample Perspectives

Zhiquan Wen, Guanghui Xu, Mingkui Tan, Qingyao Wu, Qi Wu

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Visual Question Answering, Debiased, Bias Detection

TL;DR: In the Visual Question Answering (VQA) task, we propose a novel method named D-VQA to alleviate the negative effect of the biases in language and vision modalities, and improve the model performance on the out-of-distribution dataset

Abstract: Visual question answering (VQA) is designed to examine the visual-textual reasoning ability of an intelligent agent. However, recent observations show that many VQA models may only capture the biases between questions and answers in a dataset rather than showing real reasoning abilities. For example, given a question, some VQA models tend to output the answer that occurs frequently in the dataset and ignore the images. To reduce this tendency, existing methods focus on weakening the language bias. Meanwhile, only a few works also consider vision bias implicitly. However, these methods introduce additional annotations or show unsatisfactory performance. Moreover, not all biases are harmful to the models. Some “biases” learnt from datasets represent natural rules of the world and can help limit the range of answers. Thus, how to filter and remove the true negative biases in language and vision modalities remain a major challenge. In this paper, we propose a method named D-VQA to alleviate the above challenges from the feature and sample perspectives. Specifically, from the feature perspective, we build a question-to-answer and vision-to-answer branch to capture the language and vision biases, respectively. Next, we apply two unimodal bias detection modules to explicitly recognise and remove the negative biases. From the sample perspective, we construct two types of negative samples to assist the training of the models, without introducing additional annotations. Extensive experiments on the VQA-CP v2 and VQA v2 datasets demonstrate the effectiveness of our D-VQA method.

Supplementary Material: pdf

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Code: https://github.com/Zhiquan-Wen/D-VQA

11 Replies

Loading