On the role of question encoder sequence model in robust visual question answering

Published: 01 Jan 2022, Last Modified: 05 Mar 2025Pattern Recognit. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The question-encoder sequence model plays a significant role in overfitting the VQA models to the train set language biases and reducing the performance on Out-of-Distribution test sets.•A comprehensive study of existing RNN-based and Transformer-based question-encoders on the Out-of-Distribution performance in VQA.•Proposal of a novel question-encoder GAT-QE for VQA that shows better resilience to language biases and improves the Out-of-Distribution performance even without using additional bias-mitigation approaches.
Loading