In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering

Published: 05 Jul 2021, Last Modified: 17 Dec 2023ACL 2021EveryoneCC BY-SA 4.0
Abstract: Visual Question Answering (VQA) methods aim at leveraging visual input to answer ques- tions that may require complex reasoning over entities. Current models are trained on la- belled data that may be insufficient to learn complex knowledge representations. In this paper, we propose a new method to enhance the reasoning capabilities of a multi-modal pretrained model (Vision+Language BERT) by integrating facts extracted from an external knowledge base. Evaluation on the KVQA dataset benchmark demonstrates that our method outperforms competitive baselines by 19%, achieving new state-of-the-art results. We also perform an extensive analysis high- lighting the limitations of our best performing model through an ablation study.
Loading