Breaking Down Questions for Outside-Knowledge Visual Question AnsweringDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: There is a recent trend towards Knowledge-Based VQA (KB-VQA) where different aspects of the question require different sources of knowledge including the image's visual content and external knowledge such as commonsense concepts and factual information. To address this issue, we propose a novel approach that passes knowledge from various sources between different pieces of semantic content in the question. Questions are first segmented into several chunks, and each segment is used to generate queries to retrieve knowledge from ConceptNet and Wikipedia. Then, a graph neural network, taking advantage of the question's syntactic structure, integrates the knowledge for different segments to jointly predict the answer. Our experiments on the OK-VQA dataset show that our approach achieves new state-of-the-art results.
0 Replies

Loading