Inductive Biases for Low Data VQA: A Data Augmentation Approach

Narjes Askarian, Ehsan Abbasnejad, Ingrid Zukerman, Wray L. Buntine, Gholamreza Haffari

Published: 2022, Last Modified: 18 May 2023WACV (Workshops) 2022Readers: Everyone

Abstract: Visual question answering (VQA) is the problem of understanding rich image contexts and answering complex natural language questions about them. VQA models have recently achieved remarkable results when training on large-scale labeled datasets. However, annotating large amounts of data is not feasible in many domains. In this paper, we address the problem of VQA in low labeled data regime, which is under-explored in the literature. We take a data augmentation approach to enlarge the initial small labeled data in order to inject proper inductive biases to the VQA model. We encode the additional inductive biases in the questions by producing new ones taking advantage of the image annotations. Our results show up to 34% accuracy improvements compared to the baselines trained on only the initial labeled data.

0 Replies