All You May Need for VQA are Image CaptionsDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Visual Question Answering (VQA) is a challenge that has benefited tremendously from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. We propose here a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for question generation. We show that the resulting data is powerful enough to boost the state-of-the-art zero-shot results on VQA by double digits, and exhibits a level of robustness that lacks in models with the same architecture trained on human-annotated data.
Paper Type: long
0 Replies

Loading