All You May Need for VQA are Image Captions

Anonymous

All You May Need for VQA are Image Captions

Anonymous

08 Mar 2022 (modified: 05 May 2023)NAACL 2022 Conference Blind SubmissionReaders: Everyone

Paper Link: https://openreview.net/forum?id=kp6qIdP7Nt

Paper Type: Long paper (up to eight pages of content + unlimited references and appendices)

Abstract: Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation. In this paper, we propose a method that automatically derives VQA examples at volume, by leveraging the abundance of existing image-caption annotations combined with neural models for textual question generation. We show that the resulting data is of high-quality. VQA models trained on our data improve state-of-the-art zero-shot accuracy by double digits and achieve a level of robustness that lacks in the same model trained on human-annotated VQA data.

Presentation Mode: This paper will be presented in person in Seattle

Copyright Consent Signature (type Name Or NA If Not Transferrable): Soravit Changpinyo

Copyright Consent Job Title: Software Engineer

Copyright Consent Name And Address: 340 Main St, Venice, CA USA 90291

0 Replies

Loading