Visual Question Answering with Textual Representations for ImagesDownload PDFOpen Website

2021 (modified: 28 Oct 2022)ICCVW 2021Readers: Everyone
Abstract: How far can we go with textual representations for understanding pictures? Deep visual features extracted by object recognition models are prevailing used in multiple tasks, and especially in visual question answering (VQA). However, conventional deep visual features may struggle to convey all the details in an image as we humans do. Mean-while, with recent language models’ progress, descriptive text may be an alternative to this problem. This paper delves into the effectiveness of textual representations for image understanding in the specific context of VQA.
0 Replies

Loading