Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Interpretable Counting for Visual Question Answering
Alexander Trott, Caiming Xiong, Richard Socher
Feb 15, 2018 (modified: Feb 23, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process and force our model to make discrete choices of what to count. Specifically, the model sequentially selects from detected objects and learns interactions between objects that influence subsequent selections. A distinction of our approach is its intuitive and interpretable output, as discrete counts are automatically grounded in the image. Furthermore, our method outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting.
TL;DR:We perform counting for visual question answering; our model produces interpretable outputs by counting directly from detected objects.
Keywords:Counting, VQA, Object detection
Enter your feedback below and we'll get back to you as soon as possible.