TextREC: A Dataset for Referring Expression Comprehension with Reading ComprehensionOpen Website

Published: 01 Jan 2023, Last Modified: 18 Nov 2023ICDAR (3) 2023Readers: Everyone
Abstract: Referring expression comprehension (REC) aims at locating a specific object within a scene given a natural language expression. Although referring expression comprehension has achieved tremendous progress, most of today’s REC models ignore the scene texts in images. Scene text is ubiquitous in our society, and frequently critical to understand the visual scene. To study how to comprehend scene text in the referring expression comprehension task, we collect a novel dataset, termed TextREC, in which most of the referring expressions are related to scene text. Our TextREC dataset challenges a model to recognize scene text, relate it to the referring expressions, and select the most relevant visual object. We also propose a text-guided adaptive modular network (TAMN) to comprehend scene text associated with objects in images. Experimental results reveal that current state-of-the-art REC methods fall short on the TextREC dataset, while our TAMN gets inspiring results by integrating scene text.
0 Replies

Loading