Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Wei Ji; Li Li; Hao Fei; Xiangyan Liu; Xun Yang; Juncheng Li; Roger Zimmermann

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

17 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Referring Image Segmentation; Complex Language Query; Dual-Modality Alignment

TL;DR: We propose a novel benchmark dataset, RIS-CQ, which challenges the existing RIS with294 complex queries, and propose a novel SOTA method for RIS tasks called dual-modality alignment with graph learning (DUMOGA).

Abstract: Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (e.g., "the black car." vs. "the black car is parking on the road and beside the bus."). Given the significant improvement in the semantic understanding capability of large pre-trained models, it is crucial to take a step further in RIS by incorporating complex language that resembles real-world applications. To close this gap, building upon the existing RefCOCO and Visual Genome datasets, we propose a new RIS benchmark with complex queries, namely RIS-CQ. The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific and informative queries, and enables a more realistic scenario of RIS research. Besides, we present a nichetargeting method to better task the RIS-CQ, called dual-modality graph alignment model (DuMoGa), which outperforms a series of RIS methods. To provide a valuable foundation for future advancements in the field of RIS with complex queries, we release the datasets, preprocessing and synthetic scripts, and the algorithm implementations.

Supplementary Material: pdf

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 866

Loading