Keywords: Visual Grounding, Referring Expression Comprehension, Referring Image Segmentation, Multi-Modality
Abstract: The recently proposed Generalized Referring Expression Segmentation (GRES) and Comprehension (GREC) tasks extend the traditional RES/REC paradigm by incorporating multi-target and non-target scenarios. However, the existing approaches focus on these tasks individually, leaving the unified generalized multi-task visual grounding unexplored. Moreover, current GRES methods are limited to global segmentation, lacking fine-grained instance-level awareness. To address these gaps, this paper introduces a novel $\textbf{I}$nstance-aware $\textbf{G}$eneralized multi-task $\textbf{V}$isual $\textbf{G}$rounding ($\textbf{IGVG}$) framework. IGVG is the first to integrate GREC and GRES, establishing a consistent correspondence between detection and segmentation via query guidance. Additionally, IGVG introduces instance-level awareness, enabling precise and fine-grained instance recognition. Furthermore, we present a Point-guided Instance-aware Perception Head (PIPH), which employs attention-based query generation to identify coarse reference points. These points guide the correspondence between queries, objects, and instances, enhancing the directivity and interpretability of the queries.
Experimental results on the gRefCOCO (GREC/GRES), Ref-ZOM, and R-RefCOCO/+/g benchmarks demonstrate that IGVG outperforms state-of-the-art methods.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1048
Loading