VIRO: Efficient and Robust Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

ICLR 2026 Conference Submission18070 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Referring Expression Comprehension (REC), Visual Grounding, Compositional Reasoning, Neuro-symbolic Reasoning
TL;DR: We propose Verification-Integrated Reasoning Operators (VIRO), a neuro-symbolic framework for REC that explicitly handles no-target scenarios and improves computational eff
Abstract: Referring Expression Comprehension (REC) aims to localize the image region corresponding to a natural language query. To handle complex queries, recent work has focused on compositional reasoning, with advances in Large Language Models (LLMs) and Vision Language Models (VLMs) enabling the decomposition of queries into executable programs within reasoning pipelines. However, existing approaches implicitly assume the target is always present, forcing the model to output a result even when no valid referent exists. Moreover, multi-step reasoning processes often result in high computational costs, limiting their application in real-time scenarios. To address this limitation, we propose Verification-Integrated Reasoning Operators (VIRO), which integrate operator-level verification into a neuro-symbolic pipeline, enabling abstention and the explicit handling of no-target cases. Each operator performs a reasoning step and verifies its own execution, including a lightweight CLIP-based filter with minimal computational overhead, and logical verification for spatial and relational constraints. Experimental results demonstrate that our framework achieves strong robustness in no-target cases, achieving 61.1% balanced accuracy, while showing state-of-the-art accuracy on standard REC benchmarks, compared to compositional baselines. Our neuro-symbolic pipeline also shows superior computational efficiency, high reliability with a program failure rate of just 0.3%, and scalability—achieved by decoupling program generation from execution.
Supplementary Material: pdf
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 18070
Loading