VAR: Benchmarking Active Reasoning with Noisy Visual Feedback

VAR: Benchmarking Active Reasoning with Noisy Visual Feedback

ACL ARR 2026 January Submission9926 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Reasoning, Active Reasoning, Uncertainty, LLM

Abstract: Real-world reasoning rarely reduces to static question answering: agents must actively gather information from tools and sensors that are often noisy and incorrect. However, most existing active reasoning benchmarks either focus on environments where feedback is largely reliable or inject noise without providing an explicit, calibrated uncertainty signal about tool outputs, making it difficult to analyze how LLMs should reason with uncertain evidence. We introduce VAR, a novel benchmark for active reasoning under noisy visual feedback that is explicitly designed to evaluate text-only LLM reasoners: a fixed, off-the-shelf VLM is treated as a stochastic visual sensor, and the LLM must solve VQA problems solely by querying this sensor. For each sensor query, we draw multiple samples and expose a coarse uncertainty signal via self-consistency, enabling the reasoner to probe from different angles and decide what to ask next and when to stop. Our construction is automatic and scalable: starting from diverse VQA sources and two modern VLMs, we select instances where the sensor is inconsistent yet human-solvable. VAR thus provides a controlled playground to study how different LLMs exploit uncertainty signals for robust reasoning.

Paper Type: Short

Research Area: AI/LLM Agents

Research Area Keywords: AI/LLM Agents, Dialogue and Interactive Systems, Multimodality and Language Grounding to Vision, Robotics and Beyond

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 9926

Loading