Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

ACL ARR 2026 January Submission6020 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: visualization agents, visual grounding, chart question answering, Chart Understand and Reasoning, tool use, benchmark
Abstract: Coding agents rely on Vision-Language Models (VLMs) to verify the charts they generate, but VLMs frequently hallucinate data values from rendered pixels and fail to detect fine-grained errors. We propose Introspective and Interactive Visual Grounding (IVG), a framework with two complementary mechanisms: (1) spec-grounded introspection, which provides direct access to the agent's own chart specification for deterministic value verification; and (2) view-grounded interaction, which allows agents to manipulate the view (zoom, pan, toggle traces) to resolve ambiguity in overlapping regions. To evaluate this framework, we introduce iPlotBench, a benchmark of 500 interactive Plotly figures with 6,706 binary questions and released ground-truth specifications. Our experiments show that spec-grounded introspection improves data reconstruction fidelity in chart recreation, while view-grounded interaction yields the largest gains on questions requiring spatial reasoning over overlapping geometries. The combination achieves the highest overall QA accuracy, demonstrating that these two capabilities are complementary for building trustworthy visualization agents.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM/AI agents, vision question answering, evaluation methodologies, benchmarking
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 6020
Loading