VizAgentBench: Benchmarking Multimodal Agent Reasoning on Coordinated Multi-View Visual Analytics Tasks

Guande Wu; Canwen Xu; Eden Wu; Zhewei Yao; Yuxiong He

VizAgentBench: Benchmarking Multimodal Agent Reasoning on Coordinated Multi-View Visual Analytics Tasks

Guande Wu, Canwen Xu, Eden Wu, Zhewei Yao, Yuxiong He

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: vision question answering, multimodality

TL;DR: VizAgentBench is a new visual question answering benchmark for LLM agents that evaluates their ability to perceive interactive visualizations, perform visualization interactions, and answer complex analytical questions.

Abstract: Multimodal Large Language Models (MLLMs) can now act as full-fledged desktop agents, yet their visual reasoning skills remain largely evaluated on single, static charts. Real decision making, however, happens in dashboards that combine multiple coordinated views (MCVs) and rely on rich interactions such as brushing, filtering, and drilling down. We introduce VizAgentBench, the first benchmark that challenges agents to perceive screenshots of a live MCV dashboard, issue declarative interaction commands, and answer analytical questions whose solutions may be hidden behind dynamic tooltips or axis changes. VizAgentBench is constructed by (1) surveying 14 visualization research papers to derive a design space of chart-and-interaction templates; (2) mining 10 public Kaggle datasets across finance, healthcare, sports, and socio-economics; and (3) generating 192 dashboards paired with same number of question–answer tasks using a large language model plus manual validation by a group of graduate students in data science. On our benchmark, state-of-the-art LLM agents achieve only 40% accuracy, revealing substantial headroom. We release the dashboards, data, and an open-source API that separates perception from action, lowering the barrier to agent research on interactive visualization.

Primary Area: datasets and benchmarks

Submission Number: 23509

Loading