ChartAlignBench: Benchmark for Multi-Chart Grounding & Dense Alignment

ACL ARR 2025 May Submission5662 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Charts play important roles in visualization, reasoning, and communication in data analysis and idea exchange between humans. However, vision-language models (VLMs) still lack accurate understanding of the details and struggle to extract fine-grained structural information from charts. Such limitations in chart grounding also hinder their capability to compare multiple charts and reason about their difference. In this paper, we develop a novel ``\textbf{ChartA}lign \textbf{B}enchmark (\ours)'' to provide a full-spectrum evaluation of VLMs in chart grounding tasks, i.e., extracting tabular data, allocating visualization elements, and recognizing various attributes from charts of diverse types and complexities. We develop a JSON template to facilitate the calculation of evaluation metrics specifically designed for each grounding task. By applying a novel two-stage inference workflow, the benchmark can further evaluate VLMs' capability of aligning and comparing elements/attributes in two charts. Our analysis of evaluations on several recent VLMs sheds novel insights on their perception biases, weaknesses, robustness, and hallucinations in chart understanding. These observations expose the fine-grained discrepancies among VLMs in chart understanding tasks and indicate specific skills that need to be strengthened in existing VLMs.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: dense alignment, chart understanding
Languages Studied: N/A
Submission Number: 5662
Loading