Keywords: VLM, evaluation
TL;DR: We present Chart2CSV, a benchmark designed to evaluate the ability of VLMs to accurately and comprehensively extract data points from complex charts and convert them into CSVs.
Abstract: Charts are widely used for visualizing data in research findings, and many applications require extracting data from charts and converting it into structured tables for large-scale processing and analysis.
While vision-language models (VLMs) have shown promising results on chart digitization and understanding tasks, their effectiveness in fully automating this process remains unclear. Existing benchmarks fall short because (1) they contain overly simplified charts that do not reflect real-world complexity, (2) they fail to comprehensively evaluate critical model capabilities, including perception, reasoning, planning, and long-form output generation, and (3) they lack evaluations on both the completeness and accuracy of the structured outputs.
To systematically evaluate the performance of VLMs in extracting and structuring data from charts, we introduce Chart2CSV, a benchmark comprising 812 charts sourced from research papers across 5 scientific domains, paired with expert-validated ground-truth CSVs. In Chart2CSV, VLMs are tasked with extracting data points from these charts and converting them into CSVs. We evaluate 16 VLMs on Chart2CSV and find that even the best-performing model, Claude 3.5 Sonnet, misinterprets nearly half of the data points, underscoring the deficiency of existing VLMs in automating chart data extraction and structuring.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 14065
Loading