Abstract: Existing benchmarks in chart-based visual question answering (VQA) often fail to evaluate visual cognitive load variations in vision-language models (VLMs) and lack structured multi-visual context reasoning. We introduce \textit{InterChart}, a novel benchmark designed to assess multi-visual context reasoning across varying levels of cognitive complexity. \textit{InterChart} comprises 5,214 carefully crafted QA pairs spanning 983 multi-chart visual contexts, structured into three distinct sets of breadth-to-depth cognitive context load. The dataset covers a spectrum of approaches and reasoning tasks, including decomposition, numerical analysis, entity inference and more. We conduct a comprehensive baseline evaluation across multiple VLMs, exploring different prompting strategies and a chart-to-table multi-table paradigm. Our results underscore the importance of structured cognitive decomposition in enhancing chart-based reasoning and highlight critical gaps in existing VLM capabilities.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Visual Question Answering, Multi-Modal Reasoning, Vision-Language Models, Cognitive Load in AI, Chart-to-Table Representation, ChartVQA
Contribution Types: Model analysis & interpretability, Data resources, Data analysis, Position papers
Languages Studied: English
Submission Number: 8197
Loading