Abstract: We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2–3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open- and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: multimodal representation learning, visual question answering, evaluation benchmarks, visual grounding, semantic parsing, vision-language models, chart understanding, reasoning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Previous URL: https://openreview.net/forum?id=wFgbH9IOlA
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 2
B2 Discuss The License For Artifacts: No
B2 Elaboration: We will release the data sheet which will have license information
B3 Artifact Use Consistent With Intended Use: No
B3 Elaboration: The dataset is source from public open source dataset.
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: Ethics Statement (No personal or sensitive dataset in the paper, as source form public dataset)
B5 Documentation Of Artifacts: No
B5 Elaboration: We will release the data sheet which will all detailed information
B6 Statistics For Data: Yes
B6 Elaboration: Section 2
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Appendix D: Model and Compute Details
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Appendix A: Prompt Templates , Appendix B: Flowcharts : Evaluation pipeline overview
C3 Descriptive Statistics: Yes
C3 Elaboration: Appendix E: Individual Evaluation Results
C4 Parameters For Packages: Yes
C4 Elaboration: Section 3
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Section 2.4, Ethics Statement
D2 Recruitment And Payment: Yes
D2 Elaboration: Section 2.4
D3 Data Consent: Yes
D3 Elaboration: Section 2, Appendix B, Appendix C
D4 Ethics Review Board Approval: N/A
D4 Elaboration: Ethics Statement, The dataset is not sensitive and doesn't contain human subject
D5 Characteristics Of Annotators: Yes
D5 Elaboration: Ethics Statement, Volunteers, Authors
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethics Statement
Author Submission Checklist: yes
Submission Number: 1390
Loading