Beyond What Is Shown: A Benchmark for Non-Explicit Causal Reasoning in Infographics

Beyond What Is Shown: A Benchmark for Non-Explicit Causal Reasoning in Infographics

ACL ARR 2026 January Submission8469 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal causal reasoning, Infographic understanding, Benchmark and evaluation, Vision–language models

Abstract: Recent advances in Vision-Language Models (VLMs) have shown strong performance in perception and reasoning, yet their ability to perform causal inference—an essential aspect of human cognition—remains underexplored in multimodal settings. We introduce InfoCausalQA, a benchmark for evaluating causal reasoning grounded in infographics that integrate structured visual data with textual context. InfoCausalQA consists of two tasks: Task 1 evaluates quantitative causal reasoning based on inferred numerical trends, while Task 2 targets semantic causal reasoning across five relation types—cause, effect, intervention, counterfactual, and temporal. We collect 494 infographic–text pairs from four public sources and generate 1,482 multiple-choice QA pairs using GPT-4o, followed by systematic human revision to ensure that questions require genuine visual grounding rather than surface-level cues. Experimental results show that current VLMs struggle with both quantitative and semantic causal reasoning, with particularly pronounced limitations in the latter. A human evaluation on 100 Task 2 samples further reveals a substantial performance gap, with humans achieving 77\% accuracy. These findings highlight the need to advance causal reasoning capabilities in multimodal AI systems.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: vision question answering, cross-modal information extraction, image–text matching

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 8469

Loading