Beyond Accuracy: Unveiling the Cognitive Mechanisms of MLLM Failure in Misleading Visualizations

ACL ARR 2026 January Submission6479 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Misleading Visualizations;Multimodal Large Language Models;
Abstract: While Multimodal Large Language Models demonstrate exceptional proficiency in chart understanding, their robustness against misleading visualizations remains a critical bottleneck. Existing benchmarks predominantly utilize flat taxonomies, conflating visual noise with semantic manipulation, thereby obscuring the deeper cognitive states behind model failures. To address this, we propose CoVis, a benchmark grounded in a four-layer cognitive taxonomy (Perception-Mapping-Reasoning-Logic) to ensure comprehensive coverage of the adversarial landscape. Using our proposed Knowledge-Level Uncertainty (KLU), we identify a systematic Cognitive Bifurcation: model failures collapse into either Cognitive Denial (confusion due to visual obstructions) or Cognitive Hijacking (delusion driven by semantic inducements). Evaluations across 10 state-of-the-art models demonstrate striking cross-model consistency in these failure mechanisms. Notably, we observe a pronounced Textual Dominance effect, where MLLMs often prioritize deceptive text labels over conflicting visual evidence, leading to high-confidence hijacking. Overall, CoVis establishes a mechanism-based evaluation paradigm, shifting the focus from surface-level error correction to the governance of internal cognitive states.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: safety and alignment;security and privacy;
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 6479
Loading