Chart-MRAG: Benchmarking Multimodal Retrieval Augmented Generation on Chart-based Documents

Published: 03 May 2026, Last Modified: 03 May 2026ACL 2026 mainEveryoneCC BY 4.0
Abstract: Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, $\textbf{Chart-based MRAG}$, to address this limitation. To generate high-quality evaluation samples, we propose $\textbf{CHARGE} $($\textbf{CHAR}$t-based document question-answering $\textbf{GE}$neration), a semi-automatic framework for generating evaluation samples through multi-modal keypoint extraction, knowledge graph construction, and qa pair synthesis. By combining CHARGE with expert validation, we construct $\textbf{Chart-MRAG Bench}$, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our experiments reveal three critical limitations in current approaches: (1) unified multimodal embedding retrieval methods struggles in chart-based scenarios, (2) even with ground-truth retrieval, state-of-the-art Multimodal Large Language Models (MLLMs) achieve only 71.15\% Correctness and 80.74\% Coverage scores, and (3) Widely-used MLLMs demonstrate consistent text-over-visual modality bias. These findings highlight great challenges in processing information-dense visual formats.
Loading