MMGraphRAG: Multi-modal Graph Retrieval-Augmented Generation for Document-level Question Answering

MMGraphRAG: Multi-modal Graph Retrieval-Augmented Generation for Document-level Question Answering

ACL ARR 2025 May Submission1976 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Retrieval-Augmented Generation (RAG) has been widely utilized to integrate external knowledge into Large Language Models (LLMs) for enhancement on question answering tasks. Recently, graph-augmented RAG approaches have demonstrated stronger support for more accurate, context-aware responses. However, most existing methods solely encompass textual information, resulting in suboptimal performance in multi-modal scenarios. To address this issue, in this paper, we propose MMGraphRAG, a novel framework for graph-augmented RAG. Our MMGraphRAG consists of two stages: Multi-modal Graph Construction and Cross-modal Unified Retrieval. The construction stage first integrates visual contents along with textual ones into a knowledge graph, then a unified retrieval mechanism is employed to aggregate evidence for answer generation. Experiments on three benchmarks across different modalities indicate that MMGraphRAG effectively enhances the question answering capabilities of LLMs when processing visually rich contents. Performance measured by F1 score shows that our framework outperforms all baselines with improved quality of answer generation and generalizability on modalities. The code will be available soon.

Paper Type: Long

Research Area: Generation

Research Area Keywords: retrieval-augmented generation,text-to-text generation,inference methods

Languages Studied: English

Keywords: retrieval-augmented generation, text-to-text generation, inference methods

Submission Number: 1976

Loading