iG²RAG: Information Gain Graph-based Retrieval-Augmented Generation

iG²RAG: Information Gain Graph-based Retrieval-Augmented Generation

ACL ARR 2026 January Submission1827 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Graph, Information Gain, Personalized PageRank

Abstract: Existing Retrieval-Augmented Generation (RAG) methods predominantly retrieve documents by relying on surface-level similarity to the query or by aligning entities and relations to the graph. However, in multi-hop scenarios, such strategies frequently yield insufficient or redundant retrieval, failing to account for document complementarity and potential information gain, which are critical for effectively reducing query uncertainty. To address this issue, we introduce iG$^2$RAG, which constructs a novel information Gain Graph as the foundation of the retrieval process. In the offline phase, we treat documents as nodes, mine similar neighbors to form subgraphs, and use a Large Language Model (LLM) to evaluate these neighbors, creating an information gain graph. During the online query phase, seed nodes are identified based on the query, and the Personalized PageRank (PPR) algorithm is applied to iteratively retrieve the optimal set of documents with high information gain. This process simulates foraging behavior, where the information gain graph acts as a map and PPR mimics the search for better food. Our experiments show that iG$^2$RAG outperforms baselines on multi-hop datasets, achieving state-of-the-art results and validating the framework's effectiveness.

Paper Type: Long

Research Area: Information Extraction and Retrieval

Research Area Keywords: passage retrieval; dense retrieval; document representation

Contribution Types: NLP engineering experiment, Data analysis

Languages Studied: English

Submission Number: 1827

Loading