Keywords: Retrieval-Augmented Generation, Graph, Information Gain, Personalized PageRank
Abstract: Existing Retrieval-Augmented Generation (RAG) methods predominantly retrieve documents by relying on surface-level similarity to the query or by aligning entities and relations to the graph. However, in multi-hop scenarios, such strategies frequently yield insufficient or redundant retrieval, failing to account for document complementarity and potential information gain, which are critical for effectively reducing query uncertainty. To address this issue, we introduce iG$^2$RAG, which constructs a novel information Gain Graph as the foundation of the retrieval process. In the offline phase, we treat documents as nodes, mine similar neighbors to form subgraphs, and use a Large Language Model (LLM) to evaluate these neighbors, creating an information gain graph. During the online query phase, seed nodes are identified based on the query, and the Personalized PageRank (PPR) algorithm is applied to iteratively retrieve the optimal set of documents with high information gain. This process simulates foraging behavior, where the information gain graph acts as a map and PPR mimics the search for better food. Our experiments show that iG$^2$RAG outperforms baselines on multi-hop datasets, achieving state-of-the-art results and validating the framework's effectiveness.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: passage retrieval; dense retrieval; document representation
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 1827
Loading