GLARE: Towards Graph-less Retrieval for Retrieval Augmented Generation on Million-scale Knowledge Graphs

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: RAG, graph learning, LLMs
Abstract: Retrieval-augmented generation (RAG) has emerged as an effective solution to mitigate hallucinations in Large Language Models (LLMs) by retrieving from an external knowledge base. Recent works have explored KG-based RAG, which leverages knowledge graphs (KGs) to incorporate rich relational information. However, existing methods suffer from high retrieval latency, as the retriever model needs to directly operate over graph space, thereby hindering their scalability to large-scale KGs. We propose GLARE, a scalable KG-based RAG framework that enables fast and accurate information processing over million-scale KGs. Specifically, GLARE compresses large KGs into a compact, knowledge-intensive vector memory enabling efficient retrieval without searching over an exponentially vast graph space. To preserve critical information, we further design a non-parametric, importance-aware graph pooling strategy and a VAE-style projector that reconstructs relational structures from the vector memory. At inference time, GLARE enables linear-time retrieval from the vector memory, significantly accelerating KG-based question-answering (QA) while maintaining high response quality. Evaluations on the STaRK benchmark across multiple domains demonstrate that GLARE achieves over $\times 30,000$ retrieval speedup with improved question-answering performance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13314
Loading