Keywords: Optimal Transport, Large Language Model, Retrieval Augmented Generation, Graphs
TL;DR: Kurisu-G² is a document-structure–aware retrieval algorithm that leverages Fused Gromov–Wasserstein distance to build coherent, efficient, and logically consistent contexts, outperforming RAG and graph-based baselines.
Abstract: Retrieval-Augmented Generation (RAG) has become a standard paradigm for enriching
large language models with external knowledge, yet it often treats retrieved chunks inde-
pendently and overlooks their semantic and logical dependencies, leading to incoherent
or incomplete answers. GraphRAG addresses this by introducing graph-based context
representations, but it remains limited by the quality of the constructed graph, the heavy
reliance on LLMs for graph generation, and the lack of global logical consistency. In this
work, we propose an alternative perspective: leveraging principled graph-based similarity
measures, such as the Gromov–Wasserstein distance, to guide the retrieval, selection, and
unification of knowledge units. This approach preserves both the structural and relational
properties of the knowledge base, while enabling the enrichment of missing links that are
crucial for semantic integrity. We show that this perspective yields more coherent and
interpretable retrieval contexts compared to LLM-driven graph construction. Our results
highlight a promising path toward robust and logically consistent retrieval mechanisms in
RAG-based systems, with strong implications for high-stakes domains such as medicine
and law.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 18441
Loading