KnowTrace: Explicit Knowledge Tracing for Structured Retrieval-Augmented Generation

Rui Li; Quanyu Dai; Zeyu Zhang; Xu Chen; Zhenhua Dong; Ji-Rong Wen

KnowTrace: Explicit Knowledge Tracing for Structured Retrieval-Augmented Generation

Rui Li, Quanyu Dai, Zeyu Zhang, Xu Chen, Zhenhua Dong, Ji-Rong Wen

14 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge Graph, Retrieval-Augmented Generation, Multi-Hop Question Answering, Multi-Step Reasoning

TL;DR: We introduce a structured RAG paradigm (KnowTrace) that seamlessly integrates knowledge structuring and multi-step reasoning for improved MHQA performance.

Abstract: Recent advances in retrieval-augmented generation (RAG) furnish large language models (LLMs) with iterative retrievals of relevant information to strengthen their capabilities in addressing complex multi-hop questions. However, these methods typically accumulate the retrieved natural language text into LLM prompts, imposing an increasing burden on the LLM to grasp the underlying knowledge structure for high-quality multi-step reasoning. Despite a few attempts to reduce this burden by restructuring all retrieved passages or even entire external corpora, these efforts are afflicted with significant restructuring overhead and potential knowledge loss. To tackle this challenge, we introduce a new structured paradigm (KnowTrace) from the perspective of explicit knowledge tracing, which treats LLM as an agent to progressively acquire desired knowledge triplets during iterative retrievals and ultimately trace out a specific knowledge graph conditioned on the input question. This paradigm clearly unveils the logical relationships behind the unstructured text and thus can directly facilitate LLM’s inference. Notably, it also naturally inspires a reflective mechanism of knowledge backtracing to identify supportive evidence and filter out useless retrievals in the correct trajectories, thus offering an effective way to stimulate LLM’s self-taught finetuning. Extensive experiments demonstrate the superiority of our paradigm over three standard multi-hop question answering benchmarks. Our code is available at https://github.com/xxrep/SRAG.

Primary Area: learning on graphs and other geometries & topologies

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 654

Loading