SMART: Structural Entropy-Based Multi-perspective Abstraction for Retrieval in RAG Systems

ACL ARR 2025 May Submission7268 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Retrieval-Augmented Generation (RAG) systems address the factual inaccuracies limitation of Large Language Models (LLMs). However, the absence of principled methodologies for extracting facts has led current RAG systems to uninformative fact-abstractions. To address these challenges, we propose two principles for fact abstraction in RAG systems: the \emph{Information-Maximization Principle} and the \emph{Multi-Perspective Principle}. These principles reformulate the task of extracting facts into an optimization problem based on information-theoretic quantities, providing a reliable framework for fact abstraction. Building on these principles, we introduce the \emph{Structural Entropy-Based Multi-Perspective Abstraction for Retrieval Technique} (SMART). Extensive experiments on three real-world datasets demonstrate that SMART significantly improves RAG systems, achieving notable gains in multi-hop/perspective question-answering tasks. The source code is provided at https://anonymous.4open.science/r/SMART_CODES/.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: open information extraction; knowledge base construction;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 7268
Loading