Improving Sentence-level Attribution in RAG through Linguistic Aligned Matching

Improving Sentence-level Attribution in RAG through Linguistic Aligned Matching

ACL ARR 2025 February Submission3839 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: RAG has been widely applied to enhance LLMs' integration of external knowledge. Attributing the RAG-generated content, which provides citations to support responses, has attracted a lot of research interest. However, most existing studies focus on coarse-level attribution by linking claims to passages or documents, which still require certain time costs for verification. On the other hand, existing fine-grained attribution methods rely on fine-tuned LLMs to generate citations along with the content, which is expensive and hard to control. In this work, we introduce a simple yet effective Linguistic Aligned Matching (LAM) approach for sentence-level attribution, which follows a two-step process: refinement and matching. The refinement step aligns the expression of claims with expressions of retrieved documents using LLMs. The matching step then combines the claims and refined expressions to identify supporting sentences via vector-based matching. Unlike traditional fine-grained attribution methods, LAM is training-free and can be seamlessly integrated into existing RAG systems. Experiments across diverse domains and tasks demonstrate significant improvements, achieving an average 7.87% ROUGE-F1 gain on both short- and long-context datasets.

Paper Type: Short

Research Area: Generation

Research Area Keywords: retrieval-augmented generation

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 3839

Loading