Abstract: RAG has been widely applied to enhance LLMs' integration of external knowledge. Attributing the RAG-generated content, which provides citations to support responses, has attracted a lot of research interest. However, most existing studies focus on coarse-level attribution by linking claims to passages or documents, which still require certain time costs for verification. On the other hand, existing fine-grained attribution methods rely on fine-tuned LLMs to generate citations along with the content, which is expensive and hard to control. In this work, we introduce a simple yet effective Linguistic Aligned Matching (LAM) approach for sentence-level attribution, which follows a two-step process: refinement and matching. The refinement step aligns the expression of claims with expressions of retrieved documents using LLMs. The matching step then combines the claims and refined expressions to identify supporting sentences via vector-based matching.
Unlike traditional fine-grained attribution methods, LAM is training-free and can be seamlessly integrated into existing RAG systems.
Experiments across diverse domains and tasks demonstrate significant improvements, achieving an average 7.87% ROUGE-F1 gain on both short- and long-context datasets.
Paper Type: Short
Research Area: Generation
Research Area Keywords: retrieval-augmented generation
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 3839
Loading