Abstract: Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text with in-line citations, demonstrate potential in mitigating hallucinations and improving verifiability. Nonetheless, current attributed LLMs suffer from suboptimal citation quality due to their reliance on in-context learning or post-hoc retrieval, lacking a built-in attribution mechanism. Moreover, the practice of merely citing document identifiers falls short in aiding users to pinpoint specific supporting evidence. To bridge these gaps, this work introduces FRONT, a training framework that advances the verification process in attributed LLMs through Fine-grained grounded citations. It equips LLMs with the ability to first anchor in fine-grained supporting quotes, which then guide the generation of attributed answers. Grounded quotes not only elevate LLM attribution quality but also serve as a mechanism for fine-grained verification, significantly enhancing information traceability. Experiments on the ALCE benchmark demonstrate the efficacy of FRONT in generating superior grounded responses and highly supportive citations. With LLaMA-2-7B, the framework significantly outperforms all the baselines, even surpassing ChatGPT, by achieving an average outperformance of 14.21% across all datasets. Notably, FRONT implements an automated procedure and exhibits generalization across models and data scales, enabling continuous performance improvements.
Paper Type: long
Research Area: Question Answering
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
0 Replies
Loading