SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Yung-Sung Chuang; Benjamin Cohen-Wang; Zejiang Shen; Zhaofeng Wu; Hu Xu; Xi Victoria Lin; James R. Glass; Shang-Wen Li; Wen-tau Yih

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Yung-Sung Chuang, Benjamin Cohen-Wang, Zejiang Shen, Zhaofeng Wu, Hu Xu, Xi Victoria Lin, James R. Glass, Shang-Wen Li, Wen-tau Yih

Published: 01 May 2025, Last Modified: 26 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

TL;DR: We designed a self-supervised reward to align LLMs for generating better citations to attribute the context when answering to questions, without human supervision.

Abstract: We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through *context ablation*: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks. The source code is available at https://github.com/facebookresearch/SelfCite.

Lay Summary: LLMs can support their answers with citations, but these citations are usually not accurate or fine-grained enough. Current methods rely on human-labeled data to train LLMs to generate citations, which is costly and time-consuming. We built SelfCite, a self-rewarding system that enables LLMs to self-teach and judge their own citations without additional human labeling. The idea is simple: if removing cited sentences from the source documents changes the LLM’s answer, those sentences were necessary; if keeping only those sentences still yields the same answer, they were sufficient. By rewarding the LLM based on these two checks, SelfCite helps it choose better citations—ones that are both necessary and sufficient—as it generates responses. The LLM drafts several possible citations and selects the best candidate with the highest self-score. We can also train the model to directly produce better candidates. On a challenging benchmark of long, open-ended questions, this approach improved citation quality by up to five points. Clearer citations make it easier for journalists, educators, and everyday readers to trust or critically assess AI-generated content, bringing us closer to transparent and verifiable AI systems.

Link To Code: https://github.com/facebookresearch/SelfCite

Primary Area: Deep Learning->Large Language Models

Keywords: Large Language Models, LLMs, Alignment, Preference Optimization, Context Attribution, Citation

Submission Number: 8479

Loading