Aligning Large Language Model Behavior with Human Citation Preferences

Aligning Large Language Model Behavior with Human Citation Preferences

ICLR 2026 Conference Submission25371 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Citaion, Credibility

TL;DR: Across 8 content types, LLMs over-cite “Citation needed” (up to +27%) and under-cite numeric (−22.6%) and person-name (−20.1%) sentences vs humans; DPO improves alignment by ~5.76%. Data/code will be released upon publication.

Abstract: Most services built on powerful large-scale language models (LLMs) add citations to their output to enhance credibility. Recent research has paid increasing attention to the question of what reference documents to link to outputs. However, how LLMs recognize cite-worthiness and how this process should be controlled remains insufficiently explored. In this study, we focus on what kinds of content LLMs currently tend to cite and how well that behavior aligns with human preferences. We construct a dataset to characterize the relationship between human citation preferences and LLM behavior. Web-derived texts are categorized into eight citation-motivation types, and pairwise citation preferences are exhaustively evaluated across all type combinations to capture fine-grained contrasts. Our results show that humans most frequently seek citations for medical text, and stronger models display a similar tendency. We also find that current models are as much as 27% more likely than humans to add citations to text that is explicitly marked as needing citations on sources such as Wikipedia, and this overemphasis reduces alignment accuracy. Conversely, models systematically underselect numeric sentences (by -22.6% relative to humans) and sentences containing personal names (by -20.1%), categories for which humans typically demand citations. Furthermore, experiments with fine-tuning and Direct Preference Optimization (DPO) demonstrate that model behavior can be calibrated to better match human citation preferences. We expect this study to provide a foundation for more fine-grained investigations into LLM citation preferences. Our dataset and code will be released upon publication.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 25371

Loading