Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

ACL ARR 2025 May Submission6621 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Instruction fine-tuning (IFT) can increase the informativeness of large language models (LLMs), but may reduce their truthfulness. This trade-off arises because IFT steers LLMs to generate responses containing long-tail knowledge that was not well covered during pre-training. As a result, models become more informative but less accurate when generalizing to unseen tasks. In this paper, we empirically demonstrate how unfamiliar knowledge in IFT datasets can negatively affect the truthfulness of LLMs, and we introduce two new IFT paradigms, UNIT_cut and UNIT_ref, to address this issue. UNIT_cut identifies and removes unfamiliar knowledge from IFT datasets to mitigate its impact on model truthfulness, whereas UNIT_ref trains LLMs to recognize their uncertainty and explicitly indicate it at the end of their responses. Our experiments show that UNIT_cut substantially improves LLM truthfulness, while UNIT_ref maintains high informativeness and reduces hallucinations by distinguishing between confident and uncertain statements.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: fine-tuning, generalization
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 6621
Loading