Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-tuning

Ranganath Krishnan; Piyush Khanna; Omesh Tickoo

Enhancing Trust in Large Language Models via Uncertainty-Calibrated Fine-tuning

Ranganath Krishnan, Piyush Khanna, Omesh Tickoo

Published: 02 Mar 2026, Last Modified: 07 Mar 2026ICLR 2026 Trustworthy AIEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Uncertainty Estimation, Trustworthy AI, Large Language Models (LLMs), Vision-Language Models, Uncertainty-aware Fine-tuning

TL;DR: This paper proposes uncertainty calibration fine-tuning method for LLMs, introducing a novel uncertainty-aware causal language modeling loss to enhance the quality of uncertainty estimates in LLMs and VLMs.

Abstract: Large language models (LLMs) have achieved remarkable success in natural language generation, yet they remain prone to hallucinations and often exhibit miscalibrated overconfidence, even when producing incorrect outputs. Reliable uncertainty estimation is a key requirement for deploying LLMs safely, as it enables users and downstream systems to assess confidence, detect hallucinations, and identify out-of-domain prompts. In this work, we propose an uncertainty-calibrated fine-tuning approach that improves the reliability of LLMs in open-ended, free-form generation settings. Our method introduces a novel uncertainty-aware causal language modeling loss, grounded in decision-theoretic principles, that explicitly encourages well-calibrated uncertainty estimates. We conduct extensive empirical evaluations across multiple free-form question-answering datasets and model architectures, demonstrating that our approach consistently yields better uncertainty calibration compared to standard fine-tuning. Furthermore, the experimental results show that the proposed method substantially enhances the model’s ability to detect hallucinations and identify out-of-domain prompts.

Submission Number: 232

Loading