Calibrating the Confidence of Large Language Models by Eliciting Fidelity

ACL ARR 2024 June Submission2227 Authors

15 Jun 2024 (modified: 30 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method, \textit{UF Calibration}, to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence} for large language models. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Calibration, Uncertainty
Contribution Types: Model analysis & interpretability, Reproduction study
Languages Studied: English
Submission Number: 2227
Loading