Calibrating the Confidence of Large Language Models by Eliciting Fidelity

Calibrating the Confidence of Large Language Models by Eliciting Fidelity

ACL ARR 2024 June Submission2227 Authors

15 Jun 2024 (modified: 30 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method, \textit{UF Calibration}, to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence} for large language models. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Calibration, Uncertainty

Contribution Types: Model analysis & interpretability, Reproduction study

Languages Studied: English

Submission Number: 2227

Loading