Learning from Past Experience: Confidence Expression Calibration in Language Models via Historical Evaluation

Learning from Past Experience: Confidence Expression Calibration in Language Models via Historical Evaluation

ACL ARR 2025 May Submission3040 Authors

19 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks, but they still generate inaccurate or false information with a confident tone. One of the possible solutions is to empower the LLM confidence expression capability, in which the confidence scores are well-aligned with the true probability of the generated answer being correct. However, leveraging the intrinsic ability of LLMs or the signals from the output logits of answers proves difficult in capturing the response uncertainty. Therefore, drawing inspiration from cognitive diagnostics, we propose Learning from Past experience (LePe) to enhance the capability for confidence expression. We first identify three key problems: (1) How to capture the inherent confidence of the LLM? (2) How to teach the LLM to express confidence? (3) How to verify the confidence expression of the LLM? Then we devise three phases in LePe to deal with these problems. Besides, to accurately capture the confidence of an LLM when constructing the training data, we design a complete pipeline including question preparation and answer sampling. Experimental results across multiple datasets demonstrate that our proposed method enables LLMs to provide reliable confidence scores.

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: Question Answering, Generation

Languages Studied: English

Submission Number: 3040

Loading