Assessing Confidence in Large Language Models by Classifying Task Correctness using Similarity Features

Published: 05 Mar 2025, Last Modified: 05 Mar 2025QUESTION PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: uncertainty quantification, confidence estimation, large language models, similarity
TL;DR: We propose a UQ approach that treats confidence estimation as a probabilistic classification task, where one predicts the correctness of a generation using similarities with other generations for the same query as features.
Abstract: Uncertainty quantification (UQ) provides measures of uncertainty, such as a score of confidence in an LLM's generated output, and is therefore increasingly recognized as a crucial component of trusted AI systems. Black-box UQ methods do not require access to internal model information from the generating LLM and therefore have numerous real-world advantages, such as robustness to system changes, adaptability to choice of LLM including those with commercialized APIs, reduced costs, and substantial computational tractability. In this paper, we propose a simple yet powerful UQ approach that treats confidence estimation as a probabilistic classification task, where one predicts the correctness of a generation using similarities with other generations for the same query as features. This approach requires a small labeled dataset and can be either black-box or white-box, depending on the choice of additional features for the classifier, beyond the similarities. We conduct an empirical study using 6 datasets across question answering and summarization tasks, demonstrating that features based on pairwise similarities generally result in confidence estimates that are better calibrated and more predictive of correctness as compared to the closest baselines.
Submission Number: 28
Loading