Large Language Model Confidence Estimation via Black-Box Access

TMLR Paper4371 Authors

27 Feb 2025 (modified: 26 Jun 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. We empirically demonstrate that our simple framework is effective in estimating confidence of Flan-ul2, Llama-13b, Mistral-7b and GPT-4 on four benchmark Q&A tasks as well as of Pegasus-large and BART-large on two benchmark summarization tasks with it surpassing baselines by even over 10% (on AU-ROC) in some cases. Additionally, our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery that our confidence models built for one LLM generalize zero-shot across others on a given dataset.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=ZKdDLhshDA&nesting=2&sort=date-desc
Changes Since Last Submission: Changes made for the camera ready version by incorporating the AC's and other reviewers' suggestions 1. We modified Figure 1 to a flowchart to provide an overview of the end to end algorithm. 2. Changed the listings and the algorithm from pythonic style to pseudocodes for better readability. 3. We added a paragraph in the discussion detailing the rationale behind choosing models such as Bart, Flan-UL2 etc. Furthermore, we also describe why our results will apply to the latest models.
Assigned Action Editor: ~Xingchen_Wan1
Submission Number: 4371
Loading