Llamas Know What GPTs Don't Show: Surrogate Models for Selective Classification

Vaishnavi Shrivastava; Percy Liang; Ananya Kumar

Llamas Know What GPTs Don't Show: Surrogate Models for Selective Classification

Vaishnavi Shrivastava, Percy Liang, Ananya Kumar

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: calibration, uncertainty estimation, large language models

TL;DR: To estimate the uncertainty of closed models not providing log probabilities, we propose ensembling their linguistic confidences with probabilities from other open models.

Abstract: To maintain user trust, large language models (LLMs) should signal low confidence on examples they get incorrect, instead of misleading the user. The standard approach of estimating confidence is to use the softmax probabilities of these models, but state-of-the-art LLMs such as GPT-4 and Claude do not provide access to these probabilities. We first study eliciting confidence linguistically---asking an LLM for its confidence in its answer---but we find that this leaves a lot of room for improvement (79\% AUC on GPT-4 averaged across 12 question-answering datasets---only 5\% above a random baseline). We then explore using a \emph{surrogate} confidence model---using a model where we do have probabilities to evaluate the original model's confidence in a given question. Surprisingly, even though these probabilities come from a different model, this method leads to higher AUC than linguistic confidences on 10 out of 12 datasets. Our best method mixing linguistic confidences and surrogate model probabilities gives state-of-the-art performance on all 12 datasets (85\% average AUC on GPT-4).

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8590

Loading