Teaching Models to Express Their Uncertainty in Words

Stephanie Lin; Jacob Hilton; Owain Evans

Teaching Models to Express Their Uncertainty in Words

Stephanie Lin, Jacob Hilton, Owain Evans

Published: 17 Oct 2022, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We show that a GPT-3 model can learn to express uncertainty about its own answers in natural language -- without use of model logits. When given a question, the model generates both an answer and a level of confidence (e.g. "90% confidence" or "high confidence"). These levels map to probabilities that are well calibrated. The model also remains moderately calibrated under distribution shift, and is sensitive to uncertainty in its own answers, rather than imitating human examples. For testing calibration, we introduce the CalibratedMath suite of tasks. We compare the calibration of uncertainty expressed in words ("verbalized probability") to uncertainty extracted from model logits. Both kinds of uncertainty are capable of generalizing calibration under distribution shift. We also provide evidence that GPT-3's ability to generalize calibration depends on pre-trained latent representations that correlate with epistemic uncertainty over its answers.

Submission Length: Regular submission (no more than 12 pages of main content)

Code: https://github.com/sylinrl/CalibratedMath

Assigned Action Editor: ~Yonatan_Bisk1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 188

Loading