Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: epistemic uncertainty, aleatoric uncertainty, uncertainty quantification, distribution-free, calibration, confidence intervals, reliability, misspecification, underfitting, hallucination, grouping loss, nonparameteric inference, generative models, language models
TL;DR: Training generative models with paired outputs yields provably-correct estimates of epistemic uncertainty without assumptions on the data-generating process.
Abstract: Identifying how much a model $\hat{p}\_{\scriptscriptstyle{Y|X}}^{\theta}$ knows about the stochastic real-world process $p\_{\scriptscriptstyle{Y|X}}$ it was trained on is important to ensure it avoids producing "hallucinated" answers or taking unsafe actions, but this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty). We propose a general strategy for decomposing these: train a model to predict *pairs* of independent responses drawn from the true distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. We prove that this strategy incentivizes models to become *second-order calibrated*, which allows you to both accurately estimate the gaps between $\hat{p}\_{\scriptscriptstyle{Y|X}}^{\theta}$ and also$p\_{\scriptscriptstyle{Y|X}}$ and construct decoding algorithms with bounded probability of generating an incorrect statement. Empirically, we show that our strategy outperforms other filtering methods on a synthetic language modeling task (describing digits of $\pi$).
Submission Number: 24
Loading