Calibrating LLMs for Selective Prediction: Balancing Coverage and Risk

Yuzhen Mao; Thibaut Durand; Nazanin Mehrasa; Jiawei He; Martin Ester

Calibrating LLMs for Selective Prediction: Balancing Coverage and Risk

Yuzhen Mao, Thibaut Durand, Nazanin Mehrasa, Jiawei He, Martin Ester

Published: 08 Nov 2025, Last Modified: 08 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Trustworthy LLM; Selective prediction; Uncertainty control

Abstract: Despite the impressive capabilities of large language models (LLMs), their outputs often exhibit inconsistent correctness and unreliable factual accuracy. In high-stakes domains, overconfident yet incorrect predictions can lead to serious consequences, highlighting the need for robust uncertainty estimation. To address this, we introduce SelectLLM, an end-to-end method designed to enhance the ability of LLMs to recognize and express uncertainty effectively. By integrating selective prediction into finetuning, SelectLLM optimizes model performance over the covered domain, achieving a more balanced trade-off between predictive coverage and utility. Experimental results on TriviaQA, CommonsenseQA and MedConceptsQA show that SelectLLM significantly outperforms standard baselines, improving abstention behaviour while maintaining high accuracy.

Submission Number: 59

Loading