Slice-Specific Few-Shot Recalibration of Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), in which the model’s confidence score reflects its prediction accuracy. However, while an LM may be well-calibrated on multiple domains combined, it can be significantly miscalibrated within each domain (e.g., overconfidence in math balances out underconfidence in history). In order to attain well-calibrated confidence estimates for each slice of the distribution, we propose a new framework for few-shot slice-specific recalibration. Specifically, we train a recalibration model that takes in a few unlabeled examples from a given slice and predicts the slice-specific precision scores at various confidence thresholds. Our trained model can recalibrate for new slices, without using any labeled data from that slice. This helps us identify domain-specific confidence thresholds above which the LM’s predictions can be trusted, and below which it should abstain. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods, for instance improving calibration error for PaLM2-Large on MMLU by 16%, as compared to temperature scaling.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview