Calibrating Language Models via Augmented Prompt Ensembles

Published: 23 Jun 2023, Last Modified: 05 Jul 2023DeployableGenerativeAIEveryoneRevisions
Keywords: Large Language Model; Uncertainty Estimation; Ensemble
Abstract: Large Language Models (LLMs) have achieved remarkable success, but often exhibit overconfidence and poor calibration, particularly after instruction-finetuning, which limits their reliability and applicability. To address this, we investigate ensembles, a technique known to enhance neural network calibration but underexplored in LLMs, possibly due to the computational cost of training and evaluating multiple LLMs. We introduce Calibration via Augmented Prompt Ensembles (CAPE), a practical approach to LLM ensembles that leverages the inherent prompt sensitivity of LLMs by augmenting prompts, e.g., by template paraphrasing or option permutation. Our method requires no additional training and can be efficiently evaluated in batch mode, yielding significant calibration improvements for instruction-tuned LLMs.
Submission Number: 45
Loading