FLUE: Streamlined Uncertainty Estimation for Large Language Models

Published: 01 Jan 2025, Last Modified: 16 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Uncertainty estimation is essential for practical applications such as decision-making, risk assessment, and human-AI collaboration. However, Uncertainty estimation in open-ended question-answering (QA) tasks presents unique challenges. The output space for open-ended QA is vast and discrete, and the autoregressive nature of LLMs, combined with the rapid increase in model parameters, makes inference sampling significantly costly. An ideal uncertainty estimation for LLMs should meet two criteria: 1) incur no additional inference cost and 2) capture the semantic dependencies of token-level uncertainty within sequences. We propose a promising solution that converts redundancy into randomness in the extensive parameters of LLMs to quantify knowledge uncertainty. We can obtain token-level Monte Carlo samples without multiple inferences by introducing randomness during a single forward pass. We theoretically analyze the FLUE sampling method and employ a post-processing method to learn the state transitions from token uncertainty to sequence uncertainty. In open-ended question-answering tasks, we demonstrate that FLUE can achieve competitive performance in estimating the uncertainty of generated sentences without adding extra inference overhead.
Loading