How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG

Lukas Aichberger; Kajetan Schweighofer; Mykyta Ielanskyi; Sepp Hochreiter

How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG

Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, Sepp Hochreiter

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: llm, generative models, language modelling, nlg, uncertainty estimation, aleatoric uncertainty, semantic entropy, importance sampling

TL;DR: We introduce SDLG, an efficient technique for accurately estimating aleatoric semantic uncertainty to detect LLM hallucinations.

Abstract: Large language models (LLMs) suffer from hallucination, where they generate text that is not factual. Hallucinations impede many applications of LLMs in society and industry as they make LLMs untrustworthy. It has been suggested that hallucinations result from predictive uncertainty. If an LLM is uncertain about the semantic meaning it should generate next, it is likely to start hallucinating. We introduce Semantic-Diverse Language Generation (SDLG) to quantify predictive uncertainty of LLMs. Our method detects if a generated text is hallucinated by offering a precise measure of aleatoric semantic uncertainty. Experiments demonstrate that SDLG consistently outperforms existing methods while being computationally the most efficient, setting a new standard for uncertainty estimation in NLG.

Submission Number: 58

Loading