Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings

Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings

TMLR Paper5768 Authors

29 Aug 2025 (modified: 03 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hallucinations remain a major safety bottleneck for large language models (LLMs), necessitating effective detection methods such as quantifying uncertainty in the model's generations. While traditional uncertainty measures based on token likelihoods fail to capture semantic uncertainty, recent approaches like Semantic Entropy (SE) and Kernel Language Entropy (KLE) focus on isolating the underlying semantic uncertainty of the LLM. However, these methods impose significant computational overhead beyond generating samples: they require numerous natural language inference (NLI) calls to compare outputs, limiting their use in latency-sensitive applications. We introduce \textbf{Semantic Embedding Uncertainty (SEU)}, a lightweight metric that directly measures semantic disagreement in embedding space. Like SE and KLE, SEU requires multiple model outputs, but crucially simplifies the subsequent analysis. SEU computes uncertainty as the average pairwise cosine distance between sentence embeddings---requiring only $M$ embedding model forward passes followed by $O(M^2)$ dot products, instead of $O(M^2)$ NLI forward passes. SEU thus facilitates real-time semantic uncertainty quantification in applications where latency is paramount. Experiments on question answering and reasoning tasks demonstrate that SEU achieves comparable or superior accuracy to SE and KLE while reducing inference latency by up to 100x, enabling its deployment in resource-constrained settings.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Vinay_P_Namboodiri1

Submission Number: 5768

Loading