Keywords: Uncertainty quantification, Large Language Model, Large Vision-Language Models, position paper
Abstract: This position paper argues LLM and LVLM reliability should go beyond hallucinations and integrate uncertainties.
Furthermore, the commonly used token-level uncertainty is insufficient and semantic-level uncertainty is key.
Token-based criteria, such as next-token entropy or maximum probability, work well in closed-world tasks where the output space is predefined and bounded. However, foundation models increasingly operate in open-world settings. The space of
answers is unbounded and queries may involve unseen entities, ambiguous phrasing, or complex reasoning. In such cases, token-level confidences may be misleading; outputs with high probability may
be semantically wrong, irrelevant, or hallucinatory.
We advocate shifting toward \textbf{semantic-level uncertainty} to capture uncertainty in the meaning of generated outputs.
By doing so, we can better characterize phenomena such as ambiguity, reasoning failures, and hallucination. We further argue that semantic uncertainty should become the primary lens through which we assess the reliability of foundation models in high-stakes applications, enabling more faithful, trustworthy, and transparent AI systems.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 16915
Loading