Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density

Zhaoye Li; Siyuan Shen; Wenjing Yang; Ruochun Jin; Huan Chen; ligong cao; Jing Ren

Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density

Zhaoye Li, Siyuan Shen, Wenjing Yang, Ruochun Jin, Huan Chen, ligong cao, Jing Ren

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: uncertainty quantification, large language models, trustworthy AI

Abstract: Large Language Models (LLMs) excel in language understanding but are susceptible to "confabulation," where they generate arbitrary, factually incorrect responses to uncertain questions. Detecting confabulation in question answering often relies on Uncertainty Quantification (UQ), which measures semantic entropy or consistency among sampled answers. While several methods have been proposed for UQ in LLMs, they suffer from key limitations, such as overlooking fine-grained semantic relationships among answers and neglecting answer probabilities. To address these issues, we propose Semantic Graph Density (SGD). SGD quantifies semantic consistency by evaluating the density of a semantic graph that captures fine-grained semantic relationships among answers. Additionally, it integrates answer probabilities to adjust the contribution of each edge to the overall uncertainty score. We theoretically prove that SGD generalizes the previous state-of-the-art method, Deg, and empirically demonstrate its superior performance across four LLMs and four free-form question-answering datasets. In particular, in experiments with Llama3.1-8B, SGD outperformed the best baseline by 1.52% in AUROC on the CoQA dataset and by 1.22% in AUARC on the TriviaQA dataset.

Latex Source Code: zip

Signed PMLR Licence Agreement: pdf

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission273/Authors, auai.org/UAI/2025/Conference/Submission273/Reproducibility_Reviewers

Submission Number: 273

Loading