Calibrating the Voice of Doubt: How LLMs Diverge from Humans in Verbal Uncertainty

17 Sept 2025 (modified: 16 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Trustworthy LLMs, Uncertainty Quantification, Verbal Uncertainty
Abstract: Humans naturally express uncertainty through verbal cues via uncertainty markers (e.g., “possible”, “likely”), yet existing Large Language Model (LLM) uncertainty quantification (UQ) methods primarily rely on response likelihood or semantic consistency, which are often computationally costly. Despite increasing interest in LLM reliability, it remains underexplored how LLMs diverge from humans in verbal uncertainty expression: Do LLMs share the same confidence level of uncertainty markers as humans? Can we quantify LLM uncertainty verbally? To address this gap, we study the divergence between humans and LLMs in verbal uncertainty expression. Specifically, we first collect a corpus of human uncertainty markers from the literature and systematically examine their alignment with LLMs. Our extensive experiments reveal that LLMs may encode verbal uncertainty with confidence levels that differ substantially from those of humans. To bridge this mismatch, we introduce VOCAL, a novel optimization-based algorithm that learns the confidence level for each uncertainty marker for LLMs. VOCAL achieves comparable performance on par with state-of-the-art sampling-based UQ methods over extensive experimental settings, with significantly reduced computational costs. Moreover, VOCAL disentangles the calibration mismatch and pin- points the confidence disparity between human and LLM verbal expressions. This work opens a new perspective on LLM UQ by grounding it in the verbal dimension of uncertainty expression, and offers insights into both model alignment and human–AI communication.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9350
Loading