TracVC: Tracing Verbalized Confidence of LLMs Back to Training Data

ACL ARR 2025 May Submission4742 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) can increase users’ perceived trust by verbalizing confidence in their outputs. However, prior work shows that LLMs often express overconfidence, which is misaligned with factual accuracy. To better understand the sources of this behavior, we propose \textbf{TracVC}, a method for \textbf{Trac}ing \textbf{V}erbalized \textbf{C}onfidence back to specific training data. We conduct experiments on OLMo models in a question answering setting, defining a model as \textit{truthful} when content-related training data—relevant to the question and answer, has greater influence than confidence-related data. Our analysis reveals that OLMo2-13B is often influenced by confidence-related data that is semantically unrelated to the query, suggesting that it may mimic linguistic markers of certainty. This finding highlights a fundamental limitation in current training regimes: LLMs may learn how to sound confident without understanding when confidence is warranted. Our analysis provides a foundation for improving LLMs' trustworthiness in expressing more truthful confidence.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: calibration/uncertainty, data influence
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 4742
Loading