TracVC: Tracing Verbalized Confidence of LLMs Back to Training Data

TracVC: Tracing Verbalized Confidence of LLMs Back to Training Data

ACL ARR 2025 May Submission4742 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) can increase users’ perceived trust by verbalizing confidence in their outputs. However, prior work shows that LLMs often express overconfidence, which is misaligned with factual accuracy. To better understand the sources of this behavior, we propose \textbf{TracVC}, a method for \textbf{Trac}ing \textbf{V}erbalized \textbf{C}onfidence back to specific training data. We conduct experiments on OLMo models in a question answering setting, defining a model as \textit{truthful} when content-related training data—relevant to the question and answer, has greater influence than confidence-related data. Our analysis reveals that OLMo2-13B is often influenced by confidence-related data that is semantically unrelated to the query, suggesting that it may mimic linguistic markers of certainty. This finding highlights a fundamental limitation in current training regimes: LLMs may learn how to sound confident without understanding when confidence is warranted. Our analysis provides a foundation for improving LLMs' trustworthiness in expressing more truthful confidence.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: calibration/uncertainty, data influence

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 4742

Loading