Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision

Chaeyun Jang; Moonseok Choi; Seungyoo Lee; Yegon Kim; Hyungi Lee; Juho Lee

Verbalized Confidence Triggers Self-Verification: Emergent Behavior Without Explicit Reasoning Supervision

Chaeyun Jang, Moonseok Choi, Seungyoo Lee, Yegon Kim, Hyungi Lee, Juho Lee

20 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: calibration, reasoning, LLMs, self-verfication

TL;DR: CSFT trains LLMs to verbalize confidence, yielding better calibration, accuracy, self-verification, and cross-task generalization.

Abstract: Large language models (LLMs) are increasingly used as reasoning partners in domains such as mathematics, coding, and decision support, where reliable expression of confidence is essential for human–AI interaction. However, current LLMs often generate incorrect answers with high confidence, causing significant risks in downstream decision-making. Prior approaches for inducing verbalized confidence, including reinforcement learning or external probing, have shown limited generalization across complex reasoning and unseen tasks. We introduce Confidence-Supervised Fine-Tuning (CSFT), a simple method that trains models to output both an answer and an explicit confidence statement. CSFT substantially reduces calibration errors while also improving accuracy, induces emergent self-verification behaviors such as self-checking under low confidence, and reshapes token distributions into a locally smooth structure around correct answers. Furthermore, although trained only on reasoning tasks, CSFT generalizes to non-reasoning benchmarks such as MMLU. These results establish verbalized confidence as a scalable mechanism for improving calibration, reasoning, and generalization in LLMs.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 23342

Loading