Keywords: Confidence Attribution; Large Language Models; Graph Information Bottleneck
Abstract: Large Language Models (LLMs) have achieved strong performance on complex reasoning tasks by generating step-by-step solution traces, but diagnosing where a reasoning trace might fail remains difficult. Confidence estimation (CE) provides reliability signals but is usually restricted to the final answer, offering only coarse diagnostics. While recent studies have explored stepwise diagnostics, existing methods rely on white-box access, such as token-level logits or fine-tuned models, which are infeasible for closed-source LLMs.
We introduce Stepwise Confidence Attribution, a black-box framework for diagnosing errors, requiring only access to generated reasoning traces.
Stepwise confidence attribution applies the Information Bottleneck (IB) principle to assign confidence scores at the step level, treating consensus structures across correct solutions as anchors of reliable reasoning with high confidence. Steps that do not align with these consensus patterns are assigned lower confidence.
We propose two complementary methods: (1) a non-parametric overlap-based approach (NIBS) that measures consistency without graph context, and (2) a Graph-based IB model (GIBS) that learns subgraphs through a differentiable mask to capture structural variability.
Through extensive experiments on mathematical reasoning and multi-hop question answering, we show that our framework reliably identifies low-confidence steps strongly correlated with reasoning errors. Moreover, incorporating step-level CE improves overall reasoning accuracy, yielding up to an 12.3\% accuracy gain. Our framework provides a practical diagnostic tool for enhancing the reliability of LLM reasoning. Code can be found at https://anonymous.4open.science/r/ICLR_2026_-2801.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 10543
Loading