Keywords: chain-of-thought, latent variable models, hidden Markov model, sequential representations, uncertainty quantification, posterior inference, calibration
TL;DR: We model LLM reasoning chains as an HMM over latent step correctness, using exact forward-backward inference to propagate uncertainty and derive a principled reflection policy that outperforms heuristic baselines.
Abstract: Chain-of-thought prompting elicits multi-step reasoning from large language
models, yet existing approaches treat confidence at each step as an
independent signal. This independence assumption contradicts the autoregressive generation process, wherein errors at early steps propagate forward and corrupt downstream outputs, creating epistemic blind spots where a model appears locally certain but is globally unreliable, motivating sequence-level probabilistic inference over latent reasoning correctness. We introduce \emph{Probabilistic
Chain-of-Thought} (PCoT), which models a reasoning chain as a Hidden Markov
Model over latent step correctness and performs exact posterior inference via
the forward-backward algorithm. PCoT yields a principled answer confidence
$C_{\mathrm{final}}$ and a posterior-driven reflection policy that dominates raw-score threshold rules under the model. On MATH and GSM8K, PCoT reduces
Expected Calibration Error by $\mathbf{76\%}$ over the best heuristic
baseline and improves accuracy by $\mathbf{14.7}$ percentage points at a
$2\times$ token budget, while remaining robust across three confidence
estimators. Our analysis of \emph{sequential contamination}---whereby a
single upstream error suppresses posteriors of all downstream steps---
provides a formal explanation for why point-wise step scoring is
insufficient for reliable reasoning evaluation.
Submission Number: 9
Loading