Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
Keywords: Large Language Models, Inference-Time Compute, Error Correction, Mechanistic Interpretability, Mathematical Reasoning, KV-Cache Steering
TL;DR: We introduce a training-free, inference-time method that corrects mid-generation LLM reasoning errors via residual stream monitoring, outperforming standard models at a lower token cost.
Abstract: Large language models frequently commit unrecoverable reasoning errors
mid-generation: once a wrong step is taken, subsequent tokens compound
the mistake rather than correct it.
We introduce $\textbf{Latent Phase-Shift Rollback}$ (LPSR): at each
generation step, we monitor the residual stream at a critical layer
$l_{crit}$, detect abrupt directional reversals (phase shifts) via
a cosine-similarity $+$ entropy dual gate, and respond by rolling back
the KV-cache and injecting a pre-computed steering vector.
No fine-tuning, gradient computation, or additional forward passes are
required.
LPSR achieves $\mathbf{44.0\%}$ on MATH-500 with an 8B model versus
$28.8\%$ for standard AR ($+15.2$ pp; McNemar $\chi^2 = 66.96$,
$p < 10^{-15}$).
Critically, prompted self-correction, the most natural inference-time
baseline, scores only $19.8\%$, below standard AR; LPSR exceeds
it by $+24.2$ pp ($\chi^2 = 89.4$, $p \approx 0$).
LPSR also outperforms Best-of-16 ($+7.8$ pp) at $5.4\times$ lower token
cost, and surpasses a standard 70B model ($35.2\%$) with $8.75\times$
fewer parameters at ${\sim}3\times$ the token budget.
A 32-layer sweep reveals a novel $\textbf{detection-correction
dissociation}$: error-detection AUC peaks at layer-14 ($0.718$) but
task accuracy peaks at layer-16 ($44.0\%$ vs. $29.2\%$),
demonstrating that optimal monitoring depth differs for detection and
correction.
Paper Type: Long (8 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 42
Loading