Learning Multi-step Reasoning via Persistent Latent State Propagation

Published: 02 Mar 2026, Last Modified: 18 Mar 2026LIT Workshop @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 10 pages)
Keywords: Latent reasoning, iterative architectures, multi-step reasoning
Abstract: Large language models (LLMs) typically rely on chain-of-thought (CoT) prompting for multi-step reasoning. While effective, this paradigm is brittle, data-intensive, and tightly constrained by token-level generation. Recently, latent reasoning has emerged as an alternative approach. Representative methods, such as HRM and TRM, perform complex inference entirely within their hidden, continuous state space. However, existing studies typically remain confined to direct prediction tasks and operate over restricted representational vocabularies. In this paper, we propose a lightweight Step-wise Persistent Latent Reasoner (SPLR) that performs explicit multi-step reasoning while maintaining persistent hidden-state propagation across steps. SPLR outputs step-wise intermediate hypotheses, optionally consumes external observations, and refines a compact latent state via hierarchical latent dynamics instead of growing a long token-level trace. In controlled experiments on GSM8k-Aug where all architectures are trained from scratch, SPLR achieves 79.4% on MultiArith, substantially higher than the GPT-2 baseline (18.9%). This indicates strong robustness under distribution shift with only 89M parameters.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 92
Loading