Learning Efficient Latent Reasoning with Abstract Chain-of-Thought
Track: long paper (up to 10 pages)
Keywords: latent reasoning, reinforcement learning, abstract vocabulary, chain-of-thought, efficient reasoning
TL;DR: We train language models to reason using a reserved vocabulary through an iterative warm-up and reinforcement learning, achieving inference-time efficiency gains of up to 12x fewer tokens.
Abstract: While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate at inference-time. Recent latent reasoning methods have focused on leveraging continuous representations, yet have primarily focused on pre-training and do not demonstrate efficacy at scale, lagging behind verbalized CoT. We propose $\textbf{Abstract Chain-of-Thought}$, a discrete latent reasoning post-training mechanism in which the language model produces a short sequence of tokens from a reserved codebook in lieu of a natural language CoT, prior to generating a response. To make previously unseen “abstract” tokens useful, we introduce a policy iteration-style warm-up loop that alternates between (i.) bottlenecking from a verbal CoT via masking and performing supervised fine-tuning, and (ii.) self-distillation by training the model to generate abstract tokens from the prompt alone via constrained decoding from the codebook. After warm-up, we optimize the generation of abstract sequences with warm-started reinforcement learning under constrained decoding. For the Qwen3-8B and Granite-3.3-8B language models, Abstract-CoT achieves up to $12\times$ fewer reasoning tokens while achieving comparable performance across mathematical reasoning, instruction-following, and multi-hop reasoning in natural language. We also find emergent non-uniform usage over the codebook, resembling power law dynamics seen in pre-training. Our findings highlight the potential for post-training recipes that facilitate latent reasoning for inference efficiency while inducing new representational dynamics through the introduction of a new “reasoning language”.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Keshav_Ramji1
Format: Yes, the presenting author will attend in person if this work is accepted to the workshop.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Submission Number: 86
Loading