Stoic Reasoner: Dual-Mode Transformers that Compress to Think and Decompress to Speak

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, latent reasoning, inference
TL;DR: Training paradigm for transformers to reason in both latent and language space.
Abstract: Latent reasoning has emerged as an alternative to reasoning with natural language and involves feeding back the last layer's representation (soft token) to the input of the transformer. This idea is promising, since soft tokens have higher representation capacity compared to the token in the vocabulary, $\textit{i.e.}$ hard tokens. Existing works on training transformers with soft tokens often suffer from performance loss, while in some cases sampling diverse outputs from the model can be challenging. We propose a training paradigm called $\textbf{Stoic Reasoner}$ ($\underline{\text{S}}$oft $\underline{\text{TO}}$ken $\underline{\text{I}}$mplicit $\underline{\text{C}}$ontext $\underline{\text{Reasoner}}$) for transformers that uses soft tokens, in which the model learns to operate in two modes; one that processes the soft tokens (latent thinking mode) and one that decompresses the soft tokens into few reasoning steps with hard tokens from the vocabulary (local decoding mode). We focus on logical and math reasoning tasks, and fine-tune pretrained models of different size. Our method achieves similar or better performance, compared to supervised fine-tuning with Chain-of-Thought data across all tasks; while it requires reduced KV cache and allows sampling different reasoning traces at inference.
Submission Number: 140
Loading