LATTS: LAtent space Test Time Scaling for diffusion language models

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Large Language Model, Test Time Scaling, Latent Reasoning
TL;DR: We introduce LATTS, a latent space test time scaling method to enhance the performance of masked diffusion language models.
Abstract: Test-time scaling (TTS) improves the performance of autoregressive (AR) large language models by adding computation at inference. While the prominent sequential TTS enhances accuracy by inducing models to generate longer chain-of-thought (CoT) reasoning, its computational overhead emerges as a drawback. Meanwhile, diffusion large language models (DLLMs) have emerged as a promising alternative that offers parallel decoding and self-correction capabilities. However, existing sequential TTS methods are incompatible with modern masked DLLMs. This incompatibility arises from two fundamental constraints: (1) standard DLLMs operate holistically on fixed-length sequences, preventing the dynamic token-level expansion required for CoT \revision{without specific training}, and (2) the intrinsic coupling between refinement (i.e., denoising) steps and sequence length in standard DLLM formulations, restricting effective extension without delicate designs. We introduce LATTS, a novel sequential TTS method for DLLMs that addresses the above challenges by operating in the latent embedding space. LATTS reframes CoT reasoning from a \emph{spatial} process of extending sequence length to a \emph{temporal} process that uses additional computation to extend the iterative self-refinement steps over the entire sequence's latent representation. Our evaluation on the LLaDA-Instruct model shows that: \revision{with a brief post-training phase}, LATTS achieves notable improvements over SFT baselines on reasoning and code generation benchmarks with gains of +4.1\% on GSM8K, +4.8\% on MATH, +3.2\% on MBPP, and an average of +4.6\% on commonsense reasoning tasks with minimal additional inference computation. These results establish sequential TTS as a promising technique for optimizing DLLMs.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 7519
Loading