Latent Adaptation with Masked Policy for Diffusion Language Models

Guangyan Sun; Chunshu Wu; Ang Li; Ying Nian Wu; Tong Geng

Latent Adaptation with Masked Policy for Diffusion Language Models

Guangyan Sun, Chunshu Wu, Ang Li, Ying Nian Wu, Tong Geng

19 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion language model, test-time scaling

TL;DR: Test Time Policy Gradient for Diffusion Language Models

Abstract: Diffusion large language models (dLLMs) have emerged as promising alternatives to autoregressive generation, yet their ability to refine reasoning at test time remains underexplored. We present $\textbf{LAMP}$ (Latent Adaptation with Masked Policy), a training-free framework for reward-guided \emph{latent policy optimization} in masked diffusion models. LAMP treats hidden token states as optimizable latents and adapts them per instance via policy-gradient updates, enabling direct reward feedback to shape the reasoning process without altering model parameters. To accommodate diffusion’s non-sequential decoding, we adopt a masked-policy strategy that selectively reopens and edits uncertain positions while preserving global consistency through re-inpainting. This design allows targeted latent edits to propagate coherently across the diffusion trajectory. Experiments on GSM8K, MATH-500, and AIME show consistent improvements over strong dLLM baselines. Our results establish reward-guided latent adaptation as a practical and effective axis for enhancing reasoning in diffusion-based language models.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 14739

Loading