Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: discrete diffusion models; inference time scaling
Abstract: Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior works typically focus on resampling or filtering within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time sampling algorithm that performs trajectory-level refinement, which can preserve generation perplexity under reward optimization. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional Sequential Monte Carlo kernel to resample them. Within this framework, we further analyze trade-offs across four key axes for inference-time scaling under fixed compute budgets: particle Gibbs iterations, sample count, denoising steps, and reward estimation cost. Analysis shows that scaling particle Gibbs iterations achieves the best reward–perplexity trade-off. Empirically, PG-DLM consistently outperforms prior methods on both MDLM and LLaDA-8B as base models across a wide range of compute budgets for reward-guided tasks, including toxicity and sentiment control as well as linguistic acceptability.
Primary Area: generative models
Submission Number: 14085
Loading