Keywords: generative models, diffusion language models
Abstract: Test-time scaling through reward-guided generation remains largely unexplored
for discrete diffusion models despite its potential as a promising alternative. In
this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel
test-time scaling method tailored to discrete diffusion that leverages reward-
guided noising-denoising transitions to progressively refine misaligned interme-
diate states. We formalize this process within a Multiple-Try Metropolis (MTM)
framework, proving convergence to the reward-aligned distribution. Unlike prior
methods that assume the current state is already aligned with the reward distribu-
tion and only guide the subsequent transition, our approach explicitly refines each
state in situ, progressively steering it toward the optimal intermediate distribution.
Across both text and image domains, we evaluate IterRef on diverse discrete dif-
fusion models and observe consistent improvements in reward-guided generation
quality. In particular, IterRef achieves striking gains under low compute budgets,
far surpassing prior state-of-the-art baselines. Code will be publicly released.
Primary Area: generative models
Submission Number: 2625
Loading