Scaling with Recursion in Masked Discrete Diffusion Models

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: discrete diffusion, recursion, looped models
Abstract: Masked diffusion models (MDMs) have emerged as a promising paradigm for language generation. However, current architectures typically apply a denoising transformer only once per diffusion step, while scaling performance primarily through larger parameter counts. This approach can be inefficient in memory-constrained settings, and may underutilize computation as a source of improved performance. We introduce \textbf{recursive masked diffusion models}, which are trained to repeatedly apply the same transformer block within each denoising step, enabling iterative refinement of generated tokens through parameter reuse. Empirically, we show that recursive MDMs achieve substantially improved parameter efficiency: a model with $L$ recursive loops approaches the performance of an iso-parameter baseline containing roughly $L \times$ more parameters on structured generation tasks. Moreover, recursive computation within a denoising step can partially substitute for additional diffusion steps, as recursive models often require fewer denoising iterations to match the quality of single-pass baselines. These findings identify recursive depth as a distinct and principled scaling axis for masked diffusion models, complementary to both model size and number of denoising steps.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 10
Loading