DPRM: A Plug-in Token-Ordering Module for Diffusion Language Models

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models
TL;DR: A Plug-in Token-Ordering Module for Diffusion Language Models
Abstract: Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train--test mismatch and myopic exploration. We introduce **DPRM** (Doob $h$-transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove $O(1/N)$ convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. We evaluate DPRM as a matched plug-in intervention across *diffusion pretraining, reasoning post-training, test-time scaling, multimodal protein diffusion, single-cell gene-expression diffusion, molecular drug-design diffusion, and DNA regulatory-sequence diffusion*. DPRM improves pretraining, post-training, test-time scaling and single-cell masked diffusion, with strong gains on harder reasoning subsets. In protein, molecular and DNA generation, ordering-aware variants improve selected structural, fragment-constrained or reward-specific metrics, though not every quality metric. These results identify token ordering as a fundamental control axis in diffusion language models and adjacent masked discrete diffusion systems. An anonymized code snapshot is available at: https://anonymous.4open.science/r/DPRM-DLLM-CBCA/README.md.
Submission Number: 27
Loading