Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models
TL;DR: We develop a training framework to convert pretrained autoregressive language models into diffusion language models that excel in speed.
Abstract: The token-by-token decoding nature of autoregressive (AR) language models limits their generation throughput, especially in common memory-constrained scenarios. To address this, diffusion language models (dLMs) have emerged as a promising paradigm to enable parallel, non-autoregressive generation for higher throughput. However, existing dLMs have either failed to deliver faster speeds than AR models or have been restricted to small model scales due to high training costs, resulting in limited capability. To this end, we build on pretrained AR models and develop a training framework to convert them into dLMs that excel in speed. First, we introduce a continuous pretraining scheme with a block-wise attention pattern that remains causal across blocks while enabling bidirectional modeling within each block, which we find to better preserve pretrained models' abilities than the fully bidirectional modeling used in prior work such as Dream. Second, to mitigate the training–test gap in mask token distributions, we propose a position-dependent token masking strategy that assigns higher masking probabilities to later tokens. Leveraging this framework, we conduct extensive studies of dLMs’ attention patterns, training dynamics, and other design choices, providing actionable insights into scalable AR-to-dLM conversion. We also deliver the Efficient-DLM model family, which outperforms state-of-the-art AR models and dLMs with better accuracy–throughput trade-offs, e.g., Efficient-DLM 4B achieves +1.88% higher accuracy with 4.63x throughput compared to Dream 7B, and +7.79% accuracy with 1.82x throughput compared to Qwen3 1.7B.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 7978
Loading