Elastic-MDM: Efficient Masked Diffusion Models with Variable Sequence Lengths

Seungwoo Lyu; Sangwon Jung; Taehoon Kim; Dohyun Kim; Paul Hongsuck Seo

Elastic-MDM: Efficient Masked Diffusion Models with Variable Sequence Lengths

Seungwoo Lyu, Sangwon Jung, Taehoon Kim, Dohyun Kim, Paul Hongsuck Seo

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: NLP, diffusion, LLM

Abstract: Discrete masked diffusion models (MDMs) enable parallel denoising with bidirectional context but incur unnecessary compute by encoding the entire sequence at every step and by assuming a fixed output length. We propose the Removed–Masked Diffusion Model (Elastic-MDM), which redesigns the state space by making a REMOVED token absorbing and excluding it from the Transformer input; a single reverse pass per step couples token denoising with a lightweight gap–count head that predicts how many removed tokens to (re)activate between consecutive unmasked tokens, enabling variable-length decoding. We derive a model-aligned objective without timestep weights and train with schedule randomization. Empirically, Elastic-MDM delivers substantial wall-time savings at similar quality on benchmark datasets, closely tracks the training length distribution without preset caps, and improves structured (JSON) synthesis. This shows that Elastic-MDM offers a simple, practical path to efficient, variable-length discrete diffusion.

Primary Area: generative models

Submission Number: 9175

Loading