Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas

ICLR 2026 Conference Submission19236 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models
TL;DR: We introduce a simple yet effective method for diffusion language models to perform variable-length generation.
Abstract: Diffusion Language Models (DLMs) present a compelling alternative to autoregressive models, offering flexible, any-order infilling without specialized prompting design. However, their practical utility is blocked by a critical limitation: the requirement of a fixed-length masked sequence for generation. This constraint severely degrades code infilling performance when the predefined mask size mismatches the ideal completion length. To address this, we propose \textsc{DreamOn}, a novel diffusion framework that enables dynamic, variable-length generation. \textsc{DreamOn} augments the diffusion process with two length control states, allowing the model to autonomously expand or contract the output length based solely on its own predictions. We integrate this mechanism into existing DLMs with minimal modifications to the training objective and no architectural changes. Built upon Dream-Coder-7B and DiffuCoder-7B, \textsc{DreamOn} achieves infilling performance on par with state-of-the-art autoregressive models on HumanEval-Infilling and SantaCoder-FIM and matches oracle performance achieved with ground-truth length. Our work removes a fundamental barrier to the practical deployment of DLMs, significantly advancing their flexibility and applicability for variable-length generation. Our code and models will be made publicly available.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 19236
Loading