Abstract: Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer
data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong
AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce
Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data,
and training pipeline. To enable efficient knowledge learning and stable training, we incorporate
a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-
wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall
outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on
the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than
a wide range of ∼8B ARs and DLLMs, demonstrating that diffusion-based training can improve
code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling
improves structured code modeling for editing and reasoning, and through data augmentation,
benefits low-resource coding languages.
Loading