Track: long paper (up to 8 pages)
Keywords: diffusion language models, block diffusion, discrete diffusion
TL;DR: BlockGen trains a single denoiser over multiple block sizes, with uniform diffusion and introduces AR-guided diffusion sampling to improve quality while maintaining throughput.
Abstract: Block Diffusion Models (BDMs) accelerate discrete diffusion by generating token blocks in parallel while supporting KV caching. However, existing BDMs are typically trained with a single, \emph{fixed} block size, limiting the trade-offs at inference time. Moreover, most BDMs use masked diffusion, where tokens cannot be revised once generated, limiting quality in parallel decoding scenarios. We introduce \emph{BlockGen}, a general framework for blockwise sequence modeling that trains a single model over a \emph{set} of block sizes and is compatible with arbitrary block conditionals. In this work, we instantiate BlockGen with \emph{uniform-state} discrete diffusion within each block. BlockGen achieves improved likelihood compared to fixed block-size training and higher sample quality with fewer denoising steps. Training on multiple block sizes enables hybrid samplers that combine autoregressive and diffusion predictions, substantially improving over pure block-by-block generation while preserving KV caching.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 24
Loading