BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

Justin Deschenaux; Caglar Gulcehre

BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

Justin Deschenaux, Caglar Gulcehre

Published: 02 Mar 2026, Last Modified: 12 Mar 2026ICLR 2026 Workshop MM Intelligence PosterEveryoneRevisionsCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: diffusion language models, block diffusion, discrete diffusion

TL;DR: BlockGen trains a single denoiser over multiple block sizes, with uniform diffusion and introduces AR-guided diffusion sampling to improve quality while maintaining throughput.

Abstract: Block Diffusion Models (BDMs) accelerate discrete diffusion by generating token blocks in parallel while supporting KV caching. However, existing BDMs are typically trained with a single, \emph{fixed} block size, limiting the trade-offs at inference time. Moreover, most BDMs use masked diffusion, where tokens cannot be revised once generated, limiting quality in parallel decoding scenarios. We introduce \emph{BlockGen}, a general framework for blockwise sequence modeling that trains a single model over a \emph{set} of block sizes and is compatible with arbitrary block conditionals. In this work, we instantiate BlockGen with \emph{uniform-state} discrete diffusion within each block. BlockGen achieves improved likelihood compared to fixed block-size training and higher sample quality with fewer denoising steps. Training on multiple block sizes enables hybrid samplers that combine autoregressive and diffusion predictions, substantially improving over pure block-by-block generation while preserving KV caching.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 24

Loading