BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

Published: 02 Mar 2026, Last Modified: 12 Mar 2026ReALM-GEN 2026 - ICLR 2026 WorkshopEveryoneRevisionsCC BY 4.0
Keywords: diffusion language models, block diffusion, discrete diffusion
TL;DR: BlockGen trains a single denoiser over multiple block sizes, with uniform diffusion and introduces AR-guided diffusion sampling to improve quality while maintaining throughput.
Abstract: Block Diffusion Models (BDMs) accelerate discrete diffusion by generating token blocks in parallel while supporting KV caching. However, existing BDMs are typically trained with a single, \emph{fixed} block size, limiting the trade-offs at inference time. Moreover, most BDMs use masked diffusion, where tokens cannot be revised once generated, limiting quality in parallel decoding scenarios. We introduce \emph{BlockGen}, a general framework for blockwise sequence modeling that trains a single model over a \emph{set} of block sizes and is compatible with arbitrary block conditionals. In this work, we instantiate BlockGen with \emph{uniform-state} discrete diffusion within each block. BlockGen achieves improved likelihood compared to fixed block-size training and higher sample quality with fewer denoising steps. Training on multiple block sizes enables hybrid samplers that combine autoregressive and diffusion predictions, substantially improving over pure block-by-block generation while preserving KV caching.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 15
Loading