Diffusion with Truncated Blocks: Towards Fast and High-Quality Text Generation using Truncated Block Generation

Yuyan Zhou; Weiyu Chen; James Kwok

Diffusion with Truncated Blocks: Towards Fast and High-Quality Text Generation using Truncated Block Generation

Yuyan Zhou, Weiyu Chen, James Kwok

16 Sept 2025 (modified: 25 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Large Language Models

Abstract: Diffusion-based Large Language Models (dLLMs) are emerging as a powerful alternative to traditional autoregressive models. These models learn to generate text by iteratively denoising masked sequences. In this work, we identify a critical problem in dLLMs that using token-level noise: the model's attention is wastefully expended on uninformative mask tokens, diluting its focus on meaningful context. We term this phenomenon ``attention dilution". We further show that it is an artifact of token-level noising, whereas models with sentence-level noise does not have such phenomenon. To resolve this problem, we introduce Truncated Block Generation, a novel sampling algorithm that not only mitigates attention dilution but also enables faster inference and flexible-length sequence generation. Extensive experiments validate our analysis and demonstrate the marked effectiveness of our proposed method in enhancing both the performance and efficiency of dLLMs.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 7346

Loading