Diffusion with Truncated Blocks: Towards Fast and High-Quality Text Generation using Truncated Block Generation

16 Sept 2025 (modified: 25 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Large Language Models
Abstract: Diffusion-based Large Language Models (dLLMs) are emerging as a powerful alternative to traditional autoregressive models. These models learn to generate text by iteratively denoising masked sequences. In this work, we identify a critical problem in dLLMs that using token-level noise: the model's attention is wastefully expended on uninformative mask tokens, diluting its focus on meaningful context. We term this phenomenon ``attention dilution". We further show that it is an artifact of token-level noising, whereas models with sentence-level noise does not have such phenomenon. To resolve this problem, we introduce Truncated Block Generation, a novel sampling algorithm that not only mitigates attention dilution but also enables faster inference and flexible-length sequence generation. Extensive experiments validate our analysis and demonstrate the marked effectiveness of our proposed method in enhancing both the performance and efficiency of dLLMs.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 7346
Loading