BlockSpec: Blockwise Speculative Decoding for Diffusion LLMs

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: speculative decoding, diffusion LLMs
TL;DR: This paper propose BlockSpec, a novel trajectory speculative decoding framework tailored for diffusion-based LLMs, which significantly reduces inference steps and accelerates parallel decoding.
Abstract: In diffusion-based Large Language Models (dLLMs), parallel decoding is usually realized through threshold-based or top-k strategies. While effective in high-confidence tokens, these strategies often collapse on low-confidence tokens, forcing the model into inefficient single-token decoding. To address this limitation, we propose Block Speculation (BlockSpec), a novel training-free blockwise speculative decoding method that explores multiple future decoding trajectories in parallel. Our method introduces a new tree-based trajectory generation strategy and a blockwise parallel verification module, where decoding tokens are organized into tree exploration paths and then multiple decoding trajectories can be simultaneously verified. Unlike traditional speculative decoding that focuses only on fixed-order left-to-right token speculation, our approach is the first attempt to introduce block-level speculation, which jointly explores both token choices and decoding trajectories for dLLMs. We also design two complementary speculation formulations—intra-block and inter-block speculation—that jointly accelerate dLLMs within and across blocks. Extensive experiments show that the proposed BlockSpec model reduces iteration steps by up to 40\%, accelerating over 80\% of decoding steps. As a result, our model achieves up to 7–14× speedup over vanilla dLLMs, together with an additional 1.3× improvement over state-of-the-art methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6926
Loading