PRESTO: Prefix-Aligned Tree Drafting for Diffusion Speculative Decoding

Published: 01 Jun 2026, Last Modified: 01 Jun 2026AdaptFM PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speculative Decoding, Diffusion Language Models;Tree Drafting
Abstract: Diffusion Large Language Models (dLLMs) have recently emerged as a promising alternative to autoregressive (AR) LLMs, offering parallel token generation. Recent works have shown that dLLMs are particularly effective as draft models in speculative decoding (SD), where they can efficiently propose multiple candidate tokens in parallel. However, existing diffusion-based drafting methods primarily rely on linear (single-path) drafting, despite diffusion models simultaneously producing multiple candidate tokens at multiple positions that can be combined into many possible paths. A natural solution is to extend tree-based drafting to diffusion models, enabling the exploration of diverse candidate paths. However, we find that applying naive tree-based drafting is suboptimal due to a fundamental mismatch between diffusion draft confidence and prefix-based AR verification: diffusion marginals are inherently prefix-blind, leading to unreliable path ranking. To this end, we propose PRESTO, a principled framework for tree-based diffusion drafting via prefix-faithful scoring and priority-based tree search. A key principle behind our framework is that candidate ranking should align with the prefix-based nature of AR verification. Guided by this, we design a prefix-faithful surrogate score to prioritize high-quality candidate paths during tree expansion for diffusion drafter. Note that PRESTO is a general tree drafting framework applicable to both individual diffusion drafter SD and self-speculative dLLMs. Extensive experiments show that PRESTO achieves up to an average of $1.5\times$ speedup on the state-of-the-art individual diffusion drafter SD and an average of $1.95\times$ on self-speculative diffusion LLMs across diverse benchmarks, thereby unlocking the full potential of tree-based speculative decoding for diffusion drafting.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 173
Loading