Keywords: diffusion language model, dynamic thresholding, efficient decoding, confidence calibration, large language models
TL;DR: We introduce One-Shot Dynamic Thresholding (OSDT), which calibrates confidence thresholds on a single sequence and achieves up to 50% faster diffusion language model decoding with comparable accuracy.
Abstract: Masked Diffusion Language Models (MDLM) are becoming competitive with their autoregressive counterparts but commonly decode with fixed steps and sequential unmasking. To accelerate decoding, recent works like Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block/step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs indicated by cosine similarity. Inspired by these two observations, we introduce \textbf{One-Shot Dynamic Thresholding (OSDT)}, which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy–throughput trade-offs (\textbf{+24\%} tokens/s on GSM8K at the \textbf{best} accuracy, \textbf{+45\%} on GPQA with comparable accuracy, and \textbf{+50\%} on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.
Submission Number: 79
Loading