Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

Jucheng Shen; Yeonju Ro

Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models

Jucheng Shen, Yeonju Ro

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion language model, dynamic thresholding, efficient decoding, confidence calibration, large language models

TL;DR: We introduce One-Shot Dynamic Thresholding (OSDT), which calibrates confidence thresholds on a single sequence and achieves up to 50% faster diffusion language model decoding with comparable accuracy.

Abstract: Masked Diffusion Language Models (MDLM) are becoming competitive with their autoregressive counterparts but commonly decode with fixed steps and sequential unmasking. To accelerate decoding, recent works like Fast-dLLM enables parallel decoding via a static global confidence threshold, yet we observe strong block/step-wise confidence fluctuations and, within a dataset, near-identical confidence trajectories across inputs indicated by cosine similarity. Inspired by these two observations, we introduce \textbf{One-Shot Dynamic Thresholding (OSDT)}, which calibrates thresholds on a single sequence and applies them to subsequent inputs with negligible overhead. On GPQA, GSM8K, and HumanEval, OSDT attains superior accuracy–throughput trade-offs (\textbf{+24\%} tokens/s on GSM8K at the \textbf{best} accuracy, \textbf{+45\%} on GPQA with comparable accuracy, and \textbf{+50\%} on HumanEval with a modest accuracy gap). Beyond these results, our findings suggest broader opportunities to leverage reusable task-level confidence signatures for more general-purpose algorithmic and systems innovations in diffusion decoding.

Submission Number: 79

Loading