Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

ACL ARR 2026 January Submission6523 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM Efficiency, NLP in resource-constrained settings, text-to-text generation, inference methods
Abstract: Diffusion large language models (dLLMs) offer a promising alternative to autoregressive models, but their practical utility is severely hampered by slow, iterative sampling. We present *SchED*, a training-free, model-agnostic early-exit algorithm that terminates diffusion decoding using a progress-aware confidence threshold. We evaluate *SchED* across multiple diffusion model families and a diverse set of benchmarks spanning multiple-choice, math, long-form QA, and translation. *SchED* delivers substantial acceleration: on instruction-tuned models, it achieves approximately $4\times$ speedups while retaining baseline performance on average. On base models, *SchED* yields consistent speedup gains with $99.1$--$100\%$ performance retention, with up to $2.34\times$ under more aggressive settings. Under a conservative quality–penalized speed metric, *SchED* consistently outperforms prior confidence-based early-exit methods, including on long-form generation where existing approaches tend to break down. An entropy analysis of the model’s token predictions reveals that instruction tuning speeds up the decay of predictive entropy. By leveraging inherent confidence stabilization as a signal for computational efficiency, *SchED* provides a robust framework for efficient dLLM inference.
Paper Type: Long
Research Area: LLM Efficiency
Research Area Keywords: LLM Efficiency,NLP in resource-constrained settings
Contribution Types: Model analysis & interpretability, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: English,French,German
Submission Number: 6523
Loading