DPad: Efficient Diffusion Language Models with Suffix Dropout

DPad: Efficient Diffusion Language Models with Suffix Dropout

ICLR 2026 Conference Submission20859 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion-based Large Language Models, Model Optimization and Efficiency, Token Pruning, Model Explainability

TL;DR: DPad is a training-free inference method that optimize diffusion-based large language models (dLLMs) by refining their inherent Scratchpad Mechanism; it dropouts redundant suffix tokens to yield significant speedups while maintaining model accuracy.

Abstract: Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose $\textbf{Diffusion Scratchpad} (\textbf{\textit{DPad}})$, a training-free method that restricts attention to a structured subset of suffix tokens, preserving fidelity while eliminating redundancy. $\textit{DPad}$ integrates two strategies: (i) a $\textit{sliding window}$, which maintains a fixed-length suffix window, and (ii) $\textit{distance-decay dropout}$, which deterministically removes distant suffix tokens before attention computation. This concise design is compatible with existing optimizations such as parallel decoding and prefix caching, and lends itself to a lightweight implementation. Comprehensive evaluations across multiple benchmarks on $\texttt{LLaDA}$ and $\texttt{Dream}$ models demonstrate that $\textit{DPad}$ delivers up to $\mathbf{61.4\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 20859

Loading