Keywords: diffusion language models, early termination, adaptive inference, training metadata, parameter importance through AdamW trajectory, LoRA, reasoning benchmarks
TL;DR: EDIT uses training-time metadata to enable early termination during inference in diffusion language models, reducing cost while maintaining or improving accuracy.
Abstract: Diffusion-based large language models (dLLMs) generate tokens through iterative denoising, but answers often stabilize before all denoising steps are completed.
We introduce EDIT (Early Diffusion Inference Termination), an inference-time method that adaptively stops the denoising process once reasoning stability relative to training behavior is detected.
EDIT is built on training-gradient dynamics, typically otherwise discarded after training, where, during fine-tuning, AdamW-aggregated LoRA updates encode parameter importance signals.
We retain this information as compact reasoning maps.
During inference, EDIT measures alignment between token activations and these maps, detecting convergence when KL divergence across consecutive steps on unmasked (visible) tokens falls below a threshold.
On reasoning benchmarks, EDIT reduces diffusion steps by 11.8–68.3\% while preserving or improving accuracy in most cases, with negligible storage overhead ($\sim$0.02\%, about 1.5–2 MB for all QKV modules in a 32-block, 8 GByte model).
These results establish a principled mechanism for transforming knowledge about training-gradient dynamics into practical test-time benefits such as reducing reasoning time.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 3184
Loading