Abstract: Diffusion Language Models decouple inference cost from sequence length, achieving linear computational complexity compared to the quadratic complexity of autoregressive models, while providing self-correction via iterative decoding.
However, few-step sampling in text diffusion suffers from a training–inference mismatch in the self-conditioning mechanism, akin to exposure bias, and from underexposure to high-noise regimes under uniform scheduling.
We propose FastDiSS, a unified framework that boosts the sampling efficiency by bridging the self-conditioning training-inference mismatch and improves the training process through an adaptive model-aware noise scaling.
Extensive results show that FastDiSS achieves higher BLEU scores on conditional generation benchmarks while achieving a four times speed-up over standard diffusion models.
FastDiSS considerably narrows the performance gap between few and many-step diffusion, demonstrating quality and inference speed improvements over current state-of-the-art methods.
Paper Type: Long
Research Area: Generation
Research Area Keywords: efficient models, few-step generation, text-to-text generation
Contribution Types: Model analysis & interpretability, Reproduction study, Approaches low compute settings-efficiency
Languages Studied: English
Keywords: diffusion language models, few step generation
Submission Number: 111
Loading