Keywords: Diffusion Language Models, Controllable Text Generation
TL;DR: We propose TTA -Diffusion(Token Timestep Allocation), that stabilizes the control process of diffusion language models by dynamically allocating timesteps per token, addressing instability that disrupts fluency and control.
Abstract: Classifier guidance is a widely adopted technique in diffusion language models, used to steer generation toward desired attributes. However, such guidance often introduces instability during the generation process, where token-level updates fluctuate across timesteps. We identify and formally characterize this phenomenon as update-forgetting. This instability disrupts the refinement process by overwriting semantic edits, ultimately degrading fluency and coherence, which is particularly problematic in tasks like controllable text generation. To address this, we propose TTA-Diffusion, a novel inference-time approach that dynamically allocates timesteps per token based on refinement needs. Unlike conventional diffusion models that apply uniform updates, TTA-Diffusion employs structured timestep allocation, preserving stable tokens while allowing uncertain tokens to undergo progressive adjustment. Experimental results across diverse tasks demonstrate that TTA-Diffusion significantly outperforms both diffusion-based and auto-regressive baselines in fluency and control accuracy while improving computational efficiency by reducing the number of required timesteps. On the sentiment control task, TTA-Diffusion achieves over 20\% higher accuracy and nearly half the perplexity of prior diffusion models, using less than one-fifth the denoising steps. This work highlights the importance of mitigating fluctuations in token updates and promoting a balanced refinement process, thereby enhancing stability and controllability in controllable language modeling.
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 16161
Loading