Keywords: Diffusion Models, Diffusion Guidance, Guidance
TL;DR: Token Perturbation Guidance (TPG) is a novel framework that applies perturbations directly in token space to guide the diffusion sampling process.
Abstract: Classifier-free guidance (CFG) has become an essential component of modern diffusion models to enhance both generation quality and alignment with input conditions. However, CFG requires specific training procedures and is limited to conditional generation. To address these limitations, we propose Token Perturbation Guidance (TPG), a novel method that applies perturbation matrices directly to intermediate token representations within the diffusion network. TPG employs a norm-preserving shuffling operation to provide effective and stable guidance signals that improve generation quality without architectural changes. As a result, TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation. We also analyze the guidance term provided by TPG and show that its effect on sampling more closely resembles CFG compared to existing training-free guidance techniques. We extensively evaluate TPG on SDXL and Stable Diffusion 2.1, demonstrating nearly a 2x improvement in FID for unconditional generation over the SDXL baseline and showing that TPG closely matches CFG in prompt alignment. Thus, TPG represents a general, condition-agnostic guidance method that extends CFG-like benefits to a broader class of diffusion models.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 18856
Loading