TL;DR: Albeit their empirical success, the practical implementation of guidance diverges significantly from its theoretical motivation. Our work reconciles this discrepancy by building the correct guidance theory for conditional diffusion models.
Abstract: Guidance techniques are simple yet effective for improving conditional generation in diffusion models. Albeit their empirical success, the practical implementation of guidance diverges significantly from its theoretical motivation. In this paper, we reconcile this discrepancy by replacing the scaled marginal distribution target, which we prove theoretically invalid, with a valid scaled joint distribution objective. Additionally, we show that the established guidance implementations are approximations to the intractable optimal solution under no future foresight constraint. Building on these theoretical insights, we propose rectified gradient guidance (REG), a versatile enhancement designed to boost the performance of existing guidance methods. Experiments on 1D and 2D demonstrate that REG provides a better approximation to the optimal solution than prior guidance techniques, validating the proposed theoretical framework. Extensive experiments on class-conditional ImageNet and text-to-image generation tasks show that incorporating REG consistently improves FID and Inception/CLIP scores across various settings compared to its absence.
Lay Summary: Diffusion models can create realistic images from random noise. To guide these models toward specific goals—like generating images of a certain class—techniques called “guidance” are used. However, current guidance methods don’t fully align with their theoretical foundations. Our work resolves this mismatch by proposing a new, more accurate theory and a method called Rectified Gradient Guidance (REG). REG improves the quality of generated images across multiple tasks while remaining compatible with existing systems, helping make diffusion models more reliable and effective.
Link To Code: https://github.com/zhengqigao/REG
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion models, classifier-free guidance, conditional generation
Submission Number: 1655
Loading