Keywords: Diffusion Moldes; Conditional Diffusion Models; Classifier Guidance; Manifold Hypothesis.
Abstract: Classifier guidance diffusion models have advanced conditional image generation by training a **time-dependent** classifier on noisy data from every diffusion timestep to guide denoising process. We revisit this paradigm and show that such dense guidance is unnecessary: a small set of **time-independent** classifiers, trained on data from selected timesteps, suffices to produce high-quality, class-consistent samples. Theoretically, we first analyze the feasibility of using a single time-independent classifier trained on clean data to guide generation under certain conditions which are unrealistic in practice. To address the limitations of real-world image data, we then extend this approach to a small set of classifiers trained on noisy data from some timesteps and derive a convergence bound that depends on the number of classifiers employed. Experiments on both synthetic and real-world datasets demonstrate that guiding an unconditional diffusion model with only a few time-independent classifiers achieves performance comparable to models guided by a fully time-dependent classifier.
Primary Area: generative models
Submission Number: 17706
Loading