Keywords: Diffusion model, Inference time scaling, Image synthesis
TL;DR: We propose a method that reinterprets diffusion models as distribution prediction models to estimate sample likelihood, leveraging this as inference-time scaling guidance to improve image generation quality.
Abstract: To enhance sample quality beyond their standard outputs, diffusion models typically rely on inference-time scaling, a process that necessitates external verifiers.
We challenge this dependency by proposing a framework that reframes the generative model itself as an intrinsic distribution estimator.
Our framework provides the theoretical base and empirical evidence for this, showing that the distance between independent noise and diffusion model output serves as a proxy for a sample's distributional conformity.
This insight enables our proposed method, Self-Verifying inference-time scaling method to directly assess at intermediate denoising step and to eliminate the need for external modules.
Experiment results demonstrate that our scaling method achieves consistent improvements across diverse benchmarks in fidelity, preference, and compositionality.
Our study establishes that the process of generating diffusion models is also an evaluative process, opening new avenues toward more resource-efficient and intrinsically aware generative models.
Primary Area: generative models
Submission Number: 11147
Loading