Abstract: Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem and address two key scenarios: (1) fine-tuning pre-trained models via an inference-only lower-level solver paired with a sample-efficient gradient estimator for the upper level, and (2) training diffusion model from scratch with noise schedule optimization by reparameterizing the lower-level problem and designing a computationally tractable gradient estimator. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes, offering theoretical grounding and computational practicality. Experiments demonstrate that our method outperforms existing fine-tuning and hyperparameter search baselines.
Lay Summary: Diffusion models create images by gradually removing noise until a realistic sample appears. They already work well, but can still squeeze out extra performance to generate sharper images or have faster convergence by tweaking their hyperparameters. That tuning job is naturally a bilevel optimization problem: an inner loop solves the generative task, while an outer loop adjusts the hyperparameter to make the final output as good as possible. Classic bilevel methods stumble here because diffusion model lives in an infinite-dimensional probability space and every evaluation step needs lots of expensive sampling.
This paper reformulates the hyperparameter search problem for diffusion model as a generative bilevel optimization task and introduces an efficient first order framework to solve it. We tackle two practical cases:
1. **Fine tuning a pre trained diffusion model with entropy regularization.** We treat inference (the denoising steps) as the inner problem and craft a sample efficient gradient estimator so the outer loop can update entropy strength efficiently.
2. **Training diffusion model from scratch while also learning the best noise schedule.** By re parameterizing the inner training dynamics and using the zeroth-order estimation, we can update the hyperparameter at the outer loop without Monte Carlo overload.
The resulting bilevel method has theory to back it up, fits neatly into standard diffusion training and fine-tuning pipelines, and, in experiments, beats popular grid/random/Bayesian search baselines on both fine tuning and full training tasks.
Primary Area: Deep Learning->Algorithms
Keywords: Bilevel Optimization; diffusion model; hyperparameter optimization; fine-tuning; noise scheduling
Submission Number: 8094
Loading