Variance Reduction for Expectations with Diffusion Teachers

Published: 30 May 2026, Last Modified: 01 Jun 2026SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: variance reduction, Monte Carlo, gradient estimation, scalable inference, frozen teacher, diffusion models, score distillation, importance sampling, stratified sampling, probabilistic inference, generative modeling
TL;DR: We reduce the cost of diffusion-guided tasks by identifying and eliminating dominant sources of gradient variance through principled sampling and compute reuse.
Abstract: Pretrained diffusion models increasingly serve as frozen teachers feeding downstream pipelines such as text-to-3D, single-step distillation, and data attribution. The teacher gradients these pipelines consume are Monte Carlo expectations over noise levels and Gaussian noise; their estimator variance dominates compute cost because each draw requires expensive upstream work, such as rendering, simulation, or encoding. We introduce CARV, a compute-aware variance-accounting framework that motivates a hierarchical Monte Carlo estimator: amortize the expensive upstream computation over cheap diffusion-noise resamples, sharpened by timestep importance sampling and a stratified inverse-CDF construction. Across diffusion-guided workloads, we obtain 2–3× effective compute multipliers, most from amortized reuse and approximately 25% additional gain from importance sampling plus stratification, without changing the objective. We also map regimes where these gains translate into improved downstream metrics versus regimes where they do not, such as DMD.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 101
Loading