SHAPE: SCHEDULE HESSIAN ADAPTIVE PARAMETER ESTIMATION FOR SMOOTHER DIFFUSION OPTIMIZATION

Ritika Lamba; Jing Ma

SHAPE: SCHEDULE HESSIAN ADAPTIVE PARAMETER ESTIMATION FOR SMOOTHER DIFFUSION OPTIMIZATION

Ritika Lamba, Jing Ma

Published: 03 Mar 2026, Last Modified: 10 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: This is the final stretch! Since today is the deadline (Feb 09 2026), you want to ensure your OpenReview meta-data is just as polished as your PDF. Below is the optimized content for each field, specifically tailored to appeal to the DeLTa Workshop reviewers (who prioritize Theory and Principles). 1. Title SHAPE: Schedule Hessian Adaptive Parameter Estimation for Smoother Diffusion Optimization 2. Keywords Diffusion Models, Noise Scheduling, Bayesian Optimization, Hessian Analysis, Optimization Geometry, Proxy-based Search

TL;DR: SHAPE uses proxy-based Bayesian optimization to discover noise schedules that act as implicit preconditioners, smoothing the Hessian landscape and improving diffusion training efficacy.

Abstract: Noise schedules control information destruction in diffusion models, yet practice relies on hand-crafted designs (Linear, Cosine) or fixed analytic forms. We introduce SHAPE, a Bayesian optimization framework discovering schedules by minimizing validation loss on 2M-parameter proxy models. On CIFAR-10, our learned schedule achieves a 57\% relative FID improvement over Linear baselines (35.50 vs 82.50) using 50M-parameter U-Nets. Through Hessian analysis, we demonstrate that these gains stem from superior geometric conditioning: SHAPE achieves a spectral anisotropy proxy of $\kappa_{\text{sap}}=3.12$ versus $\kappa_{\text{sap}}=79.44$ for Linear schedules, a 25-fold reduction. We provide two explanations for this result: (1) SNR Uniformity: optimal schedules spontaneously maintain near-linear log-SNR ($R^2=0.987$), rediscovering prior information-theoretic principles through pure optimization; (2) Hessian Conditioning: schedules act as implicit preconditioners that smooth the loss landscape. While absolute performance remains below state-of-the-art methods employing 5--15$\times$ larger models, our work provides evidence that noise schedule design is fundamentally a problem of geometric conditioning rather than signal processing intuition. We validate that schedule quality rankings transfer reliably across scales (Spearman $\rho=0.90$), enabling efficient proxy-based optimization with only 15.7\% computational overhead. We conclude by discussing the adaptation of our spectral measures for indefinite Hessians and the potential for combining SHAPE with modern loss-weighting baselines.

Submission Number: 119

Loading