When Guidance Breaks: A Schrödinger Bridge Perspective on Inference-Time Alignment in Diffusion Models
Keywords: diffusion models, inference-time guidance, Schrödinger bridge, mode collapse, adaptive guidance, controlled generation, score-based generative modeling
TL;DR: We explain why strong inference-time guidance causes mode collapse via Schrödinger bridge theory and introduce a training-free adaptive scheme that stabilizes sampling while preserving diversity.
Abstract: Inference-time guidance aligns diffusion models with downstream constraints without retraining, yet excessive guidance induces mode collapse, reduced diversity, and instability. We provide a theoretical account through Schrödinger bridge (SB) theory. Viewing diffusion sampling as entropy-regularized optimal transport, we show that guidance corresponds to exponential tilting of the terminal marginal. As the guidance scale increases, the associated optimal control energy grows rapidly, leading to ill-conditioned bridge dynamics under finite diffusion noise and discrete solvers. Motivated by the SB dual formulation, we propose a training-free adaptive guidance scheme that normalizes guidance by local gradient magnitude, stabilizing inference. Experiments on 2D mixtures and CIFAR-10 demonstrate that adaptive guidance preserves diversity (LPIPS $0.56$ vs.\ $0.28$ for fixed high guidance) while maintaining strong alignment. Results validate both the theoretical mechanism and practical benefit.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading