SafeGuide: Adaptive Inference-Time Safety Control for Diffusion Models

Published: 02 Mar 2026, Last Modified: 06 Mar 2026ICLR 2026 Trustworthy AIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe Generation, Diffusion Models, Dynamic Guidance
Abstract: Ensuring safety in text-to-image diffusion models is a prerequisite for their responsible deployment, yet current inference-time interventions rely on static guidance schedules that apply fixed correction strengths across all inputs. We demonstrate that this approach is systematically limited because it assumes semantic risk is stationary. In practice, unsafe semantics emerge in highly non-uniform, prompt-dependent, and phase-specific patterns along the diffusion trajectory. Consequently, static methods inevitably face a dilemma: they either overshoot on benign prompts, degrading image fidelity through unnecessary intervention, or undershoot on malicious prompts, failing to suppress unsafe content. To address this, we introduce SafeGuide, an adaptive control framework that learns to adjust safety guidance dynamically based on prompt semantics and the generation phase. SafeGuide parameterizes the safety intervention as a bell-shaped guidance schedule, using risk-aware reinforcement learning to optimize a policy network that decides the timing and strength of the intervention. Experiments show that SafeGuide effectively decouples safety from quality, outperforming existing inference guidance-based safety methods. This work establishes adaptive safety control as a principled alternative to static guidance, showing that diffusion models can achieve robust safety guarantees without sacrificing generation quality.
Submission Number: 134
Loading