Abstract: Text-to-image diffusion models have transformed visual content generation, yet their safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, mental health, or personal beliefs. To address this limitation, we propose Personalized Safety Alignment (PSA), a framework for user-specific control over generative safety behavior. We also introduce Sage, a large-scale dataset capturing diverse user-specific safety boundaries to support this task. The PSA framework integrates user profiles via a lightweight cross-attention mechanism, efficiently steering generation to align with individual preferences. Experiments demonstrate that PSA substantially outperforms static approaches in user-specific alignment. Crucially, PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes constraints to enhance visual quality, while under restrictive profiles, it intensifies suppression to maintain safety compliance. By moving beyond rigid, one-size-fits-all solutions, this work establishes personalized safety alignment as a promising new direction toward generative systems that are safer, more adaptive, and genuinely user-centered.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ning_Yu2
Submission Number: 6439
Loading