Personalized Safety Alignment for Text-to-Image Diffusion Models

Personalized Safety Alignment for Text-to-Image Diffusion Models

TMLR Paper6439 Authors

08 Nov 2025 (modified: 13 Jan 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Text-to-image diffusion models have revolutionized visual content generation, yet their deployment is hindered by a fundamental limitation: safety mechanisms enforce rigid, uniform standards that fail to reflect diverse user preferences shaped by age, culture, or personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that transitions generative safety from static filtration to user-conditioned adaptation. We introduce Sage, a large-scale dataset capturing diverse safety boundaries across 1,000 simulated user profiles, covering complex risks often missed by traditional datasets. By integrating these profiles via a parameter-efficient cross-attention adapter, PSA dynamically modulates generation to align with individual sensitivities. Extensive experiments demonstrate that PSA achieves a calibrated safety-quality trade-off: under permissive profiles, it relaxes over-cautious constraints to enhance visual fidelity, while under restrictive profiles, it enforces state-of-the-art suppression, significantly outperforming static baselines. Furthermore, PSA exhibits superior instruction adherence compared to prompt-engineering methods, establishing personalization as a vital direction for creating adaptive, user-centered, and responsible generative AI.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Ning_Yu2

Submission Number: 6439

Loading