Keywords: T2I Generative models, Fairness, Safe generattion, Stable Diffusion
TL;DR: We introduce Concept Denoising Score Matching (CoDSMa), a novel score-matching objective designed to learn responsible concept representations in the h-space to enable responsible T2I generation in diffusion models.
Abstract: Diffusion models excel at generating diverse, high-quality images, but they also risk producing unfair and harmful content. Existing methods that update text embeddings or model weights either fail to address biases within diffusion models or are computationally expensive. We tackle responsible (fair and safe) text-to-image (T2I) generation in diffusion models as an interpretable concept discovery problem, introducing Concept Denoising Score Matching (CoDSMa) -- a novel objective that learns responsible concept representations in the bottleneck feature activation (\textit{h-space}). Our approach builds on the observation that, at any timestep, aligning the neutral prompt with the target prompt directs the predicted score of denoised latent towards the target concept. We empirically demonstrate that our method enables responsible T2I generation by addressing two key challenges: mitigating gender and racial biases (fairness) and eliminating harmful content (safety). Our approach reduces biased and harmful generation by nearly 50% compared to state-of-the-art methods. Remarkably, it outperforms other techniques in debiasing gender and racial attributes without requiring profession-specific data. Furthermore, it successfully filters inappropriate content, such as depictions of illegal activities or harassment, without training on such data. Additionally, our method effectively handles intersectional biases without any further training.
Submission Number: 107
Loading