Concept Denoising Score Matching for Responsible Text-to-Image Generation

Silpa Vadakkeeveetil Sreelatha; Sauradip Nag; Serge Belongie; Muhammad Awais; Anjan Dutta

Concept Denoising Score Matching for Responsible Text-to-Image Generation

Silpa Vadakkeeveetil Sreelatha, Sauradip Nag, Serge Belongie, Muhammad Awais, Anjan Dutta

23 Sept 2024 (modified: 21 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion models, Stable Diffusion, Responsible text-to-image generation, Fairness, Safe generation, Debiasing

TL;DR: We introduce Concept Denoising Score Matching (CoDSMa), a novel score-matching objective designed to learn responsible concept representations in the $h$-space to enable responsible T2I generation in diffusion models.

Abstract: Diffusion models excel at generating diverse, high-quality images, but they also risk producing unfair and harmful content. Existing methods that update text embeddings or model weights either fail to address biases within diffusion models or are computationally expensive. We tackle responsible (fair and safe) text-to-image (T2I) generation in diffusion models as an interpretable concept discovery problem, introducing Concept Denoising Score Matching (CoDSMa) -- a novel objective that learns responsible concept representations in the bottleneck feature activation (\textit{h-space}). Our approach builds on the observation that, at any timestep, aligning the neutral prompt with the target prompt directs the predicted score of denoised latent towards the target concept. We empirically demonstrate that our method enables responsible T2I generation by addressing two key challenges: mitigating gender and racial biases (fairness) and eliminating harmful content (safety). Our approach reduces biased and harmful generation by nearly 50\% compared to state-of-the-art methods. Remarkably, it outperforms other techniques in debiasing gender and racial attributes without requiring profession-specific data. Furthermore, it successfully filters inappropriate content, such as depictions of illegal activities or harassment, without training on such data. Additionally, our method effectively handles intersectional biases without any further training.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3082

Loading