Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models without Side-effect

Aakash Kumar Singh; Priyam Dey; Sribhav Srivatsa; Venkatesh Babu Radhakrishnan

Concept Siever : Towards Controllable Erasure of Concepts from Diffusion Models without Side-effect

Aakash Kumar Singh, Priyam Dey, Sribhav Srivatsa, Venkatesh Babu Radhakrishnan

Published: 17 Oct 2025, Last Modified: 17 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Diffusion models' unprecedented success with image generation can largely be attributed to their large-scale pretraining on massive datasets. Yet, the necessity of forgetting specific concepts for regulatory or copyright compliance poses a critical challenge. Existing approaches in concept forgetting, although reasonably successful in forgetting a given concept, frequently fail to preserve generation quality or demand extensive domain expertise for preservation. To alleviate such issues, we introduce Concept Siever, an end-to-end framework for targeted concept removal within pre-trained text-to-image diffusion models. The foundation of Concept Siever rests on \textit{two key innovations}: First, an automatic technique to create paired dataset of target concept and its negations by utilizing the diffusion model’s latent space. A key property of these pairs is that they differ only in the target concept, enabling forgetting with \textit{minimal side effects} and \textit{without requiring domain expertise}. Second, we present Concept Sieve, a localization method for identifying and isolating the model components most responsible to the target concept. By retraining only these localized components on our paired dataset for a target concept, Concept Siever accurately removes the concept with \textit{negligible side-effects, preserving neighboring and unrelated concepts}. Moreover, given the subjective nature of forgetting a concept like nudity, we propose Concept Sieve which provides a \texit{fine-grained control over the forgetting strength at inference time}, catering to diverse deployment needs without any need of finetuning. We report state-of-the-art performance on the I2P benchmark, surpassing previous domain-agnostic methods by over $33\%$ while showing superior structure preservation. We validate our results through extensive quantitative and qualitative evaluation along with a user study.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We sincerely thank the reviewers and the Action Editor for their valuable feedback and constructive suggestions, which have helped us improve the quality of our work. We are pleased to see the positive decision on our submission. As promised during the rebuttal process, we've carefully incorporated the promised revisions in the final camera-ready manuscript, with a summary provided below for reference: **Changes in the main paper:** - **Content Disclaimer:** Added a warning regarding the presence of NSFW content on page 1. - **Clarification on the usage of the word "automated":** Added a footnote on page 6 to clarify the use of the term "automated" in the context of our data curation pipeline. - **Notation clarity:** We have created a separate equation (now Eq. 6) to formalize the model linearization operation. We have extended the corresponding footnote to explicitly state that $\nabla_{\theta}$ is a forward Jacobian operation (page 7). - **Baseline reporting:** Added a footnote to Table 3 (page 10) to explain why we could not learn a reasonable model for the celebrity identity task using the AdvUnlearn baseline. - **Re-organization:** The "Method Insights and Analysis" section has been restructured. The "Concept Localization" subsection has been moved to Appendix-F for better readability (page 11). - **New Sections:** Added a "Broader Impact Statement" and "Acknowledgments" on page 12. --- **Changes in the supplementary:** - **New Sections:** Added new sections for "Limitations and Future Work" (Appendix A) and a discussion on "Computational complexity" (Appendix B). - **Expanded discussions:** - Significantly extended the discussion on data curation (Appendix D) to include an analysis of the perturbation location and a detailed quantitative validation of the curation method. - Added a discussion on the model linearization hypothesis and its applicability in this work in Appendix G. - Added the formal algorithm for Structure LPIPS in Appendix E. --- All the aforementioned revisions have been incorporated into the camera-ready manuscript. We kindly invite the reviewers and the Action Editor to review the updated version and would be glad to make any further refinements if needed.

Supplementary Material: pdf

Assigned Action Editor: ~Zhangyang_Wang1

Submission Number: 5606

Loading