Side Effects of Erasing Concepts from Diffusion Models

Published: 24 Sept 2025, Last Modified: 07 Nov 2025NeurIPS 2025 Workshop GenProCCEveryoneRevisionsBibTeXCC BY 4.0
Track: Regular paper
Keywords: Side Effect Evaluation, Concept Erasure, Responsible Content Creation
Abstract: Text-to-image (T2I) generative models have opened new possibilities for creative content generation in fields such as art, science, and entertainment. Despite remarkable progress, concerns about privacy, copyright, and safety have led to the development of Concept Erasure Techniques (CETs). The goal of an effective CET is to prohibit the generation of undesired "target" concepts specified by the user, while preserving the ability to synthesize high-quality images of the remaining concepts. In this work, we demonstrate that CETs can be easily circumvented and present several side effects of concept erasure. To rigorously evaluate robustness and support safer use of T2I models in creative workflows, we present Side Effect Evaluation (SEE), a benchmark of hierarchical and compositional prompts describing objects and their attributes. This dataset and our automated evaluation pipeline quantify side effects of CETs across three aspects: impact on neighboring concepts, evasion of targets, and attribute leakage. Our experiments reveal that CETs can be circumvented by using superclass-subclass hierarchy and semantically similar prompts, such as compositional variants of the target. Additionally, we show that CETs suffer from attribute leakage and counterintuitive phenomena of attention concentration or dispersal. We release our dataset, code, and evaluation tools to advance robust erasure methods that safeguard creators’ rights without trading off image quality.
Submission Number: 28
Loading