One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
TL;DR: A text-image collaborative framework to erase concepts from diffusion models
Abstract: Concept erasing has recently emerged as an effective paradigm to prevent text-to-image diffusion models from generating visually undesirable or even harmful content. However, current removal methods heavily rely on manually crafted text prompts, making it challenging to achieve a high erasure (**efficacy**) while minimizing the impact on other benign concepts (**usability**), as illustrated in Fig.1. In this paper, we attribute the limitations to the inherent gap between the text and image modalities, which makes it hard to transfer the intricately entangled concept knowledge from text prompts to the image generation process. To address this, we propose a novel solution by directly integrating visual supervision into the erasure process, introducing the first text-image Collaborative Concept Erasing (**Co-Erasing**) framework. Specifically, Co-Erasing describes the concept jointly by text prompts and the corresponding undesirable images induced by the prompts, and then reduces the generating probability of the target concept through negative guidance. This approach effectively bypasses the knowledge gap between text and image, significantly enhancing erasure efficacy. Additionally, we design a text-guided image concept refinement strategy that directs the model to focus on visual features most relevant to the specified text concept, minimizing disruption to other benign concepts. Finally, comprehensive experiments suggest that Co-Erasing outperforms state-of-the-art erasure approaches significantly with a better trade-off between efficacy and usability.
Lay Summary: We aim to improve how stable diffusion models generators "forget" specific harmful or unwanted concepts. Current methods often struggle because they rely solely on written text prompts to describe what should be erased. This approach overlooks a key challenge: stable diffusion models interpret text and images differently, making it hard to fully remove the unwanted concept without affecting other useful ones.
To overcome this, we introduce Co-Erasing, a new framework that combines both text and images to guide the erasure process. Instead of only describing what to erase with words, we also provide the model with actual images representing the concept. This joint representation helps the model better understand what should be removed and why.
We also design a text-guided image refinement mechanism that fine-tunes the process, ensuring that only the targeted concept is affected while leaving others intact. This improves both the efficacy (removing the unwanted concept) and usability (keeping other concepts intact).
Our experiments show that Co-Erasing improves previous approaches, offering a more accurate and reliable way to align image generation models with human intent.
Link To Code: https://github.com/Ferry-Li/Co-Erasing
Primary Area: Social Aspects->Security
Keywords: Diffusion Model, Concept Erasing, Image Generation
Submission Number: 8608
Loading