Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Concept Erasure, Text-to-Image Diffusion Model, Safe generation, Text-to-image generation
TL;DR: Our paper introduces Semantic Surgery, a novel training-free method that dynamically manipulates text embeddings before diffusion to achieve concept erasure in text-to-image models while preserving image quality and locality.
Abstract: With the growing power of text-to-image diffusion models, their potential to generate harmful or biased content has become a pressing concern, motivating the development of concept erasure techniques. Existing approaches, whether relying on retraining or not, frequently compromise the generative capabilities of the target model in achieving concept erasure. Here, we introduce **Semantic Surgery**, a novel training-free framework for zero-shot concept erasure. Semantic Surgery directly operates on text embeddings *before* the diffusion process, aiming to neutralize undesired concepts at their semantic origin with dynamism to enhance both erasure completeness and the locality of generation. Specifically, Semantic Surgery dynamically estimates the presence of target concepts in an input prompt, based on which it performs a calibrated, scaled vector subtraction to neutralize their influence at the source. The overall framework consists of a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence, thereby reinforcing erasure throughout the subsequent denoising process. Our proposed Semantic Surgery requires no model retraining and adapts dynamically to the specific concepts and their intensity detected in each input prompt, ensuring precise and context-aware interventions. Extensive experiments are conducted on object, explicit content, artistic style, and multi-celebrity erasure tasks, demonstrating that our method significantly outperforms state-of-the-art approaches. That is, our proposed concept erasure framework achieves superior completeness and robustness while preserving locality and general image quality (e.g., achieving a 93.58 H-score in object erasure, reducing explicit content to just 1 instance with a 12.2 FID, and attaining an 8.09 H_a in style erasure with no MS-COCO FID/CLIP degradation). Crucially, this robustness enables our framework to function as a built-in threat detection system by monitoring concept presence scores, offering a highly effective and practical solution for safer text-to-image generation. Our code is publicly available at: https://github.com/Lexiang-Xiong/Semantic-Surgery
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 13111
Loading