Abstract: Removing unwanted concepts from large-scale text-to-image (T2I) diffusion models while maintaining their overall generative quality remains an open challenge. This difficulty is especially pronounced in emerging paradigms, such as Stable Diffusion (SD) v3 and Flux, which incorporate flow matching and transformer-based architectures. These advancements limit the transferability of existing concept-erasure techniques that were originally designed for the previous T2I paradigm (e.g., SD v1.4). In this work, we introduce EraseAnything, the first method specifically developed to address concept erasure within the latest flow-based T2I framework. We formulate concept erasure as a bi-level optimization problem, employing LoRA-based parameter tuning and an attention map regularizer to selectively suppress undesirable activations. Furthermore, we propose a self-contrastive learning strategy to ensure that removing unwanted concepts does not inadvertently harm performance on unrelated ones. Experimental results demonstrate that EraseAnything successfully fills the research gap left by earlier methods in this new T2I paradigm, achieving state-of-the-art performance across a wide range of concept erasure tasks.
Lay Summary: This paper introduces a new method called EraseAnything. It's the first tool specifically designed to tackle this problem in these latest text-to-image models. EraseAnything works by carefully adjusting the model's internal settings using LoRA, and by guiding the model's "attention" to suppress the unwanted concept. It also has a clever strategy that makes sure that when it removes an unwanted concept, it doesn't accidentally make the model worse at generating other, unrelated things.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/tomguluson92/eraseanything
Primary Area: Social Aspects->Safety
Keywords: Concept Erasing, Text2Image, Safety, Unlearning
Submission Number: 297
Loading