Keywords: unlearning, diffusion, concept erasure, safety
TL;DR: We introduce CARE (Co-occurring Associated Retained concepts) and propose ReCARE, a framework that preserves CARE during diffusion model unlearning, achieving robust erasure without sacrificing benign co-occurring concepts.
Abstract: Unlearning has emerged as a key technique to mitigate harmful content generation in diffusion models. However, existing methods often remove not only the target concept, but also benign co-occurring concepts. Unlearning nudity can unintentionally suppress the concept of person, preventing a model from generating images with person. We define these undesirably suppressed co-occurring concepts that must be preserved $\textbf{CARE}$ ($\textbf{C}$o-occurring $\textbf{A}$ssociated $\textbf{RE}$tained concepts). Then, we introduce the $\textbf{CARE score}$, a general metric that directly quantifies their preservation across unlearning tasks. With this foundation, we propose $\textbf{ReCARE}$ ($\textbf{R}$obust $\textbf{e}$rasure for $\textbf{CARE}$), a framework that explicitly safeguards CARE while erasing only the target concept. ReCARE automatically constructs the CARE-set, a curated vocabulary of benign co-occurring tokens extracted from target images, and leverages this vocabulary during training for stable unlearning. Extensive experiments across various target concepts ($\textit{Nudity}$, $\textit{Van Gogh}$ style, and $\textit{Tench}$ object) demonstrate that ReCARE achieves overall state-of-the-art performance in balancing robust concept erasure, overall utility, and CARE preservation.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 16776
Loading