Keywords: Text-to-Image Diffusion-based Foundation Models, Erasing Concepts, Generative model
TL;DR: A novel method for removing entire sub categories from text-to-image diffusion models
Abstract: The emergence of large-scale text-to-image diffusion (T2ID) models has led to significant advancements in generating high-quality visual content from textual prompts. However, these powerful capabilities have also raised growing concerns about the generation of harmful and copyrighted material. While existing concept erasure techniques can effectively block the production of specific unwanted concepts from prompts, they often fall short when it comes to erasing an entire category (including subcategories) and are typically limited to handling only a few concepts at a time. In this paper, we introduce Subcategorical Unlearning via Regularized Erasure (SURE), a novel method for removing entire subcategories from text-to-image diffusion models using only a single parent category as the target. Unlike prior approaches, SURE does not rely on sets of synonyms. Instead, it employs concept space to discover and eliminate the target category while preserving the model's overall utility. To further enhance erasure, SURE incorporates Lipschitz regularization, which encourages smoother model responses to perturbations around the target category. Specifically, the regularization promotes consistent behavior in the model’s latent space when exposed to slight variations of the category to be forgotten. This smoothness constraint aids in erasure while maintaining the model’s ability to generate unrelated content. Extensive experiments conducted across three tasks—object removal, suppression of explicit content, and elimination of artistic styles demonstrate that SURE achieves balanced performance in both effective category erasure and preservation of non-target concepts.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6156
Loading