ARMOR: Conceptual Augmentation for Robust Multi-Concept Erasure in Stable Diffusion via Model Retrieval

05 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Concept Erasure, Stable Diffusion, Model Retrieval
Abstract: Stable Diffusion enables high-quality synthesis but raises risks around copyright, misinformation, and explicit content. Concept erasure helps mitigate these risks by fine-tuning model weights, yet existing methods face two key challenges: (1) **Robustness**: erased concepts can be reconstructed via synonymous representations or adversarial attacks, and (2) **Multi-concept erasure**: training a single model to erase multiple concepts often strongly perturbs the weights, leading to degraded general-purpose generation. To address these challenges, we introduce **ARMOR**, a novel framework that integrates **conceptual augmentation** with a **model retrieval** approach for robust multi-concept erasure. Our method introduces two key innovations: First, we propose a conceptual augmentation technique that distils visual concepts into text modality for more effective and robust fine-tuning. Second, for each concept, we fine-tune the cross-attention key/value projection layers to obtain a dedicated eraser, and employ a retrieval mechanism that dynamically selects the appropriate erasers at inference, achieving a superior removal–generation trade-off. Extensive experimental results demonstrate that ARMOR outperforms prior work on challenging multi-concept erase tasks, resists red-team attacks, and achieves the best CLIPScore gaps, with **at least 10\% gains over the second best** across four tasks.
Primary Area: generative models
Submission Number: 2391
Loading