Dormant Memories Undermine Safety: Initial Latent Variable Optimization for Attacking Unlearned Diffusion
Keywords: Unlearned Diffusion Models, Adversarial Attack, NSFW Content Generation, Latent Optimization
TL;DR: This paper proposes IVO, a latent space attack framework that bypasses internal defenses of unlearned diffusion models to generate NSFW content with semantic consistency.
Abstract: Although diffusion models (DMs) have advanced image synthesis, they pose risks of generating Not-Safe-For-Work (NSFW) content. Recent unlearning-based defenses contend that they can eliminate NSFW concepts, and show promise in defending traditional attacks. However, we analyze unlearned models from a new perspective and reveal a key insight: unlearning does not really erase unsafe concepts, but only disrupts the mapping between linguistic symbol and corresponding knowledge. The knowledge itself remains intact, preserved as **dormant memories**. We further show that the distributional discrepancy in the denoising process serves as a measurable indicator of how much of the mapping is retained, reflecting the strength of unlearning. Inspired by this, we propose **IVO** (**I**nitial Latent **V**ariable **O**ptimization), a concise yet powerful attack framework that reactivates these dormant memories by reconstructing the broken mappings. IVO uses optimized initial latent variables as triggers align the noise distribution of unlearned models with that of standard DMs while steering it toward NSFW content. It operates in three simple stages: *Image Inversion*, *Adversarial Optimization*, and *Reused Attack*. Extensive experiments across 6 widely used unlearning techniques demonstrate that IVO achieves the highest attack success rates while maintaining strong semantic consistency, indicating that dormant memories remain exploitable and exposing fundamental flaws in current defenses. The code is available at anonymous.4open.science/r/IVO/. **Warning**: This paper has unsafe images that may offend some readers.
Primary Area: generative models
Submission Number: 6494
Loading