UnGuide: Learning to Forget with LoRA-Guided Diffusion Models

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unlearning, Diffusion Models, LoRA
TL;DR: UnGuide is an adaptive guidance-based method for machine unlearning in diffusion models that selectively removes targeted concepts while preserving image quality by balancing outputs from a base model and a LoRA adapter.
Abstract: Recent advances in large-scale text-to-image diffusion models have heightened concerns about their potential misuse, especially in generating harmful or misleading content. This underscores the urgent need for effective machine unlearning, i.e., removing specific knowledge or concepts from pretrained models without compromising overall performance. One possible approach is Low-Rank Adaptation (LoRA), which offers an efficient means to fine-tune models for targeted unlearning. However, LoRA often inadvertently alters unrelated content, leading to diminished image fidelity and realism. To address this limitation, we introduce UnGuide, a novel LoRA-guided model that controls the unlearning process. UnGuide modulates the guidance scale based on the stability of a few first steps of denoising processes. For high-variance denoising trajectories, negative guidance is applied to stabilize sampling along the data manifold, while low-variance trajectories receive positive guidance to maintain fidelity. Empirical results demonstrate that UnGuide achieves controlled concept removal and retains the expressive power of diffusion models, outperforming existing LoRA-based methods in both object erasure and explicit content removal tasks.
Primary Area: generative models
Submission Number: 5959
Loading