TL;DR: We propose a highly efficient machine unlearning method for foundation models that requires only one-time gradient calculation and partial model modification (one layer).
Abstract: Machine unlearning methods aim to remove sensitive or unwanted content from trained models, but typically demand extensive model updates at significant computational cost while potentially degrading model performance on both related and unrelated tasks. We propose Single Layer Unlearning Gradient (SLUG) as an efficient method to unlearn targeted information by updating a single critical layer using a one-time gradient computation. SLUG uses layer importance and gradient alignment metrics to identify the optimal layer for targeted information removal while preserving the model utility. We demonstrate the effectiveness of SLUG for CLIP, Stable Diffusion, and vision-language models (VLMs) in removing concrete (e.g., identities and objects) and abstract concepts (e.g., artistic styles). On the UnlearnCanvas benchmark, SLUG achieves comparable unlearning performance to existing methods while requiring significantly less computational resources. Our proposed approach offers a practical solution for targeted unlearning that is computationally efficient and precise. Our code is available at https://github.com/CSIPlab/SLUG
Lay Summary: Modern generative models can create misinformation through celebrity impersonation, unauthorized copying of artwork, or misuse of styles. To tackle this, researchers use machine unlearning—removing certain knowledge while keeping the model's original power. However, existing methods typically need many updates and repeated calculations, which are computationally costly.
We propose Single Layer Unlearning Gradient (SLUG), a highly efficient alternative. Instead of retraining the whole model, SLUG finds one key layer of model to update using a smart metric that identifies where the targeted information is stored. With just a single gradient calculation and a one-step update, SLUG removes the unwanted information while preserving the model’s ability to perform other tasks.
We show that SLUG works well on popular models like CLIP and Stable Diffusion, effectively forgetting specific identities, objects, or styles with far less effort than traditional methods. Our code is publicly available at https://github.com/CSIPlab/SLUG
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/CSIPlab/SLUG
Primary Area: Deep Learning->Robustness
Keywords: Machine unlearning, foundation models, CLIP, vision-language model (VLM), Stable Diffusion, privacy protection, copyright protection, trustworthy and safe machine learning
Submission Number: 7262
Loading