Targeted Unlearning with Single Layer Unlearning Gradient

Published: 12 Oct 2024, Last Modified: 14 Nov 2024SafeGenAi PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative AI, machine unlearning, efficient unlearning
TL;DR: We propose a highly efficient machine unlearning method for generative model that requires only one-time gradient calculation.
Abstract: The unauthorized generation of privacy-related and copyright-infringing content using generative-AI is becoming a significant concern for society, raising ethical, legal, and privacy issues that demand urgent attention. Recently, machine unlearning techniques have arisen that attempt to eliminate the influence of sensitive content used during model training, but they often require extensive updates in the model, reduce the utility of the models for unrelated content, and/or incur substantial computational costs. In this work, we propose a novel and efficient method called Single Layer Unlearning Gradient (SLUG), that can unlearn targeted information by updating a single targeted layer of a model using a one-time gradient computation. We introduce two metrics: layer importance and gradient alignment, to identify the appropriate layers for unlearning targeted information. Our method is highly modular and enables selective removal of multiple concepts from the generated outputs of widely used foundation models (e.g., CLIP), generative models (e.g., Stable Diffusion) and Vision-Language models. Our method shows effectiveness on a broad spectrum of concepts ranging from concrete (e.g., celebrity name, intellectual property figure, and object) to abstract (e.g., novel concept and artistic style). Our code is available at https://github.com/CSIPlab/SLUG.
Submission Number: 76
Loading