Targeted Unlearning via Single Layer Unlearning Gradient

Zikui Cai; Yaoteng Tan; M. Salman Asif

Targeted Unlearning via Single Layer Unlearning Gradient

Zikui Cai, Yaoteng Tan, M. Salman Asif

18 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine unlearning, multi-modality, CLIP, vision-language model (VLM), stable diffusion, privacy protection, copyright protection, trustworthy and safe machine learning

TL;DR: We propose an efficient unlearning method for targeted information removal from multi-modal foundation models using single layer update with one-time gradient calculation.

Abstract: The unauthorized generation of privacy-related and copyright-infringing content using generative-AI is becoming a significant concern for society, raising ethical, legal, and privacy issues that demand urgent attention. Recently, machine unlearning techniques have arisen that attempt to eliminate the influence of sensitive content used during model training, but they often require extensive updates in the model, reduce the utility of the models for unrelated content, and/or incur substantial computational costs. In this work, we propose a novel and efficient method called Single Layer Unlearning Gradient (SLUG), that can unlearn targeted information by updating a single targeted layer of a model using a one-time gradient computation. We introduce two metrics: layer importance and gradient alignment, to identify the appropriate layers for unlearning targeted information. Our method is highly modular and enables selective removal of multiple concepts from the generated outputs of widely used foundation models (e.g., CLIP), generative models (e.g., Stable Diffusion) and Vision-Language models. Our method shows effectiveness on a broad spectrum of concepts ranging from concrete (e.g., celebrity name, intellectual property figure, and object) to abstract (e.g., novel concept and artistic style). Our method also exhibits state-of-the-art efficiency with effective unlearning and retention on the comprehensive benchmark UnlearnCanvas. Our code is available at https://anonymous.4open.science/r/SLUG-6CDF

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1663

Loading