TL;DR: We investigate the practical challenge of LLM continual unlearning and propose the ALKN algorithm, which mitigates both accumulative decline and cascading degradation in model utility by adaptively refining gradient updates.
Abstract: With the growing deployment of large language models (LLMs) across diverse domains, concerns regarding their safety have grown substantially.
LLM unlearning has emerged as a pivotal approach to removing harmful or unlawful contents while maintaining utility.
Despite increasing interest, the challenges of continual unlearning, which is common in real-world scenarios, remain underexplored.
Successive unlearning tasks often lead to intensified utility degradation.
To effectively unlearn targeted knowledge while preserving LLM utility, it is essential to minimize changes in model parameters by selectively updating those linked to the target knowledge, thereby ensuring other knowledge remains unaffected.
Building on the task vector framework, we propose a new method named ALKN (Adaptive Localization of Knowledge Negation), which uses dynamic masking to sparsify training gradients and adaptively adjusts unlearning intensity based on inter-task relationships.
Comprehensive experiments across three well-established LLM unlearning datasets demonstrate that our approach consistently outperforms baseline methods in both unlearning effectiveness and utility retention under continual unlearning settings.
Lay Summary: Large language models (LLMs), like those powering chatbots, can sometimes store sensitive or unwanted information, raising safety and privacy concerns. This research introduces a new method called Adaptive Localization of Knowledge Negation (ALKN) to help these models "forget" specific information, such as personal data or harmful content, while keeping their overall usefulness intact. Unlike previous approaches that can harm a model’s performance when repeatedly asked to forget things, ALKN carefully targets only the relevant parts of the model to update, avoiding unnecessary changes. It also adjusts how strongly it forgets based on whether the information is already partially forgotten, preventing excessive loss of the model’s abilities. Tested on real-world scenarios, ALKN successfully removes unwanted information while preserving over 95% of the model’s performance, outperforming other methods. This makes it a promising tool for keeping LLMs safe and effective as they handle ongoing requests to forget specific data.
Primary Area: Deep Learning->Large Language Models
Keywords: Large language model, Continual learning, Unlearning, Robustness
Submission Number: 3125
Loading