GRU: Mitigating the Trade-off between Unlearning and Retention for LLMs

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses, crucial for their legal and safe applications. However, the pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality, leading to a notorious trade-off between unlearning and retention. It motivates this paper to explore enhanced unlearning schemes that can mitigate this trade-off. Specifically, we propose Gradient Rectified Unlearning (GRU), an improved framework that regulates the directions of gradient updates during the unlearning procedure such that their side impacts on other, unrelated responses can be minimized. GRU is easy and general to implement, demonstrating practical effectiveness across a variety of well-established unlearning benchmarks.
Lay Summary: Large language models (LLMs) are typically trained on web-scaled corpora, which carry the risk of learning private and harmful knowledge. This knowledge can be further exposed to users, raising many legal and policy-related concerns. These issues have motivated recent studies on LLM unlearning, which aim to remove such harmful knowledge from model parameterization—a pursuit with notable academic and industrial significance. However, in practice, unlearning often severely impacts the performance of unrelated responses, thereby diminishing the overall model utility. This highlights a notorious trade-off between removing harmful knowledge and retaining overall performance. To address this issue, we propose Gradient Rectified Unlearning (GRU), a method that controls model updates by eliminating gradient directions that negatively impact overall performance. Such an insight is formalized as a constrained optimization problem, which has a closed-form solution that can be implemented effectively. We conduct extensive experiments on several widely used benchmarks, demonstrating that GRU can effectively eliminate targeted knowledge while preserving the overall model capabilities. Our results show the reliability of GRU to mitigate the trade-off between unlearning and retention, serving as a promising method that warrants further study.
Primary Area: Deep Learning->Large Language Models
Keywords: Large Language Model Unlearning
Flagged For Ethics Review: true
Submission Number: 915
Loading