Keywords: machine unlearning, second-order unlearning
Abstract: Machine unlearning enables AI practitioners to comply with data owners' ``Right to be Forgotten'' and post-hoc filter sensitive, noisy, or malicious data from trained models. As a theoretically justified algorithm, Newton unlearning is used in previous works to rigorously unlearn selected models, eliminating the need for expensive retraining. However, we found that Newton unlearning is highly sensitive to the Hessian degeneracy phenomenon in trained neural networks, including large language models (LLMs), leading to unlearning performance degradation. To address this challenge, we propose two new unlearning algorithms, CuReNU and CuReNUS, that tackle the Hessian degeneracy in principle based on cubic regularization and discuss their convergence guarantees. As a stochastic variant of CuReNU, CuReNUS offers an efficient second-order unlearning algorithm that is applicable even to the scale of LLMs. We demonstrated that CuReNUS can achieve comparable unlearning performance to state-of-the-art empirical algorithms across diverse settings, including batch and challenging sequential unlearning.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 16110
Loading