Abstract: The growing concerns regarding user privacy and data security have brought attention to the task of machine unlearning (MU), which aims to remove the influence of specific data from a well-trained model effectively and efficiently. A naive unlearning method is finetuning the pretrained model to continually learn the remaining data to induce the “catastrophic forgetting” of forgetting data. However, such unlearning often turns out to be inefficient. For effective and efficient unlearning, it is crucial to stimulate catastrophic forgetting, ideally by directly localizing the model’s knowledge of specific class-wise features associated with the forgetting data. In this paper, we highlight that the targeted universal adversarial perturbation (UAP) implicitly contains class-wise information. In light of this, we propose Unlearning by UAP (U\(^{2}\)AP). By adding the perturbation to clean remaining data during the finetuning process, we shift the model’s attention away from the forgetting class directly, stimulating faster and more efficient catastrophic forgetting. Extensive experiments demonstrate that U\(^2\)AP enables quicker and more accurate forgetting while maintaining model performance on the remaining data.
External IDs:dblp:conf/pkdd/ZhouCWYH25
Loading