Keywords: machine unlearning, conformal prediction, uncertainty quantification, data privacy
Abstract: Machine unlearning seeks to remove the influence of specified data from a trained model. While metrics such as unlearning accuracy (UA) and membership inference attack (MIA) provide baselines for assessing unlearning performance, they fall short of evaluating the reliability of forgetting. In this paper, we find that the data points misclassified by UA and MIA still have their ground truth labels included in the prediction set from the uncertainty quantification perspective, which raises the issue of fake forgetting. To address this issue, we propose two novel metrics inspired by conformal prediction that provide a more reliable evaluation of forgetting quality. Building on these insights, we further propose an unlearning framework that integrates conformal prediction into the Carlini & Wagner adversarial attack loss, which can effectively push the ground truth label out of the conformal prediction set. Through extensive experiments on image classification tasks, we demonstrate both the effectiveness of our proposed metrics and the superiority of our framework. Code is available at https://anonymous.4open.science/r/MUCP-60E4.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 12799
Loading