everyone
since 11 Jun 2025">EveryoneRevisionsBibTeXCC BY 4.0
Machine unlearning aims to eliminate the impact of specific data on a trained model. Although metrics like unlearning accuracy (UA) and membership inference attack (MIA) are commonly used to evaluate forgetting quality, they fall short in capturing the reliability of forgetting. In this work, we observe that even when data are misclassified according to UA and MIA, their ground truth labels can still remain within the predictive set from an uncertainty quantification perspective, revealing a fake unlearning issue. To better assess forgetting quality, we propose two novel metrics inspired by conformal prediction that offer a more faithful evaluation of forgetting reliability. Building upon these insights, we further introduce a conformal prediction-guided unlearning framework that integrates the Carlini & Wagner adversarial loss. This framework effectively encourages the exclusion of ground truth labels from the conformal prediction set. Extensive experiments on image classification tasks demonstrate the effectiveness of our proposed metrics. By incorporating a tailored loss term, our unlearning framework improves the UA of existing unlearning methods by an average of 6.6%.