MIAU: Membership Inference Attack Unlearning Score for Quantifying the Forgetting Quality of Unlearning Methods

MIAU: Membership Inference Attack Unlearning Score for Quantifying the Forgetting Quality of Unlearning Methods

ICLR 2026 Conference Submission7936 Authors

16 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Privacy Evaluation, Forgetting Quality, Membership Inference Attacks

TL;DR: We propose the Membership Inference Attack Unlearning Score (MIAU), a principled metric that quantifies how closely an unlearning method approximates the behavior of a fully retrained model.

Abstract: Machine unlearning aims to adapt the model’s internal representations as if the forget set was never part of training set. In this context, a central challenge lies in accurately evaluating whether forgetting has actually occurred. Membership Inference Attacks (MIAs) are commonly used for this purpose; however, existing approaches are limited, often relying on single comparison and lacking reference points such as baseline and retrained model performance. We propose the Membership Inference Attack Unlearning Score (MIAU), a systematic metric that quantifies how closely an unlearning method mirrors the behavior of a fully retrained model. MIAU evaluates the unlearned model by comparing how easily it can separate three different pairs of data: forgotten samples versus test samples, forgotten samples versus retained samples, and retained samples versus test samples. These comparisons are then normalized between the performance of the original model and fully retrained model, providing an interpretable and balanced score of unlearning quality. The MIAU is intended to be used as an offline auditing benchmark to select the most suitable unlearning method for a given model setup and application setting, so that once this choice is made, the method can be applied in practice without performing any additional retraining. Extensive experiments demonstrate that MIAU consistently distinguishes effective unlearning methods across various image classification benchmarks and model architectures. Further statistical tests and empirical evaluations on retrained models—trained on 25%, 50%, and 75% of the forget set—highlight inherent limitations of MIAs in capturing gradual forgetting, presenting need for complementary evaluation methods in unlearning assessment.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 7936

Loading