A Multi-domain Benchmark for Machine Unlearning in Classification Tasks

A Multi-domain Benchmark for Machine Unlearning in Classification Tasks

ICLR 2026 Conference Submission17181 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Machine Learning, Benchmarking

TL;DR: Largest Machine Unlearning (MU) benchmark for classification to date; introduction of a new unified MU metric.

Abstract: Machine unlearning (MU), the process of removing specific data influences from trained machine learning models, is critical for regulatory compliance (e.g., GDPR’s right to be forgotten) and for addressing copyright and privacy concerns in large-scale models. While a wide range of methods and metrics have been proposed, systematic evaluations remain fragmented, typically limited in scope by modality, metric coverage, or the number of methods considered. In this work, we present the most comprehensive MU benchmark to date, evaluating 12 unlearning methods on 8 datasets and models across four modalities (images, text, tabular data, and graphs) by assessing the three key aspects of an unlearning outcome: utility -- the overall performance of the model after unlearning -- efficacy -- how well the data is forgotten -- and efficiency -- the computational cost of unlearning. We also introduce LUMA (Laplacian Unlearning Multidimensional Assessment), a unified metric that consolidates them into a single score. Unlike prior metrics, LUMA can flexibly incorporate multiple measures within each dimension (e.g., F1 over test and forget set for utility, UMIA for efficacy, runtime and GPU memory for efficiency), enabling more accurate and extensible comparisons. Our code is reproducible and extensible to serve as a benchmark for MU research.

Primary Area: datasets and benchmarks

Submission Number: 17181

Loading