A Universal Source-Free Class Unlearning Framework via Synthetic Embeddings

ICLR 2026 Conference Submission15940 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine unlearning, Class unlearning, Source-free unlearning
TL;DR: We propose a source-free class unlearning framework that removes target classes from neural classifiers using only intermediate embeddings—no data or input-space generation required—while preserving accuracy on retain classes.
Abstract: Class unlearning in neural classifiers refers to selectively removing the model’s ability to recognize a target (forget) class by reshaping the decision boundaries. This is essential when taxonomies change, labels are corrected, or legal or ethical requirements mandate class removal. The objective is to preserve performance on the remaining (retain) classes while avoiding costly full retraining. Existing methods generally require access to the source, i.e., forget/retain data or a relevant surrogate dataset. This dependency limits their applicability in scenarios where access to source data is restricted or unavailable. Even the recent source-free class unlearning methods rely on generating samples in the data space, which is computationally expensive and not even essential for doing class unlearning. In this work, we propose a novel source-free class unlearning framework that enables existing unlearning methods to operate using only the deployed model. We show that, under weak assumptions on the forget loss with respect to logits, class unlearning can be performed source-free for any given neural classifier by utilizing randomly generated samples within the classifier’s intermediate space. Specifically, randomly generated embeddings classified by the model as belonging to the forget or retain classes are sufficient for effective unlearning, regardless of their marginal distribution. We validate our framework on four backbone architectures, ResNet-18, ResNet-50, ViT-B-16, and Swin-T, across three benchmark datasets, CIFAR-10, CIFAR-100, and TinyImageNet. Our experimental results show that existing class unlearning methods can operate within our source-free framework, with minimal impact on their forgetting efficacy and retain class accuracy.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15940
Loading