Reference-Guided Machine Unlearning

Jonas Mirlach; Sonia Laguna; Julia E Vogt

Reference-Guided Machine Unlearning

Jonas Mirlach, Sonia Laguna, Julia E Vogt

Published: 01 Mar 2026, Last Modified: 24 Apr 2026ICLR 2026 AIWILDEveryoneRevisionsCC BY 4.0

Keywords: Machine Unlearning, Approximate Unlearning, Distribution Matching, Knowledge Distillation, Unlearning-Utility Trade-off, Membership Inference Attacks

TL;DR: We propose Reference-Guided Unlearning (ReGUn), a machine unlearning framework that uses a held-out dataset to distill forget-set predictions toward “unseen” behavior, improving the forgetting–utility trade-off across datasets and architectures.

Abstract: Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these signals can be poorly conditioned, leading to unstable optimization and harming the model's generalization. We argue that unlearning should instead prioritize distributional indistinguishability, aligning the model’s behavior on forget data with its behavior on truly unseen data. Motivated by this, we propose Reference-Guided Unlearning ReGUn, a framework that leverages a disjoint held-out dataset to provide a principled, class-conditioned reference for distillation. We demonstrate across various model architectures, natural image datasets, and varying forget fractions that ReGUn consistently outperforms standard approximate baselines, achieving a superior forgetting-utility trade-off.

PDF: pdf

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 195

Loading