Rethinking the Definition of Unlearning: Suppressive Machine Unlearning

ICLR 2026 Conference Submission11700 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Machine Unlearning, Knowledge Suppression
Abstract: Machine unlearning, an emerging issue of privacy concern in the deep learning era, is practically motivated by the *data removal* from training or *knowledge suppression* of utility on that data. Unfortunately, retraining via data removal, which has been understood as the gold standard, does not elucidate how much we suppress the model's knowledge on the target. The existing definition well covers an *exact* or *approximate* unlearning only with a removal perspective, yet failing to encompass knowledge suppression incurred via unlearning. Moreover, suppression is tightly entangled with removal in a way that more knowledge suppression obviously leads to significant divergence from exact and approximate unlearning, thus motivating us to rethink the definition of machine unlearning. We formally introduce a novel definition of *Suppressive Machine Unlearning*, encompassing how far the unlearned model is from retraining, i.e., $(\varepsilon,\delta)$-approximate unlearning, and how much the model's utility becomes suppressed, i.e., $\kappa$. To illuminate the formal dynamics between removal and suppression, we reveal the trade-off between the removal guarantees ($\varepsilon, \delta$), which quantifies how much it deviates from an idealized retraining and $\kappa^*$, which is the requested level of suppression.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 11700
Loading