Should We Forget About Certified Unlearning? Evaluating the Pitfalls of Noisy Methods

ICLR 2026 Conference Submission22617 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: unlearning, differential privacy
TL;DR: We critically evaluate certifiable unlearning and find retraining from scratch often offers better compute-utility-unlearning tradeoffs than current methods based on noisy training, which suffer from suboptimal convergence and high compute costs.
Abstract: Removing the influence of certain training data points from trained models ("unlearning") is a critical need driven by data privacy regulations. While a straightforward way to achieve this "exactly" is to retrain from scratch on only permissible data (the "retain set"), that approach is computationally prohibitive. A promising alternative involves first training a model on the full dataset with differential privacy (DP) and then fine-tuning it, with or without noise, on only the retain set. This offers certifiable unlearning: while unlearning is approximate in this case, this method comes with theoretical guarantees on the quality of that approximation, building on the DP guarantees. Recent papers claim that this approach makes favourable tradeoffs relative to retraining: while DP-unlearning offers a weaker guarantee, and may degrade model utility, it is more efficient. However, the practical viability of this approach has not been rigorously assessed in realistic settings. We conduct a systematic evaluation across both vision and language tasks revealing that, contrary to prevailing claims, DP-unlearning methods fail to offer a compelling advantage over retraining from scratch, even after applying several improvements to maximize their potential, and even when allowing them to offer a weaker guarantee than what would be necessary in some practical scenarios. We identify two key failure modes explaining this result. First, if starting from a random initialization, DP guides models to suboptimal solutions from which they cannot easily escape, costing too much in terms of utility. On the other hand, starting the training from a pretrained model doesn't pay off either: simply ``re-finetuning'' that pretrained model is already quite fast, while also having the strongest unlearning guarantee. Overall, we failed to find a scenario where certified unlearning is worthwhile. This important negative result highlights the need to explore alternative techniques.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22617
Loading