Unlearning with asymmetric sources: improved unlearning-utility trade-off with public data

Unlearning with asymmetric sources: improved unlearning-utility trade-off with public data

ICLR 2026 Conference Submission21301 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine unlearning, stochastic optimization, domain adaptation, Langevin dynamics

TL;DR: Incorporating public data in Langevin unlearning improves the unlearning-utility trade-off

Abstract: Achieving certified data erasure in machine unlearning faces a fundamental trade-off: preserving model utility requires less noise, but formal privacy guarantees demand more. This tension typically degrades model performance. In this work, we study this challenge in Langevin Unlearning, a noisy variant of SGD that is uniquely amenable to theoretical analysis. We introduce an asymmetric unlearning setting assuming that datasets contain both private data (subject to unlearning) and public data (permanently retained). Our framework demonstrates that incorporating public data enables better unlearning-utility trade-offs without additional noise or restrictive differential privacy assumptions. We prove that public data volume quadratically reduces the Rényi divergence between unlearning and retraining distributions, allowing control over unlearning guarantees through data composition rather than noise amplification. The framework also provides a fine-grained analysis of how distributional alignment between public and private data affects performance preservation. Empirical validation using variational Rényi divergence estimation confirms our theoretical predictions, showing that strategic public data injection achieves comparable unlearning efficacy while significantly improving model performance and computational efficiency.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 21301

Loading