Keywords: Machine unlearning, stochastic optimization, domain adaptation, Langevin dynamics
TL;DR: Incorporating public data in Langevin unlearning improves the unlearning-utility trade-off
Abstract: Achieving certified data erasure in machine unlearning faces a fundamental trade-off: preserving model utility requires less noise, but formal privacy guarantees demand more. This tension typically degrades model performance. In this work, we study this challenge in Langevin Unlearning, a noisy variant of SGD that is uniquely amenable to theoretical analysis.
We introduce an asymmetric unlearning setting assuming that datasets contain both private data (subject to unlearning) and public data (permanently retained). Our framework demonstrates that incorporating public data enables better unlearning-utility trade-offs without additional noise or restrictive differential privacy assumptions. We prove that public data volume quadratically reduces the Rényi divergence between unlearning and retraining distributions, allowing control over unlearning guarantees through data composition rather than noise amplification. The framework also provides a fine-grained analysis of how distributional alignment between public and private data affects performance preservation. Empirical validation using variational Rényi divergence estimation confirms our theoretical predictions, showing that strategic public data injection achieves comparable unlearning efficacy while significantly improving model performance and computational efficiency.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 21301
Loading