TL;DR: Flexible, Efficient, and Stable Adversarial Attacks against machine unlearning that processes multiple arbitrary attack targets at a time
Abstract: Machine unlearning (MU) aims to remove the influence of specific data points from trained models, enhancing compliance with privacy regulations. However, the vulnerability of basic MU models to malicious unlearning requests in adversarial learning environments has been largely overlooked. Existing adversarial MU attacks suffer from three key limitations: inflexibility due to pre-defined attack targets, inefficiency in handling multiple attack requests, and instability caused by non-convex loss functions. To address these challenges, we propose a Flexible, Efficient, and Stable Attack (DDPA). First, leveraging Carathéodory's theorem, we introduce a convex polyhedral approximation to identify points in the loss landscape where convexity approximately holds, ensuring stable attack performance. Second, inspired by simplex theory and John's theorem, we develop a regular simplex detection technique that maximizes coverage over the parameter space, improving attack flexibility and efficiency. We theoretically derive the proportion of the effective parameter space occupied by the constructed simplex. We evaluate the attack success rate of our DDPA method on real datasets against state-of-the-art machine unlearning attack methods. Our source code is available at https://github.com/zzz0134/DDPA.
Lay Summary: Modern privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA), enforce "the right to be forgotten", i.e., individuals can request that their personal data be deleted.
Machine unlearning techniques aim to enable data owners to proactively remove their data and eliminate its influence from already trained machine learning model upon requests.
For example, Stability AI announced that it would allow artists to remove their work from the training data used for the Stable Diffusion 3.0 release.
However, these unlearning models are vulnerable to malicious requests in adversarial environments, where attackers try to exploit the system.
In this work, we present a method that can effectively attack unlearning models in a flexible, efficient, and stable way. Our framework can inspire new defensive techniques for a wide variety of privacy-critical applications that usually require near-zero tolerance of data leaking, such as financial and health data analyses, where even minimal data leaks are unacceptable.
Link To Code: https://github.com/zzz0134/DDPA
Primary Area: General Machine Learning->Everything Else
Keywords: Machine unlearning, poisoning attack, thrust vector control theory, John's Theorem, polyhedral approximation
Submission Number: 15747
Loading