Turning a Curse Into a Blessing: Enabling Data-Free Backdoor Unlearning via Stabilized Model Inversion

Si Chen; Yi Zeng; Won Park; Tianhao Wang; Xun Chen; Lingjuan Lyu; Zhuoqing Mao; Ruoxi Jia

Turning a Curse Into a Blessing: Enabling Data-Free Backdoor Unlearning via Stabilized Model Inversion

Si Chen, Yi Zeng, Won Park, Tianhao Wang, Xun Chen, Lingjuan Lyu, Zhuoqing Mao, Ruoxi Jia

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Backdoor Defenses

Abstract: Effectiveness of many existing backdoor removal techniques crucially rely on access to clean in-distribution data. However, as model is often trained on sensitive or proprietary datasets, it might not be practical to assume the availability of in-distribution samples. To address this problem, we propose a novel approach to reconstruct samples from a backdoored model and then use the reconstructed samples as a proxy for clean in-distribution data needed by the defenses. We observe an interesting phenomenon that ensuring perceptual similarity between the synthesized samples and the clean training data is \emph{not} adequate to enable effective defenses. We show that the model predictions at such synthesized samples can be unstable to small input perturbations, which misleads downstream backdoor removal techniques to remove these perturbations instead of underlying backdoor triggers. Moreover, unlike clean samples, the predictions at the synthesized samples can also be unstable to small model parameter changes. To tackle these issues, we design an optimization-based data reconstruction technique that ensures visual quality while promoting the stability to perturbations in both data and parameter space. We also observe that while reconstructed from a backdoored model, the synthesized samples do not contain backdoors, and further provide a theoretical analysis that sheds light on this observation. Our evaluation shows that our data synthesis technique can lead to state-of-the-art backdoor removal performance without clean in-distribution data access and the performance is on par with or sometimes even better than using the same amount of clean samples.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

5 Replies

Loading