Fisher Information Guided Backdoor Purification Via Naive Exploitation of Smoothness

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Backdoor Attack, AI Security, Fisher Information, DNN Smoothness
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Designing a novel DNN backdoor purification technique using the knowledge of Fisher Information Matrix
Abstract: Backdoor attacks during deep neural network (DNN) training have gained popularity in recent times since they can easily compromise the safety of a model of high importance, e.g., large language or vision models. Our study shows that a backdoor model converges to a *bad local minima*, i.e., sharper minima as compared to a benign model. Intuitively, the backdoor can be purified by re-optimizing the model to smoother minima. To obtain such re-optimization, we propose *Smooth Fine-Tuning (SFT)*, a novel backdoor purification framework that exploits the knowledge of *Fisher Information Matrix (FIM)*. However, purification in this manner can lead to poor clean test time performance due to drastic changes in the original backdoor model parameters. To preserve the original test accuracy, a novel regularizer has been designed to explicitly remember the learned clean data distribution. In addition, we introduce an efficient variant of SFT, dubbed as *Fast SFT*, which reduces the number of tunable parameters significantly and obtains an impressive runtime gain of almost $5\times$. Extensive experiments show that the proposed method achieves state-of-the-art performance on a wide range of backdoor defense benchmarks: *four different tasks---Image Recognition, Object Detection, Video Action Recognition, 3D point Cloud; 10 different datasets including ImageNet, PASCAL VOC, UCF101; diverse model architectures spanning both CNN and vision transformer; 14 different backdoor attacks, e.g., Dynamic, WaNet, ISSBA, etc.*
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4205
Loading