Keywords: AI Security, Backdoor or Trojan Attacks on Deep Networks, Safe and Robust AI
Abstract: The success of a deep neural network (DNN) heavily relies on the details of the
training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent
backdoor attacks suggest that an adversary can take advantage of such training
details and compromise the integrity of DNN. Our studies show that a backdoor
model is usually optimized to a bad local minima, i.e., sharper minima as compared
to a benign model. Intuitively, backdoor can be purified by re-optimizing the
model to a smoother minima through fine-tuning with a few clean validation data.
However, fine-tuning all DNN parameters often requires huge computational costs as
well as sub-par clean test performance. To address this concern, we propose a novel
backdoor purification technique—N atural G radient Fine-tuning (NGF)—which
focuses on removing backdoor by fine-tuning only one layer. Specifically, NGF
utilizes a loss surface geometry-aware optimizer that can successfully overcome
the challenge of reaching a smooth minima under one-layer optimization scenario.
To enhance the generalization performance of our proposed method, we introduce
a clean data distribution-aware regularizer based on the knowledge of loss surface
curvature matrix, i.e., Fisher Information Matrix. To validate the effectiveness of
our method, we conduct extensive experimentation with four different datasets—
CIFAR10, GTSRB, Tiny-ImageNet, and ImageNet; as well as 11 recent backdoor
attacks, e.g., Blend, Dynamic, Clean Label, etc. NGF achieves state-of-the-art
performance in most of these benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
22 Replies
Loading