Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models
Abstract: The backdoor attack on deep neural network models implants malicious data patterns in a model to induce
attacker-desirable behaviors. Existing defense methods fall into the online and offline categories, in which the offline models achieve
state-of-the-art detection rates but are restricted by heavy computation overhead. In contrast, their more deployable online
counterparts lack the means to detect source-specific backdoors with large sizes. This work proposes a new online backdoor detection
method—Reverse Backdoor Distillation (RBD) to handle issues associated with source-specific and source-agnostic backdoor attacks.
RBD, designed with the novel perspective of distilling instead of erasing backdoor knowledge, is a complementary backdoor detection
methodology that can be used in conjunction with other online backdoor defenses. Considering the fact that trigger data will cause
overwhelming neuron activation while clean data will not, RBD distills backdoor attack pattern knowledge from a suspicious model to
create a shadow model, which is subsequently deployed online along with the original model in scope to predict a backdoor attack. We
extensively evaluate RBD on several datasets (MNIST, GTSRB, CIFAR-10) with diverse model architectures and trigger patterns.
RBD outperforms online benchmarks in all experimental settings. Notably, RBD demonstrates superior capability in detecting
source-specific attacks, where comparison methods fail, underscoring the effectiveness of our proposed technique. Moreover,
RBD achieves a computational savings of at least 97%.
Loading