Abstract: The increasing computational demand of Deep Neural Networks (DNNs) motivates companies and organizations to outsource the training process. However, outsourcing training process makes DNNs easy to be backdoor attacked. It is necessary to defend against such attacks, i.e., to design a training strategy or postprocess a trained suspicious model so that backdoor behavior of a model is mitigated while normal prediction power on clean inputs is not affected. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, these samples are usually unavailable in the real world, causing existing methods not applicable. In this paper, we argue that, to mitigate backdoor, (1) labels of data may not be necessary (2) in-distribution data may not be needed. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively remove backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we compare our framework with six backdoor defense methods using labeled data against six state-of-the-art backdoor attacks. The experiments show that our framework can achieve comparable results, even only with out-of-distribution data.
0 Replies
Loading