Abstract: Deep neural networks have achieved outstanding performance on decision-making tasks. Still, it is vulnerable to the backdoor attack because of lacking transparency, which results in unexpected results for the input with a trigger. Backdoors can remain hidden in neural networks for a long time, and the networks implanted with backdoors can still make correct judgments when receiving normal inputs. The most common way to implant backdoors into neural networks is to perform trigger inserting and label modifying in the training dataset so that the trained models naturally carry backdoors. This paper proposes a novel defense method against neural network backdoor attacks, referred to as the backdoor filter. The defense is deployed on the dataset used for model training: It first blurs the dataset images and then performs sharpening operations on the obtained processed intermediate parts. This processed dataset is then used as input to the neural network for model training. We use three different types of backdoor attacks to validate our approach, evaluating the prediction accuracy and the attack's success rate after our processing. Experiments show that our approach can significantly reduce the success rate of the corresponding attacks and effectively defend against backdoor attacks.
Loading