Abstract: As deep neural networks (DNNs) become more widely used in various safety-critical applications, protecting their security has been an urgent and important task. Recently, one critical security issue is proposed that DNN models are vulnerable to targeted bit-flip attacks. This kind of sophisticated attack tries to inject backdoors into models via flipping only a few bits of carefully chosen model parameters. In this paper, we propose a gradient obfuscation-based data augmentation method to mitigate these targeted bit-flip attacks as an empirical study. Particularly, we mitigate such targeted bit-flip attacks by preprocessing only input samples to break the link between the features carried by triggers of input samples with the modified model parameters. Moreover, our method can keep an acceptable accuracy on benign samples. We show that our method is effective against two targeted bit-flip attacks by experiments on two widely-used structures (ResNet-20 and VGG-16) with one famous dataset (CIFAR-10).
Loading