Keywords: Randomized Feature Squeezing, Unseen $ {l_p} $ Attacks
Abstract: Deep learning has made tremendous progress in the last decades; however, it is not robust to adversarial attacks. The most effective approach is perhaps adversarial training, although it is impractical because it requires prior knowledge about the attackers and incurs high computational costs. In this paper, we propose a novel approach that can train a robust network only through standard training with clean images without awareness of the attacker's strategy. We add a specially designed network input layer, which accomplishes a randomized feature squeezing to reduce the malicious perturbation. It achieves excellent robustness against unseen ${l_0,l_1,l_2}$ and $ {l_\infty} $ attacks at one time in terms of the computational cost of the attacker versus the defender through just 100/50 epochs of standard training with clean images in CIFAR-10/ImageNet. The thorough experimental results validate the high performance. Moreover, it can also defend against unlearnable examples generated by One-Pixel Shortcut which breaks down the adversarial training approach.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 14939
Loading