Combating Adversarial Attacks Using Sparse Representations

Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, Ramtin Pedarsani

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: It is by now well-known that small adversarial perturbations can induce classification errors in deep neural networks (DNNs). In this paper, we make the case that sparse representations of the input data are a crucial tool for combating such attacks. For linear classifiers, we show that a sparsifying front end is provably effective against l∞-bounded attacks, reducing output distortion due to the attack by a factor of roughly K/N where N is the data dimension and K is the sparsity level. We then extend this concept to DNNs, showing that a “locally linear” model can be used to develop a theoretical foundation for crafting attacks and defenses. Experimental results for the MNIST dataset show the efficacy of the proposed sparsifying front end.
  • Keywords: Adversarial examples, sparse representations, robust machine learning
  • TL;DR: We show via a theoretically grounded framework that sparsity in natural data can be exploited to combat adversarial attacks.