Replacing Loss Functions And Target Representations For Adversarial Defense

Sean Saito, Sujoy Roy

Feb 12, 2018 (modified: Jun 04, 2018) ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: Recent works have shown that neural networks are susceptible to adversarial data, despite demonstrating high performance across various tasks. Hence, there is a growing need to develop techniques that make neural networks more robust against attacks given their increasingly frequent applications in real-life use cases. In this work, we propose simple techniques for adversarial defense, namely: (1) changing the loss function from cross entropy to mean-squared error, (2) representing targets as codewords generated from random codebooks, and (3) using an autoencoder to filter noisy logits before the final activation layer. Our experiments on CIFAR-10 using the DenseNet model have shown that these techniques can help prevent targeted attacks as well as improve classification accuracy on adversarial data generated in a white-box or black-box setting.
  • Keywords: adversarial attacks, target representation, loss function
  • TL;DR: Changing the loss function and target representation along with adding an autoencoder layer can significantly improve resistance to adversarial attacks