Replacing Loss Functions And Target Representations For Adversarial Defense

Sean Saito; Sujoy Roy

Replacing Loss Functions And Target Representations For Adversarial Defense

Sean Saito, Sujoy Roy

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: Recent works have shown that neural networks are susceptible to adversarial data, despite demonstrating high performance across various tasks. Hence, there is a growing need to develop techniques that make neural networks more robust against attacks given their increasingly frequent applications in real-life use cases. In this work, we propose simple techniques for adversarial defense, namely: (1) changing the loss function from cross entropy to mean-squared error, (2) representing targets as codewords generated from random codebooks, and (3) using an autoencoder to filter noisy logits before the final activation layer. Our experiments on CIFAR-10 using the DenseNet model have shown that these techniques can help prevent targeted attacks as well as improve classification accuracy on adversarial data generated in a white-box or black-box setting.

TL;DR: Changing the loss function and target representation along with adding an autoencoder layer can significantly improve resistance to adversarial attacks

Keywords: adversarial attacks, target representation, loss function

4 Replies

Loading