Explainable Adversarial Learning: Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

Priyadarshini Panda; Kaushik Roy

Explainable Adversarial Learning: Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

Priyadarshini Panda, Kaushik Roy

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Withdrawn SubmissionReaders: Everyone

Abstract: We introduce Explainable Adversarial Learning, ExL, an approach for training neural networks that are intrinsically robust to adversarial attacks. We find that the implicit generative modeling of random noise with the same loss function used during posterior maximization, improves a model's understanding of the data manifold furthering adversarial robustness. We prove our approach's efficacy and provide a simplistic visualization tool for understanding adversarial data, using Principal Component Analysis. Our analysis reveals that adversarial robustness, in general, manifests in models with higher variance along the high-ranked principal components. We show that models learnt with our approach perform remarkably well against a wide-range of attacks. Furthermore, combining ExL with state-of-the-art adversarial training extends the robustness of a model, even beyond what it is adversarially trained for, in both white-box and black-box attack scenarios.

Keywords: Adversarial Robustness, PCA variance, PCA subspace, Generative Noise Modeling, Adversarial attack, Adversarial Robustness Metric

TL;DR: Noise modeling at the input during discriminative training improves adversarial robustness. Propose PCA based evaluation metric for adversarial robustness

11 Replies

Loading