TL;DR: We propose a novel method for suppressing the vulnerability of latent feature space to achieve robust and compact networks.
Abstract: Despite the remarkable performance of deep neural networks (DNNs) on various tasks, they are susceptible to adversarial perturbations which makes it difficult to deploy them in real-world safety-critical applications. In this paper, we aim to obtain robust networks by sparsifying DNN's latent features sensitive to adversarial perturbation. Specifically, we define vulnerability at the latent feature space and then propose a Bayesian framework to prioritize/prune features based on their contribution to both the original and adversarial loss. We also suggest regularizing the features' vulnerability during training to improve robustness further. While such network sparsification has been primarily studied in the literature for computational efficiency and regularization effect of DNNs, we confirm that it is also useful to design a defense mechanism through quantitative evaluation and qualitative analysis. We validate our method, \emph{Adversarial Neural Pruning (ANP)} on multiple benchmark datasets, which results in an improvement in test accuracy and leads to state-of-the-art robustness. ANP also tackles the practical problem of obtaining sparse and robust networks at the same time, which could be crucial to ensure adversarial robustness on lightweight networks deployed to computation and memory-limited devices.
Keywords: adversarial examples, robust machine learning, robust optimization, network pruning
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1908.04355/code)
Original Pdf: pdf
7 Replies
Loading