Gradient-Free Adversarial Attacks for Bayesian Neural NetworksDownload PDF

Published: 21 Dec 2020, Last Modified: 12 Mar 2024AABI2020Readers: Everyone
Keywords: Bayesian Neural Networks, Robustness, Adversarial Examples
TL;DR: We explore the use of gradient-free methods for attacking BNNs where the gradient information wrt an input point is known to be vanishing.
Abstract: The existence of adversarial examples, i.e., small perturbations to their inputs that can cause a misclassification, underscores the importance of understanding the robustness of machine learning models. Bayesian neural networks (BNNs), due to their calibrated uncertainty, have been shown to posses favorable adversarial robustness properties. However, when approximate Bayesian inference methods are employed, the adversarial robustness of BNNs is still not well understood. In this work, we employ gradient-free optimization methods in order to find adversarial examples for BNNs. In particular, we consider genetic algorithms, surrogate models, as well as zeroth order optimization methods and adapt them to the goal of finding adversarial examples for BNNs. In an empirical evaluation on the MNIST and Fashion MNIST datasets, we show that for various approximate Bayesian inference methods the usage of gradient-free algorithms can greatly improve the rate of finding adversarial examples compared to state-of-the-art gradient-based methods.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2012.12640/code)
1 Reply

Loading