Keywords: Bayesian Inference, Bayesian Neural Networks, Adversarial Attacks, Adversarial Examples, Uncertainty Quantification, Selective Prediction
TL;DR: We investigate whether Bayesian neural networks are inherently adversarially robust and identify errors in prior studies of Bayesian neural network robsutness.
Abstract: This work examines the claim in recent work that Bayesian neural networks (BNNs) are inherently robust to adversarial perturbations. To study this question, we investigate whether it is possible to successfully break state-of-the-art BNN inference methods and prediction pipelines using even relatively unsophisticated attacks for three tasks: (1) label prediction under the posterior predictive mean, (2) adversarial example detection with Bayesian predictive uncertainty, and (3) semantic shift detection. We find that BNNs trained with state-of-the-art approximate inference methods, and even with HMC inference, are highly susceptible to adversarial attacks and identify various conceptual and experimental errors in previous works that claimed inherent adversarial robustness of BNNs. We conclusively demonstrate that BNNs and uncertainty-aware Bayesian prediction pipelines are not inherently robust against adversarial attacks and open up avenues for the development of Bayesian defenses for Bayesian prediction pipelines.