Original Pdf: pdf
TL;DR: We argue theoretically that by simply assuming the weights of a ReLU network to be Gaussian distributed (without even a Bayesian formalism) could fix this issue; for a more calibrated uncertainty, a simple Bayesian method could already be sufficient.
Abstract: The point estimates of ReLU classification networks, arguably the most widely used neural network architecture, have recently been shown to have arbitrarily high confidence far away from the training data. This architecture is thus not robust, e.g., against out-of-distribution data. Approximate Bayesian posteriors on the weight space have been empirically demonstrated to improve predictive uncertainty in deep learning. The theoretical analysis of such Bayesian approximations is limited, including for ReLU classification networks. We present an analysis of approximate Gaussian posterior distributions on the weights of ReLU networks. We show that even a simplistic (thus cheap), non-Bayesian Gaussian distribution fixes the asymptotic overconfidence issue. Furthermore, when a Bayesian method, even if a simple one, is employed to obtain the Gaussian, the confidence becomes better calibrated. This theoretical result motivates a range of Laplace approximations along a fidelity-cost trade-off. We validate these findings empirically via experiments using common deep ReLU networks.
Keywords: uncertainty quantification, overconfidence, Bayesian inference