Keywords: Deep neural networks, curvature, spectral norm, Lipschitz constant, robustness
TL;DR: We propose a practical method to train neural networks such that they have a low curvature, without losing predictive accuracy.
Abstract: Standard deep neural networks often have excess non-linearity, making them susceptible to issues
such as low adversarial robustness and gradient instability. Common methods to address these
downstream issues, such as adversarial training, are expensive and often sacrifice predictive accuracy.
In this work, we address the core issue of excess non-linearity via curvature, and
demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature
than standard models while exhibiting similar predictive performance. This leads to improved
robustness and stable gradients, at a fraction of the cost of standard adversarial training.
To achieve this, we decompose overall model curvature in terms of curvatures and slopes of
its constituent layers. To enable efficient curvature minimization of constituent layers,
we introduce two novel architectural components: first, a non-linearity called centered-softplus
that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained
batch normalization layer.
Our experiments show that LCNNs have lower curvature, more stable gradients and increased
off-the-shelf adversarial robustness when compared to standard neural networks, all without
affecting predictive performance. Our approach is easy to use and can be readily incorporated
into existing neural network architectures.
Supplementary Material: pdf
19 Replies
Loading