Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks

Qiyang Li; Saminul Haque; Cem Anil; James R Lucas; Roger B Grosse; Joern-Henrik Jacobsen

Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks

Qiyang Li, Saminul Haque, Cem Anil, James R Lucas, Roger B Grosse, Joern-Henrik Jacobsen

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: Lipschitz constraints on deep neural networks have been used to train invertible networks, get provable adversarial robustness bounds, improve training stability and get tight Wasserstein distance estimates. However, there are many theoretical hurdles which make training expressive Lipschitz networks challenging. Anil et al. recently introduced an effective approach to tightly constrain the Lipschitz constant in fully connected networks. In this work, we follow their gradient-norm-preserving design philosophy to train scalable, expressive, provably Lipschitz convolutional networks. We identify several limitations of commonly-used approaches to constrain the Lipschitz constant of convolutional networks. Guided by a thorough theoretical analysis of the space of orthogonal convolutions, we introduce an expressive parameterization of the space of orthogonal kernels to remedy their shortcomings. This parameterization allows us to train large convolutional networks outperforming other commonly-used approaches. We verify our approach on two challenging problems: (1) Adversarial robustness, where we achieve state-of-the-art provable deterministic robustness under the L2 metric on MNIST and CIFAR10. (2) Wasserstein distance estimation between high dimensional distributions, where our gradient norm preserving architectures achieve better performance than competing approaches.

Code Link: https://github.com/ColinQiyangLi/LConvNet

CMT Num: 8869

0 Replies

Loading