Keywords: nonconvex optimization, path norm, neural networks, deep learning, robustness, generalization
TL;DR: we develop a new regularization methods for deep neural networks that increases accuracy and robustness, compared to weight decay
Abstract: The so-called path-norm measure is considered one of the best indicators for good generalization of neural networks. This paper introduces a proximal gradient framework for the training of deep neural networks via 1-path-norm regularization, which is applicable to general deep architectures. We address the resulting nonconvex nonsmooth optimization model by transforming the intractable induced proximal operation to an equivalent differentiable proximal operation. We compare automatic differentiation (backpropagation) algorithms with the proximal gradient framework in numerical experiments on FashionMNIST and CIFAR10. We show that 1-path-norm regularization is a better choice than weight-decay for fully connected architectures, and it improves the robustness to the presence of noisy labels. In this latter setting, the proximal gradient methods have an advantage over automatic differentiation.
Submission Type: Non-Archival
Submission Number: 13
Loading