Achieving Strong Regularization for Deep Neural Networks


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: L1 and L2 regularizers are critical tools in machine learning due to their ability to simplify solutions. However, imposing strong L1 or L2 regularization with gradient descent method easily fails, and this limits the generalization ability of the underlying neural networks. To understand this phenomenon, we investigate how and why training fails for strong regularization. Specifically, we examine how gradients change over time for different regularization strengths and provide an analysis why the gradients diminish so fast. We find that there exists a tolerance level of regularization strength, where the training completely fails if the regularization strength goes beyond it. We propose a time-dependent regularization schedule in order to moderate the tolerance level. Experiments show that our proposed approach indeed achieves strong regularization for both L1 and L2 regularizers and improve both accuracy and sparsity on public data sets. Our source code is published.
  • TL;DR: We investigate how and why strong L1/L2 regularization fails and propose a method than can achieve strong regularization.
  • Keywords: deep learning, regularization