Regularizing activations in neural networks via distribution matching with the Wassertein metric


Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Regularization and normalization have become an indispensable component in deep learning because it enables faster training and improved generalization performance. We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. PER randomly projects activations to one dimensional space and computes the regularization in the projected space. PER acts like the Pseudo-Huber loss in the projected space, enabling robust regularization for training deep neural networks. In addition, PER can capture interaction between hidden units by projection vector drawn from unit sphere. By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution. To the best of the authors' knowledge, this is the first work to regularize activations concerning the target distribution in the probability distribution space. We evaluate the proposed method on image classification task and word-level language modeling task.
0 Replies