ZeroLiers: Diminishing Large Outliers in ReLU-like ActivationsDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Deep Neural Network, Rectified Linear Units, Generalization, Regularization, Dropout
Abstract: As the number of learnable parameters is getting bigger and bigger, overfitting is still one of the main challenges in training DNNs. Even though DNNs with billions or even a few hundred billions of parameters are proposed and used, it is still hard to determine the appropriate training set size that prevents overfitting. In this work, we propose a new activation function, called ZeroLiers, to prevent overfitting. It eliminates the need to use Dropout and leads to better generalization when training DNNs with fully connected layers. ZeroLiers can be easily implemented by replacing large outliers in ReLU-like activations with zeros. We perform an empirical evaluation of ZeroLiers' regularization effect against Dropout. Interestingly, the validation loss decreases much faster using ZeroLiers than Dropout, and the generalization performance improves. Moreover, we train several recent DNNs with fully connected layers and investigate the effect of ZeroLiers. Specifically, we find that ZeroLiers accelerates the convergence speed of both their training and validation losses.
One-sentence Summary: The paper proposes a new activation function, called ZeroLiers, to prevent overfitting and to eliminate the need to use Dropout.
Supplementary Material: zip
5 Replies

Loading