Abstract: Neural networks with saturating activations are often not used due to vanishing gradients. This problem is frequently tackled using Batch Normalization techniques, but we propose to use a different approach: the Auto-Rotation (AR). An existing AR-based method is the Auto-Rotating Perceptron (ARP), which enhances Rosenblatt's Perceptron and alleviates vanishing gradients by limiting the pre-activation to a region where the neurons do not saturate. However, this method is only defined for dense layers and requires additional hyperparameter tuning. In this paper, we present an extension of the ARP concept: the Auto-Rotating Neural Networks (ARNN). With them, we have convolutional layers and learnable pre-activation saturation regions. Regarding our experiments, in all of them we got that the AR outperforms the Batch Normalization approach in terms of preventing vanishing gradients. Also, our results show that the AR enhances the performance of convolutional nets that use saturated activations, even allowing them to slightly outperform ReLU-activated models. Besides that, by activating the AR we get faster convergence and, due to less hyperparameter tuning, we obtain greater ease of use. Furthermore, with our method we experimentally obtained much more uniform and stable gradients (across the layers and epochs, respectively). We expect that our Auto-Rotating layers will be used for deeper models with saturating and non-saturating activations, since our approach prevents vanishing gradients and issues related to gradient continuity, like what occurs with ReLUs.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Gintare_Karolina_Dziugaite1
Submission Number: 1873
Loading