Optimizing Neural Network Effectiveness via Non-monotonicity Refinement
Abstract: Activation functions play a crucial role in artificial neural networks by introducing non-linearities that enable networks to learn complex patterns in data. An appropriate choice of an activation function plays a crucial role in the training dynamics of a neural network, which can boost network performance significantly. Rectified Linear Unit (ReLU) and its variants, like leaky ReLU and parametric ReLU, have emerged as the most popular activations due to their ability to enable faster training and generalization in deep neural networks despite having some significant issues like vanishing gradient problems. In this paper, we have proposed smooth functions, which we call the AMSU family, which are smooth approximations of the maximum function. We derive three activations from the AMSU family, namely AMSU-1, AMSU-2, & AMSU-3, and show their effectiveness in different deep learning problems. By simply replacing the ReLU function, Top-1 accuracy improves by 5.88%, 5.96%, and 5.32% on the CIFAR100 dataset on the ShuffleNet V2 model. Also, replacing ReLU with AMSU-1, AMSU-2, and AMSU-3, Top-1 accuracy improves by 8.50%, 8.29%, and 7.70% on the CIFAR100 dataset on the ShuffleNet V2 model with FGSM attack. Also, Replacing ReLU with AMSU-1, AMSU-2, and AMSU-3 on ImageNet-1K data, we got 3%-5% improvement on ShuffleNet and MobileNet models. The source code is publicly available at https://github.com/koushik313/AMSU.
Loading