- Keywords: Activation functions, Deep Learning
- TL;DR: We propose polynomial as activation functions.
- Abstract: Activation is a nonlinearity function that plays a predominant role in the convergence and performance of deep neural networks. While Rectified Linear Unit (ReLU) is the most successful activation function, its derivatives have shown superior performance on benchmark datasets. In this work, we explore the polynomials as activation functions (order ≥ 2) that can approximate continuous real valued function within a given interval. Leveraging this property, the main idea is to learn the nonlinearity, accepting that the ensuing function may not be monotonic. While having the ability to learn more suitable nonlinearity, we cannot ignore the fact that it is a challenge to achieve stable performance due to exploding gradients - which is prominent with the increase in order. To handle this issue, we introduce dynamic input scaling, output scaling, and lower learning rate for the polynomial weights. Moreover, lower learning rate will control the abrupt fluctuations of the polynomials between weight updates. In experiments on three public datasets, our proposed method matches the performance of prior activation functions, thus providing insight into a network’s nonlinearity preference.