Discovering Parametric Activation Functions

Garrett Bingham; Risto Miikkulainen

Discovering Parametric Activation Functions

Garrett Bingham, Risto Miikkulainen

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: activation function, parametric, evolution

Abstract: Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task-dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with three different neural network architectures on the CIFAR-100 image classification dataset show that this approach is effective. It discovers different activation functions for different architectures, and consistently improves accuracy over ReLU and other recently proposed activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: Evolutionary search discovers the general form of novel activation functions, and gradient descent fine-tunes the shape for different parts of the network and over the learning process.

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=c4wRLAmFxq

9 Replies

Loading