Towards Efficient Search for Customized Activation Functions With Gradient Descent

Lukas Strack; Mahmoud Safari; Frank Hutter

Towards Efficient Search for Customized Activation Functions With Gradient Descent

Lukas Strack, Mahmoud Safari, Frank Hutter

Published: 12 Jul 2024, Last Modified: 12 Aug 2024AutoML 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Learning, Deep Learning, Activation Functions, Neural Architecture Search

TL;DR: We leverage gradient-based bi-level optimization techniques to identify high-performing customized activation functions for deep neural networks.

Abstract: We leverage recent advancements in gradient-based search techniques for neural architectures to efficiently identify high-performing activation functions for a given application. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions, allowing for the exploration of novel activations. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried, from image classification to language models. Moreover, the identified activations exhibit strong transferability to larger models of the same type, as well as new datasets. Importantly, our automated process is orders of magnitude more efficient than previous approaches. It can easily be applied on top of arbitrary deep learning pipelines and thus offers a promising practical avenue for enhancing deep learning architectures.

Submission Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Optional Meta-Data For Green-AutoML: All questions below on environmental impact are optional.

Submission Number: 22

Loading