Keywords: activation function, quantization, nonlinearity
TL;DR: We propose a highly flexible trainable activation function that works well with all kinds of DNNs and tasks.
Abstract: Activation functions (AFs) are a cornerstone of deep learning, providing the crucial nonlinearity needed for network expressiveness. However, widely used AFs like ReLU and GELU are fixed and non-adaptive, offering limited nonlinearity and often necessitating larger, more complex architectures to capture intricate functions. This paper introduces a new family of trainable, architecture-agnostic AFs called Soft Quantization Activation Functions (SQUAFs). We show theoretically that SQUAFs can approximate any continuous nonlinear one-dimensional function with arbitrary precision. Our extensive experiments demonstrate that networks equipped with SQUAFs consistently outperform their counterparts using existing AFs across diverse tasks. Specifically, we achieve orders-of-magnitude error reduction in function fitting, up to 25.27 dB gain in image fitting, and significant accuracy improvements in image classification and large language model (LLM) fine-tuning. Moreover, SQUAFs (1) enable smaller models to surpass larger ones trained with conventional AFs, and (2) can reduce the inter-device communication cost in model-parallel settings by up to 9-fold while still improving accuracy. These results highlight SQUAFs as a simple yet powerful drop-in replacement for standard AFs, offering both theoretical expressiveness and practical performance gains.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9182
Loading