Keywords: Deep neural network, Activation function, Attention mechanism
Abstract: Deep Neural Networks (DNNs) rely on activation functions to introduce non-linearity, which significantly impacts performance across various tasks. To enhance neural network expressivity, we propose \textbf{Attention-based Dynamic ReLU (ADReLU)}—a novel activation function that replaces ReLU’s fixed zero threshold with a dynamic, input-dependent threshold computed via an attention mechanism. To balance expressivity and computational efficiency, ADReLU employs grouped convolution and depth-wise projection for image data, mitigating the computational cost typically associated with attention operations. Extensive experiments on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets demonstrate that ADReLU consistently outperforms both predefined activation functions (such as ReLU, LReLU) and trainable (such as PReLU, GCLU, GELU, Maxout, and Dynamic ReLU) in terms of accuracy. Furthermore, we empirically analyze ADReLU’s attention subspace dimension, sparsity patterns, and computational complexity, highlighting its balanced efficacy in feature representation and resource efficiency.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 16604
Loading