ADReLU: Enhancing Neural Network Expressivity with Attention-based Dynamic ReLU

ADReLU: Enhancing Neural Network Expressivity with Attention-based Dynamic ReLU

ICLR 2026 Conference Submission16604 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep neural network, Activation function, Attention mechanism

Abstract: Deep Neural Networks (DNNs) rely on activation functions to introduce non-linearity, which significantly impacts performance across various tasks. To enhance neural network expressivity, we propose \textbf{Attention-based Dynamic ReLU (ADReLU)}—a novel activation function that replaces ReLU’s fixed zero threshold with a dynamic, input-dependent threshold computed via an attention mechanism. To balance expressivity and computational efficiency, ADReLU employs grouped convolution and depth-wise projection for image data, mitigating the computational cost typically associated with attention operations. Extensive experiments on CIFAR-10, CIFAR-100, SVHN, and ImageNet datasets demonstrate that ADReLU consistently outperforms both predefined activation functions (such as ReLU, LReLU) and trainable (such as PReLU, GCLU, GELU, Maxout, and Dynamic ReLU) in terms of accuracy. Furthermore, we empirically analyze ADReLU’s attention subspace dimension, sparsity patterns, and computational complexity, highlighting its balanced efficacy in feature representation and resource efficiency.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 16604

Loading