Toward Principled Flexible Scaling for Self-Gated Neural Activation

Toward Principled Flexible Scaling for Self-Gated Neural Activation

ICLR 2026 Conference Submission18129 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Neural Activation Functions, Principled Neural Activation Modeling, Neural Activation Interpretation, Non-local Information Modeling

TL;DR: We identify, elucidate, and address the underexplored non-local tension problem and introduce FleS, a self-gated activation function that enhances discriminative visual recognition through adaptive scaling.

Abstract: Neural networks necessitate nonlinearities to achieve universal approximability. Traditional activation functions introduce nonlinearities through rigid feature rectifications. Recent self-gated variants improve traditional methods in fitting flexibility by incorporating learnable content-aware factors and non-local dependencies, enabling dynamic adjustments to activation curves via adaptive translation and scaling. While SOTA approaches achieve notable gains in conventional CNN layers, they struggle to enhance Transformer layers, where fine-grained context is inherently modeled, severely reducing the effectiveness of non-local dependencies leveraged in activation processes. We refer to this critical yet unexplored challenge as the non-local tension of activation. Drawing on a decision-making perspective, we systematically analyze the origins of the non-local tension problem and explore the initial solution to foster a more discriminative and generalizable neural activation methodology. This is achieved by rethinking how non-local cues are encoded and transformed into adaptive scaling coefficients, which in turn recalibrate the contributions of features to filter updates through neural activation. Grounded in these insights, we present FleS, a novel self-gated activation model for discriminative pattern recognition. Extensive experiments on various popular benchmarks validate our interpretable methodology for improving neural activation modeling.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 18129

Loading