Keywords: Neural Activation Functions, Principled Neural Activation Modeling, Neural Activation Interpretation, Non-local Information Modeling
TL;DR: We identify, elucidate, and address the underexplored non-local tension problem and introduce FleS, a self-gated activation function that enhances discriminative visual recognition through adaptive scaling.
Abstract: Neural networks necessitate nonlinearities to achieve universal approximability.
Traditional activation functions introduce nonlinearities through rigid feature rectifications.
Recent self-gated variants improve traditional methods in fitting flexibility by incorporating learnable content-aware factors and non-local dependencies, enabling dynamic adjustments to activation curves via adaptive translation and scaling.
While SOTA approaches achieve notable gains in conventional CNN layers, they struggle to enhance Transformer layers, where fine-grained context is inherently modeled, severely reducing the effectiveness of non-local dependencies leveraged in activation processes.
We refer to this critical yet unexplored challenge as the non-local tension of activation.
Drawing on a decision-making perspective, we systematically analyze the origins of the non-local tension problem and explore the initial solution to foster a more discriminative and generalizable neural activation methodology.
This is achieved by rethinking how non-local cues are encoded and transformed into adaptive scaling coefficients, which in turn recalibrate the contributions of features to filter updates through neural activation.
Grounded in these insights, we present FleS, a novel self-gated activation model for discriminative pattern recognition.
Extensive experiments on various popular benchmarks validate our interpretable methodology for improving neural activation modeling.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18129
Loading