Abstract: Standard ML models fail to infer the context distribution and suitably adapt. For instance, the learning fails when the underlying distribution is actually a mixture of distributions with contradictory labels. Learning also fails if there is a shift between train and test distributions. Standard neural network architectures like MLPs or CNNs are not equipped to handle this.
In this article, we propose a simple activation function, quantile activation (QAct), that addresses this problem without significantly increasing computational costs. The core idea is to "adapt" the outputs of each neuron to its context distribution. The proposed quantile activation (QAct) outputs the relative quantile position of neuron activations within their context distribution, diverging from the direct numerical outputs common in traditional networks.
A specific case of the above failure mode is when there is an inherent distribution shift, i.e the test distribution differs slightly from the train distribution. We validate the proposed activation function under covariate shifts, using datasets designed to test robustness against distortions. Our results demonstrate significantly better generalisation across distortions compared to conventional classifiers and other adaptive methods, across various architectures. Although this paper presents a proof of concept, we find that this approach unexpectedly outperforms DINOv2 (small), despite DINOv2 being trained with a much larger network and dataset.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/adityac20/quantile_activation
Assigned Action Editor: ~Guillermo_Ortiz-Jimenez1
Submission Number: 4063
Loading