Keywords: Activation function, Ordinal regression, Ranking problem, Parameterized activation, Trainable activation, Mutual information
TL;DR: We propose trainable staircase-like activations that learn intervals preserving ordinal structure. With noise, monotonic terms, adaptive piecewise functions, and mutual-information regularized loss, it improves stability and outperforms baselines.
Abstract: Ordinal labels are discrete and ordered but lack calibrated spacing, a structure that most deep networks ignore by treating them as nominal classes or real values. We introduce trainable staircase activations as a drop-in replacement, which partitions the output space into learnable, ordered intervals to align predictions with the ordinal labels. Direct parameterization reveals a degeneration–saturation dilemma in which gradients vanish and intervals collapse; we analyze its cause and propose three remedies: (i) stochastic noise injection to de-saturate plateaus, (ii) a monotonic ascending term to enforce order, and (iii) adaptive piecewise-linear functions that adjust thresholds end-to-end. Paired with a mutual information regularized absolute-error loss, our design stabilizes optimization and preserves ordinal structure. The modules are drop-in replacements for final layers and integrate with standard architectures without any architectural changes. Across diverse benchmarks, they consistently outperform softmax/logistic baselines and prior ordinal methods, demonstrating that staircase activations are an effective and principled building block for end-to-end learning with ordinal targets.
Supplementary Material: pdf
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 4266
Loading