Data-driven Staircase Activation Functions for Ordinal Classification

Guo-Wei Wong; Ming-Chuan Yang; Shou-De Lin; Meng Chang Chen

Data-driven Staircase Activation Functions for Ordinal Classification

Guo-Wei Wong, Ming-Chuan Yang, Shou-De Lin, Meng Chang Chen

12 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Activation function, Ordinal regression, Ranking problem, Parameterized activation, Trainable activation, Mutual information

TL;DR: We propose trainable staircase-like activations that learn intervals preserving ordinal structure. With noise, monotonic terms, adaptive piecewise functions, and mutual-information regularized loss, it improves stability and outperforms baselines.

Abstract: Ordinal labels are discrete and ordered but lack calibrated spacing, a structure that most deep networks ignore by treating them as nominal classes or real values. We introduce trainable staircase activations as a drop-in replacement, which partitions the output space into learnable, ordered intervals to align predictions with the ordinal labels. Direct parameterization reveals a degeneration–saturation dilemma in which gradients vanish and intervals collapse; we analyze its cause and propose three remedies: (i) stochastic noise injection to de-saturate plateaus, (ii) a monotonic ascending term to enforce order, and (iii) adaptive piecewise-linear functions that adjust thresholds end-to-end. Paired with a mutual information regularized absolute-error loss, our design stabilizes optimization and preserves ordinal structure. The modules are drop-in replacements for final layers and integrate with standard architectures without any architectural changes. Across diverse benchmarks, they consistently outperform softmax/logistic baselines and prior ordinal methods, demonstrating that staircase activations are an effective and principled building block for end-to-end learning with ordinal targets.

Supplementary Material: pdf

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 4266

Loading