Shave Peaks, Don't Fill Valleys: Upper-Tail Risk Balancing Improves Robustness without Accuracy Loss
Keywords: Attribution Concentration, Risk-Balanced Representation Learning, Robustness Training, Upper-Tail Control, Gradient-Based Attribution, Time Series, Sequence Models
TL;DR: RBRL trims the upper tail of activation×gradient attribution to diffuse peak reliance in sequence models, boosting robustness to targeted occlusions and noise without degrading accuracy or inference cost.
Abstract: Many sequence models achieve strong average performance yet exhibit **concentrated internal dependencies**: removing just a few "critical units/time positions" causes disproportionate degradation. We propose **RBRL (Risk-Balanced Representation Learning)**, which applies financial risk allocation principles to neural network training by constraining **attribution concentration** through adaptive risk budgets. RBRL uses a stable attribution signal (AEC: activation × gradient with EMA normalization) and imposes upper-tail constraints via quantile budgets and soft-Top-K penalties, enabling "peak-shaving" without compromising main objectives through dual-only training that preserves backbone gradients.
Across S\&P 500 and ETT datasets, RBRL **improves robustness under a tunable computational overhead while maintaining baseline-level accuracy on S\&P 500**; on ETT, RMSE changes show mixed results across subsets; on S\&P 500, differences are small but not statistically significant (RMSE *p* = 0.216; MAE *p* = 0.201; directional accuracy unchanged). Our comprehensive evaluation across 68 configurations demonstrates architecture-agnostic applicability to LSTM, iTransformer, and other sequence models. We position this as a **robust reliance training paradigm**: proactively dispersing dependencies during training rather than addressing brittleness post-hoc.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3763
Loading