Shave Peaks, Don't Fill Valleys: Upper-Tail Risk Balancing Improves Robustness without Accuracy Loss

Yucheng Chen

Shave Peaks, Don't Fill Valleys: Upper-Tail Risk Balancing Improves Robustness without Accuracy Loss

Yucheng Chen

10 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Attribution Concentration, Risk-Balanced Representation Learning, Robustness Training, Upper-Tail Control, Gradient-Based Attribution, Time Series, Sequence Models

TL;DR: RBRL trims the upper tail of activation×gradient attribution to diffuse peak reliance in sequence models, boosting robustness to targeted occlusions and noise without degrading accuracy or inference cost.

Abstract: Many sequence models achieve strong average performance yet exhibit **concentrated internal dependencies**: removing just a few "critical units/time positions" causes disproportionate degradation. We propose **RBRL (Risk-Balanced Representation Learning)**, which applies financial risk allocation principles to neural network training by constraining **attribution concentration** through adaptive risk budgets. RBRL uses a stable attribution signal (AEC: activation × gradient with EMA normalization) and imposes upper-tail constraints via quantile budgets and soft-Top-K penalties, enabling "peak-shaving" without compromising main objectives through dual-only training that preserves backbone gradients. Across S\&P 500 and ETT datasets, RBRL **improves robustness under a tunable computational overhead while maintaining baseline-level accuracy on S\&P 500**; on ETT, RMSE changes show mixed results across subsets; on S\&P 500, differences are small but not statistically significant (RMSE *p* = 0.216; MAE *p* = 0.201; directional accuracy unchanged). Our comprehensive evaluation across 68 configurations demonstrates architecture-agnostic applicability to LSTM, iTransformer, and other sequence models. We position this as a **robust reliance training paradigm**: proactively dispersing dependencies during training rather than addressing brittleness post-hoc.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 3763

Loading