Abstract: Long-horizon multivariate forecasting is often brittle under regime changes, rare high-impact windows, and
error accumulation.
Standard training samples windows uniformly and optimizes mean loss, while existing curricula typically rank
windows by difficulty alone and robustness objectives (e.g., CVaR, IRM/REx, GroupDRO) act only after windows
have entered the optimization stream.
We propose \method{}, a \emph{model-agnostic} training wrapper that reallocates gradient budget by coupling
(i) self-paced window admission, (ii) shift-aware importance weights over context- or feature-defined
environments, and (iii) tail- and environment-robust outer objectives.
The wrapper leaves the forecasting backbone unchanged and adds no inference-time cost.
At the population level, we formalize the induced target as a trimmed, shift-corrected robust risk. We show
that the differentiable quantile gate is an $O(1/\gamma)$ approximation to its hard admitted-set counterpart,
quantify the bias introduced by label-adaptive difficulty signals via an explicit adaptive-gap term, and
derive a deterministic upper bound on worst-environment risk from the environment-variance penalty.
Empirically, on six long-horizon benchmarks (ETTh1/2, ETTm1/2, Weather, Electricity) and four backbones
(RLinear, DLinear, RMLP, iTransformer), \method{} lowers MSE in 82 of 96 backbone--dataset--horizon cells,
with 65 cells improving by more than 1\%, and yields positive average gains in every backbone--horizon
aggregate.
On a scoped robustness battery (ETTh1 with DLinear), \method{} reduces mean MSE by 5.1--9.0\% across temporal
shift levels and reduces worst-environment MSE by up to 30\% in the hardest stress setting.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Olivier_Cappé2
Submission Number: 7797
Loading