TRACE: Theoretical Risk Attribution under Covariate-shift Effects

TRACE: Theoretical Risk Attribution under Covariate-shift Effects

ICLR 2026 Conference Submission20588 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Distribution Shift, Risk Attribution, Optimal Transport, Covariate Shift

TL;DR: We introduce TRACE, a framework to attribute the risk of model updates to data shift and model instability, providing both a practical diagnostic and a powerful automated deployment gate score.

Abstract: When a source-trained model $Q$ is replaced by a model $\tilde{Q}$ trained on shifted data, its performance on the source domain can change unpredictably. To address this, we study the two-model risk change, $\Delta R := R_P(Q) - R_P(\tilde{Q})$, under covariate shift. We introduce TRACE (Theoretical Risk Attribution under Covariate-shift Effects), a framework that decomposes $|\Delta R|$ into an interpretable upper bound. This decomposition disentangles the risk change into four actionable factors: two generalization gaps, a model change penalty, and a covariate shift penalty, transforming the bound into a powerful diagnostic tool for understanding why performance has changed. To make TRACE a fully computable diagnostic, we instantiate each term. The covariate shift penalty is estimated via a model sensitivity factor (from high-quantile input gradients) and a data-shift measure; we use feature-space Optimal Transport (OT) by default and provide a robust alternative using Maximum Mean Discrepancy (MMD). The model change penalty is controlled by the average output distance between the two models on the target sample. Generalization gaps are estimated on held-out data. We validate our framework in an idealized linear regression setting, showing the TRACE bound correctly captures the scaling of the true risk difference with the magnitude of the shift. Across synthetic and vision benchmarks, TRACE diagnostics are valid and maintain a strong monotonic relationship with the true performance degradation. Crucially, we derive a deployment gate score from the model change and covariate shift terms that strongly correlates with $|\Delta R|$ and achieves exceptionally high AUROC/AUPRC for gating decisions, enabling safe, label-efficient model replacement.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 20588

Loading