Explanation Shift: How Did the Distribution Shift Impact the Model?

Published: 30 Jan 2025, Last Modified: 30 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The performance of machine learning models on new data is critical for their success in real-world applications. Current methods to detect shifts in the input or output data distributions have limitations in identifying model behaviour changes when no labelled data is available. In this paper, we define \emph{explanation shift} as the statistical comparison between how predictions from training data are explained and how predictions on new data are explained. We propose explanation shift as a key indicator to investigate the interaction between distribution shifts and learned models. We introduce an Explanation Shift Detector that operates on the explanation distributions, providing more sensitive and explainable changes in interactions between distribution shifts and learned models. We compare explanation shifts with other methods that are based on distribution shifts, showing that monitoring for explanation shifts results in more sensitive indicators for varying model behavior. We provide theoretical and experimental evidence and demonstrate the effectiveness of our approach on synthetic and real data. Additionally, we release an open-source Python package, \texttt{skshift}, which implements our method and provides usage tutorials for further reproducibility.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: ## Changes In the Appendix, we have added a similar experiment to the one Figure 5, but varying the depth(complexity) of the Explanation Shift Detector. We have also done a full grammar check on the paper. We have also implemented the following requests: >This claim is not supported with clear evidence in this paper. I suggest toning down the sentence: as such domain assumptions may be available in the domain of images but are rarely or never available for tabular data. [Check] > Please define dom() [Check] > In Definition 2.1,D should be D_X ? [Check] >The following part seems overly complicated if we use empirical distributions [Check] > In Definition 2.3, the first has a different font than other ones [Check] > Please define all the acronyms (OOD, KS, etc.). [Check] >In and above Eq. (1), val should be val_f,x? [Check] > I'm not sure if this information is already somewhere in the paper, but if not, it would be nice to add some more details on the task and the dataset used in Section 5.4. [Check] > In Figure 3, are the legend for "Input, f = XGB" and "Input, f = Log" correct? Their lines and the circles seem to be showing the same plot Yes, its the same plot. Distribution shifts on input data are independent of the estimator $f_\theta$ used. So the results are identical. > Typos and Formatting Errors from reviewer 'Crub' [Check] > Gradient boosting decision tree / gradient-boosting decision tree ⇒ Should be consistently referred to as gradient-boosted decision tree. [Check] > Avoid Redundancy: Repeating information multiple times should be avoided [Check] >Gradient boosting decision tree / gradient-boosting decision tree ⇒ Should be consistently referred to as gradient-boosted decision tree. [check] >Avoid Redundancy: Repeating information multiple times should be avoided. For instance, the table caption text like “Displayed results are the one-tailed p-values of the Kolmogorov-Smirnov test comparison between two underlying distributions” or “Novel Covariate Group Shift for the 'Asian' group with a fraction ratio of 0.5 as described in Section 5” only needs to be mentioned once in the main text. [check] > Equations formatting from reviewer 'Crub' and Action Editor 'hxK6'. [check] > I don't think the following argument is very accurate. The optimal classification rule for dog images may change if the species evolves and they start to look different. Furthermore, for the second example, we could think about a task of predicting buying behaviour based on images, to say there will be a concept shift for image classification by the same argument. In any case, giving a few examples like these will not serve as a proof of the authors' statement about the relationships between variables only based on the type of data. It is fine to focus on tabular data, but please try to avoid making claims without solid evidence. [Check] # Changes 2.0 Updated definition 2.5 to $\frac{P(\D^{tr}_Y , \D^{tr}_X)}{P(\D^{tr}_X)}$ # Changes 2.1 In pg 8, updated to $\frac{P(\D^{new}_Y),\D^{new}_X)}{P(\D^{new}_X)}$ Updated to "for $j \in \{1, 2\}$" in Example 4.1 Removed"for $i \in \{1, 2\}$" in Example 4.3. Added missing end paragraph punctuations.
Assigned Action Editor: ~Ikko_Yamane1
Submission Number: 3065
Loading