Keywords: Shapley values, Time Series, Drift detection
TL;DR: Using Shapley values for distributional shift detection and visualization
Abstract: In streaming data, distributional shifts can appear both in the univariate dimensions and in the joint distributions with the labels. However, in many real-time scenarios, labels are often either missing or delayed; Unsupervised drift detection methods are desired in those applications. We design slidSHAPs, a novel representation method for unlabelled data streams. Commonly known in machine learning models, Shapley values offer a way to exploit correlation dependencies among random variables; We develop an unsuper- vised sliding Shapley value series for categorical time series representing the data stream in a newly defined latent space and track the feature correlation changes. Transforming the original time series to the slidSHAPs allows us to track how distributional shifts affect the correlations among the input variables; the approach is independent of any kind of labeling. We show how abrupt distributional shifts in the input variables are transformed into smoother changes in the slidSHAPs; Moreover, slidSHAP allows for intuitive visualization of the shifts when they are not observable in the original data.