Keywords: optimal transport, Wasserstein distance, kernel methods, MMD
TL;DR: A kernel-based two-sample test to detect covariate shift that identifies the best landmark point to compare the witness function evaluations.
Abstract: Training an effective predictive model with empirical risk minimization requires a distribution of the input training data that matches the testing data. Covariate shift can occur when the testing cases are not class-balanced, but the training is. In order to detect when class imbalance is present in a test sample (without labels), we propose to use statistical divergence based on the Wasserstein distance and optimal transport. Recently, slicing techniques have been proposed that provide computational and statistical advantages for the Wasserstein distance for high-dimensional spaces. In this work we presented a computationally simple approach to perform generalized slicing of the kernel-based Wasserstein distance and apply it as a two-sample test. The proposed landmark-based slicing chooses a single point to be the sole support vector to represent the witness function. We run pseudo-real experiments using the MNIST dataset and compare our method with maximum mean discrepancy (MMD). We have shown that our proposed methods perform better than MMD on these synthetic simulations of covariate shift.