Keywords: maximum mean discrepancy, data shift, covariate shift, missing value, robust model, representation learning
TL;DR: This paper develops novel approaches to train a more robust machine learning model when distribution shift and/or missingness shift exist between train and test data.
Abstract: Covariate shifts are a common problem in predictive modeling on real-world problems. This paper proposes addressing the covariate shift problem by minimizing Maximum Mean Discrepancy (MMD) statistics between the training and test sets in either feature input space, feature representation space, or both. We designed three techniques that we call MMD Representation, MMD Mask, and MMD Hybrid to deal with the scenarios where only a distribution shift exists, only a missingness shift exists, or both types of shift exist, respectively. We find that integrating an MMD loss component helps models use the best features for generalization and avoid dangerous extrapolation as much as possible for each test sample. Models treated with this MMD approach show better performance, calibration, and extrapolation on the test set. More details such as related work, additional experiences and a discussion of future work are in the appendix. And an extended version of this paper can be found at https://arxiv.org/abs/2111.10344.