Diagnostic Tool for Out-of-Sample Model Evaluation
Abstract: Assessment of model fitness is a key part of machine learning. The standard paradigm of model evaluation is analysis of the average loss over future data. This is often explicit in model fitting, where we select models that minimize the average loss over training data as a surrogate, but comes with limited theoretical guarantees. In this paper, we consider the problem of characterizing a batch of out-of-sample losses of a model using a calibration data set. We provide finite-sample limits on the out-of-sample losses that are statistically valid under quite general conditions and propose a diagonistic tool that is simple to compute and interpret. Several numerical experiments are presented to show how the proposed method quantifies the impact of distribution shifts, aids the analysis of regression, and enables model selection as well as hyperparameter tuning.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=VIBw3IoWDk
Changes Since Last Submission: The proof of Theorem 1 has been made more explicit to improve readability. The introduction of Section 4. Experiments, have been provided a passage about the scale of the experiments and their limitations.
Assigned Action Editor: ~Alain_Durmus1
Submission Number: 1259