Keywords: distribution shift types, natural distribution shifts, tabular datasets
TL;DR: We build a tabular benchmark with specified distribution shift patterns, which highlights the importance of future research that builds an understanding of how distributions differ.
Abstract: Different distribution shifts require different algorithmic and operational
interventions. Methodological research must be grounded by the specific
shifts they address. Although nascent benchmarks provide a promising
empirical foundation, they \emph{implicitly} focus on covariate
shifts, and the validity of empirical findings depends on the type of shift,
e.g., previous observations on algorithmic performance can fail to be valid when
the $Y|X$ distribution changes. We conduct a thorough investigation of
natural shifts in 5 tabular datasets over 86,000 model configurations, and
find that $Y|X$-shifts are most prevalent. To encourage researchers to
develop a refined language for distribution shifts, we build
``WhyShift``, an empirical testbed of curated real-world shifts where
we characterize the type of shift we benchmark performance over. Since
$Y|X$-shifts are prevalent in tabular settings, we \emph{identify covariate
regions} that suffer the biggest $Y|X$-shifts and discuss implications for
algorithmic and data-based interventions. Our testbed highlights the
importance of future research that builds an understanding of why
distributions differ.
Supplementary Material: pdf
Submission Number: 633
Loading