Understanding Subpopulation Shifts through a Unified Lens of Separability

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Subpopulation shift, distribution shift, spurious correlation
Abstract: Subpopulation shifts have been a major challenge for deploying machine learning algorithms. The shift in subgroup proportions between training and test data always leads to a significant performance drop or suboptimal performance in certain groups, therefore limiting the broader or more reliable usage of machine learning methods. We present a unified theoretical framework to characterize a broad range of subpopulation shifts, including but not limited to well-studied shifts such as spurious correlation, under-representation, and class imbalance. Within this framework, we derive the performance of the Bayesian optimal classifier fitted on skewed data. The evaluation of thorough subpopulation shifts provides a quantitative tool to guide dataset collection. Our analysis further highlights the critical role of the feature separability assumption in our modeling, which explains the effectiveness of recent shift-mitigation methods and enabled principled comparison of encoders. Overall, this framework offers a unified perspective on evaluating subpopulation shifts and provides practical guidance on future work in both data collection and training strategies.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 12139
Loading