Keywords: concept shift, distribution shift, ridge regression, thermodynamic limit, high-dimensional learning, solvable model, out-of-distribution generalization
TL;DR: We study concept shift in ridge regression in the thermodynamic limit, and found a non-monotonic data dependence of test performance that generalizes to realistic machine learning tasks (MNIST classification and transformer ICL regression).
Abstract: Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift—a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, experiments on MNIST and FashionMNIST further validate our theoretical predictions, suggesting these phenomena represent a fundamental aspect of learning under distribution shift.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 25892
Loading