Abstract: Understanding the performance of machine learning models across diverse data distributions is critically important for reliable applications. Recent empirically works find that there is a strong linear relationship between in-distribution (ID) and out-of-distribution (OOD) performance, but we show that this is not necessarily true if there are subpopulation shifts. In this paper, we empirically show that out-of-distribution performance often has nonlinear correlation with in-distribution performance under subpopulation shifts. To understand this phenomenon, we decompose the model's performance into performance on each subpopulation. We show that there is a "moon shape" correlation (parabolic uptrend curve) between the test performance on the majority subpopulation and the minority subpopulation. This nonlinear correlations hold across model architectures, training durations and hyperparameters, and the imbalance between subpopulations. Moreover, we show that the nonlinearity increases in the presence of spurious correlations in the training data. We provide complementary theoretical and experimental analyses for this interesting phenomenon of nonlinear performance correlation across subpopulations. Finally, we discuss the implications of our findings for ML reliability and fairness.