On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations
Keywords: Stochastic Gradient Descent, Spurious Correlation, Four-Points Data Model
TL;DR: We use the four-points model to study how smaller SGD batch sizes accelerate the learning of invariant features and affect reliance on spurious correlations.
Abstract: Theoretical progress has recently been made in understanding how machine learning models develop reliance on spurious correlations. While empirical findings highlight the influence of stochastic gradient descent (SGD) and its optimization hyperparameters on this behavior, a grounded theoretical explanation remains lacking. Existing theories provide limited justification and fail to account for these phenomena. In this work, we revisit the four-points framework, a widely used theoretical tool for analyzing spurious correlations, to investigate how batch size affects the learning speeds of invariant features in the presence of spurious correlations. Our results show that the framework can account for the faster acquisition of invariant features under small-batch regimes, offering a principled perspective on the role of SGD and its hyperparameters in shaping reliance on spurious correlations. This analysis contributes to a deeper theoretical understanding of the mechanisms underlying robustness and generalization in machine learning.
Submission Number: 171
Loading