On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations

Mahdi Ghaznavi; Hesam Asadollahzadeh

On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations

Mahdi Ghaznavi, Hesam Asadollahzadeh

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic Gradient Descent, Spurious Correlation, Four-Points Data Model

TL;DR: We use the four-points model to study how smaller SGD batch sizes accelerate the learning of invariant features and affect reliance on spurious correlations.

Abstract: Theoretical progress has recently been made in understanding how machine learning models develop reliance on spurious correlations. While empirical findings highlight the influence of stochastic gradient descent (SGD) and its optimization hyperparameters on this behavior, a grounded theoretical explanation remains lacking. Existing theories provide limited justification and fail to account for these phenomena. In this work, we revisit the four-points framework, a widely used theoretical tool for analyzing spurious correlations, to investigate how batch size affects the learning speeds of invariant features in the presence of spurious correlations. Our results show that the framework can account for the faster acquisition of invariant features under small-batch regimes, offering a principled perspective on the role of SGD and its hyperparameters in shaping reliance on spurious correlations. This analysis contributes to a deeper theoretical understanding of the mechanisms underlying robustness and generalization in machine learning.

Submission Number: 171

Loading