Keywords: Implicit Regularization, Stochastic Gradient Descent, Group Robustness
TL;DR: We investigate the effect of the implicit regularization of stochastic gradient descent on achieving robustness to group shifts and improving feature learning.
Abstract: The implicit regularization effect of Stochastic Gradient Descent (SGD) is known to enhance the generalization of deep neural networks and becomes stronger with higher learning rates and smaller batch sizes. However, its role in improving group robustness, defined as a model’s ability to perform well on underrepresented subpopulations, remains underexplored. In this work, we study the impact of SGD's implicit regularization under group imbalance characterized by spurious correlations. Through extensive experiments on various datasets, we show that increasing the strength of implicit regularization improves worst-group accuracy (WGA). Crucially, this improvement is not merely a byproduct of better overall generalization, but a targeted enhancement in robustness to spurious features. Moreover, our analysis reveals that this phenomenon also contributes to improved feature learning in deep networks. These findings offer a new perspective on the role of SGD's implicit regularization, showing that it not only supports generalization but also plays a central role in achieving robustness to spurious correlations and generalizing under spurious correlations.
Student Paper: Yes
Submission Number: 110
Loading