The Silent Helper: How Implicit Regularization Enhances Group Robustness

Nahal Mirzaie; Mahdi Ghaznavi; Hosna Oyarhoseini; Alireza Alipanah; Erfan Sobhaei; Ali Abbasi; Amirmahdi Farzane; Hossein Jafarinia; Parsa Sharifi Sedeh; Arefe Boushehrian; Mahdieh Soleymani Baghshah; Mohammad Hossein Rohban

The Silent Helper: How Implicit Regularization Enhances Group Robustness

Nahal Mirzaie, Mahdi Ghaznavi, Hosna Oyarhoseini, Alireza Alipanah, Erfan Sobhaei, Ali Abbasi, Amirmahdi Farzane, Hossein Jafarinia, Parsa Sharifi Sedeh, Arefe Boushehrian, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

Published: 09 Jun 2025, Last Modified: 12 Jul 2025HiLD at ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Implicit Regularization, Stochastic Gradient Descent, Group Robustness

TL;DR: We investigate the effect of the implicit regularization of stochastic gradient descent on achieving robustness to group shifts and improving feature learning.

Abstract: The implicit regularization effect of Stochastic Gradient Descent (SGD) is known to enhance the generalization of deep neural networks and becomes stronger with higher learning rates and smaller batch sizes. However, its role in improving group robustness, defined as a model’s ability to perform well on underrepresented subpopulations, remains underexplored. In this work, we study the impact of SGD's implicit regularization under group imbalance characterized by spurious correlations. Through extensive experiments on various datasets, we show that increasing the strength of implicit regularization improves worst-group accuracy (WGA). Crucially, this improvement is not merely a byproduct of better overall generalization, but a targeted enhancement in robustness to spurious features. Moreover, our analysis reveals that this phenomenon also contributes to improved feature learning in deep networks. These findings offer a new perspective on the role of SGD's implicit regularization, showing that it not only supports generalization but also plays a central role in achieving robustness to spurious correlations and generalizing under spurious correlations.

Student Paper: Yes

Submission Number: 110

Loading