The Effect of Model Size on Worst-Group Generalization

Alan Le Pham; Eunice Chan; Vikranth Srivatsa; Dhruba Ghosh; Yaoqing Yang; Yaodong Yu; Ruiqi Zhong; Joseph E. Gonzalez; Jacob Steinhardt

The Effect of Model Size on Worst-Group Generalization

Alan Le Pham, Eunice Chan, Vikranth Srivatsa, Dhruba Ghosh, Yaoqing Yang, Yaodong Yu, Ruiqi Zhong, Joseph E. Gonzalez, Jacob Steinhardt

Published: 02 Dec 2021, Last Modified: 05 May 2023NeurIPS 2021 Workshop DistShift PosterReaders: Everyone

Abstract: Overparameterization is shown to hurt test accuracy on rare subgroups under the fixed reweighing regime. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown.

1 Reply

Loading