Theoretical  Characterization of Neural Network Generalization with Group Imbalance

Hongkang Li; Shuai Zhang; Meng Wang; Yihua Zhang; Pin-Yu Chen; Sijia Liu

Theoretical Characterization of Neural Network Generalization with Group Imbalance

Hongkang Li, Shuai Zhang, Meng Wang, Yihua Zhang, Pin-Yu Chen, Sijia Liu

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Group imbalance, Sample complexity, Generelization analysis, Gaussian mixture model, Empirical risk minimization

TL;DR: A theoretical characterization of generalization and sample complexity of training neural networks with group imbalance

Abstract: Group imbalance has been a known problem in empirical risk minimization (ERM), where the achieved high \textit{average} accuracy could be accompanied by low accuracy in a \textit{minority} group. Despite various algorithmic efforts to improve the minority group accuracy, a theoretical study of the generalization performance of ERM on individual groups remains elusive. By formulating the group imbalance problem with the Gaussian Mixture Model, this paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance. Although our theoretical framework is centered on binary classification using a one-hidden-layer neural network, to the best of our knowledge, we provide the first theoretical analysis of the group-level generalization of ERM in addition to the commonly studied average generalization performance. Sample insights of our theoretical results include that when all group-level co-variance is in the medium regime and all mean are close to zero, the learning performance is most desirable in the sense of a small sample complexity, a fast training rate, and a high average and group-level testing accuracy. Moreover, we show that increasing the fraction of the minority group in the training data does not necessarily improve the generalization performance of the minority group. Our theoretical results are validated on both synthetic and empirical datasets such as CelebA and CIFAR-10 in image classification.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)

Supplementary Material: zip

10 Replies

Loading