Abstract: A fundamental challenge in machine learning is training models that generalize well to distributions different from the training distribution. Empirical Risk Minimization (ERM), which is the predominant learning principle, is known to under-perform in minority sub-populations and fail to generalize well in unseen test domains. In this work, we propose a novel learning principle called Uniform Risk Minimization (URM) to alleviate these issues. We first show theoretically that uniform training data distributions and feature representations support robustness to unknown distribution shifts. Motivated by this result, we propose an empirical method that trains deep neural networks to learn a uniformly distributed feature representation in their final activation layer for improved robustness. Our experiments on multiple datasets for sub-population shifts and domain generalization show that URM improves the generalization of deep neural networks without requiring knowledge of groups or domains during training. URM is competitive with the best existing methods designed for these tasks and can also be easily combined with them for improved performance. Our work sheds light on the importance of the distribution of learned feature representations for downstream robustness and fairness.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Revision for rebuttal
Assigned Action Editor: ~Hanie_Sedghi1
Submission Number: 3204
Loading