Keywords: robustness, worst-group error, distribution shift, accuracy estimation
TL;DR: Improve worst-group accuracy without any group annotations
Abstract: Despite having good average test accuracy, classification models can have poor performance on subpopulations that are not well represented in the training data. In this work, we introduce a criterion to estimate the accuracy on these populations. This allows us to design a procedure that achieves good worst-group performance and unlike previous procedures requires no group labels. We provide a sound empirical investigation of our procedure and show that it recovers the worst-group performance of methods that use oracle group annotations.