For Robust Worst-Group Accuracy, Ignore Group Annotations

Published: 12 Dec 2024, Last Modified: 12 Dec 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data. We show, both in theory and practice, that annotation-based data augmentations using either downsampling or upweighting for WGA are susceptible to domain annotation noise. The WGA gap is exacerbated in high-noise regimes for models trained with vanilla empirical risk minimization (ERM). To this end, we introduce Regularized Annotation of Domains (RAD) to train robust last layer classifiers without needing explicit domain annotations. Our results show that RAD is competitive with other recently proposed domain annotation-free techniques. Most importantly, RAD outperforms state-of-the-art annotation-reliant methods even with only 5\% noise in the training data for several publicly available datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Camera Ready - Added requested references - Added noisy embedding results - Added section contrastings M-SELF and RAD-UW - Added acknowledgements - Clarified ERM
Code: https://github.com/SankarLab/regularized-annotation-of-domains/tree/main
Supplementary Material: zip
Assigned Action Editor: ~Yu_Yao3
Submission Number: 2901
Loading