An Analysis of Model Robustness across Concurrent Distribution Shifts

TMLR Paper3428 Authors

03 Oct 2024 (modified: 23 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine learning models, meticulously optimized for source data, often fail to predict target data when faced with distribution shifts (DSs). Previous benchmarking studies, though extensive, have mainly focused on simple DSs. Recognizing that DSs often occur in more complex forms in real-world scenarios, we broaden our study to include multiple concurrent shifts, such as unseen domain shifts combined with spurious correlations. We evaluate 26 algorithms that range from simple heuristic augmentations to zero-shot inference using foundation models, across 168 source-target pairs from eight datasets. Our analysis of over 56K models reveals that (i) concurrent DSs typically worsen performance compared to a single shift, with certain exceptions, (ii) if a model improves generalization for one distribution shift, it tends to be effective for others, (iii) heuristic data augmentations achieve the best overall performance on both synthetic and real-world datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=vZYLN1TJaU&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Updated in response to reviewers.
Assigned Action Editor: ~Eleni_Triantafillou1
Submission Number: 3428
Loading