BREEDS: Benchmarks for Subpopulation Shift

Shibani Santurkar; Dimitris Tsipras; Aleksander Madry

BREEDS: Benchmarks for Subpopulation Shift

Shibani Santurkar, Dimitris Tsipras, Aleksander Madry

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: benchmarks, distribution shift, hierarchy, robustness

Abstract: We develop a methodology for assessing the robustness of models to subpopulation shift---specifically, their ability to generalize to novel data subpopulations that were not observed during training. Our approach leverages the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. This enables us to synthesize realistic distribution shifts whose sources can be precisely controlled and characterized, within existing large-scale datasets. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity. We then validate that the corresponding shifts are tractable by obtaining human baselines. Finally, we utilize these benchmarks to measure the sensitivity of standard model architectures as well as the effectiveness of existing train-time robustness interventions.

One-sentence Summary: We develop a methodology for constructing large-scale subpopulation shift benchmarks and use them to assess model robustness as well as the effectiveness existing robustness interventions.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Code: [![github](/images/github_icon.svg) MadryLab/robustness](https://github.com/MadryLab/robustness) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=mQPBmvyAuk)

Data: [ImageNet](https://paperswithcode.com/dataset/imagenet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/breeds-benchmarks-for-subpopulation-shift/code)

8 Replies

Loading