Extending the WILDS Benchmark for Unsupervised Adaptation

Shiori Sagawa; Pang Wei Koh; Tony Lee; Irena Gao; Sang Michael Xie; Kendrick Shen; Ananya Kumar; Weihua Hu; Michihiro Yasunaga; Henrik Marklund; Sara Beery; Etienne David; Ian Stavness; Wei Guo; Jure Leskovec; Kate Saenko; Tatsunori Hashimoto; Sergey Levine; Chelsea Finn; Percy Liang

Extending the WILDS Benchmark for Unsupervised Adaptation

Published: 02 Dec 2021, Last Modified: 04 May 2025NeurIPS 2021 Workshop DistShift PosterReaders: Everyone

Keywords: distribution shifts, adaptation, unlabeled data

TL;DR: We introduce U-WILDS, which augments the WILDS distribution shift benchmark with realistic unlabeled data, and benchmark existing methods for unlabeled data on these in-the-wild distribution shifts.

Abstract: Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original WILDS benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/extending-the-wilds-benchmark-for/code)

1 Reply

Loading

Extending the WILDS Benchmark for Unsupervised Adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang