MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution ShiftsDownload PDF

04 Jun 2022 (modified: 05 May 2023)Shift Happens 2022 ContributedTalkReaders: Everyone
Abstract: Understanding the performance of machine learning models across diverse data distributions is critically important for reliable applications. Motivated by this, there is a growing focus on curating benchmark datasets that capture distribution shifts. In this work, we present MetaShift—a collection of 12,868 sets of natural images across 410 classes—to address this challenge. We leverage the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. The key construction idea is to cluster images using its metadata, which provides context for each image (e.g. cats with cars or cats in bathroom) that represent distinct data distributions. MetaShift has two important benefits: first, it contains orders of magnitude more natural data shifts than previously available. Second, it provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets. Importantly, MetaShift can be readily used to evaluate any ImageNet pre-trained vision model, as we have matched MetaShift with ImageNet hierarchy. The matched version covers 867 out of 1,000 classes in ImageNet-1k. Each class in the ImageNet-matched MetaShift contains 2301.6 images on average, and 19.3 subsets capturing images in different contexts. We also propose methods to construct either binary or multiclass classification tasks, providing access to evaluate the model’s robustness across diverse distribution shifts.
Submission Type: Full submission (technical report + code/data)
Supplement: zip
Co Submission: No I am not submitting to the dataset and benchmark track and will complete my submission by June 3.
0 Replies

Loading