# MetaShift: A Dataset of Datasets for Evaluating Distribution Shifts and Training Conflicts 

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](./LICENSE)


**[New]** Check out our website https://MetaShift.readthedocs.io/ ,which provides detailed documentation and installation guidelines/  


This repo provides the scripts for generating the proposed MetaShift, and the PyTorch source code for the experiments of evaluating distribution shifts and training conflicts. 

## Abstract
*Understanding the performance of machine learning model across diverse data distributions is critically important for reliable applications. Motivated by this, there is a growing focus on curating benchmark datasets that capture distribution shifts. While valuable, the existing benchmarks are limited in that many of them only contain a small number of shifts and they lack systematic annotation about what is different across different shifts. We present MetaShift---a collection of 12,868 sets of natural images across 410 classes---to address this challenge. We leverage the natural heterogeneity of Visual Genome and its annotations to construct MetaShift. The key construction idea is to cluster images using its metadata, which provides context for each image (e.g. cats with cars or cats in bathroom) that represent distinct data distributions. MetaShift has two important benefits: first it contains orders of magnitude more natural data shifts than previously available. Second, it provides explicit explanations of what is unique about each of its data sets and a distance score that measures the amount of distribution shift between any two of its data sets. We demonstrate the utility of MetaShift in benchmarking several recent proposals for training models to be robust to data shifts. We find that the simple empirical risk minimization performs the best when shifts are moderate and no method had a systematic advantage for large shifts. We also show how MetaShift can help to visualize conflicts between data subsets during model training.*


## Code Structure
The `generate_dataset` folder provides the script for generating MetaShift. 
The `experiments` folder provides the expriments on MetaShift in the paper. Please refer to the `README.md` in the corresponding folder for more details. 


## Introducing MetaShift
What is MetaShift? The MetaShift is a collection of subsets of data together with an annotation graph that explains the similarity/distance between two subsets (edge weight) as well as what is unique about each subset (node metadata). For each class, say “cat”, we have many subsets of cats, and we can think of each subset as a node in the graph. Each subset corresponds to “cat” in a different context: e.g. “cat with sink” or “cat with fence”. The context of each subset is the node metadata. The “cat with sink” subset is more similar to “cat with faucet” subset because there are many images that contain both sink and faucet. This similarity is the weight of the node; higher weight means the contexts of the two nodes tend to co-occur in the same data. 

<p align='center'>
  <img width='100%' src='./docs/figures/MetaShift Examples.jpg'/>
<b>Figure 1: Example Cat vs. Dog Images from MetaShift. </b> For each class, MetaShift provides many subsets of data, each of which corresponds different contexts (the context is stated in parenthesis). 
</p>




How can we use MetaShift? It is a flexible framework to generate a large number of real-world distribution shifts that are well-annotated and controlled. For each class of interest, say ``cats'', we can use the meta-graph of cats to identify a collection of cats nodes for training (e.g. cats with bathroom related contexts) and a collection of cats nodes for out-of-domain evaluation (e.g. cats in outdoor contexts). Our meta-graph tells us exactly what is different between the train and test domains (e.g. bathroom vs. outdoor contexts), and it also specifies the similarity between the two contexts via graph distance. That makes it easy to carefully modulate the amount of distribution shift. For example, if we use cats-in-living-room as the test set, then this is an smaller distribution shift.  


<p align='center'>
  <img width='100%' src='./docs/figures/MetaShift InfoGraphic.jpg'/>
<b>Figure 2: Infographics of MetaShift. </b> 
MetaShift covers a wide range of 410 classes and 12,868 sets of natural images in total. 
For each class, we have 31.4 subsets on average together with an annotation graph (i.e., meta-graph) that explains the similarity/distance between two subsets (edge weight) as well as what is unique about each subset (node metadata). 
More concretely, the subsets are characterized by a diverse collection of 1,853 distinct contexts, which covers 1,702 object presence, 37 general contexts and 114 object attributes.  
</p>



### MetaGraph 
<p align='center'>
  <img width='100%' src='./docs/figures/Cat-MetaGraph.jpg'/>
<b>Figure 3: Meta-graph: visualizing the diverse data distributions within the “cat” class.  </b> 
MetaShift splits the data points of each class (e.g., Cat) into many subsets based on visual contexts. 
Each node in the meta-graph represents one subset. The weight of each edge is the overlap coefficient between the corresponding two subsets. Node colors indicate the graph-based community detection results. Inter-community edges are colored. Intra-community edges are grayed out for better visualization. The border color of each example image indicates its community in the meta-graph. We have one such meta-graph for each of the 410 classes in the MetaShift.
</p>


<p align='center'>
  <img width='100%' src='./docs/figures/Dog-MetaGraph.jpg'/>
<b>Figure 4: Meta-graph for the “Dog” class, which captures meaningful semantics of the multi-modal data distribution of “Dog”. </b> 
</p>



## Base Dataset: Visual Genome
We leverage the natural heterogeneity of [Visual Genome](https://visualgenome.org) and its annotations to construct MetaShift. Visual Genome contains over 100k images across 1,702 object classes. For each image, Visual Genome annotates the class labels of the objects that occur in each image, and we use them as the meta-data. We use the pre-processed and cleaned version of Visual Genome by [Hudson and Manning](https://arxiv.org/pdf/1902.09506.pdf). 

## Citation
If you use this library in your research, cite it as
follows *(Under Submission)*. :
```
@inproceedings{
  anonymous2022metashift,
  title={MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts},
  author={Anonymous},
  booktitle={Submitted to The Tenth International Conference on Learning Representations },
  year={2022},
  url={https://openreview.net/forum?id=MTex8qKavoS},
  note={under review}
}
```