# TopTwo
This repository contains code and the real world dataset for our ICLR 2023 paper: "Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing".

## Datasets

We provide 6 publicly accessible datasets(**Adult2**, **Web**, **Dog**, **Flag**, **Food**, **Plot**) and **Color** dataset that we created. The datasets are contained in ./dataset folders. We provide crowd_data.txt and ground_truth.txt files for each dataset. Each line of the crowd_data.txt file consists of three numbers corresponding to (worker, task, answer). Each line of the ground_truth.txt file consists of two numbers corresponding to (task, ground_truth). In the **Color** dataset, we also provide the most confusing answer in the most_confusing_answer.txt.

## How to run the code
We provide three matlab codes in this repository : **RealExperiment.m**, **SyntheticExperiment.m**, and **DrawDistribution.m**.

For the experiment on the real world dataset, you can change the variable "dataset" at the top of **RealExperiment.m** to obtain the prediction error of each dataset. You can also get the distribution of the real world dataset using **DrawDistribution.m**.

For the synthetic experiment,  You can change the variables in **SyntheticExperiment.m** file to obtain the prediction error curve of our algorithms and the baseline methods in the various scenarios. Implementation of the baselines can be found at https://github.com/maqqbu/MMSR. 

## CIFAR10H

We provide simple python codes for evaluate the neural network training using hard/top2/full label. 

### Prerequisites
- Python 3.6
- PyTorch 1.12.1
- CUDA 11.6

### Training Examples
- training ResNet with hard label:
```
python main.py --lr 0.1 --type full --model resnet
```
- training vgg with top2 label:
```
python main.py --lr 0.1 --type top2 --model vgg
```

