# Source code of SPL for Weakly-supervised Image Classification Experiments
Check paper for algorithm details.

**The code is attached along with a paper submission and it has been anonymized to satisfy conference policy**

## Environment
```bash
pip install virtualenv
virtualenv -p python3 --system-site-packages spl_env
. spl_env/bin/activate
sudo apt-get install libsnappy-dev
pip install -r requirements.txt # you may need sudo for running this command pip
```

## Download data
We prepare the Clothint1M dataset tfrecords and download using the following command
```bash
mkdir data
cd data
wget https://storage.googleapis.com/zz-sharing/SHARED/clothing1m.zip
unzip clothing1m.zip
```

## Training
### Step 1: Training teacher model on clean data

```bash
# use 1 GPU to train
MODEL_DIR=experiments/spl_train_teacher
CUDA_VISIBLE_DEVICES=0 python spl.py --gin_config=configs/clothing1m/supervised_clothing1m.gin \
--model_dir=${MODEL_DIR}
```

### Step 2: Assign sub-pseudo labels to each noisy data
The results are saved by creating new tfrecords. We use Apache beam to speedup the inference and tfrecord saving. 

This step finish the key of the proposed SPL method. See `spl_utls.label_map_assignment` for implementation details of different mapping function.

```bash
# use CPU
NOISE_DATA_PATH='data/clothing1m/tfrecord/train/noisy_train*'
OUTPUT_GRAPH=${MODEL_DIR}/checkpoint-50 # all data and metadata are saved to last checkpoint subfolders
DATA_WITH_INFERENCE_PATH=$OUTPUT_GRAPH
GIN=supervised_clothing1m.gin

mkdir $OUTPUT_GRAPH

CUDA_VISIBLE_DEVICES= python spl.py \
  --input_path=$NOISE_DATA_PATH \
  --model_path=$OUTPUT_GRAPH \
  --output_path=$DATA_WITH_INFERENCE_PATH \
  --model_dir=${MODEL_DIR} \
  --beam_inference=True \
  --gin_config=configs/clothing1m/supervised_clothing1m.gin \
  --num_worker=50 # equal to your CPU
```


### Step 3: Pretrain models using generated psudo labeled data
- Modify `Clothing1M.external_data` in configs/clothing1m/pretrain_clothing1m.gin pointing to generated tfrecords
- Run the following
  ```bash
  # use 8 GPUs to train
  MODEL_DIR=experiments/spl_pretraining
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python spl.py --gin_config=configs/clothing1m/pretrain_clothing1m.gin \
  --model_dir=${MODEL_DIR}
  ```


### Step 4: Finetune using the pre-training checkpoint
- Modify `hparams.finetune.ckpt` in configs/clothing1m/finetune_clothing1m.gin pointing to pretrained checkpoint
- Run the following
  ```bash
  # use 1 GPUs to train
  MODEL_DIR=experiments/spl_finetune
  CUDA_VISIBLE_DEVICES=0 python spl.py --gin_config=configs/clothing1m/finetune_clothing1m.gin --model_dir=${MODEL_DIR}
  ```


## Pretrained models
Finetune checkpoints are saved in 
```bash
mkdir pretrained_models
cd pretrained_models
wget https://storage.googleapis.com/zz-sharing/SHARED/pretrained_models.zip
unzip pretrained_models.zip
```

## Evaluation
With the pretrained models downloaded 
  ```bash
  # evaluation_clothing1m.gin poin
  MODEL_DIR=pretrained_models/spl_evaluation
  CUDA_VISIBLE_DEVICES=0 python spl.py --gin_config=configs/clothing1m/evaluation_clothing1m.gin --model_dir=${MODEL_DIR}
  ```

# Visualize tensorboard
```bash
# look at tag: test/accuracy 
tensorboard --logdir=./experiments/
```
