The code base is built on the r

This repo was tested with Ubuntu 16.04.5 LTS, Python 3.5, PyTorch 0.4.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=0.4.0

## Running

1. Fetch the pretrained teacher models released by the previous "Contrastive Representation Distillation"（https://arxiv.org/abs/1910.10699） paper：

    ```
    sh scripts/fetch_pretrained_teachers.sh
    ```
   which will download and save the models to `save/models`
   
2. To run the experiment with the prokd:
an example on the  ResNet50 and MobileNetv2 is:

    ```
    python pkd.py --path_t ./save/models/ResNet50_vanilla/ckpt_epoch_240.pth --distill mirror --model_s MobileNetV2  -p 0.6 -c teacher --trial 1
    ```
    the flags are explained as:
    - `--path_t`: specify the path of the teacher model
    - `--model_s`: specify the student model, see 'models/\_\_init\_\_.py' to check the available model types.
    - `--distill`: specify the distillation metho `mirror` for pkd.
    - `-p`: the balanced weight between the cross entropy loss and kl constraint for training the teacher model.
    - `-c`: `teacher` indicates that the teacher is trained towards a converged teacher model while `label` indicates the teacher is trained towards the ground-truth label. 
    - `--trial`: specify the experimental id to differentiate between multiple runs.

3. To run the experiment with the crd+prokd an example on the  ResNet50 and MobileNetv2 is:

    ```
    python new_crd_pkd.py --path_t ./save/models/ResNet50_vanilla/ckpt_epoch_240.pth --distill mirror --model_s MobileNetV2  -p 0.6 -c teacher -b 0.8 --trial 1
    ```
    the flags are explained as:

   - `--path_t`: specify the path of the teacher model
   - `--model_s`: specify the student model, see 'models/\_\_init\_\_.py' to check the available model types.
   - `--distill`: specify the distillation method `mirror` for pkd_crd.
   - `-p`: the balanced weight between the cross entropy loss and kl constraint for training the teacher model.
   - `-b`: the balanced weight between the crd loss and kl loss for student model
   - `-c`: `teacher` indicates that the teacher is trained towards a converged teacher model while `label` indicates the teacher is trained towards the ground-truth label. 
   - `--trial`: specify the experimental id to differentiate between multiple runs.

   

4. (optional) Train teacher networks from scratch. Example commands are in `scripts/run_cifar_vanilla.sh`

