# Paper Title

This repository is the official implementation of "Understanding Knowledge Distillation" for the image classification tasks.


## Training
Experiments in our paper can be reproduced as the following .sh files.

1. To get a teacher model, firstly run this command:

```train
./run_vanilla.sh
```

You can get other baseline models by changing `--student` argument. Also, you can also get the baseline model on CIFAR10 by changing `--dataset` arguments to `cifar10`.


 - To get a distilled student model, run this command:
```
./run_vanilla_kd.sh
```
Here, you can handle the amount of distillation by changing both `--alpha` and `--temperature`. If you use `--temperature` larger than 100, then we replace the gradient of the logit vector with a gradient that converges when the `temperature` goes to infinity.  You can further train the model with MSE distillation if you add the argument `--logit=l2_logit`.

 - To handle the amount of training data, run this command:
```
./run_vanilla_kd_data.sh
```
In our environment, there are the following hyperparameters:
 - `cls_acq`, `sample_acq` : selection function for class-imbalanced and class-balanced sampling, respectively. Here, there are `random` and `tld` functions.
 - `cls_lower_qnt`, `cls_upper_qnt` : Each indicates the lower and upper value of quantile for constructing data on class-imbalacnced sampling.
 - `sample_lower_qnt`, `sample_lower_qnt` : Each indicates the lower and upper value of quantile for constructing data on class-balacnced sampling.

In this file, we provide both class-imbalanced and class-balanced sampled experiments by setting `cls_acq`, `sample_acq` to `random` and each lower and upper qnt value to `0.0` and `0.1`, respectively. Detailed explanations are described in `main.py`.

## Evaluation

To evaluate the model(s) and see the results via the entropy or TLD, please refer to the `grid_plot.ipynb` and `pdf_plot.ipynb`.


## Results
All our results can be reproduced by our code. All results have been described in our paper including Appendix. The results of our experiments are so numerous that it is difficult to post everything here. However, if you experiment several times by modifying the hyperparameter value in the .sh file, you will be able to reproduce all of our analyzes.


## Dependencies
Our code is tested under the following environment:
1. Python 3.7.3
2. PyTorch 1.1.0
3. torchvision 0.3.0
4. Numpy 1.16.2
5. tqdm