# Teacher-Guided Student Self-Knowledge Distillation Using Diffusion Model
The overview of our proposed DSKD:
![picture](./Figure/main.jpg)
## Installation
### Requirements
Ubuntu 20.04 LTS

Python 3.9

CUDA 11.8
please install python packages:


## Experiment

### Support datasets

| Dataset | Train Size | Test Size | Class |
| -- | -- |-- |-- |
| CIFAR-100 | 50,000 | 10,000 | 100 |
| ImageNet | 1,281,167  | 100,000 | 1000 |

### Training teacher networks

```
CUDA_VISIBLE_DEVICES=0 bash dist_train_teacher.sh 1 29500 ./cifar.yaml cifar_resnet56 
    --experiment cifar100_resnet56
```

```
CUDA_VISIBLE_DEVICES=0 bash dist_train_teacher.sh 1 29500 ./cifar.yaml cifar_wrn_40_2 
    --experiment cifar100_wrn_40_2
```

### Training student networks
```
CUDA_VISIBLE_DEVICES=0 bash dist_train.sh 1 29512 dskd_cifar.yaml cifar_wrn_40_1 
    --teacher-model cifar_wrn_40_2 
    --experiment dskd_new 
    --teacher-ckpt "path to teacher_ckpt"
```
```
CUDA_VISIBLE_DEVICES=0,1,2,3 bash dist_train.sh 4 29512 ./dskd_b1.yaml resnet18 
    --teacher-model resnet34 
    --experiment dskd_new  
    --teacher-ckpt "path to teacher_ckpt"
```

## Visualization
### Visualize attention heatmaps
Visualizations of attention heatmaps generated by teacher as well as student trained by baseline, DiffKD and our DSKD on ImageNet.

![picture](./Figure/Heatmap.jpg)\
We conducted heatmap visualization experiments based on github.com/jacobgil/pytorch-grad-cam.