# Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization (DJIP)

This repository contains the official code for reproducing the experimental results presented in our paper, *Differentiable JPEG-based Input Perturbation for Knowledge Distillation Amplification via Conditional Mutual Information Maximization*, under review as a conference paper at ICLR 2026.

<p align="center">
<img src="./diagrams/framework.png" width="80%" height="80%" class="center">
</p>

# Installation

The code has been tested with Python 3.9.19 and CUDA 12.0/12.4. Please refer to the `environment.yaml` files for the list of dependencies for conda and pip. You can use `conda env create -f environment.yml` to create the conda environment. The code may also run with other versions of the listed packages, although compatibility is not guaranteed.

# Differentiable JPEG Codec

The implementation of the differentiable JPEG codec is provided in [JPEG_layer.py](./helper/JPEG_layer.py). Additionally, the centroid helper and the loss function used in the paper are located in [centroids.py](./helper/centroids.py) and [cmi.py](./helper/cmi.py), respectively.

# How to Run

Example bash scripts used in our experiments are available in the [bash folder](./bash). Model checkpoints and training logs are saved under `./save/` by default. You may modify output paths in the corresponding python files. We have provide the main experimental results on ImageNet under `./save/imagenet`

## Dataset and Model Preparation

Please prepare the ImageNet dataset and update the `--data_path` parameter in the bash scripts accordingly. Also, modify the following line in [imagenet.py](./mdistiller/dataset/imagenet.py) to point to your dataset directory:

```python
data_folder = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'your/path/imagenet/')
```

Pre-trained CIFAR-100 models used in the paper can be downloaded via [download_cifar100_model.sh](./bash/download_cifar100_model.sh). For ImageNet-1k pre-trained models, we utilize those provided by PyTorch.

## Training DJIP Teacher

To train DJIP teacher on ImageNet-1k, please use [teacher_imagenet.sh](./bash/teacher_imagenet.sh). 

For CIFAR-100 teacher training, use [teacher_cifar100.sh](./bash/teacher_cifar100.sh). 

The hyperparameters are set according to the configurations described in the original paper.

## Distillation

For ImageNet-1k distillation, please use [student_imagenet.sh](./bash/student_imagenet.sh). 

For CIFAR-100 distillation, please use the following bash scripts: [student_cifar100.sh](./bash/student_cifar100.sh), [student_cifar100_1.sh](./bash/student_cifar100_1.sh), [student_cifar100_2.sh](./bash/student_cifar100_2.sh), [student_cifar100_3.sh](./bash/student_cifar100_3.sh), and [student_cifar100_4.sh](./bash/student_cifar100_4.sh).

We provide implementations of several knowledge distillation methods for CIFAR-100, including:  
(1) Logit-based methods: KD, DKD, DIST, WTTM;  
(2) Relation-based methods: CC, RKD;  
(3) Feature-based methods: AT, FitNet, FT, SP, ITRM, CRD, and LSKD.

and for ImageNet-1k, we provide: KD, AT, DKD, LSKD, WSLD, ReviewKD.

<p align="center">
<img src="./diagrams/cifar100_same_results.png" width="80%" height="80%" class="center">
</p>

<p align="center">
<img src="./diagrams/cifar100_different_results.png" width="80%" height="80%" class="center">
</p>

<p align="center">
<img src="./diagrams/imagenet_results.png" width="80%" height="80%" class="center">
</p>
