# Knowledge Distillation for Vision Foundation Models

Implementation of "SiNGER: A Clearer Voice Distills Vision Transformers Further".

## Installation

Environments:

- Python 3.11.11
- PyTorch 2.6.0
- torchvision 0.21.0

Install the package:

```
conda env create -f environment.yaml
conda activate distill
```

## Usage

0. Wandb as the logger

- The registeration: <https://wandb.ai/home>.
- If you don't want wandb as your logger, set `CFG.LOG.WANDB` as `False` at `mdistiller/engine/cfg.py`.

1. DDP setup

    ```bash
    # for instance, FitNet method.
    # get the number of active devices and set the number of OpenMP threads.
    export CUDA_DEVICE_COUNT=$(python -c "import torch; print(torch.cuda.device_count())")
    export OMP_NUM_THREADS=4
    ```

2. Training on ImageNet

- Download the dataset at <https://image-net.org/> and put them to `./data/imagenet`

  ```bash
  torchrun --nproc-per-node=$CUDA_DEVICE_COUNT tools/train.py --cfg configs/imagenet/vit/amd-sner-v3.yaml ./configs/imagenet/optim/adamw.yaml
  ```

- Config examples for experiment are in `configs/imagenet/vit`. If you want to custom config, modify  `mdistiller/engines/cfg.py` and `mdisitller/disitllers/....py`

  ```yaml
  AMD:
    M_LAYERS: [17] # for distillation layer
    ALIGN_TYPE: 'mse'
    INPUT_SIZE: [224, 224] 
    LOSS:
      FEAT_WEIGHT: 1.0
    SNER: # SiNGER params. 
      RANK: 64
      OUTLIER_Q: 0.97
      METHOD: 'sner'
  ```
