# Spatial Entropy Regularization for Vision Transformers
We provide the code used to run the experiments reported in the main paper and supplementary material.
Unless stated otherwise, all the experiments has been performed on 8 V100 GPUs.

Our spatial entropy loss is implemented in `attention_losses.py`.


### Training
Training can be performed with the following command (e.g. on ImageNet-1k)

```
python -u -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \ 
    --batch-size 128 \
    --epochs 300 \
    --data-set imagenet \
    --data-path <path_to_imagenet> \
    --no-repeated-aug \
    --use-blob \
    --use-nosn \
    --blob-weight 0.01 
```

To train on other dataset, change the `--dataset` and `--data-path` arguments.

### Evaluation
To evaluate a pretrained model, run:

```python main.py --eval --resume <path_of_pretrained_model> --data-path <path_to_imagenet>```

Set ```--use-nosn``` to remove last skip connection and layer normalization.



### Fine-tuning
To fine-tune a pretrained model, run:

```
python -u -m torch.distributed.launch --nproc_per_node=8 --use_env main.py \
    --batch-size 64 \
    --epochs 100 \
    --lr 5e-4 \
    --no-pin-mem  \
    --warmup-epochs 3 \
    --data-set cifar10 \
    --no-repeated-aug \
    --resume <path_of_pretrained_model>
```

Set ```--use-blob --blob-weight 0.01``` to use spatial entropy loss during fine-tuning along with cross-entropy loss.
Set ```--use-nosn``` to remove last skip connection and layer normalization.



### Segmentation on PASCAL-VOC
To evaluate the segmentation properties of the attention maps of a pretrained ViT on PASCAL-VOC 2012, run:

```python evaluate_segmentation.py --pretraining supervised --pretrained_weights <path_of_pretrained_model> --voc_path <path_pascal-voc_dataset>```


### Attention Maps Visualization
To visualize the attention map of the last transformer block of a pretrained ViT, run:

```python visualization.py --pretraining supervised --pretrained_weights <path_of_pretrained_model> --test-dir <path_of_test_images>```

### ImageNet-A and ImageNet-C
We refer the reader to the correspondent source code for evaluation on [ImageNet-A](https://github.com/hendrycks/natural-adv-examples) and [ImageNet-C](https://github.com/hendrycks/robustness).


## Acknowledgement
Our code borrows from [DEiT](https://github.com/facebookresearch/deit), [DINO](https://github.com/facebookresearch/dino),[timm](https://github.com/rwightman/pytorch-image-models). We thank the authors for making their code publicly available.
