
# GTA: Guided Transfer of Spatial Attention from Self-supervised Models

Pytorch Implementation of GTA: Guided Transfer of Spatial Attention from Self-supervised Models 

* For code anonymization, we removed all the author names, instiution names, licenses and the urls.  
* After the process finished, we will upload non-anonymized version.  

# Abstract  
Recently, self-supervised learning has enabled the pre-training of vision transformers (ViT) using vast amounts of unlabeled data to obtain rich representations. Using well-trained representations in transfer learning can lead to better performance and faster convergence compared to training from scratch. However, even if such good representations are transferred, a model can easily overfit the limited training dataset and lose the characteristics of the transferred representations. This phenomenon is more severe in ViT, which has low inductive bias. Through experimental analysis using attention maps in ViT, we observe that the rich representations deteriorate when trained on a small dataset. Motivated by this finding, we propose a novel and simple regularization method for ViT called guided transfer of spatial attention (GTA). Our proposed method regularizes the self-attention maps between source and target models. Through this explicit regularization, a target model can fully exploit the knowledge related to object localization properties. Our experimental results show that the proposed GTA consistently improves the accuracy across five benchmark datasets especially when the number of training data is small. As far as we know, there has been no previous study to improve transfer learning performance, specifically considering the ViT architecture.   

# Method  
![Method](./sub/method_figure.png)
 
# Installation  

```
conda create -n gta python=3.8
source activate gta
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
```

# Preparation
### Datasets  
We will upload the urls to download datasets later.   
You should download datasets under the "data" folder.   
| Dataset | links |
| --- | --- |
| CUB200 | |
| StanfordDogs | |
| StanfordCars | |
| FGVC Aircraft | |
| OxfordIIITPets | |

# How to run
1. Download SSL weights (example with iBOT)  
We will upload the urls to download weights later.  
```
# 1.1) Download the weights and rename them
    # Example iBOT small
wget {iBOT pretrained weights url}
mv checkpoint_teacher.pth ibot_small.pth
```

2. Make bash for Experiments  
We make bash file to run experiments for convenience.   
With this bash, you can run baseline experiments and Guide experiments with different lambda.  
If you want to test with other options, such as l2sp, attn_only, bss, you can add the parser --l2sp, --attn_only, --bss in the baseline.  
```
# 2) Make Bash  
    # bash _make_scripts.sh -d {dataset you want to train} -g {Multiple GPUs e.g 0,1} -p {master port for multi nodes}
    # example
bash _make_scripts.sh -d StanfordCars -g 0,1 -p 20010
```

3. Run    
```
# 3) Run Experiments
    # run bashes made from 2
    # example ibot_small_~.sh
bash ibot_small_StanfordCars_1222.sh
```

4. Check the results  
```
# 4) Check the results with mlflow
    # check localhost with port 5000
    # you can watch the results while training
mlflow server
```

5. Visualization   
```
# 5) Visualize the attention heads
    # When you completed training, there are saved_model with a specific path.
    # You can visualize the attention heads of ViT with the path
    # python visualize_head.py --path {the specific path from mlflow}  
python visualize_head.py --path {the specific path from mlflow}
```

# Reproduce
above the code example, you can reproduce the StanfordCars baseline performance 55.33% for valid top1 with 1222 seed.  
you can reproduce the results presented in our paper with this code.
