# Relaxed Attention for Swin Transformer with Dense Relative Localization Loss

This code implements `relaxed self-attention` for the `Swin Transformer` as an extension to the [Efficient Training of Visual Transformers with Small Datasets (Drloc) Repository](https://github.com/yhlleo/VTs-Drloc), which itself is based on the official [Swin Transformer Repository](https://github.com/microsoft/Swin-Transformer).

- For setting up an environment, we refer to the official ["get started" instructions](https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md).
- To configure relaxed self-attention, to following parameters can be set: 

**activate relaxed self-attention** by setting the relaxation coefficient per Swin block: \
--relax "0.1" "0.1" --relax "0.1" "0.1"  "0.1" "0.1"  "0.1" "0.1"  --relax "0.1" "0.1"  --relax "0.1" "0.1" 


- For the experiments that use a pre-trained model, the trained model weights can be downloaded from the official repository.
- start an example training with pre-training: \
python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345  /main.py \\\
--cfg swin_tiny_patch4_window7_224.yaml \\\
--data-path data/cifar100 \\\
--batch-size 64 \\\
--output output/ \\\
--tag example1 \\\
--pretrained swin_tiny_patch4_window7_224.pth \\\
--opts TRAIN.ACCUMULATION_STEPS 2 TRAIN.EPOCHS 100 TRAIN.WARMUP_EPOCHS 20 TRAIN.BASE_LR 5e-4 DATA.DATASET 'cifar100' PRINT_FREQ '100'