# TINA: Tiny Adapters for Vision Transformers Finetuning

The code is structured into two folders:
 - Vanilla Adapters : for creating baselines of adapters with different sizes of parameters.
 - TINA: for applying neurons selection

Each folder has a README file describing how to generate the experiments. Enjoy ! 

### Abstract:
Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, as a powerful alternative to convolutional neural networks (CNNs) and pretrained ViT model are commonly adapted to new tasks via fine-tuning of it parameters. Recent works in NLP proposed a variety of parameter-efficient transfer learning methods such as adapters to avoid the prohibitive storage cost of fine-tuning. 

In this work, we start from the observation that adapters perform poorly when the dimension of adapters is small and we propose a training algorithm that address this isssue. Our approach starts from large adapters which can be trained easily and iteratively reduce the size of every adapter. We introduce a scoring function that can be used to compare neuron importance across layers and consequently allow automatic estimation of the hidden dimension of every adapters. Our method outperforms similar approaches in terms of the trade-off between accuracy and trained parameters across domain adaptation benchmarks. We release our code publicly to promote further applications of our approach.



### Environment:
Install the requirement packages from environment.yaml using Anaconda

### Data preparation

|Dataset|Download Link|
|:-----|:-----|
|[ImageNet](https://www.image-net.org/)|[train](http://www.image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar),[val](http://www.image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar)|
|[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)|[all](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)|
|[CIFAR-100](https://www.cs.toronto.edu/~kriz/cifar.html)|[all](https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz)|
|[SVHN](http://ufldl.stanford.edu/housenumbers/)|[train](http://ufldl.stanford.edu/housenumbers/train_32x32.mat),[test](http://ufldl.stanford.edu/housenumbers/test_32x32.mat), [extra](http://ufldl.stanford.edu/housenumbers/extra_32x32.mat)|
|[Oxford-Flower102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/)|[images](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz), [labels](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat), [splits](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/setid.mat)|
|[Clipart](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/groundtruth/clipart.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/clipart_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/clipart_test.txt)|
|[Infograph](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/infograph.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/infograph_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/infograph_test.txt)|
|[Painting](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/groundtruth/painting.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/painting_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/painting_test.txt)|
|[Quickdraw](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/quickdraw.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/quickdraw_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/quickdraw_test.txt)|
|[Real](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/real.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/real_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/real_test.txt)|
|[Sketch](http://ai.bu.edu/M3SDA/)|[images](http://csr.bu.edu/ftp/visda/2019/multi-source/sketch.zip), [train_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/sketch_train.txt), [test_list](http://csr.bu.edu/ftp/visda/2019/multi-source/domainnet/txt/sketch_test.txt)|

 - Download the datasets and pre-processe some of them (i.e., imagenet, [domainnet](http://ai.bu.edu/M3SDA/)) by using codes in the `scripts` folder.
 - The datasets are prepared with the following stucture (except CIFAR-10/100 and SVHN):

```
dataset_name
  |__train
  |    |__category1
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__category2
  |    |    |__xxx.jpg
  |    |    |__...
  |    |__...
  |__val
       |__category1
       |    |__xxx.jpg
       |    |__...
       |__category2
       |    |__xxx.jpg
       |    |__...
       |__...
```

```

### Related Work:

 - [Swin-Transformer](https://github.com/microsoft/Swin-Transformer)
 - [CvT](https://github.com/microsoft/CvT)
 - [T2T-ViT](https://github.com/yitu-opensource/T2T-ViT)
 - [ViT](https://github.com/lucidrains/vit-pytorch)


### Acknowledgments 

This code is highly based on the [VTs-Drloc](https://github.com/yhlleo/VTs-Drloc). Thanks to the contributors of this project.

This code is highly based on the [Swin-Transformer](https://github.com/microsoft/Swin-Transformer). Thanks to the contributors of this project.
