# Transforming Transformers for Resilient Lifelong Learning
## PyTorch Implementation

# Data Preparation
## VDD
Download the VDD data (```decathlon-1.0-devkit.tar.gz```, ```decathlon-1.0-data.tar.gz```, and ```decathlon-1.0-data-imagenet.tar```) from the instructions mentioned at [https://www.robots.ox.ac.uk/~vgg/decathlon/](https://www.robots.ox.ac.uk/~vgg/decathlon/), and place them in a directory of your choice. We will refer to this directory as ```<DATA_DIR>``` (eg ```~/data/vdd/```). Run the preparation script which will place the data in the format required for the data loaders:
```sh
./scripts/vdd/prepare.sh <DATA_DIR>
```

## 5-Datasets
- CIFAR10:
  Download [https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz](https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz)
  extract to ```<DATA_DIR>``` (e.g. ```~/data/5-datasets```)
- SVHN:
  Download the train data from [http://ufldl.stanford.edu/housenumbers/train_32x32.mat](http://ufldl.stanford.edu/housenumbers/train_32x32.mat) and the test data from [http://ufldl.stanford.edu/housenumbers/test_32x32.mat](http://ufldl.stanford.edu/housenumbers/test_32x32.mat)
  ```sh
  mkdir <DATA_DIR>/svhn
  mv train_32x32.mat <DATA_DIR>/svhn
  mv test_32x32.mat <DATA_DIR>/svhn
  ```
- MNIST
  Download [https://data.deepai.org/mnist.zip](https://data.deepai.org/mnist.zip) and unzip, and unzip all the gzip archives
  ```sh
  mv mnist <DATA_DIR>/
  ```
- not-MNIST
  Download the notMNIST small from [http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html)
  ```sh
  tar -xvzf notMNIST_small.tar.gz
  mv notMNIST_small <DATA_DIR>/not-mnist
  ```
  Corrupt files: A/RGVtb2NyYXRpY2FCb2xkT2xkc3R5bGUgQm9sZC50dGY=.png, F/Q3Jvc3NvdmVyIEJvbGRPYmxpcXVlLnR0Zg==.png

- Fashion MNIST
  Download the gzip archives from [https://github.com/zalandoresearch/fashion-mnist](https://github.com/zalandoresearch/fashion-mnist) and place them in ```<DATA_DIR>/fashion-mnist```
  ```sh
  cd <DATA_DIR>/fashion-mnist
  tar -xvf t10k-images-idx3-ubyte.gz
  tar -xvf t10k-labels-idx1-ubyte.gz
  tar -xvf train-images-idx3-ubyte.gz
  tar -xvf train-labels-idx1-ubyte.gz
  ```

> In all the following experiments, ```<ARTIHIPPO_COMPONENT>``` can take values from the following list:
> ```attn_proj```, ```value```, ```query```, ```key```, ```ffn```
To run the Exploration-Exploitation experiments, first calculate the mean class tokens for iamgenet12:
```sh
./scripts/vdd/evaluate.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE>"
```

# Training on the VDD dataset
All the experiments on the VDD Benchmark have been run with 3 seeds: ```42```, ```4242```, ```424242```
### Exploration-Exploitaiton
```sh
./scripts/vdd/run.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### Exploration
```sh
./scripts/vdd/run_exp.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### Exploration-Exploitation with Task Tokens
```sh
./scripts/vdd/run_w_prompt.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### Exploration-Exploitation in Task-to-Task setting
```sh
./scripts/vdd/run_t2t.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### S-Prompts
```sh
./scripts/vdd/run_prompts.sh <DATA_ROOT> vit_base_patch8_224 <METHOD> <CUDA_VISIBLE_DEVICE> <SEED> s-prompts
```

### Learn to Prompt
```sh
./scripts/vdd/run_prompts.sh <DATA_ROOT> vit_base_patch8_224 <METHOD> <CUDA_VISIBLE_DEVICE> <SEED> l2p
```

### Learn to Grow with DARTS
```sh
./scripts/vdd/run_darts.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### Learn to Grow with Beta-DARTS
```sh
./scripts/vdd/run_beta-darts.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> <EPOCHS>
```

### SupSup
```sh
./scripts/vdd/train_supsup.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED>"
```

### Efficient Feature Transformation (EFT)
```sh
./scripts/vdd/train_scale_shift.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> --scale-artihippo
```

### Lightweight Learner (LL)
```sh
./scripts/vdd/train_scale_shift.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEED> --shift-artihippo
```

### EWC
```sh
./scripts/vdd/train_ewc.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE>"
```

### L2 Parameter Regularization
```sh
./scripts/vdd/train_l2.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE>"
```

# Train on the 5-Datasets benchmark
All the experiments on the 5-Datasets have been run with seed ```42``` with 5 task orders. The task orders are as follows:
- fashion-mnist, svhn, cifar10, mnist, not-mnist
- cifar10, svhn, mnist, not-mnist, fashion-mnist
- mnist, cifar10, svhn, fashion-mnist, not-mnist
- svhn, fashion-mnist, mnist, not-mnist, cifar10
- not-mnist, cifar10, svhn, fashion-mnist, mnist

### Exploration
```sh
./scripts/5-datasets/run.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEQ> <IMNET_PATH>
```

### Learn to Grow-DARTS
```sh
./scripts/5-datasets/run_darts.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEQ> <IMNET_PATH>
```

### Learn to Grow-Beta-DARTS
```sh
./scripts/5-datasets/run_beta-darts.sh <DATA_ROOT> vit_base_patch8_224 <ARTIHIPPO_COMPONENT> <CUDA_VISIBLE_DEVICE> <SEQ> <IMNET_PATH>
```

### S-Prompts
```sh
./run.sh <DATA_ROOT> vit_base_patch8_224 s-prompts <CUDA_VISIBLE_DEVICE> <SEQ> <LEN> <IMNET_PATH>"
```

### Learn to Prompt
```sh
./run.sh <DATA_ROOT> vit_base_patch8_224 l2p <CUDA_VISIBLE_DEVICE> <SEQ> <LEN> <IMNET_PATH>"
```

For all the experiments on the 5-Datasets bechmark, ```<IMNET_PATH>``` is the path the ```<DATA_ROOT>``` of the VDD Benchmark.

# Training Logs
All the training logs for VDD and 5-Datasets benchmarks are available in ```artifacts/<BENCHMARK>/<EXPERIMENT>/logs```. The folder names are representative of the experiments. For example, ```iclr-42-ee-w-prompt-150``` means the Exploration-Exploitation method along with a prompt of length 1, with the supernet trained for 150 epochs with random seed 42.
