# Soft-masking of Parameter-level Gradient flow (SPG)

## Datasets Preparation

We use 5 datasets in the paper. To reproduce the results, those datasets need to be prepared.

### CIFAR100-based (C-10 and C-20)

You do not need do anything for these datasets as they will be downloaded automatically in the code via the `torchvision` package.

### TinyImageNet-based (T-10 and T-20)

You can download the datasets from [the official site](https://image-net.org/).

1. Download the Tiny ImageNet file.
2. Extract the file, and place them as follows.

<pre>
data/tiny-imagenet-200/
|- train/
|  |- n01443537/
|  |- n01629819/
|  +- ...
+- val/
   |- val_annotations.txt
   +- images/
      |- val_0.JPEG
      |- val_1.JPEG
      +- ...
</pre>

3. Run `prep_tinyimagenet.py` to reorganise files so that `torchvision.datasets.ImageFolder` can read them.
4. Make sure you see the structure as follows.

<pre>
data/tiny-imagenet-200/
|- test/
|  |- n01443537/
|  |- n01629819/
|  +- ...
|- train/
|  |- n01443537/
|  |- n01629819/
|  +- ...
+- val/
   +- # These files are not used any more. 
</pre>

### ImageNet-based (I-100)

You can download the datasets from [the official site](https://image-net.org/).

1. Download the downsampled image data (32x32).
2. Extract the files, and place the extracted files under `./data/imagenet/`.
3. Make sure you see the structure as follows.

<pre>
data/imagenet/
|- test/
|  +- val_data
+- train/
   |- train_data_batch_1
   |- ...
   +- train_data_batch_10
</pre>

### Federated CelebA-based (FC-10 and FC-20)

1. Follow [the instruction](https://github.com/TalwalkarLab/leaf/tree/master/data/celeba) to create data.
2. Place the raw images under `data/fceleba/raw/img_align_celeba/`.
3. Make sure you see the structure as follows.

<pre>
data/fceleba/
|- iid/
|  |- test/
|  |  +- all_data_iid_01_0_keep_5_test_9.json
|  +- train/ 
|     +- all_data_iid_01_0_keep_5_train_9.json
+- raw/
   +- img_align_celeba/
      |- 000001.jpg
      |- 000002.jpg
      +- ...
</pre>

### Federated EMNIST-based (FE-10 and FE-20)

1. Follow [the instruction](https://github.com/TalwalkarLab/leaf/tree/master/data/femnist) to create data.
2. Place the raw images under `data/femnist/raw/train/` and `data/femnist/raw/test/`.
3. Make sure you see the structure as follows.

<pre>
data/femnist/
+- raw/
   |- test
   |  |- all_data_0_iid_01_0_keep_0_test_9.json
   |  |- ...
   |  +- all_data_34_iid_01_0_keep_0_test_9.json
   +- train
      |- all_data_0_iid_01_0_keep_0_train_9.json
      |- ...
      +- all_data_34_iid_01_0_keep_0_train_9.json
</pre>

## Experiments

Experiments can be executed by

```shell
python3 main.py appr=<appr> seq=<seq>
```

with specifying each option, `<appr>` for approach, `<seq>` for dataset.

For `<appr>`, you can select one from the following approachs.

- `mtl` for MTL (Multi-Task Learning)
- `one` for ONE (One Task Learning)
- `ncl` for NCL (Naive Continual Learning)
- `agem` for A-GEM from [Efficient Lifelong Learning with A-GEM](https://arxiv.org/abs/1812.00420)
- `pgn` for PGN from [Progressive Neural Networks](https://arxiv.org/abs/1606.04671)
- `pathnet` for PathNet from [PathNet: Evolution Channels Gradient Descent in Super Neural Networks](https://arxiv.org/abs/1701.08734)
- `hat` for HAT from [Overcoming catastrophic forgetting with hard attention to the task](https://arxiv.org/abs/1801.01423)
- `cat` for CAT from [Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks](https://arxiv.org/abs/2112.10017)
- `supsup` for SupSup from [Supermasks in Superposition](https://arxiv.org/abs/2006.14769)
- `ucl` for UCL from [Uncertainty-based Continual Learning with Adaptive Regularization](https://arxiv.org/abs/1905.11614)
- `si` for SI from [Continual Learning Through Synaptic Intelligence](https://arxiv.org/abs/1703.04200)
- `tag` for TAG from [TAG: Task-based Accumulated Gradients for Lifelong learning](https://arxiv.org/abs/2105.05155)
- `ewc` for EWC from [Overcoming catastrophic forgetting in neural networks](https://arxiv.org/abs/1612.00796)
- `ewcgi` for EWC-GI (regularization using gradient-based importance)
- `spgfi` for SPG-FI (soft-masking using the Fisher information-based importance)
- `spg` for proposed SPG from *S*oft-masking of *P*arameter-level *G*radient Flow

For `<seq>`, you can select one from the following datasets.

- `cifar100_10` for C-10 (CIFAR100 with 10 tasks)
- `cifar100_20` for C-20
- `tinyimagenet_10` for T-10 (TinyImageNet with 10 tasks)
- `tinyimagenet_20` for T-20
- `imagenet_100` for I-100 (ImageNet with 100 tasks)
- `fceleba_10` for FC-10 (Federated CelebA with 10 tasks)
- `fceleba_20` for FC-20
- `femnist_10` for FE-10 (Federated EMNIST with 10 tasks)
- `femnist_20` for FE-20

## Other available options

Besides of `<appr>` and `<seq>`, the following options are available.

- `seed_pt` (choices: [`psearch`, `random`(default)])
    - `psearch` is used for hyper-parameter search.
    - `random` is used for final experiments over different seeds with the found best hyper-parameters.
- `n_trials` (default: `5` if `seed_pt=random`, 20 if `seed_pt=psearch`)
    - The number of trials.