# Codes for the paper "Partially Relaxed Masks for Lightweight Knowledge Transfer without Forgetting in Continual Learning"

## Environment

The evaluation was conducted under the following environment.

* Python 3.9.2

* Several libraries (See `requirements.txt`)

  

## Directories

* `approacheas/`: The proposed and baseline methods are implemented.

* `conf/`: The configurations of experiments that are written in yaml format that will be interpreted by Hydra.

* `data/`: All dataset should be placed/downloaded here. See the "Task sequences" section below.

  

## Experiments

The following arguments are available. (`x`) means that `x` is the name of argument.

* Approach (`appr`; required)

* Task sequences (`seq`; required)

* Seed (`seed`; required)

* Ablation (`ablation`)

* Device (`device`)

  

### Approach (required)

The following methods are available. (`x`) means that `x` can be passed as argument of `appr`, like `appr=x`.

* STL (`stl`): Single Task Learning.

* NCL (`ncl`): Naive Continual Learning.

* ACL (`acl`): The implementation of [Adversarial Continual Learning](https://arxiv.org/abs/2003.09553).

* PathNet (`pathnet`): The implementation of [PathNet: Evolution Channels Gradient Descent in Super Neural Networks](https://arxiv.org/abs/1701.08734).

* SupSup (`supsup`): The implementatoin of [Supermasks in SuperPosition](https://mitchellnw.github.io/blog/2020/supsup/).

* HAT (`hat`): The implementation of [Overcoming Catastrophic Forgetting with Hard Attention to the Task](http://proceedings.mlr.press/v80/serra18a.html).

* EHAT (`hatewc`): The implementation of EWC, [Overcoming Catastrophic Forgetting in Neural Networks](https://arxiv.org/abs/1612.00796), in the HAT package.

* CAT (`cat`): The implementation of [Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks](https://proceedings.neurips.cc/paper/2020/file/d7488039246a405baf6a7cbc3613a56f-Paper.pdf).

* __PRM__ (`prm`): The proposed approach.

* PRM w/o 2SO (`prmwo2so`): Another proposed approach.

  

### Task Sequences (required)

The following dataset are available. (`x`) means that `x` can be passed as argument of `seq`, like `seq=x`.

* (#1) EMNIST-10T (`emnistsmall_10`)

  * You do not have to anything as the program will download the data.

* (#2) CIFAR100-10T (`cifar100_10`)

  * You do not have to anything as the program will download the data.

* (#3) F-EMNIST-10T (`femnsitsmall_10`)

  * Follow [the instruction](https://github.com/TalwalkarLab/leaf/tree/master/data/femnist) to place the raw images under `data/femnist/raw/train/` and `data/femnist/raw/test/`.

* (#4) F-CelebA-10T (`fceleba_10`)

  * Follow [the instruction](https://github.com/TalwalkarLab/leaf/tree/master/data/celeba) to place the raw images under `data/fceleba/raw/img_align_celeba/`.

* (#5) EMNIST-10T & F-EMNIST-10T (`emnistsmall_10__femnistsmall_10`)

  * See #1 and #3.

* (#6) CIFAR100-10T & F-CelebA-10T (`cifar100_10__fceleba_10`)

  * See #2 and #4.

    

### Seed (required)

This indicates the seed for datasets. Currently, only following options are available.

* `seed_1` (for hyper parameter search)
* `seed_2`
* `seed_3`

When `seed_1` is given, the program will search hyper parameters using Optuna. With other seeds, the approach will load hyper parameter values that are specified in `conf/appr/${appr}.yaml`, and run only one time without hyper parameter search.



### Ablation (optional)

Only for methods of "PRM" (`prm`) and "PRM w/o 2SO" (`prmwo2so`), the following ablation options are available. It will be ignored with other approaches.

(`x`) means that `x` can be passed as argument of `ablation`, like `ablation=allsimilar`.

* S (`allsimilar`): corresponds to "PRM(S)" in the paper.
* D (`alldissimilar`): corresponds to "PRM(D)".
* T (`typegiven`): corresponds to "PRM(T)".
* None (default): corresponds to pure "PRM".



### Device (optional)

If nothing is passed as the argument of "device", it will be replaced with "cuda:0" if CUDA is available, otherwise "cpu".

When you want to try use additional specific GPU, you can try it like `device=cuda:1`.



### Examples

If you try to run PRM on #6 sequences with seed_1 for hyper parameter search:

`$ python3 main.py appr=prm seq=cifar100_10__fceleba_10 seed=seed_1`

For ablation study, `ablation` option can be used like:

`$ python3 main.py appr=prm seq=emnistsmall_10 seed=seed_1 ablation=allsimilar`

For the baselines, you can try like:

`$ python3 main.py appr=cat seq=fceleba_10 seed=seed_1 device=cuda:1`

`$ python3 main.py appr=hatewc seq=cifar100_10 seed=seed_2` (in case of no hyper parameter search)