# Mammoth - An Extendible (General) Continual Learning Framework for Pytorch
We use the mammoth framwork to run all the experiments.

## Setup

+ Use `./utils/main.py` to run experiments.
+ Use `--noise` to choose which type of noise to inject ('sym', 'asym')
+ Use `--noise_rate` to choose percentage of noise to load (in `[0,1]`)
+ `--buffer_size` (for methods that use it). See supplementary material for the specific size for the dataset.

### Models

Different models can be loaded with `--model=<model_name>` with `<model_name>` being:
+ `ours`: Our proposal, with AER and ABS combined. Additional arguments (for ablations):
    - `--aer` (default `1`): enable or disable AER
    - `--abs` (default `1`): enable or disable ABS
+ `puridiver`: Supports both standard `PudiDivER` and our `PuriDivER.ME` (default `PudiDivER`). Additional arguments:
    - `--freeze_buffer_after_first` (default `0`): freeze buffer after first epoch. If enabled, model becomes `PuriDivER.ME`.
    - `--use_bn_classifier` (defult `1`): Add a final batch normalization layer after the classifier (as per the original code).
    - `--initial_alpha` (default `0.5`): Initial value of `alpha` to weight purity and diversity.
    - `--disable_train_aug` (default `0`): Disable data augmentation during train as in original `PudiDivER` (we found we get better results WITH data augmentation for the offline setting).
+ `dividermix`: Our custom `DividERMix` baseline (requires `--lnl_mode=dividemix`)
+ `er`: Base ER model based on Reservoir sampling. Combined with buffer fitting in our experiments.
+ `gdumb`: GDumB model. Combined with buffer fitting in our experiments.

NOTE: `spr` and `cnll` are missing as are not part of the main comparison but can be found in the respective original repositories.

### Buffer consolidation

Requires `--buffer_fitting_epochs` (set to `255` in our experiments) and `--buffer_fitting_lr`.  
Supports `--warmup_buffer_fitting_epochs` if the method requires it (performs initial fully-supervised epochs).  
To switch between different fitting techniques use `--lnl_mode`:
    + `<none>`: (default) fully-supervised fitting
    + `coteaching`
    + `dividemix`
    + `mixmatch`
    + `puridiver`

```
# e.g. to run our model on cifar10 sym noise 40% experiment,
python ./utils/main.py --model=ours --dataset=seq-cifar10 --buffer_size=500 --lr=0.03 --noise=sym --noise_rate=0.4 

# e.g. to run our model cifar100 asym noise 60% experiment with consolidation,
python ./utils/main.py --model=ours --dataset=seq-cifar100 --buffer_size=2000 --noise=asym --noise_rate=0.6  --lr=0.03 --buffer_fitting_epochs=255 --buffer_fitting_lr=0.1 --mixmatch_lambda_buffer_fitting=0.01
```

### Datasets

Can be selected with `--datast=<dataset_name>`:
+ `seq-cifar10`: Sequential CIFAR-10
+ `seq-cifar100`: Sequential CIFAR-100
+ `seq-webvision`: Sequential WebVision: Needs to be downloaded from [https://data.vision.ee.ethz.ch/cvl/webvision/download.html]
+ `seq-ntu60`: *not available, dataset is not freely distributed but is available for researchers* [https://rose1.ntu.edu.sg/dataset/actionRecognition/]

#### WebVision

From WebVision Dataset 1.0 - resized images (small version)
Label list is provided in seq_webvision.py

organize folder as:
data/
    miniWEBVISION/
        google/
            q0096/
                filename.jpg            
            q0172/
                ...
        info/
        val_images_256/
