# experiments.csv explained

First some bookkeeping stuff.

* `exp_id`: The unique id of the experiment. Used to define which experiment you run. 
* `run`: the number of times you want to run an experiment if it is launched through `condor_launcher.py`. When manually starting with `main_exp.py` this is ignored.
* `condor_job`: `0` if the job needs to be run by condor, Anything else (I use `1`) and `condor_launcher.py` will ignore this experiment. If condor schedules a job it will put the run_id in this column.
* `exp_name`: Name for an experiment. It doesn't matter much, shouldn't be unique. Used as legend in plots.

These define the data and tasks that are used. 

* `base_data`: The dataset of the first task.
* `num_classes_base`: the number of classes that is used in the first task. This should always be just one task.
* `new_data`: The dataset for the other tasks. Currently, this is always the same as the `base_data` and I don't think the code works if this is not the same. 
* `num_classes_new`: the number of classes in the new task(s). If more than one task, they need to be separated by a `,`. These values are used in `get_classes_per_task`, for instance when balancing the loss or automatically calculating batch sizes.
* `task_file`: the file inside the folder `task_sequences` where the order is defined. These are generated by `generate_task_seq.py`.
* `task`: This allows to train a specific task when there are multiple new tasks. This doesn't make sense in the continuous case, so it shouldn't be used there. When training from scratch with multiple tasks it allows to split up the jobs. The value specifies which task to train, see `run_exp`.
* `seed`: the name of the `task_file` is followed by a random seed. This in theory allows to have multiple orders of the same benchmark. Never used it, but it should work. 

These two define which model to use when training new data. `init_model` loads the weights, `init_method` can apply a 
method to improve plasticity. Set both to `scratch` for the baseline.

* `init_method`: this defines how a model should be initialized for new tasks after the old data is trained. Values is mostly used in `before_training`, but also once in `train_new_data` (only relevant if this value is `scratch`). It can have a couple values:
  * `scratch`: start from a random initialization (also use `scratch` it in `init_model` in this case)
  * `finetune`: don't do anything and continue training form the model that was trained on the old data
  * `shrink_perturb`: use shrink and perturb, values are defined in the `shrink` and `perturb` columns.
  * `interpolate`: this will interpolate between the random initialization of this model and the model train on the old data. It uses `shrink` as the alpha in the interpolation.
* `init_model`: this defines which model weights to use when starting to train on the new model. So `init_method` is applied to this `init_model`. Its values are either:
  * `scratch`: random initialization.
  * `base`: this will train a model on the old data.
  * any other `exp_id`: this will get the model that was trained on the old data of the experiment defined by `exp_id`, so it should refer to a `base` experiment in this case. Mostly used in `get_initialization`, but also in some other places.

Below is only used when training a base model, i.e. when `init_model` is equal to `base`.

* `init_optim`: the optimizer to use when training the old data. 
* `init_sched`: the scheduler to use when training the old data.
* `init_aug`: the augmentations to use when training the old data.
* `init_epochs`: the number of epochs to train on the old data. I've always used 50 here, maybe not the best possible choice.
* `init_regularizer`: the regularizer to use when training old data. Either `na`, `l2` or `l2_init`.  
* `init_reg_strength`: the strength of the regularizer. I only really tried `0.01` here.

Below is relevant for training on the new data.

* `batch_size_old`: how many of the samples in a batch that come from the old data.
* `batch_size_new`: how many come from the new data. The total batch size is defined by the sum of `batch_size_old` and this value, except for one exception. The values of either of these can also be `-1`. In this case, the batch size is fixed at `128` and the batch size is determined proportional to the amount of new and old data in `get_batch_sizes`, which is most importantly called in `run_exp`. This is useful when using multiple tasks, because then the optimal batch sizes change when more new data becomes available. Batch sizes should be the same when comparing experiment, because we compare compute cost based on the number of iterations.
* `shrink`: float value that determines the amount of shrinking (or interpolation). `0.0` would lead to zero weights, `1.0` would do no shrinking at all. If `init_method` is `interpolate`, `0.0` would give a random model and `0.0` the original one.
* `perturb` the amount of random model to add to the shrunk one.
* `epochs`: deprecated, use `max_iters` instead.
* `max_iters`: the number of iterations to train on the new data. By default, I use `39100`, which is just a bit more than `100` epochs of CIFAR100.
* `aug`: the type of augmentations to use. See `get_transforms` for more details.
* `optim`: optimizer to use
* `sched`: scheduler to use. See `get_optim_and_scheduler` for more details on the optim and sched.
* `lr`: learning rate (both for old and new data!)
* `regularizer`: regularizer to use when training the new data. Either `l2` or `l2-init` (only for new_data)
* `reg_strength`: the amount of regularization loss to add. 
* `balance`: whether to balance the loss of old and new samples or not. If `True`, the loss is calculated as `loss_ratio_old_new` * `old_loss` + ($1-$`loss_ratio_old_new`) * `new_loss`. This is not so useful anymore, as we now always balance the number of samples in the batch anyway. See `train_new_data`. It's not strictly necessary, but better to set this to `False` when training from scratch.
* `loss_ratio_old_new`: see above for when this value is a float. Can also be `balanced` and then this value will be calculated to be equal to what you would expect in a random batch. See `reweigh_loss`, which gets called from `train_new_data` when `balance` is `True`.
* `easy_cutoff`: the percentage of easy samples that gets sampled with probability `cutoff_prob`. The scores are stored inside the `model` folder when a `base` model is trained. 
* `hard_cutoff`: idem as easy, but for hard samples. 
* `cutoff_prob`: the relative probability with which hard and easy samples are used compared to the other samples. `0.1` means it is 10 times less likely that an easy or hard sample is in a batch, `0.0` would not use them at all. 
* `notes`: not used in the experiments.