# Description
This is the code repository for the paper "An Empirical Investigation of the Role of
Pre-training in Lifelong Learning". It contains all code necessary to replicate the 
experiments and figures discussed in the paper.
# Installation

### Requirements
Python 3.6, PyTorch 1.7.0, transformers 2.9.0


### Setting up a virtual environment

[Conda](https://conda.io/) can be used to set up a virtual environment
with Python 3.6 in which you can
sandbox dependencies required for our implementation:

1.  [Download and install Conda](https://conda.io/docs/download.html).

2.  Create a Conda environment with Python 3.6

    ```
    conda create -n lll python=3.6
    ```

3.  Activate the Conda environment.  (You will need to activate the Conda environment in each terminal in which you want to run our implementation).

    ```
    conda activate lll
    ```

### Setting up our environment

1. Visit http://pytorch.org/ and install the PyTorch 1.7.0 package for your system.

    ```
    conda install pytorch==1.7.0 cudatoolkit=11.0 -c pytorch
    ```

2. Install other requirements

   ```
   pip install -r requirements.txt
   ```

That's it! You're now ready to reproduce our results.

# Running Vision & NLP Experiments

## 0. Setting up datasets

1. First create the data directory:
    ```
    mkdir data
    ```
2. To download the data for <b>Split CIFAR-100</b> and <b>Split CIFAR-50</b> experiments, run:
    ```
    ./scripts/download_vision_data.sh cifar100
    ```
3. To download data for <b>5-dataset</b> experiments, run:
    ```
    ./scripts/download_vision_data.sh 5data
    ```
4. To download data for <b>Split YahooQA</b> experiments, run:
    ```
    ./scripts/download_splityahooqa_data.sh
    ```
5. To download data for <b>5-dataset-NLP</b> experiments, run:
    ```
    ./scripts/download_5dataset_nlp_data.sh
    ```

 
## 1. Running Lifelong Learning Experiments

#### A. Vision

To run the vision experiments and create the necessary model checkpoints for random initialization, run:
```
./scripts/run_vision.sh \ 
    {DATASET} \ 
    {METHOD} \
    ./data \
    ./output/{DATASET}/random/run_1 1
```
where `{DATASET}` is one of `"5data", "cifar50", "cifar100"`, and `{METHOD}` is 
one of `"sgd", "er", "ewc"`.

Similarly, to run and create the necessary model checkpoints for pre-trained initialization, run:
```
./scripts/run_vision.sh \
    {DATASET} \
    {METHOD} \
    ./data \
    ./output/{DATASET}/pt/run_1 1 pt 
```
where `{DATASET}` is one of `"5data", "cifar50", "cifar100"`, and `{METHOD}` is 
one of `"sgd", "er", "ewc"`.

The above run commands will create a folder called `output` with all of the relevant data for the 
run as well as the model checkpoints. In our experiments, we run this with 5 different random seeds. The data in `Table 1` for vision experiments is generated based on the `log.json` files in each run folder.

#### B. NLP

To run the text experiments and create the necessary model checkpoints for
random initialization, run:
```
./scripts/run_text.sh {CUDA_DEVICE} \
    ./data \
    ./output \
    "{TASK_SEQUENCE}" \
    "distilbert" \
    "distilbert-base-uncased" \
    {METHOD} \
    "no_pt" \
    ".cache/torch/transformers" \
    "--lll_mode"
```
where `{TASK_SEQUENCE}` is one of `"seq71, seq72, seq73, seq74, seq75"` (for `"splityahooqa"`), `"seq51, seq52, seq53, seq54, seq55"` (for `"5datasetnlp"`), `"seq41, seq42, seq43, seq44, seq45"` (for `"15datasetnlp"`), and `{METHOD}` is one of `"sgd", "er", "ewc"`.

Similarly, to run and create the necessary model checkpoints for pre-trained initialization, run:
```
./scripts/run_text.sh {CUDA_DEVICE} \
    ./data \
    ./output \
    "{TASK_SEQUENCES}" \
    "distilbert" \
    "distilbert-base-uncased" \
    {METHOD} \
    "pt" \
    ".cache/torch/transformers" \
    "--lll_mode"
```
where `{TASK_SEQUENCE}` is one of `"seq71, seq72, seq73, seq74, seq75"` (for `"splityahooqa"`), `"seq51, seq52, seq53, seq54, seq55"` (for `"5datasetnlp"`), `"seq41, seq42, seq43, seq44, seq45"` (for `"15datasetnlp"`), and `{METHOD}` is one of `"sgd", "er", "ewc"`.

This will create a sub-folder under `output/distilbert/{TASK_SEQUENCE}/` for each method, along with the relevant data for the run as well as the model checkpoints. In our experiments, we run this with 5 different task sequences as mentioned above. The data in `Table 1` for text experiments is generated based on the `eval_results/results.json` files in each run folder.


## 2. Running the analysis

### I) Loss Contour
For this analysis experiment, first we create the folders where all the results will be stored:

```
mkdir -p results/analysis/contour/{DATASET}/random
mkdir -p results/analysis/contour/{DATASET}/pt
```
for each dataset of interest (`5data, cifar50, cifar100, splityahooqa, 5datasetnlp`).

#### A. Vision

To run the loss contour analysis, we must pass in the specific run folder along with the data and checkpoints and a output file where the analysis data will be stored. For example:

```
./scripts/run_vision_analysis.sh \
    {DATASET} \
    ./data \
    ./output/{DATASET}/random/run_1 \
    ./results/analysis/contour/{DATASET}/random/run_1.json \
    contour
```

#### B. NLP
```
./scripts/run_text_analysis.sh {CUDA_DEVICE} \
     ./data \
     ./output \
     "{TASK_SEQUENCE}" \
     "distilbert" \
     "distilbert-base-uncased" \
     "{METHOD}" \
     "no_pt" \
     "{CACHE_DIR}" \
     "--lll_mode" \
     "{DATASPLIT}" \
     "contour" \
     ./results/analysis/contour/{DATASET}/random/run_1.json \
     {START_TASK_IDX}
```
Run a similar command for the pre-trained models.

### II) Linear Model Interpolation (LMI)
Similar to the above analysis experiment, first we create the folders where all the results will be stored:
```
mkdir -p results/analysis/lmi/{DATASET}/random
mkdir -p results/analysis/lmi/{DATASET}/pt
```
for each dataset of interest (`5data, cifar50, cifar100, splityahooqa, 5datasetnlp`).

#### A. Vision

To run linearly interpolate the model checkpoints for loss analysis, we run a similar command as above:
```
./scripts/run_vision_analysis.sh \
    {DATASET} \
    ./data \
    ./output/{DATASET}/random/run_1 \
    ./results/analysis/lmi/{DATASET}/random/run_1.json \
    lmi
```

#### B. NLP

```
./scripts/run_text_analysis.sh {CUDA_DEVICE} \
    ./data \
    ./output \
    "{TASK_SEQUENCE}" \
    "distilbert" \
    "distilbert-base-uncased" \
    "{METHOD}" \
    "no_pt" \
    "{CACHE_DIR}" \
    "--lll_mode" \
    "{DATASPLIT}" \
    "lmi" \
    ./results/analysis/lmi/{DATASET}/random/run_1.json \
    {START_TASK_IDX}
```
Run a similar command for the pre-trained models.


### III) Sharpness
This follows a similar structure as the above two analyses. Create the folders:
```
mkdir -p results/analysis/sharpness/{DATASET}/random
mkdir -p results/analysis/sharpness/{DATASET}/pt
```
for each dataset of interest (`5data, cifar50, cifar100, splityahooqa, 5datasetnlp`).

#### A. Vision

To run the sharpness analysis, we run a similar command as above:
```
./scripts/run_vision_analysis.sh \
    {DATASET} \
    ./data \
    ./output/{DATASET}/random/run_1 \
    ./results/analysis/sharpness/{DATASET}/random/run_1.json \
    sharpness
```

#### B. NLP
```
./scripts/run_text_analysis.sh {CUDA_DEVICE} \
    ./data \
    ./output \
    "{TASK_SEQUENCE}" \
    "distilbert" \
    "distilbert-base-uncased" \
    "{METHOD}" \
    "no_pt" \
    "{CACHE_DIR}" \
    "--lll_mode" \
    "{DATASPLIT}" \
    "sharpness" \
    ./results/analysis/sharpness/{DATASET}/random/run_1.json 0
```

Run a similar command for the pre-trained models.

## 3. Visualizing the analysis
Once the analysis for each dataset is complete, we should have a directory 
structure similar to this:
```
results
|---analysis
|   |---lmi
|   |   |---5data
|   |   |   |---random
|   |   |   |   |   run_1.json
|   |   |   |   |   run_2.json
|   |   |   |   |   ...
|   |   |   |---pt
|   |   |   |   |   run_1.json
|   |   |   |   |   run_2.json
|   |   |   |   |   ...
|   |   |---cifar50
|   |   |   | ...
|   |---sharpness
|   |   | ...
|   |---contour
|   |   | ...

```

Once we have this, we can create the visualizations. 
First, we create the folder where the visualizations will be stored.
```
mkdir -p results/figs/contour
mkdir -p results/figs/lmi
mkdir -p results/figs/sharpness
```

### I) Contours
To get the contour plots in Figure 2, simply run 
```
mkdir -p results/figs/contour/{DATASET}/random
./scripts/run_visualization.sh \
    contour \
    ./results/analysis/contour/{DATASET}/random/run_1.json \
    ./results/figs/contour/{DATASET}/random/run_1 \
    {MAX} \
    {STEP}
```
for each run that we want to create a contour plot for.
{MAX} and {STEP} are parameters that you can use to tweak the output visualization. {MAX} represents
the maximum loss that is present on the visualization, and {STEP} represents the step between loss
levels in the contour.


### II) Linear Model Interpolation (LMI)
To get the linear model interpolation plots in Figure 3(b), simply run 
```
./scripts/run_visualization.sh \
    lmi \
    ./results/analysis/lmi/{DATASET} \
    ./results/figs/lmi/{DATASET}
```
for each dataset. One LMI plot can be created for each dataset.

### III) Sharpness
To get the sharpness plots in Figure 3(a), simply run 
```
./scripts/run_visualization.sh \
    sharpness \
    ./results/analysis/sharpness \
    ./results/figs/sharpness/sharpness
```
One sharpness plot is created for all present datasets.
