## Note to ICLR Reviewer

Below you may find instructions for reproducing our experiment results. In addition, to supporting the reproduction of all results provided within the paper we are in the process of curating to Google Colab Notebook tutorial to accompany the paper for didactic purposes. 


## Hardware/Software Requirements

### Operating System

* **Officially supported Operating Systems:** Linux (tested on Ubuntu 22.04 LTS)
* **Unofficially supported Operating Systems:** Windows (plans to officially support in camera-ready version)
* **Unsupported Operating Systems:** Macintosh (unsupported due to Jax accelerator requirements)

### GPU Hardware

The experiments that reproduce the results of our paper require significant GPU memory to complete in a reasonable amount of time. 
A workstation with a Nvidia GPU and/or Google TPU with more than 12 GB of memory should suffice.

* **Officially supported and tested:** Nvidia RTX3090 (24 GB)
* **Officially supported but not tested:** Nvidia GPU, Google TPU 

### CPU Hardware

* **Officially supported and tested:** AMD Ryzen 9 5900X 12-Core Processor

An equivalent or even less powerful CPU should suffice for running experiments within this repository.

### Miscellaneous

Our workstation has 64GB of RAM and a 4TB disk. This is excessive for the experiments we run however, there is a certain 
amount of datapreprocessing so experiments should be run on machines with at least 16GB of RAM and reasonable storage.

## Getting Started

The repository has the following file structure:

    ├──── config                             # hydra configs for individual experiments
    ├──── config_hyperparameter_sweep        # hydra configs for hyperparameter sweeps
    ├──── data_preprocessing                 # utilities for preprocessing raw datasets
    ├──── dynamax                            # package for implementations of dynamical system neural network layers
    ├──── evaluate                           # model evaluation scripts
    ├──── huggingface                        # utilities for interfacing with huggingface
    ├──── model_architectures                # flax model definitions
    ├──── train                              # model training scripts
    ├── run_eval.ipynb                       # notebook for inspecting and evaluating trained models

Before we start interacting with these files we need to:

1. Install the Python virtual environment associated with this project
2. Create a Weights and Biases account for tracking train/eval jobs
3. Update Hydra config files to be compatible with our filesystem 

### Installing the Python Virtual Environment
We currently manage Python package dependencies using [poetry](https://python-poetry.org/), if you do not already have poetry installed please proceed
with the installation instructions provided on the poetry project webpage.

Once you've installed poetry, you are going to want to install and activate the virtual environment by running the following commands from the root of this project:

```bash
poetry install
poetry shell
```
### Creating a Weights and Biases Account
We are using Weights and Biases to track our training and evaluation metrics, we intend to also release our Weights and Biases dashboard and report to the public alongside our paper. When reproducing our results you need to leverage a Weights and Biases account as the codebase has been written to integrate this experiment tracking software. To sign up and create an account please follow the [official guides](https://docs.wandb.ai/guides) provided by the Weights and Biases team.

### Updating Configurations Files for Filesystem Compatibility
There are two updates to the Hydra configuration files that need to be performed in order to run our experiments. You will need to update the `entity` associated to the Weights and Biases account used for experiment tracking as well as the `base_path` for the project. For updating the entity, all yaml files in the top-level of the directory `./config/wandb` require the `entity` parameter to be updated to your Weights and Biases username. For updating the project base path, all yaml files in the top-level of the directory `./config` need the `base_path` parameter to be updated to the root directory of this project. In future, we intend to include a bash script to automate this configuration update, as well as removing the dependency on Weights and Biases for experiment tracking.

With the above steps completed you are now ready to start training and evaluation jobs that replicate our results. The `scripts` folder in the directory of this `README.md` contains bash scripts that can be used to reproduce the results presented in our paper. In the following sections we provide further instructions for running these jobs.

## Running Training Experiments

Python files for running individual training jobs can be found in `./train`. Each python file relies on an experiment configuration defined in `./config` to be read in with [hydra](https://hydra.cc/docs/intro/). An example of running a training job for the feedforward architecture with using `feedforward.yaml` and overwriting the default dataset shape is given below:

```python
python ../train/train_feedforward.py +config=feedforward config.dataset.shape=Sshape
```

In this example it is assumed you are running this command from the `scripts` directory. Rather than running each command individually we have created a bash scripts within the `scripts` directory to enable you to rerun our experiments, for the single-task experiments you can start training jobs for all models by running:

```bash
./train_single_task.sh
```

For the multi-task experiments you can start training jobs for all models by running:

```bash
./train_multi_task.sh
```

## Running Evaluation Experiments

In order to run evaluation experiments you need to specify the Weights and Biases `run_id` parameter in yaml files under the folder `./config/evaluation`, the name of the yaml file corresponds with the type of model being evaluated.  

As we wanted to rapidly explore evaluation data associated with this project, we choose to leverage a jupyter notebook to generate figures and display results.
With the Python virtual environment for this project activated you can run the following command to start a jupyter notebook session in your browser.

```bash
python -m notebook
```

Once the session is live, open the `run_eval.ipynb` notebook to begin exploring the evaluation results of trained models. We highly recommend examining models 
evaluation results in this way as this is what we used to generate the results in our paper. 

Alternatively there are Python files for running individual evaluation jobs can be found in `./evaluate`. Each python file relies on an experiment configuration defined in `./config` to be read in with [hydra](https://hydra.cc/docs/intro/). An example of running an evaluation job for the feedforward architecture with using `feedforward.yaml` is given below

```python
python ../evaluate/evaluate_feedforward.py +config=feedforward
```

In this example it is assumed you are running this command from the `scripts` directory. Rather than running each command individually we have created a bash scripts within the `scripts` directory to enable you to rerun our evaluation experiments, for the single-task experiments you can start evaluation jobs for all models by running:

```bash
./eval_single_task.sh
```

For the multi-task experiments you can start evaluation jobs for all models by running:

```bash
./eval_multi_task.sh
```
