# Code for "Graph Neural Modeling of Network Flows"

This is the code for the paper _Graph Neural Modeling of Network Flows_.

## License
MIT.

## Prerequisites
Currently tested on Linux and MacOS (specifically, CentOS 7.4.1708 and Mac OS Big Sur 11.2.3), can also be adapted to Windows through [WSL](https://docs.microsoft.com/en-us/windows/wsl/about).
Makes heavy use of Docker, see e.g. [here](https://docs.docker.com/engine/install) for how to install. Tested with Docker 19.03. The use of Docker largely does away with dependency and setup headaches, making it significantly easier to reproduce the reported results.

## Configuration
Create a file `relnet.env` at the root of the project (see `relnet_example.env`) and adjust the paths within: this is where some data generated by the container will be stored. 

Add the following lines to your `.bashrc`, replacing `/home/jane/git/relnet` with the path where the repository is cloned. 

```bash
export RN_SOURCE_DIR='/home/jane/git/relnet'
set -a
. $RN_SOURCE_DIR/relnet.env
set +a

export PATH=$PATH:$RN_SOURCE_DIR/scripts
```

Make the scripts executable (e.g. `chmod u+x scripts/*`) the first time after cloning the repository.

## Managing the containers
Some scripts are provided for convenience. To build the containers (note, this will take a few minutes):
```bash
update_container.sh
```
To start them:
```bash
manage_container.sh up
```
To stop them:
```bash
manage_container.sh stop
```

## Setting up graph topology data
Copy the `topologies.zip` file provided to `$RN_EXPERIMENT_DATA_DIR`, unzip then remove it:

```bash
cp Downloads/topologies.zip $RN_EXPERIMENT_DATA_DIR
unzip topologies.zip
rm topologies.zip

mkdir $RN_EXPERIMENT_DATA_DIR/demand_matrices
```

If you would like to use different topologies, you can place other topology files from Repetita under `$RN_EXPERIMENT_DATA_DIR/topologies` and use the `relnet/experiment_launchers/select_topologies.py` script to select them automatically based on different criteria.
Beware, however, that there are some caveats regarding data cleanliness for some topologies, which have too small overall delays for tm-gen to run; some have unclean and / or irregular data.

## Setting up demand matrices

Then, run the following to generate the demand matrices:
```bash
# generate data for main experiments
$RN_SOURCE_DIR/scripts/data_gen_mult.sh main

# generate data for topology variation experiments
$RN_SOURCE_DIR/scripts/data_gen.sh topvar ssp topvarfinal1d 1.0
$RN_SOURCE_DIR/scripts/data_gen.sh topvar ecmp topvarfinal1d 1.0

```

## Running experiments
The file `relnet/evaluation/experiment_conditions.py` contains the configuration for the experiments reported in the paper, but you may modify e.g. models, objective functions, hyperparameters etc. to suit your needs.

Then, you can launch all the experiments as follows:

```bash
run_everything.sh hyperopt
# [wait until completion]
run_everything.sh eval
```

Note that, as reported in the paper, running all the computations will take 35 days on a cluster with 8 CPU machines.

The tasks are trivially parallelizable and the `run_prod.sh` script can be easily modified to e.g. submit a job to a cluster instead, or to use OS-level parallelism (e.g., by running jobs in the background with `&`).

Note also that modifying `hyps_chunk_size` and `seeds_chunk_size` in `experiment_conditions.py` will pack together more or less tasks and can be modified to suit your needs.

If your computational budget is constrained, consider e.g. selecting only the smallest graphs, such as `Aconet`, and running with less synthetic training data.  

## Bundled notebook service
There is a Jupyter notebook server running on the  `manager` node at `http://localhost:8888`.

The first time Jupyter is accessed it will prompt for a token to enable password configuration, it can be grabbed by running `docker exec -it relnet-manager /bin/bash -c "jupyter notebook list"`.

## Accessing experiment data
Experiment data and results are under your configured `$RN_EXPERIMENT_DATA_DIR`.
Some functionality is provided in `relnet/evaluation/storage.py` to insert and retrieve data, you can use it in e.g. analysis notebooks.

## Reproducing the results
Jupyter notebooks are used to perform the data analysis and produce tables and figures. Navigate to `http://localhost:8888`, then notebooks folder.

All tables and result figures can be obtained by opening the `FlowGNN_Evaluation.ipynb` and `FlowGNN_Topology_vs_Predictability.ipynb` notebooks, selecting the `py3-relnet` kernel and run all cells. Resulting .pdf figures and .tex tables can be found at `$RN_EXPERIMENT_DIR/aggregate`.

There is an additional notebook (`FlowGNN_Hyperparam_Optimisation.ipynb`) provided for analyzing the results of hyperparameter optimization.
 
### Problems with jupyter kernel
In case the `py3-relnet` kernel is not found, try reinstalling the kernel by running `docker exec -it -u 0 relnet-manager /bin/bash -c "source activate relnet-cenv; python -m ipykernel install --user --name relnet --display-name py3-relnet"`