# Installation

Installation must be done using conda.

```shell
conda env create -f environment.yml
```

# Running

Slurm Workload Manager which will use Ray for multi-node:
```text
$ ./submit_job.sh -h
Usage: submit_job.sh [OPTION]...

Submit a slurm batch job for the given task and other options.

  -h, --help                     Print this message
  -s, --single-objective         Single-objective search (task performance only)
  -t, --task TASK                The task/dataset name for which to run the
                                 experiment (required)
  -r, --reservation RESERVATION  The reservation name, if applicable
  -n, --reservation-nodes RESERVATION_NODES
                                 The reserved number of nodes, if applicable
```
Before submitting a job, make sure you update the variable in the script `ACTIVATE_PYTHON_ENV`
to the name of your conda environment. If not using conda, the script will need minor
refactoring to activate your environment rather than conda.

For running on a single node:
```shell
TASK=mnist
SINGLE_OBJECTIVE=false
EXPLAINABILITY_TYPE=activations
CPUS_PER_TASK=16
GPUS_PER_TASK=1
./deephyper_xnas nas "nsga2" \
    --problem "experiments.$TASK.problem.Problem" \
    --run "xnas.nas_deephyper.nas_run.run" \
    --multiobjective-explainability "true" \
    --record-mo-xai-only "$SINGLE_OBJECTIVE" \
    --explainability-type "$EXPLAINABILITY_TYPE" \
    --max-evals 16000 \
    --evaluator "ray" \
    --ray-address "auto" \
    --num-cpus-per-task $CPUS_PER_TASK \
    --num-gpus-per-task $GPUS_PER_TASK
```

## Running Introspectability as a Regularizer

The introspectability metric is implemented as a loss/regularizer. To enable it
during training, the environmental variable `INTROSPECTABILITY_AS_REGULARIZER`
should be set.


# Results
Results are stored with the following tree structure (`submit_job.sh` script):
```text
results/
|-- <task>/
|   |-- <timestamp>/
|   |   |-- deephyper.log (DeepHyper logging output)
|   |   |-- init_infos.json (DeepHyper problem definition)
|   |   |-- results.csv (pertinent results you are likely interested in)
|   |   |-- activations/ (activations for intra-gen Pareto-optimal solutions)
|   |   |   |-- <UID>.npz
|   |   |   `-- ...
|   |   `-- save/
|   |       |-- config/ (configuration for each instantiated model)
|   |       |   |-- <UUID>.json
|   |       |   `-- ...
|   |       |-- history/
|   |       |   |-- <UUID>.json
|   |       `-- model/ (empty)
|   |-- ...
|-- ...
```
Logs are stored in `logs_batch/`.

## Processing Raw Results
Raw results are provided in the `raw-results` directory. This can be used in
conjunction with the `plotting-code.ipynb` notebook. In the first cell of this
notebook, simply point to the directory that the results are stored in (the file
of particular interest is `results.csv`). There are examples left in this cell
that you can follow. Following this, simply run all cells in the notebook. In
sequence, it will process the results and give the following analyses and
figures (cell by cell):

1. Statistics about the number of evaluations and metrics
2. The phylogenetic trees of NAS evolution, if applicable
3. The objectives plotted over time
4. The Pareto fronts grouped by generation, cumulative generation, and overall
5. The objectives plotted over each generation
6. The Pareto front plotted for the results
7. The hypervolume+Pareto front plots as shown in the paper (Figure 2)
8. Pairplots of the objectives
9. The highest-accuracy solution discovered
10. Correlation plot of the objectives
11. Plots showing the generalization error, number of parameters, and train
times as a function of the objectives
12. Analysis of the motifs discovered in the Pareto front (frequencies of
varying sizes and correlation statistics with objectives, including along
the Pareto front)
13. Activation heat maps across blocks of architectures
