# Hyperparameters and Experiment Launching

The experiments should be reasonably deterministic given the same set of hyperparameters. However, this has not been tested completely -- although random seeding is done, there are other sources of non-determinism such as CuDNN and scatter operations on the GPU. Regardless, we supply the hyperparameters used in our experiments.

## Launching Runs

You will want to invoke `main.py`. There are 3 positional arguments:

1. Experiment output directory
2. Model
3. Dataset

Optional arguments you will be interested in are:

1. `--hparams` - invokes training of final models using the dictionary of hyperparameters supplied on the command line; the string representation provided is `eval`-ed into the dictionary.
2. `--final-runs` - controls how many final training runs are done; will default to the per-experiment default.
3. `--hidden` - sets the hidden dimension

For EGC experiments, you will want to use the `--egc-num-bases` and `--egc-num-heads` options. You will also want to use the `aggrs` option, which enables setting the choice of aggregators. These are passed by comma separated list; choices include symadd, add, mean, min, max, std and var.


## Infrastructure Used

We ran our experiments on Nvidia GPUs.
We primarily ran our trials on 1080Tis/2080Tis, with additional experiments being run on V100s and an RTX8000.
Each machine had an Intel Xeon CPU (varying models).

## Hyperparameters

### Zinc

Tune over:

1. Learning rate [0.0001, 0.01]
2. Batch size (64, 128)
3. Weight decay [0.0001, 0.001]

Random Search (50 samples) + hyperband pruning

| Model | Hidden | Bases | Heads | Aggregators |                           Hparams                            |
| :---: | :----: | :---: | :---: | :---------: | :----------------------------------------------------------: |
| EGC-S |  168   |   4   |   8   |   symadd    | {'lr': 0.00278434576243951, 'batch_size': 64, 'wd': 0.00015614444389379077} |
| EGC-M |  124   |   4   |   4   | add,std,max | {'lr': 0.0019099809690277627, 'batch_size': 64, 'wd': 0.00020407622034162426} |

### CIFAR

Tune over:

1. Learning rate [0.0001, 0.01]
2. Batch size (32, 64)
3. Weight decay [0.0001, 0.001]
4. Dropout [0.0, 0.5]

Random search (50 samples) + hyperband pruning

| Model | Hidden | Bases | Heads | Aggregators    | Hparams                                                      |
| ----- | ------ | ----- | ----- | -------------- | ------------------------------------------------------------ |
| EGC-S | 168    | 4     | 8     | symadd         | {'lr': 0.0012354800908953303, 'batch_size': 32, 'wd': 0.000453476392621599, 'dropout': 0.13094687106367725}  |
| EGC-M | 128    | 4     | 4     | symadd,std,max | {'lr': 0.0009263869626947979, 'batch_size': 32, 'wd': 0.0007592290244995363, 'dropout': 0.08118925150158363} |

### MolHIV

Tune over:

1. Learning rate [0.0001, 0.01]
2. Batch size (32, 64)
3. Weight decay [0.0001, 0.001]
4. Dropout [0.0, 0.2]

Grid search (5 LR, 2 BS, 2 weight decay, 2 dropout) + hyperband to prune trials

Dataset version was 1.2.3

| Model    | Hidden | Bases | Heads | Aggregators  | Hparams                                                      |
| -------- | ------ | ----- | ----- | ------------ | ------------------------------------------------------------ |
| GCN      | 240    |       |       |              | {'lr': 0.00031622776601683794, 'batch_size': 32, 'wd': 0.0001, 'dropout': 0.2} |
| GAT      | 240    |       |       |              | {'lr': 0.00031622776601683794, 'batch_size': 32, 'wd': 0.0001, 'dropout': 0.0} |
| GIN-eps  | 240    |       |       |              | {'lr': 0.00031622776601683794, 'batch_size': 32, 'wd': 0.0001, 'dropout': 0.2} |
| SAGE     | 180    |       |       |              | {'lr': 0.00031622776601683794, 'batch_size': 64, 'wd': 0.001, 'dropout': 0.2} |
| MPNN-Sum | 180    |       |       |              | {'lr': 0.0001, 'batch_size': 32, 'wd': 0.001, 'dropout': 0.2} |
| MPNN-Max | 180    |       |       |              | {'lr': 0.00031622776601683794, 'batch_size': 64, 'wd': 0.0001, 'dropout': 0.0} |
| EGC-S    | 296    | 4     | 8     | symadd       | {'lr': 0.0001, 'batch_size': 32, 'wd': 0.0001, 'dropout': 0.2} |
| EGC-M    | 224    | 4     | 4     | add,mean,max | {'lr': 0.0001, 'batch_size': 32, 'wd': 0.001, 'dropout': 0.2} |

PNA result taken from Corso et al.

### Arxiv

Tune over:

1. Learning rate [0.001, 0.05]
2. Weight decay [0.0001, 0.001]
3. Dropout [0.0, 0.2]

Grid search (5 LR, 2 weight decay, 2 dropout). *No pruning with hyperband*

Dataset version was 1.2.3

| Model    | Hidden | Bases | Heads | Aggregators     | Hparams                                                     |
| -------- | ------ | ----- | ----- | --------------- | ----------------------------------------------------------- |
| GCN      | 156    |       |       |                 | {'lr': 0.0023853323044733007, 'wd': 0.0001, 'dropout': 0.2} |
| GAT      | 152    |       |       |                 | {'lr': 0.0087876393444041, 'wd': 0.0001, 'dropout': 0.2}    |
| GIN-eps  | 156    |       |       |                 | {'lr': 0.0087876393444041, 'wd': 0.0001, 'dropout': 0.2}    |
| SAGE     | 115    |       |       |                 | {'lr': 0.0023853323044733007, 'wd': 0.001, 'dropout': 0.2}  |
| MPNN-Sum | 116    |       |       |                 | {'lr': 0.03237394014347626, 'wd': 0.0001, 'dropout': 0.2}   |
| MPNN-Max | 116    |       |       |                 | {'lr': 0.001, 'wd': 0.001, 'dropout': 0.2}                  |
| PNA      | 76     |       |       |                 | {'lr': 0.0036840314986403863, 'wd': 0.001, 'dropout': 0.2}  |
| EGC-S    | 184    | 4     | 8     | symadd          | {'lr': 0.005689810202763908, 'wd': 0.001, 'dropout': 0.2}   |
| EGC-M    | 136    | 4     | 4     | symadd,max,mean | {'lr': 0.0036840314986403863, 'wd': 0.001, 'dropout': 0.2}  |

### Code-V1

Tune over:

1. Learning rate [0.0001, 0.01]

Grid search over 6 learning rates.

Dataset version was 1.2.3

| Model    | Hidden | Bases | Heads | Aggregators    | Hparams                      |
| -------- | ------ | ----- | ----- | -------------- | ---------------------------- |
| GCN      | 304    |       |       |                | {'lr': 0.001584893192461114} |
| GAT      | 304    |       |       |                | {'lr': 0.000630957344480193} |
| GIN-eps  | 304    |       |       |                | {'lr': 0.003981071705534973} |
| MPNN-Sum | 292    |       |       |                | {'lr': 0.001584893192461114} |
| MPNN-Max | 292    |       |       |                | {'lr': 0.000630957344480193} |
| PNA      | 272    |       |       |                | {'lr': 0.00063096}           |
| EGC-S    | 304    | 8     | 8     | symadd         | {'lr': 0.001584893192461114} |
| EGC-M    | 300    | 4     | 4     | symadd,min,max | {'lr': 0.001584893192461114} |

### Code-V2

A new version of the code dataset was released in version 1.2.5 after label leakage was discovered in the previous version. We re-ran our experiments on this new version.

Tune over:

1. Learning rate [0.0001, 0.01]

Grid search over 6 learning rates.

Dataset version was 1.2.3

| Model    | Hidden | Bases | Heads | Aggregators    | Hparams                        |
| -------- | ------ | ----- | ----- | -------------- | ------------------------------ |
| GCN      | 304    |       |       |                | {'lr': 0.001584893192461114}   |
| GAT      | 304    |       |       |                | {'lr': 0.00025118864315095795} |
| GIN-eps  | 304    |       |       |                | {'lr': 0.001584893192461114}   |
| SAGE     | 393    |       |       |                | {'lr': 0.000630957344480193}   |
| MPNN-Sum | 292    |       |       |                | {'lr': 0.00025118864315095795} |
| MPNN-Max | 292    |       |       |                | {'lr': 0.000630957344480193}   |
| PNA      | 272    |       |       |                | {'lr': 0.00063096}             |
| EGC-S    | 304    | 8     | 8     | symadd         | {'lr': 0.000630957344480193}   |
| EGC-M    | 300    | 4     | 4     | symadd,min,max | {'lr': 0.001584893192461114}   |

## Ablation Study

**Note**: there is variance in runs, and we did encounter some outlier experiments where the test loss was 0.42 or higher. These were clear outliers, and we re-ran those experiments. We expect some variance if you re-run the search: the generated proposal hyperparameters are not seeded, and neither are the weight initializations. **Only the final runs are seeded**. In hindsight this is something that we would aim to avoid for future work.

### Parameter Count (100K)

| Heads | Bases | Hidden | Result | Std   | Hyperparameters                                                                |
|-------|-------|--------|--------|-------|--------------------------------------------------------------------------------|
| 1     | 1     | 142    | 0.438  | 0.011 | {'lr': 0.003059716468710177, 'batch_size': 64, 'wd': 0.00024932487158610574}   |
| 1     | 2     | 105    | 0.387  | 0.022 | {'lr': 0.0051846898355138405, 'batch_size': 128, 'wd': 0.00010398947101972665} |
| 1     | 4     | 76     | 0.402  | 0.029 | {'lr': 0.004082569111303931, 'batch_size': 128, 'wd': 0.00021020533452335589}  |
| 1     | 8     | 54     | 0.396  | 0.02  | {'lr': 0.0015950000229818808, 'batch_size': 64, 'wd': 0.00018432566293718268}  |
| 1     | 16    | 39     | 0.397  | 0.017 | {'lr': 0.0016254924999467638, 'batch_size': 64, 'wd': 0.00012255974207700643}  |
|       |       |        |        |       |                                                                                |
| 2     | 1     | 186    | 0.425  | 0.026 | {'lr': 0.004595766114338344, 'batch_size': 128, 'wd': 0.0003406841019850422}   |
| 2     | 2     | 140    | 0.389  | 0.022 | {'lr': 0.006758330977425682, 'batch_size': 128, 'wd': 0.00011116155956381182}  |
| 2     | 4     | 104    | 0.399  | 0.019 | {'lr': 0.0012725522058221876, 'batch_size': 64, 'wd': 0.0003932087276656274}   |
| 2     | 8     | 74     | 0.391  | 0.024 | {'lr': 0.0022763496559145965, 'batch_size': 64, 'wd': 0.00030972140779290556}  |
| 2     | 16    | 52     | 0.419  | 0.011 | {'lr': 0.0009133748989945809, 'batch_size': 64, 'wd': 0.0004191994746174878}   |
|       |       |        |        |       |                                                                                |
| 4     | 1     | 232    | 0.435  | 0.008 | {'lr': 0.0012809553738776893, 'batch_size': 64, 'wd': 0.00023599032344650205}  |
| 4     | 2     | 180    | 0.378  | 0.014 | {'lr': 0.0023503165020786554, 'batch_size': 64, 'wd': 0.00021326255261347772}  |
| 4     | 4     | 136    | 0.405  | 0.019 | {'lr': 0.002280874077442256, 'batch_size': 128, 'wd': 0.00016983733932965093}  |
| 4     | 8     | 100    | 0.397  | 0.023 | {'lr': 0.002157056978525518, 'batch_size': 64, 'wd': 0.0006505102634046523}    |
| 4     | 16    | 68     | 0.4    | 0.028 | {'lr': 0.007475759228858606, 'batch_size': 128, 'wd': 0.00018261215555152014}  |
|       |       |        |        |       |                                                                                |
| 8     | 1     | 272    | 0.408  | 0.014 | {'lr': 0.003237561847876019, 'batch_size': 64, 'wd': 0.0004029760918683308}    |
| 8     | 2     | 216    | 0.388  | 0.022 | {'lr': 0.00565064174990118, 'batch_size': 64, 'wd': 0.00011144780738870096}    |
| 8     | 4     | 168    | 0.364  | 0.02  | {'lr': 0.00278434576243951, 'batch_size': 64, 'wd': 0.00015614444389379077}    |
| 8     | 8     | 120    | 0.396  | 0.029 | {'lr': 0.001406514441546532, 'batch_size': 64, 'wd': 0.00029544213504303457}   |
| 8     | 16    | 80     | 0.391  | 0.022 | {'lr': 0.0036797253125154775, 'batch_size': 128, 'wd': 0.00027104079055315436} |
|       |       |        |        |       |                                                                                |
| 16    | 1     | 288    | 0.391  | 0.015 | {'lr': 0.0061087185167932115, 'batch_size': 128, 'wd': 0.0002273184198789297}  |
| 16    | 2     | 224    | 0.378  | 0.014 | {'lr': 0.00428840518208411, 'batch_size': 64, 'wd': 0.0001061508219486589}     |
| 16    | 4     | 176    | 0.393  | 0.015 | {'lr': 0.002931923031986728, 'batch_size': 128, 'wd': 0.00013034058509380351}  |
| 16    | 8     | 112    | 0.369  | 0.023 | {'lr': 0.003643084029023136, 'batch_size': 128, 'wd': 0.00014767545119931004}  |
| 16    | 16    | 64     | 0.383  | 0.027 | {'lr': 0.0022159422474374592, 'batch_size': 64, 'wd': 0.0001148037568072897}   |

### Hidden Dimension (128)

| Heads | Bases | Result | Std   | Hyperparameters                                                                |
|-------|-------|--------|-------|--------------------------------------------------------------------------------|
| 1     | 1     | 0.457  | 0.026 | {'lr': 0.0009905628560898867, 'batch_size': 64, 'wd': 0.0003791379814624699}   |
| 1     | 2     | 0.402  | 0.029 | {'lr': 0.0033203547862751016, 'batch_size': 128, 'wd': 0.000129368683798121}   |
| 1     | 4     | 0.372  | 0.013 | {'lr': 0.0028185855054290348, 'batch_size': 64, 'wd': 0.000110241337219995}    |
| 1     | 8     | 0.395  | 0.015 | {'lr': 0.0020643416889042335, 'batch_size': 64, 'wd': 0.00024402240952518204}  |
| 1     | 16    | 0.397  | 0.023 | {'lr': 0.0004963552869961618, 'batch_size': 64, 'wd': 0.0003160754920947726}   |
| 2     | 1     | 0.436  | 0.013 | {'lr': 0.0024391466860373976, 'batch_size': 128, 'wd': 0.0003764051687247347}  |
| 2     | 2     | 0.429  | 0.009 | {'lr': 0.0016602428864439917, 'batch_size': 128, 'wd': 0.0002168869706564395}  |
| 2     | 4     | 0.413  | 0.015 | {'lr': 0.0009803643724230493, 'batch_size': 64, 'wd': 0.00011250422859596219}  |
| 2     | 8     | 0.361  | 0.012 | {'lr': 0.0033278618751334138, 'batch_size': 64, 'wd': 0.00010366478664849056}  |
| 2     | 16    | 0.389  | 0.025 | {'lr': 0.002082475573321234, 'batch_size': 128, 'wd': 0.00031036620463648827}  |
| 4     | 1     | 0.431  | 0.009 | {'lr': 0.004700535543388525, 'batch_size': 128, 'wd': 0.0001002933008418741}   |
| 4     | 2     | 0.42   | 0.015 | {'lr': 0.003702073953063512, 'batch_size': 64, 'wd': 0.0006869206027948232}    |
| 4     | 4     | 0.395  | 0.017 | {'lr': 0.003148181818571187, 'batch_size': 128, 'wd': 0.0006299893259191312}   |
| 4     | 8     | 0.388  | 0.016 | {'lr': 0.003723003115072577, 'batch_size': 64, 'wd': 0.000187218003891752}     |
| 4     | 16    | 0.396  | 0.023 | {'lr': 0.0012547863658416598, 'batch_size': 128, 'wd': 0.00018530057376373087} |
| 8     | 1     | 0.421  | 0.025 | {'lr': 0.008110031010018465, 'batch_size': 64, 'wd': 0.00010421914838623193}   |
| 8     | 2     | 0.4    | 0.021 | {'lr': 0.007920837940362318, 'batch_size': 64, 'wd': 0.000113779504056895}     |
| 8     | 4     | 0.395  | 0.006 | {'lr': 0.008610092880667053, 'batch_size': 128, 'wd': 0.00010134943833468606}  |
| 8     | 8     | 0.383  | 0.009 | {'lr': 0.006047352685362815, 'batch_size': 128, 'wd': 0.00021038377802130008}  |
| 8     | 16    | 0.383  | 0.025 | {'lr': 0.0013731703674031866, 'batch_size': 64, 'wd': 0.0002327872787400411}   |
| 16    | 1     | 0.419  | 0.01  | {'lr': 0.005305930382865731, 'batch_size': 64, 'wd': 0.00043968261225748716}   |
| 16    | 2     | 0.401  | 0.019 | {'lr': 0.00257005942283963, 'batch_size': 64, 'wd': 0.0006895449848152606}     |
| 16    | 4     | 0.395  | 0.018 | {'lr': 0.00213094731168947, 'batch_size': 64, 'wd': 0.000862134262819252}      |
| 16    | 8     | 0.382  | 0.019 | {'lr': 0.004052115476867187, 'batch_size': 64, 'wd': 0.00014627932774578965}   |
| 16    | 16    | 0.359  | 0.02  | {'lr': 0.002873665274991742, 'batch_size': 64, 'wd': 0.00013305870959268287}   |
