Welcome to our codebase anonymous ICLR reviewer! We hope you enjoy your stay (:

To get started, we need to create a conda environment with our dependencies.

```
cd iclr_code_release
conda env create -f context_modeling.yml
conda activate fsmol
```

We now need to download the FS-Mol dataset. Thankfully, this is quite easy.

* Navigate to the FS-Mol github: https://github.com/microsoft/FS-Mol
* Find the Data section, and download the FS-Mol Data .zip file.
* Move it to `iclr_code_release` (this folder) and call it `fsmol_datasets`.

Running `du-h fsmol_datasets/` should return something like this:

```
18M	fsmol_datasets/valid
53M	fsmol_datasets/test
4.7G	fsmol_datasets/train
4.7G	fsmol_datasets/
```

To train the context model...

* `cd context_model`
* ```
  python context_model_pretrain.py ../fsmol_datasets --batch_sizes 256 256 256 256 256 \
    --context_lengths 256 128 64 32 16 --model_size base --save-dir v2_train --num_epochs 100 \
    --task-list-file datasets/fsmol-0.1.json --model_type ContextTransformer_v2 --cuda 0 --attention_dropout 0.2
  ```
* The first argument is the path of the data directory.
* `--save-dir v2` is telling the code to make a directory in `context_model/` called `v2_train` and save the training
  log and saved model here.
* We are unable to release our model weights due to the 100 MB upload limit on OpenReview; however, we will release them after the review process.
* `--cuda 0` specifies the GPU your model should run on. You can see which GPUs are available with the
  command `nvidia-htop.py` or `nvidia-smi`

To evaluate the context model on FS-Mol, simply run

```
python fs_mol/context_modeling_test.py . ../fsmol_datasets --save-dir v2_eval \
--model_type ContextTransformer_v2 --model_path 'v2_train/model_name/best_model.pt' --train-sizes [128]
```

* Similar to the training command, the first argument is the path of the data directory.
* `--save-dir v2_eval` means we write the test log files to a new folder called `v2_eval`
* `--model_path` tells the code where to your trained model is.
    * You **will** need to change the path for the argument `--model_path 'v2_train/model_name/best_model.pt'`
    * Your train run **will not** be called `model_name`, but rather something else. Replace `model_name` with the name
      of the folder in `v2_train`
* `--train-sizes [128]` is the context length. You can set the values of this argument to `[8]`, `[16]`, `[32]`, `[64]`,
  or `[128]`

This completes the training and evaluating the Context Model described in our submission on FS-Mol.

The code for training on the MoleculeNet datasets is actually less involved, but anonymously releasing the MoleculeNet
datasets featurized and reformulated as few-shot learning benchmarks, while preserving anonymity, is tricky as they
would need to be hosted for download on an anonymous server. It is significantly easier for us to release these datasets
if anonymity does not need to be maintained, and we will do so after the ICLR review cycle.