### Env Setup

Setup the anaconda
 ```bash
wget https://repo.continuum.io/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh -b
export PATH=$PWD/anaconda3/bin:$PATH
 ```

Install packages under conda env
```bash
conda create -n PC3D python=3.7
conda activate PC3D
conda install -y -c rdkit rdkit
conda install -y -c pytorch pytorch=1.7.0
conda install -y numpy networkx scikit-learn

pip install e3fp==1.2.1 msgpack==1.0.0 ipykernel==5.3.0 einops
pip install git+https://github.com/bp-kelley/descriptastorus

export TORCH=1.7.0
export CUDA=cu110
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install torch-geometric
```


### Dataset Download

First we need to install GEOM dataset:

```bash
cd datasets
mkdir -p GEOM/raw
mkdir -p GEOM/processed
```

+ GEOM: [Paper](https://arxiv.org/pdf/2006.05531v3.pdf), [GitHub](https://github.com/learningmatter-mit/geom)
+ Data Download:
    + [Not Used] [Drug Crude](https://dataverse.harvard.edu/api/access/datafile/4360331),
      [Drug Featurized](https://dataverse.harvard.edu/api/access/datafile/4327295),
      [QM9 Crude](https://dataverse.harvard.edu/api/access/datafile/4327190),
      [QM9 Featurized](https://dataverse.harvard.edu/api/access/datafile/4327191)

    + [Mainly Used] [RdKit Folder](https://dataverse.harvard.edu/api/access/datafile/4327252)
    ```bash
    wget https://dataverse.harvard.edu/api/access/datafile/4327252
    mv 4327252 rdkit_folder.tar.gz
    tar -xvf rdkit_folder.tar.gz
    ```

+ Chem Dataset
```bash
wget http://snap.stanford.edu/gnn-pretrain/data/chem_dataset.zip
unzip chem_dataset.zip
mv dataset molecule_datasets
```


### Dataset Generation

```
cd src
python dataset_preparation.py --n_mol=50000 --n_conf=5 --n_upper=1000
```

### Run GraphMVP
```
cd src
python pretrain_GraphMVP.py --dataset=GEOM_3D_nmol50000_nconf1_nupper1000
```

### Run Fine-tuning Models
```
cd src
mode="$pre-trained_mode"
export dataset_list=(tox21 toxcast clintox bbbp sider muv hiv bace)
export seed=0

for dataset in "${dataset_list[@]}"; do
    export folder="$mode"/"$seed"
    mkdir -p ../output/"$folder"
    mkdir -p ../output/"$folder"/"$dataset"

    export output_path=../output/"$folder"/"$dataset".out
    export input_model_file=../output/"$mode"/pretraining_model.pth

    python molecule_funetune.py \
    --dataset="$dataset" --runseed="$seed" --eval_train \
    --input_model_file="$input_model_file" > "$output_path"

done
```
