# Contrastive Learning of Molecular Representation with Fragmented Views

**ICLR2023 submission**

<br>

## Refernence
Our code is largely based on [GraphMVP](https://github.com/chao1224/GraphMVP#pre-training-molecular-graph-representation-with-3d-geometry)

## Environments
Install packages under conda env
```bash
conda create -n GraphMVP python=3.7
conda activate GraphMVP

conda install -y -c rdkit rdkit
conda install -y -c pytorch pytorch=1.9.1
conda install -y numpy networkx scikit-learn
pip install ase
pip install git+https://github.com/bp-kelley/descriptastorus
pip install ogb
export TORCH=1.9.0
export CUDA=cu102  # cu102, cu110

wget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl
pip install torch_cluster-1.5.9-cp37-cp37m-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl
pip install torch_scatter-2.0.9-cp37-cp37m-linux_x86_64.whl
wget https://data.pyg.org/whl/torch-${TORCH}%2B${CUDA}/torch_sparse-0.6.12-cp37-cp37m-linux_x86_64.whl
pip install torch_sparse-0.6.12-cp37-cp37m-linux_x86_64.whl
pip install torch-geometric==1.7.2
```

## Dataset Preprocessing

For dataset download, please follow the instruction [this webpage](https://github.com/chao1224/GraphMVP/datasets).

For data preprocessing (GEOM), please use the following commands:
```
cd src_classification
python FragCL_preparation.py --n_mol 50000 --data_folder $Your_folder
cd ..

```

## Pre-training

Please check the following script:
- `scripts_classification/submit_pre_training_FragCL.sh`

## Fine-tuning

Please check the following script:
- `scripts_classification/submit_fine_tuning_FragCL.sh`

## Analysis

Please check the following script:
- `output/run_analysis.py`
