# Note on anonymization
**We guarantee that all author-identifying information has been removed from this repository to ensure anonymity.** Please note that this repository *has* copied existing public repositories (e.g., EquiformerV2, E3NN, OCP models) for user convenience, and these public repositories do contain information regarding their respective authors (e.g., in their licenses). **Such information is not to be interpreted as relating to the authors of this new submission.**

# Overview
This repository contains all the training and inference code for ShEPhERD. 

To facilitate reproducibility, we also include the conditioning structures used in our natural product, bioactive hit diversification, and fragment merging experiments in `conformers/`.

Trained model checkpoints for the model trained on ShEPhERD-MOSES-aq are in `shepherd_chkpts/`. To stay under the file size limit, we have not included our GDB17 model checkpoints in this initial code submission. We can provide these checkpoints upon request, and they will all be released after the review period.

`RUNME_unconditional_generation_MOSESaq.ipynb` contains a Jupyter notebook that demonstrates how to run ShEPhERD in unconditional generation settings, using our ShEPhERD model that has been trained on our ShEPhERD-MOSES-aq dataset.

`RUNME_conditional_generation_MOSESaq.ipynb` contains a Jupyter notebook that demonstrates how to run ShEPhERD in conditional tasks via inpainting, using our ShEPhERD model that has been trained on our ShEPhERD-MOSES-aq dataset.

`train.py` is our training script. It can be run from the command line by specifying a parameter file and a seed. All of our parameter files are held in `parameters/`. As an example, one may re-train the P(x1,x3,x4) model on ShEPhERD-MOSES-aq by calling:

`python train.py params_x1x3x4_diffusion_mosesaq_20240824 0`

Note that for our initial code submission, we've only included a small subset of training data so that this repository is self-contained (e.g., requiring no external downloads) and remains under the 100MB limit. These samples are found in `conformers/gdb/` and `conformers/moses_aq/`. We can provide access to our full datasets to reviewers upon request, and all data will be released upon de-anonymization of this submission.


# Environment Setup

`environment.yml` contains the conda environment used for training and running ShEPhERD. We followed these steps to create a suitable conda environment. Note that this set-up may depend on your system, particularly your cuda version.

```
conda create --name shepherd python=3.8.13
source activate shepherd
conda install merv::envvar-pythonnousersite-true
source deactivate

source activate shepherd

conda config --append channels conda-forge

pip cache purge
pip3 cache purge
export TMPDIR='/var/tmp'

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit==11.3.1 -c pytorch
conda install pyg=2.2.0 -c pyg

pip install e3nn

pip install jupyterlab

pip install pip==24.0
pip install pytorch-lightning==1.6.3
pip install setuptools==59.5.0

pip install rdkit
conda install xtb
pip install open3d
conda install h5py

pip install numpy --upgrade
```