# One-Shot Ligand Scoring (OSLS)

This repository contains code for the proposed one-shot ligand scoring (OSLS) algorithm. Unfortunately, no pretrained models could be provided due to upload size constraints.

## Requirements
[RDKit](https://www.rdkit.org/docs/Install.html) is required, as well as the following python packages that can be installed with pip:
```
torch
scipy
scikit-learn
numpy
tqdm
SmilesPE
```

## Preprocessing
`preprocess_bindingdb.py` contains code to extract and preproces data from BindingDB. Calling `python preprocess_bindingdb.py <filename>` will load data from `<filename>`, that must be in the BindingDB tsv format, and produce a `data.pickle` file that is ready for training. For the paper, we used `BindingDB_All.tsv` from BindingDB's [Download](https://www.bindingdb.org/rwd/bind/chemsearch/marvin/SDFdownload.jsp?all_download=yes) page, available [here](https://www.bindingdb.org/bind/downloads/BindingDB_All_2022m8.tsv.zip).

## Training
`train.py` contains the main script to train OSLS. The following parameters may be supplied: `--context_affinity_min` (default -50) specifies the minimum context affinity used to train the model, and `--context_affinity_max` (default 50) specifies the maximum context affinity used to train the model. Minimum and maximum context affinities should be supplied in log10 of nanomoles/liter (nM). The defaults are beyond both the minimum and maximum values, meaning using the defaults will train a model across the whole context affinity range.

 `--run_name` optionally specifies a name for TensorBoard logging, and `--checkpoint_file` will save model checkpoints to the provided filename (default `model.pt`). TensorBoard logs from training the model are saved to the `./runs/` folder, and includes both training and testing statistics.

## Inference
`score_ligands.py` uses the trained model to perform inference on a given context and query. The following parameters must be supplied: `--context_smiles` specifies the SMILES string of the context molecule and `--context_affinity` specifies its affinity to the target in nanomoles/liter (nM), `--query_smiles` specifies the SMILES string of the query molecule, and `--model_file` specifies the path to the trained model. The script prints to stdout the affinity prediction of the query molecule, also in nM. Note that the supplied context affinity should be in the same range that was used to train the model, or the model will not function as expected.