# 3D_MolGNN_RL

<a href="1"><img align="center" src="3d-scaffold-gnn-rl-architecture-v3.png" width="700" height="400"></a>

3D_MolGNN_RL is a novel framework coupling reinforcement learning (RL) to deep generative model based on 3D-Scaffold to generate protein pocket specific desired candidates by placing atom by atom on the core scaffold. Our RL framework provides an efficient way of optimizing a key features within a protein target using parallel graph neural network model as a critic.




<!-- ```math
R_1(s_t) = \alpha.C_{BP}(B,C_t) + \beta.C_{EA}(B,C_t) + (1 -\gamma.C_{SA}(C_t))\\


R_2(s_t) = \alpha.C_{BP}(B,C_t) + (1-\beta.C_{SA}(C_t))
``` -->

**Requirements**
* schnetpack 0.3, 
* pytorch >= 1.2, 
* python >= 3.7,
* ASE >= 3.17.0, 
* Open Babel 2.41, 
* rdkit >= 2019.03.4.0 

Detailed requirements can be found. in `requirement.txt` and can be installed as 

```python
pip install -r requirement.txt
```

**Generate a database:**

Inside the dbs_script, you can run follwing command to generate the database. The input molecule should be in xyz format, and each molecule needs to be associated with a functional group. A sample xyz file is provided as sample.xyz. The molecule's xyz coordinates are provided in the file and the last two lines represent the SMILES strings of the molecule and the associated functional group respectively. The script provided to create the database reads the SMILES string and functional group from the last two lines, but can be altered to read the inputs as per the user's convinience.

```python
python generate_data.py <path_to_xyz_files>
```

Move the generated database file 'scaffold3D.db' and 'scaffold3Dgen.db' to ./dataset/ folder 

**Training the 3D_MolGNN agent:**


The target against which the molecules needs to be obtimzied needs to be kept in pocket_dir directory, with a sub-folder corresponding to the target of interest that should contain the protein-pocket. The weights associated with the reward functions needs to be passed as command line inputs to the script train_3D_MolGNN.py. Here we have used 0.50,0.25 and 0.25 for target-compound binding probability, target-compound binding affinity and compoubnd's SA score respectively. If the experimental_affinity_weight is set to zero, the agent would be optizming on reward_2 otherwise reward_1 ( refer to figure-[1](#1) ).

```python
python ./train_3D_MolGNN.py train 3D_MolGNN ./dbs_script/dataset/ ./model --split 2000 1000 --bp_weight 0.50 --experimental_affinity_weight 0.25 --sa_weight 0.25 --logger csv --log_every_n_epochs 1 --cuda --target <target_name> --batch_size 1 --draw_random_samples 5 --features 64 --interactions 6 --max_epochs 150
```


**Generate a model:**


Once the model is trained, it can be queried to generate molecules. The model needs a functional group as input to start building molecules on top of it.
In the following example, we have used piperazine scaffold as input.

```python
python ./train_3D_MolGNN.py generate 3D_MolGNN  ./model/ 100 --functional_group 'C1CNCCN1' --chunk_size 100 --max_length 65 --file_name scaffold
```

**Filter the generated molecules:**


After generation, the generated molecules can be filtered for invalidity and redundancy structures by running the `filter_generated.py` script as follows.

```python
python filter_generated.py ./model/generated/
```

**Write generated molecules in to xyz file**

The generated moelcules are initially stored in a ase database. They can be converted to xyz format by using the script `write_xyz_files.py`

```python
python write_xyz_files.py model/generated/
```
**Assess the generated molecules with the critic**
Once the molecules are generated, they can be further evaluated using the pretrained crtitcs to estimate the binding affinity and binding probability of the generated compounds towards the target of interest. The reward_mode needs to be set to either 1 or 2 based on the reward chosen for training.

```python
python assess_generated_molecules_RL.py model/generated/xyz_files/ <out_filename> <rewarad_mode>
```


**References;**
1. Gebauer, N.; Gastegger, M.; Sch ̈utt, K. Symmetry-adapted generation of 3d point setsfor  the  targeted  discovery  of  molecules.  Advances  in  Neural  Information  ProcessingSystems. 2019; pp 7566–7578.

2. Rajendra Prashad Joshi, Niklas Gebauer, Neeraj Kumar, and Mridula Bontha.  3d-scaffold:  Deeplearning  framework  to  generate  3d  coordinates  of  drug-like  molecules  with  desired  scaffolds.bioRxiv, 2021.

