# Welcome to CLaSMO!

This repository includes the implementation of CLaSMO, with an executable example.
Contents of the repo are:

- lsbo_cvae_clasmo_tdc_oracles.py -> lsbo code for TDC oracles. 
- lsbo_cvae_clasmo_run.sh -> runs lsbo_cvae_clasmo_tdc_oracles.py for given oracle name
- clasmo_only_qed.py -> includes the codes of proposed CLaSMO algorithm and runs the QED optimization task
- clasmo_input_data.csv -> input scaffolds to be used in CLaSMO algorithm
- clasmo_results.csv -> Some example results obtained by running clasmo_qed.py using clasmo_input_data.csv
- settings_cvae.yml -> details on model architecture
- clasmo_inputs/
    - E_ld_2_beta_1e-06.pt & D_ld_2_beta_1e-06.pt -> trained encoder and decoder of CVAE
    - embeddings_encoder.pt -> trained encoder for conditional embeddings
    - minmaxscaler.joblib -> scaler
    - gp_cvae_y_sorted_ld_2_beta_1e-06_target_property_qed.csv -> gp training data (y_delta (reward))
    - gp_cvae_selected_atom_ids_ld_2_beta_1e-06_target_property_qed.csv -> gp training data (selected atom ids)
    - gp_cvae_x_selected_ld_2_beta_1e-06_target_property_qed.pt -> gp training data (latent vectors)
    - ring_fragments_neighbors_qm9_enriched.txt -> dataset prepared using our BRICS-based approach
- clasmo_cvae_data_preparation_notebook.ipynb -> Demonstrates the data preparation step using 0SelectedSMILES_QM9.txt file (QM9 dataset) as input.


## Environment Setup
To run the experiment, set up a Conda environment with Python 3.7.16 and the required dependencies from requirements.txt:

Create and activate the Conda environment:

> conda create -n clasmo python=3.7.16
> conda activate clasmo

Install dependencies:

> pip install -r requirements.txt

## Running the Experiment

After setting up the environment, use the following command to run the experiment:

> python clasmo_qed.py

It will run the scaffold optimization experiments for QED task. The code will automatically print the improvement in QED values, and will save the results in clasmo_results_new_run.csv file. You will see print messages like below:

```
CLaSMO is running for input scaffold number 0
**** LSBO STEP 0 ****
**** LSBO STEP 1 ****
**** LSBO STEP 2 ****
y_delta is 0.027989904785057695 at CLaSMO step 2 for input molecule 0, QED is improved to 0.8495 from 0.8215.
**** LSBO STEP 3 ****
**** LSBO STEP 4 ****
**** LSBO STEP 5 ****
**** LSBO STEP 6 ****
y_delta is 0.02996707160949008 at CLaSMO step 6 for input molecule 0, QED is improved to 0.8795 from 0.8215.
**** LSBO STEP 7 ****
**** LSBO STEP 8 ****
**** LSBO STEP 9 ****
**** LSBO STEP 10 ****
**** LSBO STEP 11 ****
y_delta is 0.015820974854680925 at CLaSMO step 11 for input molecule 0, QED is improved to 0.8953 from 0.8215.
**** LSBO STEP 12 ****
**** LSBO STEP 13 ****
**** LSBO STEP 14 ****
**** LSBO STEP 15 ****
**** LSBO STEP 16 ****
y_delta is 0.01535360578812428 at CLaSMO step 16 for input molecule 0, QED is improved to 0.9106 from 0.8215.
**** LSBO STEP 17 ****
**** LSBO STEP 18 ****
**** LSBO STEP 19 ****
y_delta is 0.006981167071848482 at CLaSMO step 19 for input molecule 0, QED is improved to 0.9176 from 0.8215.
```


