# Molecular Generation based on **Mi**ned **C**onnection-**a**ware **M**otfs (**MiCaM**)

## Note

### 1.
We used to call our model **C**onnection **Q**uery VAE (CQ-VAE), and the names in codes have not been all changed. 

### 2.
The current codes still lack organization, and we will update them after organizing.

## Workflow

### 1. Mining connection-aware motifs

It consists of two phases: merging operation learning and motif vocabulary construction.

For merging operation learning, you can run a command in form of

```
python src/merging_operation_learning.py \
    --train_file debug/train.smiles \
    --preprocess_dir preprocessed/debug \
    --operation_path operation.txt \
    --num_iters 100 \
    --min_frequency 1 \
    --num_workers 6
```

For motif vocabulary constraction, you can run a command in form of

```
python src/motif_vocab_construction.py \
    --train_file debug/train.smiles \
    --preprocess_dir preprocessed/debug \
    --vocab_path vocab.txt \
    --operation_path operation.txt \
    --num_operations 50
```

### 2. Preprocess

To generate training data, using a given motif vocabulary, you can run a command in form of

```
python src/generate_training_data.py \
    --train_file deubg/train.smiles \
    --valid_file debug/valid.smiles \
    --preprocess_dir preprocessed/debug \
    --operation_path operation.txt \
    --vocab_path vocab.txt \
    --batch_size 128
```

Alternatively, to run all the entire preprocessing workflow, which includes GFE and generating training data, you can just run a command in form of

```
python src/preprocess.py \
    --train_file debug/train.smiles \
    --valid_file debug/valid.smiles \
    --preprocess_dir preprocessed/debug \
    --operation_path operation.txt \
    --num_iters 100 \
    --min_frequency 1 \
    --vocab_path vocab.txt \
    --num_operations 50 \
    --batch_size 128
```

### 3. Training **MiCaM**

To train the MiCaM model, run a command in form of

```
python src/train.py \
    --job_name train \
    --preprocess_dir preprocessed/debug \
    --model_save_dir ckpt/debug \
    --train_file debug/train.smiles \
    --valid_file debug/valid.smiles \
    --batch_size 128 \
    --operation_path operation.txt \
    --vocab_path vocab.txt \
    --epoch 5 \
    --lr 0.005 \
    --lr_anneal_iter 10 \
    --lr_anneal_rate 0.99 \
    --beta_warmup 10 \
    --beta_min 0.001 \
    --beta_max 0.1 \
    --beta_anneal_period 100 \
    --prop_weight 0.2 \
    --greedy
```

Benchmarking will be automatically conduct during the training process.



