# Improving Diffusion Models for Scene Text Editing with Dual Encoders
This repo contains the code for submission 11351 [Improving Diffusion Models for Scene Text Editing with Dual Encoders].

The basic sturcture follows [stable-diffusion](https://github.com/CompVis/stable-diffusion), which utilizes `pyTorch-lightning` as backbone training framework. We use `diffusers` for loading model.  

## Prepare Environment
Install the required packages in `requirements.txt`.

## Prepare Data 
1. Real world dataset: download by running 
```bash
sh down_data.sh
```
2. Synthetic dataset: install requirements in `synthtigergenerator` folder and run
```bash
sh gen_synth.sh
```
Notice that you may need to first download fonts from google fonts library, we include a list of font names we used in `synthgenerator/resources/100fonts`.

The generated data will be in folder `ocr-dataset`.

## Train the model
The main script is `train.py`. You can train the model by running
```bash
python train.py --base ${config_paths} --stage fit --name ${run_name} --project ${project_name} --base_logdir ${log_directory}
```
Logs will be saved in `${log_directory}/${project_name}/${time}_${run_name}`.
An example config file is in `configs` folder, which defines the hyper parameter and other information required for training.

An example:
```bash
python train.py --base configs/config_charinpaint.yaml --stage fit --name trialrun --project DiffSTE --base_logdir logs/
```

## Test the model
```bash
python test.py --log_dir evaluate_result/ --base configs/config_charinpaint.yaml --path ${model_path} --todo ${dataset_name}
```