# Data Curation for Image Captioning with Text-to-Image Generative Models

## Requirements
To install requirements: `conda env create -f env.yml`.

## Finetune captioning with curation
```
cd DCIC 

OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=4 --master_port=10068 \
train_caption_with_curation.py \
    --config <config_file> \
    --output_dir <output_dir> \
    --curtaion_option "replace_img" \
    --curation_ratio 0.05 --change_caption "loss" 
```
- `curation_option` can be set as: `remove`, `replace_cap`, and `replace_img`
- `curation_ratio` is the curation ratrio for each of the option
- `change_caption` can be set as `loss` or `length`, where we change the caption for the replaced images at the same time.

For Flickr30K and COCO, download COCO and Flickr30k datasets from the original websites, and 'image_root' should be setup in the config files accordingly: `/configs/caption_coco_dc.yaml`, and `/configs/caption_flickr30k_dc.yaml`. We will realease the synthesized images we use for `replace_img` later following community guidelines.



## Evaluate
Evaluate can be done simply by adding the `--evaluate` argument.
```
train_caption_with_curation.py \
    --config <config_file> \
    --output_dir <output_dir> \ 
    --evaluate 
```
When evaluate, set the finetuned checkpoint in `pretrained` model path in the config_file. 

## Generate synthesized images for curation online

### Set up
Clone the stable diffusion [repo](https://github.com/Stability-AI/stablediffusion.git). We provide a script `sd_gen.py` that supports batched generation of images, which needs to be put in `stablediffusion/scripts/` folder. Follow the repo guidelines to install requirements for the SD model to work properly.
Our finetuned Stable Diffusion v1.5 model can be downloaded [here](https://drive.google.com/file/d/1O3-G60VVt9Z5qpIWKXdbv1B1J3ZEjTL-/view?usp=sharing). 


### Run
```
OMP_NUM_THREADS=1 torchrun --standalone --nnodes=1 --nproc_per_node=1 --master_port=10068 \
train_caption_with_curation.py \
    --config <config_file> \
    --output_dir <output_dir> \
    --curtaion_option "sd" \
    --curation_ratio 0.05 \
    --change_caption "loss" \ 
    --distributed False \
    --concat_prompt True --add_styler True
```


 ## Performance
 | Method      | B             | M             | R             | C              | S             | CS            | RCS           |
|-------------|---------------|---------------|---------------|----------------|---------------|---------------|---------------|
FLickr30K |
| BLIP        | 37.6          | 27.2          | 57.1          | 92.8           | 20.1          | 78.6          | 81.1          |
| +Remove     | 38.6          | **27.4** | **57.5** | **95.8**  | 21.0          | **79.2** | 81.9          |
| +ReplaceCap | 37.9          | **27.4** | 57.4          | 94.5           | **21.1** | 78.9          | 81.5          |
| +ReplaceImg | **39.0** | 27.3          | 57.4          | 95.7           | 20.7          | 79.1          | **82.0** |
COCO|
| BLIP        | 39.9          | 30.8          | 59.9          | 132.0          | 23.8          | 77.3          | 82.8          |
| +Remove     | 40.1          | 30.9          | 60.0          | 132.5          | 23.6          | 77.3          | 82.8          |
| +ReplaceCap | **40.2** | 30.9          | **60.1** | 132.7          | **23.9** | 77.3          | 82.8          |
| +ReplaceImg | **40.2** | **31.0** | **60.1** | **133.1** | **23.9** | 77.3          | 82.8          |
