# SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data


## Stage1: Prompt Generation with LLM

1. Follow the instructions [here](https://github.com/tatsu-lab/stanford_alpaca/tree/main) to install the environments for Prompt Generation.

2. Generate prompts:

```
cd llm
CUDA_VISIBLE_DEVICES=0 python prompt_generation --output_dir "YOUR OURPUT DIR" --skills "Whoops"
```

* `output_dir`: Output directory of generated prompts.

* `skills`: Skills to generate. Selecting from [`Whoops`, `Localized Narrative`, `Coco`, `CountBench`, `DiffusionDB`]

* Update your OpenAI API under utils.py  `client = OpenAI(api_key="")`


## Stage2: Image Generation with Stable Diffusion

1. Follow the instructions [here](https://github.com/huggingface/diffusers) to install the environments for stable diffusion.

2. Generate images given prompts:

```
cd image-generation
CUDA_VISIBLE_DEVICES=0 python img_gen_gpt.py --prompt_path "YOUR PROMPT PATH" --output_path "YOUR OUTPUT PATH"
```

* `prompt_path`: Path to prompts generated in Stage 1.

* `output_path`: Where to save the generated images.

* `model_id`: Which Stable Diffusion model to use. Selecting from [`stabilityai/stable-diffusion-2`, `"CompVis/stable-diffusion-v1-4"`, `"stabilityai/stable-diffusion-xl-base-1.0"`]. Defaults to `stabilityai/stable-diffusion-2`.

* `cache_dir`: Cache dir for downloading pre-trained Stable Diffusion model.

* `batch_size`: Batch size for generating images. Default to 1.

## Stage3: LoRA Expert Fine-tuning

1. Follow the instructions [here](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) to install the environments for fine-tuning SD and SDXL.

2. Fine-tune LoRA Expert:

```
cd fine-tune
# Create metadata file for LoRA fine-tuning
python create_metadata_file.py --img_path "YOUR IMG PATH" --prompt_path "YOUR PROMPT PATH"
# LoRA fine-tuning
CUDA_VISIBLE_DEVICES=0 bash ft_sd_lora.sh
CUDA_VISIBLE_DEVICES=0 bash ft_sd_lora_sdxl.sh
```

Update `MODEL_NAME`, `TRAIN_DIR`, and `OUTPUT_PATH` in bash file for your setting.



## Stage4: LoRA Merging during Inference Time

1. Generate images with merged LoRA experts. Evaluation prompts come from [DSG](https://github.com/j-min/DSG) and [TIFA](https://github.com/Yushi-Hu/tifa).

```
cd inference
CUDA_VISIBLE_DEVICES=0 python img_gen_merge.py --ckpt "YOUR CKPT HERE"
```

* `ckpt`: Path to fine-tuned LoRA experts.

* `model_id`: Which Stable Diffusion model to use. Selecting from [`stabilityai/stable-diffusion-2`, `"CompVis/stable-diffusion-v1-4"`, `"stabilityai/stable-diffusion-xl-base-1.0"`]. Defaults to `stabilityai/stable-diffusion-2`.

* `output_path`: Where to save inference images.

* `steps`: Which step in the fine-tuning to use.

* `cache_dir`: Cache dir for downloading pre-trained Stable Diffusion model.

* `eval_benchmark`: Which evaluation benchmark to generate images on. Selecting from [`DSG`, `TIFA`].
