# RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models


## Installation

```shell
conda create -n RealCompo python==3.8.10
conda activate RealCompo
pip install -r requirements.txt
```

## Download models

We provide the code of RealCompo v1, which is composed of Stable Diffusion v1.5 and GLIGEN.

You should download the checkpoints of GLIGEN in hugging face: gligen/gligen-generation-text-box/diffusion_pytorch_model.bin and put its path into  `inference.py`.

## Generating images with RealCompo

### Option 1: Use LLMs to reason out the layout

You can get the results through running: 

```bash
python inference.py --user_prompt 'Two cute small corgi sitting in a movie theater with two popcorns in front of them.' --api_key 'put your api_key here' 
```

**--user_prompt** is the original prompt that used to generate a image.

**--api_key** is needed if you use GPT-4.

**You can also use local LLMs to reason out layouts**. Example samples will be saved in `generation_samples`. You can check `inference.py` for more details about interface. 

```
generation_samples
├── generation_realcompo_v1_sd_gligen_two_cute_small_corgi_sitting_in
│   ├── 0.png
│   ├── 1.png
|   .....
......
```

### Option 2: Manually setting the layout

If you already have the layouts related to all objects, you can directly run:

```bash
python inference.py  --no_gpt --user_prompt 'Two cute small corgi sitting in a movie theater with two popcorns in front of them.' --object "['a cute small corgi', 'a cute small corgi', 'a movie theater', 'popcorn', 'popcorn']" --boundingbox "[[0.05, 0.05, 0.52, 0.58], [0.52, 0.05, 1.0, 0.58], [0.0, 0.0, 1, 1], [0.0, 0.6, 0.48, 0.95], [0.52, 0.6, 1, 0.95]]" --token_location "[4, 4, 9, 12, 12]"
```

**--no_gpt** can be used when you have already obtained the layout.

**--object** represents the set of objects mentioned in the prompt.

**--boundingbox** represents the set of layout for each object.

**--token_location** represents the set of locations where each object appears in the prompt.



You can change the backbone of the T2I model to Stable Diffusion v1.4, TokenCompose, and other (stylized) T2I models. 

The core code for updating the models' coefficients is located in `ldm/models/diffusion/plms.py`. Using this code, you can make slight modifications to replace the L2I model with another one.


