# Controllable Image Generation with Composed Parallel Token Prediction

## Environment Setup
First ensure conda is installed. Then initialise a conda environment using:

```
conda env create --name controllable --file requirements.yml
```

## Dataset Downloads and Setup

FFHQ can be obtained [here](https://github.com/NVlabs/ffhq-dataset) and FFHQ annotations can be obtained[here](https://github.com/DCGM/ffhq-features-dataset).

Positional CLEVR and Relational CLEVR (and to the subsets needed for evaluation can be obtained) can be obtained [here](https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/tree/main/classifier).

After downloading the relevant files, place the following in `/datasets/` (alternatively, modify `./utils/data_utils.py` to use the directory of choice).


* `clevr_pos_data_128_30000.npz`
* `clevr_training_data_128.npz`
* `clevr_pos_5000_1.npz`
* `clevr_pos_5000_2.npz`
* `clevr_pos_5000_3.npz`
* `clevr_generation_1_relations.npz`
* `clevr_generation_2_relations.npz`
* `clevr_generation_3_relations.npz`
* `FFHQ/`
* `ffhq-features-dataset/`

The data partitions (image IDS) used for evaluation on FFHQ are in `ffhq_<N_COMPONENTS>_partition.txt`. The corresponding images are used for computing accuracy and FID. These were chosen at random from the FFHQ dataset.


## VQ-VAE/VQ-GAN Training
The following scripts train a VQ-VAE (for `clevr_pos` and `clevr_rel`) or VQ-GAN (for `FFHQ`). Settings are those used for results reported in the paper.

```
./train_vqgan_clevr_pos.sh
```

```
./train_vqgan_clevr_rel.sh
```

```
./train_vqgan_ffhq.sh
```

## Sampler Training
The following scripts train conditional samplers for each dataset of interest. Settings are those used for results reported in the paper.

```
./train_sampler_clevr_pos.sh
```

```
./train_sampler_clevr_rel.sh
```

```
./train_sampler_ffhq.sh
```

## Accuracy and FID Evaluation
Before running evaluations on FFHQ specifically, first run `python3 utils/prepare_ffhq_npz.py` to convert the images into the correct format.

The following scripts evaluate compositional generation (accuracy and FID).

The pre-trained classifiers must first be trained (clevr_pos and clevr_rel classifiers can be obtained from https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch/tree/main/classifier)

```
./eval_clevr_pos.sh
```
```
./eval_clevr_rel.sh
```
```
./eval_ffhq.sh
```

STDOUTs are dumped to `./logs/experiments_<DATASET>/acc_<N_COMPONENTS>.txt` and `./logs/experiments_<DATASET>/FID_<N_COMPONENTS>.txt`, results are contained therein.

## Sample Time Evaluation

```
./run_time_batch.sh
```

STDOUTs are dumped to `./logs/experiments_time/time_<BATCH_SIZE>_<N_COMPONENTS>.txt`, results are contained therein.
