# RPO

## How to to run the code
### Dataset
Guarantee to download the dataset at first.
```
git clone https://github.com/google/dreambooth
```
### Dependencies
Run the following to install a subset of necessary python packages for our code
```
conda env create -f environment.yml
conda activate rpo
```
### Usage

#### Training
Train Stable Diffusion Models through `train.py`. We use the `dog` subject as an example.
```
export SAVE_DIR="path-to-save-model/rpo/"
export REFERENCE_DIR="../dreambooth/dataset/dog"
export GENERATE_DIR="generate/dog"

python train.py \
  --reference_data_dir=$REFERENCE_DIR \
  --generated_data_dir=$GENERATE_DIR \
  --savepath=$SAVE_DIR \
  --beta=1.0 \
  --prompt="a photo of [V] dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --learning_rate=5e-6 \
  --max_train_steps=400 \
  --eval_steps=40 \
  --bucket="rpo-nips-bucket" \
  --subject="dog" \
  --class_token="dog" \
  --reward_lambda=0.3
```
#### Evaluation
Evaluate the RPO-trained model through `eval.py`.
```
export SAVE_DIR="path-to-save-model/rpo/"
python eval.py \
    --reference_data_dir="../dreambooth/dataset/dog" \
    --savepath=$SAVE_DIR \
    --resolution=512 \
    --class_token="dog" \
    --subject="dog" \
    --bucket="rpo-nips-bucket"
```
To compute the DINO, CLIP-I, and CLIP-T scores and upload the results to a Google Cloud Bucket:
```
python report.py --bucket="rpo-nips-bucket" \
    --prefix="lambda=0.3/experiments/rpo" \
    --local_folder="logs/results/rpo" \
    --latency_prefix="lambda=0.3/validation_rewards/rpo"
```

#### Configuration
- `reference_data_dir`: the path of the reference images, like dog, backpack, cat, and etc.
- `generated_data_dir`: the path of the generated images from the pre-trained base model. 
- `savepath`: the path of saving checkpoints.
- `beta`: the $\beta$ used in our paper, which is the regularizer weight.
- `prompt`: the prompt $\mathbf{c}$ used in our paper.
- `resolution`: the resolution of the generated images.
- `train_batch_size`: the batch size and the `total_train_batch size = num_devices * train_batch_size`.
- `max_train_steps`: the maximum training steps.
- `eval_steps`: we evaluate the model performance by $\lambda$-Harmonic reward function per `eval_steps`.
- `bucket`: the google cloud bucket that to store the result, e.g., validation rewards during training.
- `subject`: a folder will be used for uploading result.
- `class_token`: the class token for the subject, the more detailed class tokens can be found in `class_mappings.txt`.
- `lambda_reward`: the $\lambda_{\text{val}}$ used in our paper.
- `prefix`: the prefix for the bucket path.
- `local_folder`: the path for saving the report results.
- `latency_prefix`: the prefix for the latency report.

### Results
RPO achieves the following performance on [DreamBench](https://github.com/google/dreambooth).
| $\lambda_{\text{val}}$ | DINO | CLIP-I| CLIP-T |
|----|----|----|----|
|0.3| 0.581  | 0.798 | 0.329|
|0.5| 0.652  | 0.833 | 0.314 |
|0.7| 0.679 | 0.850 | 0.304 |