# Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback

This is the codebase for **Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback**.

This repository is based on [openai/guided-diffusion](https://github.com/openai/guided-diffusion) and includes the following features:
- Implementation of the human feedback framework
- Integration of the `latent_guided_diffusion` module, which is derived from [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion) with modification to perform guided sampling.
- Most components used in our experiments related to guidance is based on [arpitbansal297/Universal-Guided-Diffusion](https://github.com/arpitbansal297/Universal-Guided-Diffusion) with some necessary modifications.


## Contents
- [Installation](#installation)
- [Experiment 1. MNIST 7 (censoring crossed 7s)](#experiment-1-mnist-7)
- [Experiment 2. LSUN Church (censoring stock watermarks)](#experiment-2-lsun-church)
- [Experiment 3. ImageNet Tench (censoring human faces)](#experiment-3-imagenet-tench)
- [Experiment 4. LSUN Bedroom (censoring broken images)](#experiment-4-lsun-bedroom)


# Installation

Make sure you have Python version **3.9** installed.

To install the required dependencies, run the following command:

```
pip install -e .
```

This will install the `guided_diffusion` python package that the scripts depend on.

You also need to separately install `torchvision`. Run the following command to install it.

```
pip install torchvision
```



# Experiment 1. MNIST 7
## 1.1 Prepare training data

Run the following command:
```sh
python datasets/mnist_7.py
```

The script will download the MNIST train dataset, select only the images of the digit 7, resize the images to 32x32, and save them into the `mnist_7` directory.
<!-- 
If you want to use your own datset, you can place your images in a directory with the extensions ".jpg", ".jpeg", or ".png". The training code will be able to read them if you pass the `--data_dir` argument pointing to the directory when running the training script. -->

## 1.2 Train diffusion model

Run the following shell script to train the DDPM model on MNIST 7s:
```sh
MODEL_FLAGS="--image_size 32 --image_channels 1 --num_channels 128 --num_res_blocks 3"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--lr 1e-4 --batch_size 256 --save_interval 100000"
LOG_DIR="path/to/log" # The diffusion model will be saved in .pt format within the directory specified by this path.
NUM_GPUS="1" # The number of GPUs used in parallel computing. If this is larger than 1, adjust the batch_size argument accordingly.

echo $(mpiexec -n $NUM_GPUS python scripts/image_train.py --log_dir=$LOG_DIR --data_dir=mnist_7 --rgb=False --random_flip=False $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS)
```

## 1.3 Prepare human feedback data for reward model training

### 1.3.1 Generate and save baseline samples
Run the following shell script to generate baseline samples:
```sh
MODEL_FLAGS="--image_size 32 --image_channels 1 --num_channels 128 --num_res_blocks 3"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
SAMPLE_FLAGS="--batch_size 250 --num_samples 2000"
MODEL_PATH="path/to/diffusion/model.pt"
LOG_DIR="path/to/log" # The generated images will be saved in .npz format within this directory.
NUM_GPUS="8"

echo $(mpiexec -n $NUM_GPUS python scripts/image_sample.py --log_dir $LOG_DIR --model_path $MODEL_PATH $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

Run the following command to convert the NPZ sample file into PNG images and save them into designated path.

```sh
python scripts/save_samples_as_files.py \
    --sample_path path/to/sample.npz \ 
    --save_dir path/to/baseline/sample/dir # Each sample image will be saved in .png format within this directory.
```


### 1.3.2 Provide human feedback on baseline samples using GUI
Run the following command to run our GUI-based human feedback collector, until you find desired number (10, to reproduce our experiments) of malign images. of The provided labels will comprise the train data for the **reward model**:
```sh
python hf_data/collect_feedback_v2.py \
    --data_dir path/to/baseline/sample/dir \
    --feedback_path path/to/total_feedback.pkl \
    --resolution some-integer-value # Resolution in which each image will be displayed (default 150)
    --grid_row some-integer-value # Number of rows in the image grid to be displayed
    --grid_col some-integer-value # Number of columns in the image grid
```

The provided human labels will be saved into the file `total_feedback.pkl` within the specified directory.
The .pkl file stores a dictionary, whose keys are paths to generated image files from baseline sampling and values are binary labels 0 or 1, where 0 indicates malign and 1 indicates benign (when the user is not sure, `None` label can be provided).


### 1.3.3 Create partial data for ensemble training
Run the following command to create the partial .pkl files for training reward ensemble:
```sh
python hf_data/select_partial_feedback.py \
    --all_feedback_path path/to/total_feedback.pkl
    --out_feedback_path path/to/partial_feedback.pkl \
    --num_malign_samles 10 \
    --num_benign_samles 10
```
**Note**: If `partial_feedback.pkl` already exists at the `out_feedback_path`, the new feedback information will be appended to it instead of overwriting the existing data.

<!-- To create 5 datasets for ensemble, you can start by selecting 10 malign samples and creating 5 copies of each. Then, select an additional 10 benign samples for each subset. -->

To reproduce the ablation study (the **Union** model case), run the following shell script to merge (union) multiple feedback files:
```sh
FEEDBACK_PATH_1="path/to/partial/feedback/for/training_1.pkl"
FEEDBACK_PATH_2="path/to/partial/feedback/for/training_2.pkl"
FEEDBACK_PATH_3="path/to/partial/feedback/for/training_3.pkl"
FEEDBACK_PATH_4="path/to/partial/feedback/for/training_4.pkl"
FEEDBACK_PATH_5="path/to/partial/feedback/for/training_5.pkl"
OUT_DIR="path/to/save/union_feedback.pkl"

echo $(python hf_data/union_feedback.py --feedback_paths $FEEDBACK_PATH_1 $FEEDBACK_PATH_2 $FEEDBACK_PATH_3 $FEEDBACK_PATH_4 $FEEDBACK_PATH_5 --out_union_feedback_dir $OUT_DIR)
```

## 1.4 Train reward model
Run the following shell script to train reward models:
```sh
REWARD_FLAGS="--image_size 32 --image_channels 1 --classifier_attention_resolutions 16,8,4 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --output_dim 1"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
TRAIN_FLAGS="--augment_mnist True --iterations 1001 --anneal_lr True --lr 3e-4 --batch_size 128 --save_interval 1000 --weight_decay 0.05"
POS_WEIGHT="0.02" # Change this to 0.005 for the 'Union' case
NUM_AUGMENT="10"
FEEDBACK_PATH="path/to/partial_feedback.pkl"
AUGMENT_DATA_DIR="path/to/save/temporary/augmented/images"
LOG_DIR="path/to/log" # The reward model will be saved in .pt format within this directory.
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/feedback_reward_train.py --log_dir=$LOG_DIR --pos_weight=$POS_WEIGHT --augment_data_dir=$AUGMENT_DATA_DIR --num_augment=$NUM_AUGMENT --feedback_path=$FEEDBACK_PATH $REWARD_FLAGS $TRAIN_FLAGS $DIFFUSION_FLAGS)   
```
The `POS_WEIGHT` parameter corresponds to $\alpha$ within the weighted BCE loss $BCE_{\alpha}$.

To train the **Union** model, change `POS_WEIGHT` argument to 0.005 and `FEEDBACK_PATH` to `path/to/union_feedback.pkl`.

## 1.5 Perform censored sampling
### 1.5.1 Single & Union
For guided sampling using the **Single** and **Union** models, run:
```sh
MODEL_FLAGS="--image_size 32 --image_channels 1 --num_channels 128 --num_res_blocks 3"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
REWARD_FLAGS="--output_dim 1 --classifier_attention_resolutions 16,8,4 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
SAMPLE_FLAGS="--sampling_type ddpm --num_recurrences 1 --classifier_scale 0 --backward_steps 0 --optim_lr 0.0 --use_forward False --original_guidance True --original_guidance_wt 5.0 --batch_size 200 --num_samples 1000"
MODEL_PATH="path/to/diffusion/model.pt"
REWARD_PATH="path/to/reward/model.pt"
LOG_DIR="path/to/log" # The generated samples will be saved in .npz format within this directory.
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/feedback_reward_universal_guidance.py --log_dir $LOG_DIR --model_path $MODEL_PATH --reward_path $REWARD_PATH $MODEL_FLAGS $REWARD_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

### 1.5.2 Reward ensemble & universal guidance (backward and recurrence)
For guided sampling using the ensemble model (with universal guidance components), run:
```sh
MODEL_FLAGS="--image_size 32 --image_channels 1 --num_channels 128 --num_res_blocks 3"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
CLASSIFIER_FLAGS="--output_dim 1 --classifier_attention_resolutions 16,8,4 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
SAMPLE_FLAGS="--sampling_type ddpm --num_recurrences 1 --backward_steps 0 --optim_lr 0.001 --use_forward False --original_guidance True --original_guidance_wt 5.0 --batch_size 200 --num_samples 1000"
MODEL_PATH="path/to/diffusion/model.pt"
REWARD_PATHS_1="path/to/reward/model_1.pt"
REWARD_PATHS_2="path/to/reward/model_2.pt"
REWARD_PATHS_3="path/to/reward/model_3.pt"
REWARD_PATHS_4="path/to/reward/model_4.pt"
REWARD_PATHS_5="path/to/reward/model_5.pt"

REWARD_TYPES="0 0 0 0 0" # Do not adjust this argument unless you are newly implementing ensemble with multiple architectures.
LOG_DIR="path/to/log" # The generated samples are saved in .npz format within this directory.
NUM_GPUS="1" # When backward/recurrence is used, set this to 1.

echo $(mpiexec -n $NUM_GPUS python scripts/feedback_reward_universal_guidance_ensemble.py --log_dir $LOG_DIR --model_path $MODEL_PATH --reward_paths $REWARD_PATHS_1 $REWARD_PATHS_2 $REWARD_PATHS_3 $REWARD_PATHS_4 $REWARD_PATHS_5 --reward_types $REWARD_TYPES $MODEL_FLAGS $CLASSIFIER_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

Change `backward_steps` ($B$ in the paper), `optim_lr` (backward guidance learning rate), and `num_recurrences` ($R$ in the paper) as desired. 
When using reward ensemble, the `original_guidance_wt` is $K$ times $\omega$ (in time-dependent guidance) from the paper.

<!-- To enable the recurrence and backward options, set the `--num_recurrences` to 4, the `--backward_steps` to 5, and the `--original_guidance_wt` to 1. -->


# Experiment 2. LSUN Church

## 2.1 Environment Setup

A seperate environment is required for experiments in this section, which use the `latent diffusion model` repository.

Setup a conda environment `diffusion_hf` using the provided YAML file and activate it:
```
conda env create -f enviroment.yaml
conda activate diffusion_hf
```
Alternatively, you can try running:
```
cd latent_guided_diffusion
conda env create -f enviroment.yaml
conda activate diffusion_hf
```
and then repeat the [Installation](#installation) steps.


## 2.2 Download pretrained latent diffusion model
Download the pretrained LDM components from [CompVis/latent-diffusion](https://github.com/CompVis/latent-diffusion).

### Pretrained Autoencoding Models


| Model                   | rFID vs val | train steps           |PSNR           | PSIM          | Link                                                                                                                                                  | Comments              
|-------------------------|------------|----------------|----------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|
| f=8, KL                 | 0.90       | 246803 | 24.19 +/- 4.19 | 1.02 +/- 0.35 |             https://ommer-lab.com/files/latent-diffusion/kl-f8.zip                            |                       |

Download the model:
```sh
cd latent_guided_diffusion
sh scripts/download_first_stages_kl-f8.sh
```
The first stage models can then be found in `models/first_stage_models/kl-f8`.

### Pretrained LDM
| Datset                          |   Task    | Model        | FID           | IS              | Prec | Recall | Link                                                                                                                                                                                   | Comments                                        
|---------------------------------|------|--------------|---------------|-----------------|------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
| LSUN-Churches                   | Unconditional Image Synthesis   |  LDM-KL-8 (400 DDIM steps, eta=0)| 4.02 (4.02) | 2.72 | 0.64 | 0.52 |         https://ommer-lab.com/files/latent-diffusion/lsun_churches.zip        |                                                 |  

Download the model:
```sh
sh scripts/download_church_model.sh
```
The models can then be found in `models/ldm/lsun_churches256`.

## 2.3 Prepare human feedback data for reward model training

### 2.3.1 Generate and save baseline samples
Sample generation:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --image_channels 3 --num_channels 256 --learn_sigma True --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
SAMPLE_FLAGS="--num_recurrences 1 --backward_steps 0 --use_forward False --batch_size 8 --num_samples 1000"
LOG_DIR="path/to/log"
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_ldm_feedback_reward_universal_guidance.py --log_dir $LOG_DIR $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS --use_ldm True)
```

Convert the NPZ sample file into PNG images and save (same as in [Section 1.3.1](#131-generate-and-save-baseline-samples)).

### 2.3.2 Labeling with GUI

Same as in [Section 1.3.2](#132-provide-human-feedback-on-baseline-samples-using-gui).

### 2.3.3 Create partial data for ensemble training

Same as in [Section 1.3.3](#133-create-partial-data-for-ensemble-training) except that `num_malign_samples` and `num_benign_samples` should be set to 30, instead of 10.

## 2.4 Train Reward Model
Run the following shell script to train reward models:
```sh
POS_WEIGHT="0.1" # Change this to 0.01 for the 'Union' case
TRAIN_FLAGS="--isaugment False --image_size 256  --iterations 2001 --anneal_lr True --lr 3e-4 --batch_size 128 --save_interval 200 --weight_decay 0.05"
FEEDBACK_PATH="path/to/partial_feedback.pkl"
NUM_GPUS="1"
LOG_DIR="path/to/log" 

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_binary_universal_partial_classifier_train_aug.py --log_dir=$LOG_DIR --rgb=True --feedback_path=$FEEDBACK_PATH --pos_weight=$POS_WEIGHT $TRAIN_FLAGS)
```

<!-- You can adjust the `--isaugment` flag to enable or disable data augmentation during training.
If you set `--isaugment True`, make sure to adjst the `--p_malgin_transfrom` and `--p_benign_transfrom` values based on the type of data augmentation you want to apply, such as MNIST and Tench. -->

<!-- You can adjust the `pos_weight` parameter to modify the $BCE_{\alpha}$ value and control the balance between positive and negative samples during training.

To obtain the union reward model, you can change the `feedback_path` to the `union_feedback` pickle file. -->

## 2.5 Perform censored sampling
### 2.5.1 Single & Union
For guided sampling using the **Single** and **Union** models, run:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --image_channels 3 --num_channels 256 --learn_sigma True --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True  --use_scale_shift_norm True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
SAMPLE_FLAGS="--num_recurrences 1 --use_forward True --forward_guidance_wt 10.0 --batch_size 8 --num_samples 500"
LOG_DIR="path/to/log"
REWARD_PATHS="path/to/reward/model.pt"
NUM_GPUS="4"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_ldm_feedback_reward_universal_guidance.py --log_dir $LOG_DIR $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS --reward_path $REWARD_PATHS --use_ldm True)

```

<!-- To obtaion the union samples, you can change `REWARD_PATH` to the union reward model pickle file. -->

### 2.5.2 Reward ensemble & universal guidance (backward and recurrence)
For ensemble and ensemble with recurrence and backward, run the following command:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --image_channels 3 --num_channels 256 --learn_sigma True --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
SAMPLE_FLAGS="--num_recurrences 1 --use_forward True --forward_guidance_wt 10.0 --batch_size 8 --num_samples 500"
REWARD_PATHS_1="path/to/reward/model_1.pt"
REWARD_PATHS_2="path/to/reward/model_2.pt"
REWARD_PATHS_3="path/to/reward/model_3.pt"
REWARD_PATHS_4="path/to/reward/model_4.pt"
REWARD_PATHS_5="path/to/reward/model_5.pt"
LOG_DIR="path/to/log"
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_ldm_feedback_reward_universal_guidance_ensemble.py --log_dir $LOG_DIR --use_ldm True --reward_paths $REWARD_PATHS_1 $REWARD_PATHS_2 $REWARD_PATHS_3 $REWARD_PATHS_4 $REWARD_PATHS_5 $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

The `forward_guidance_wt` is $K$ times $\omega$ (in time-independent guidance) from the paper.


# Experiment 3. ImageNet Tench

## 3.1 Download pretrainined diffusion model and classifier
Download the following checkpoints provided by the OpenAI's Guided Diffusion repo:
 * 128x128 classifier: [128x128_classifier.pt](https://openaipublic.blob.core.windows.net/diffusion/jul-2021/128x128_classifier.pt)
 * 128x128 diffusion: [128x128_diffusion.pt](https://openaipublic.blob.core.windows.net/diffusion/jul-2021/128x128_diffusion.pt)

## 3.2 Prepare human feedback data for reward model training
### 3.2.1 Generate and save baseline samples

Sample generation:
```sh
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 128 --image_channels 3 --num_channels 256 --learn_sigma True --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
CLASSIFIER_FLAGS="--classifier_scale 0.5 --output_dim 1000 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
SAMPLE_FLAGS="--batch_size 128 --num_samples 1000 --target_class 0"
MODEL_PATH="path/to/diffusion.pt"
CLASSIFIER_PATH="path/to/classifier.pt"
LOG_DIR="path/to/log"
NUM_GPUS="4"

echo $(mpiexec -n $NUM_GPUS python scripts/classifier_sample.py --log_dir $LOG_DIR --model_path $MODEL_PATH --classifier_path $CLASSIFIER_PATH $MODEL_FLAGS $CLASSIFIER_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

Convert the NPZ sample file into PNG images and save (same as in [Section 1.3.1](#131-generate-and-save-baseline-samples)).

### 3.2.2 Provide human feedback on baseline samples using GUI
Same as in [Section 1.3.2](#132-provide-human-feedback-on-baseline-samples-using-gui).

### 3.2.3 Create partial data for round 1 of imitation learning
Run:
```sh
python hf_data/select_partial_feedback.py \
    --all_feedback_path path/to/total_feedback.pkl
    --out_feedback_path path/to/partial_feedback.pkl \
    --num_malign_samles 10 \
    --num_benign_samles 10
```

By adjusting `num_malign_samples` and `num_benign_samples` to 20 or 30, one can also prepare data for training non-imitation learning models used in the ablation study.

## 3.3 Train Reward Model
Run the following shell script to train reward models:
```sh
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
CLASSIFIER_FLAGS="--image_size 128 --output_dim 1 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
TRAIN_FLAGS="--augment_imgnet True --p_benign_transform 1 1 1 0 --p_malign_transform 1 1 1 0 --rgb True --iterations 501 --save_interval 500 --anneal_lr True --lr 3e-4 --batch_size 32 --weight_decay 0.05" # Change 'iterations' when performing imitation learning
FEEDBACK_PATH="path/to/partial_feedback.pkl"
AUGMENT_DATA_DIR="path/to/save/temporary/augmented/images"
POS_WEIGHT="0.1"
NUM_AUGMENT="20"
LOG_DIR="path/to/log"
NUM_GPUS="4"
RESUME_CHECKPOINT="" # Use this for imitation learning 

echo $(mpiexec -n $NUM_GPUS python scripts/feedback_reward_train.py --resume_checkpoint=$RESUME_CHECKPOINT --log_dir=$LOG_DIR --pos_weight=$POS_WEIGHT --augment_data_dir=$AUGMENT_DATA_DIR --num_augment=$NUM_AUGMENT --feedback_path=$FEEDBACK_PATH $CLASSIFIER_FLAGS $TRAIN_FLAGS $DIFFUSION_FLAGS)   
```

### 3.3.1 Imitation learning
To perform imitation learning, first follow [Section 3.4](#34-perform-censored-sampling) to collect samples from the previous round of censoring. 
Then, repeat the procedure of [Section 3.2.2](#322-provide-human-feedback-on-baseline-samples-using-gui) and [Section 3.2.3](#323-create-partial-data-for-round-1-of-imitation-learning).
Next, run the following to merge feedback .pkl files from the previous round and the newly incoming feedback data:

```sh
FEEDBACK_PATH_1="path/to/feedback_from_last_round.pkl"
FEEDBACK_PATH_2="path/to/new_partial_feedback.pkl"
OUT_DIR="path/to/save/feedback_for_this_round.pkl"

echo $(python hf_data/union_feedback.py --feedback_paths $FEEDBACK_PATH_1 $FEEDBACK_PATH_2 --out_union_feedback_dir $OUT_DIR)
```

Change the `FEEDBACK_PATH` argument to `path/to/save/feedback_for_this_round.pkl` and the `RESUME_CHECKPOINT` argument to `path/to/model_from_last_round.pt` in the reward training script.
For round 2, set `iterations` to 1500. For round 3, set `iterations` to 3000.


## 3.4 Perform censored sampling
Run:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 128 --image_channels 3 --num_channels 256 --learn_sigma True --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
CLASSIFIER_FLAGS="--classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
SAMPLE_FLAGS="--sampling_type ddpm --num_recurrences 1 --classifier_scale 0.5 --backward_steps 0 --optim_lr 0.01 --use_forward False --original_guidance True --original_guidance_wt 5.0 --batch_size 50 --num_samples 200 --target_class 0"
MODEL_PATH="path/to/diffusion/model.pt"
CLASSIFIER_PATH="path/to/classifier/model.pt"
REWARD_PATHS="path/to/reward/model.pt"
LOG_DIR="path/to/log"
NUM_GPUS="4"

echo $(mpiexec -n $NUM_GPUS python scripts/feedback_reward_universal_guidance.py --log_dir $LOG_DIR --model_path $MODEL_PATH --classifier_path $CLASSIFIER_PATH --reward_path $REWARD_PATHS $MODEL_FLAGS $CLASSIFIER_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```

For backward guidance and recurrence (to be combined with round 3), adjust the `backward_steps` ($B$ in the paper) and `num_recurrences` ($R$ in the paper) arguments. The `original_guidance_wt` here coincides with $\omega$ in the paper (**unlike the cases using ensemble**).

# Experiment 4. LSUN bedroom

## 4.1 Download pretrained model
Download the following checkpoint provided by the OpenAI's Guided Diffusion repo:
 * LSUN bedroom: [lsun_bedroom.pt](https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_bedroom.pt)

## 4.2 Prepare human feedback data for reward model training
### 4.2.1 Generate and save baseline samples
Sample generation:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --dropout 0.1 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
SAMPLE_FLAGS="--batch_size 16 --num_samples 5000 --timestep_respacing 1000"
MODEL_PATH="path/to/diffusion/model.pt"
LOG_DIR="path/to/log"

echo $(mpiexec -n 4 python scripts/image_sample.py --log_dir $LOG_DIR $MODEL_FLAGS --model_path $MODEL_PATH $SAMPLE_FLAGS)
```

Convert the NPZ sample file into PNG images and save (same as in [Section 1.3.1](#131-generate-and-save-baseline-samples)).

### 4.2.2 Provide human feedback on baseline samples using GUI
Same as in [Section 1.3.2](#132-provide-human-feedback-on-baseline-samples-using-gui).

### 4.2.3 Create partial data for ensemble training

Same as in [Section 1.3.3](#133-create-partial-data-for-ensemble-training) except that `num_malign_samples` and `num_benign_samples` should be set to 100, instead of 10.

## 4.3 Train Reward Model
Run:
```sh
POS_WEIGHT=0.1 # Change this to 0.02 for the 'Union' case
TRAIN_FLAGS="--isaugment False --image_size 256  --iterations 2001 --anneal_lr True --lr 3e-4 --batch_size 128 --save_interval 200 --weight_decay 0.05"
FEEDBACK_PATH="path/to/feedback.pt"
LOG_DIR="path/to/log"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_binary_universal_partial_classifier_train_aug.py --log_dir=$LOG_DIR --rgb=True --feedback_path=$FEEDBACK_PATH --pos_weight=$POS_WEIGHT $TRAIN_FLAGS)
```
<!-- You can adjust the `--isaugment` flag to enable or disable data augmentation during training.
If you set `--isaugment True`, make sure to adjst the `--p_malgin_transfrom` and `--p_benign_transfrom` values based on the type of data augmentation you want to apply, such as MNIST and Tench.

You can adjust the `pos_weight` parameter to modify the $BCE_{\alpha}$ value and control the balance between positive and negative samples during training.

To obtain the union reward model, you can change the `feedback_path` to the `union_feedback` pickle file. -->

## 4.4. Perform censored sampling
### 4.4.1 Single & Union
For guided sampling using the **Single** and **Union** models, run:
```sh
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --dropout 0.1 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
MODEL_PATH="path/to/diffusion/model.pt"
SAMPLE_FLAGS="--num_recurrences 1 --use_forward True --forward_guidance_wt 10.0 --batch_size 5 --num_samples 50"
REWARD_PATH="path/to/reward/model.pt"
LOG_DIR="path/to/log" 
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_feedback_reward_universal_guidance.py --log_dir $LOG_DIR --model_path $MODEL_PATH --reward_path $REWARD_PATH $MODEL_FLAGS $SAMPLE_FLAGS)
```
<!-- To obtaion the union samples, you can change `REWARD_PATH` to the union reward model pickle file. -->

## 4.4.3 Reward ensemble & universal guidance (backward and recurrence)
For guided sampling using the ensemble model (with universal guidance components), run:
```sh
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear"
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --dropout 0.1 --image_size 256 --learn_sigma True --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
SAMPLE_FLAGS="--num_recurrences 1 --backward_steps 0 --optim_lr 0.01 --use_forward True --forward_guidance_wt 10 --batch_size 8 --num_samples 500"
MODEL_PATH="path/to/diffusion/model.pt"
REWARD_PATHS_1="path/to/reward/model_1.pt"
REWARD_PATHS_2="path/to/reward/model_2.pt"
REWARD_PATHS_3="path/to/reward/model_3.pt"
REWARD_PATHS_4="path/to/reward/model_4.pt"
REWARD_PATHS_5="path/to/reward/model_5.pt"
LOG_DIR="path/to/log"
NUM_GPUS="1"

echo $(mpiexec -n $NUM_GPUS python scripts/LSUN_feedback_reward_universal_guidance_ensemble.py --log_dir $LOG_DIR --model_path $MODEL_PATH --reward_paths $REWARD_PATHS_1 $REWARD_PATHS_2 $REWARD_PATHS_3 $REWARD_PATHS_4 $REWARD_PATHS_5 $MODEL_FLAGS $DIFFUSION_FLAGS $SAMPLE_FLAGS)
```
<!-- To enable the recurrence and backward options, set the `--num_recurrences` to 4, the `--backward_steps` to 5. -->

The `forward_guidance_wt` is $K$ times $\omega$ (in time-independent guidance) from the paper.

# 5. Appendix

## Errors while environment setup regarding `mpi4py`

1. If you encounter any missing package errors, use `pip` to install those packages until you no longer receive the errors related to the `mpi4py` package.

   If installing `mpi4py` using `pip` doesn't work, try:
   ```sh
   conda install -c conda-forge mpi4py mpich
   ```

2. Once the necessary packages are installed, your environment should be ready to use. Avoid loading the `cuda/1X.X` module when running this repository. If the `module list` command displays `cuda/1X.X`, try unloading it using:
   ```sh
   module unload cuda/1X.X
   ```