# Diffusing DeBias: Synthetic Bias Amplification for Model Debiasing

# Abstract

**DISCLAIMER**: Due to the conference upload size limits, 

## Getting Started

### Requirements

+ python 3.10+
+ pytorch 2.0+ (with torchvision)
+ An NVIDIA GPU

### Datasets

We implemented automatic download for the benchmark datasets analyzed in this study, therefore there is no need to manually add them. For the Urbancars and Imagenet9 datasets, please refer to [Whac-A-Mole](https://github.com/facebookresearch/Whac-A-Mole/blob/main/README.md#urbancars-experiments) and [ReBias](https://github.com/clovaai/rebias) repositories, respectively.

### Setup Python Environment:

To set up your python environment, you can use venv+pip and leverage the provided dependency file "requirements.txt":

```
python3.10 -m venv <env_path>
source <env_path>/bin/activate
pip install -r requirements.txt
```

# Running DDB Experiments

### Synthetic Image Generations

To streamline experimentation, we have included the synthetic images required to run the Debiasing Recipes. These images are located in the ```Debiasing/data/synthetic``` directory. Specifically:

- ```w_1/imagenet```: Contains the synthetic images used for the main results.
- Other folders: Contain synthetic images for the ablation studies.
  By providing these pre-generated synthetic images, reviewers can run the full DDB pipeline without the need to train the CDPM or generate by themselves.

## Diffusing the Bias

To run components from this part, you need to change your current working directory to ```DiffuseBias```, then you can launch both CDPM training and Image Generation as follows:

- Launch CDPM model training
  
  ```
  python runCDPM.py --state train --iterations 100000 --batch_size 32 --dataset waterbirds --img_size 64 --device cuda:0
  ```

- Generate synthetic images
  
  ```
  python runCDPM.py --state eval --load_weights path/to/checkpoint.pt --batch_size 100 --dataset waterbirds --img_size 64 --device cuda:0
  ```

Generated image captions, used for quantitatively validating identified biases, can be obtained by running:

```
    python captions_generator.py /path/to/synthetic/images.npy/directory/ --device cpu
```

## Debiasing Recipes

To run the different debiasing recipes you need to change your current working directory to ```Debiasing```, then create the directories ```outputs``` and ```saved_models```, finally launch Recipe I and Recipe II as follows:

### Recipe I: two-step debiasing

To execute DDB Recipe I with three different runs on different seeds, an example command is

```
bash scripts/waterbirds_seeds_recipe_one.sh
```

- Available ablation studies on Waterbirds include: 
  
  - Ablation on classifier-free guidance strength $w$:
    
    ```
    bash scripts/waterbirds_ablation_guidance.sh
    ```
  
  - Ablation on the number of synthetic images used to train the Bias Amplifier:
    
    ```
    bash scripts/waterbirds_ablation_trimages.sh
    ```
    
    Regarding the ablation on an unbiased dataset (CIFAR-10), you can execute:
    
    ```
    bash scripts/ablation_unbiased_cifar10.sh
    ```

### Recipe II: end-to-end debiasing

To execute DDB Recipe II with three different runs on different seeds, an example command is

```
bash scripts/waterbirds_seeds_recipe_two.sh
```
