# CPSample: Classifier Protected Sampling for Guarding Private Data During Diffusion

This repo contains the PyTorch implementation for the paper "CPSample: Classifier Protected Sampling for Guarding Private Data During Diffusion".

Abstract : *Despite surpassing GAN's with respect to quality metrics, diffusion models and GAN's share the risk of replicating training data during inference.  While prior works have focused on mitigating this risk though various training techniques, their methods have resulted in significant degradation of image quality. 
    We present CPSample, a modification to the denoising process that safeguards against replication without compromising generation quality.  Moreover, our technique provides diffusion models with greater robustness against inference attacks.  Our key idea is to use a classifier trained on random 0-1 labels to steer the denoising process away from points in the training dataset.  We explain the theory behind this sampling technique and present its empirical successes on the CIFAR-10, CelebA, and LSUN Church datasets.*

<div align="center">
  <img src="images/CelebASim.png" alt="CelebASim" title="Generated image and most similar training image pairs for DDIM sampling (left) and CPSample (right) on CelebA." width="400" height="400"/>
</div>
<h6>
Generated image and most similar training image pairs for DDIM sampling (left) and CPSample with alpha = 0.001, s=1000 (right).  These are the four examples out of 100 images that have the highest similarity scores with their nearest neighbor.  The model was heavily fine-tuned on a subset of 1000 images.
</h6>

## Installation

We recommend using a conda environment to install the required packages.

```
conda create -n dp python=3.8
conda activate dp
```

Then, install the required packages using pip.
```
pip install -r requirements.txt
```

Download the pretrained denoiser model and classifier from the following links:
- [Denoiser](https://drive.google.com/file/d/1-1Z)  # TODO
- [Classifier](https://drive.google.com/file/d/1-1Z)  # TODO

## Experiment

### Prepare a subset of the training data

TODO: Add instructions for preparing a subset of the training data.

### Train a classifier on a subset
```
python main.py --config <CONFIG_FILE_PATH> --exp <WORKING_DIRECTORY> --doc <MODEL_DIR> --doc_classifier <CLASSIFIER_DIR> --train_classifier --subset --resume_training  --indices <SUBSET_INDICES_FILE>
```

### Fine-tuning denoiser model on the subset
Fine-tune the pretrained denoiser model on the subset until it begins to produce images that look very similar to the training set (but not so much that it completely collapses).

```
python main.py --config <CONFIG_FILE_PATH> —exp <WORKING_DIRECTORY> --doc <MODEL_DIR> --ni --doc_classifier <CLASSIFIER_DIR> --subset --indices=<SUBSET_INDICES_FILE> --resume_training
```
e.g.
```
python main.py --config celeba_test.yml --exp DiffDP --doc celeba_finetune --ni --doc_classifier celeba_classifier --subset --indices celeba_subset_images/subset_indices.txt --resume_training
```


### Generating samples from the fine-tuned denoiser model
To generate **protected** images from the fine-tuned denoiser model, run the following command:
```
python main.py --config <CONFIG_FILE_PATH> --exp <WORKING_DIRECTORY> --sample --fid --timesteps 1000 --eta 0 --ni --sample_type generalized_guided --tolerance 0.05 --scale 100 --doc_classifier <CLASSIFIER_DIR> --image_folder=<OUTPUT_IMAGE_FOLDER>
```

To generate **raw** images from the fine-tuned denoiser model, run the following command:
```
python main.py --config <CONFIG_FILE_PATH> --exp <WORKING_DIRECTORY> --sample --fid --timesteps 1000 --eta 0 --ni --sample_type generalized_guided --tolerance 0.0 --scale 0 --doc_classifier <CLASSIFIER_DIR> --image_folder=<OUTPUT_IMAGE_FOLDER>
```
  

### Do inference attacks
```
python main.py --config <CONFIG_FILE_PATH> --exp <WORKING_DIRECTORY> --doc <MODEL_DIR> --doc_classifier <CLASSIFIER_DIR> --subset --indices <SUBSET_INDICES_FILE> --inference_attack
```

## CPSample for guided generation
Please check the jupyter script `jupyter_scripts/CPSample_Guided_Generation.ipynb` modified from "Stable Diffusion Deep Dive" for Stable Diffusion Guided Generation using CPSample.