# rectified-ema-guidance

This repository contains the implementation for our method described in `Rectifying Diffusion Guidance with Exponential Moving Average`.  
It builds upon the [EDM2](https://github.com/NVlabs/edm2), [Diffusers](https://github.com/huggingface/diffusers), [torch-fidelity](https://github.com/toshas/torch-fidelity) framework and provides the necessary code and scripts for evaluation.

## 📦 Dependencies

We use the same environment configuration as [EDM2](https://github.com/NVlabs/edm2) and [Diffusers](https://github.com/huggingface/diffusers).

To set up the environment:

```bash
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
Make sure you are using Python 3.10. If you encounter environment issues, refer to the EDM2 setup instructions.

# 🔧 Training
Not applicable – this repository does not include training scripts.
We rely on pre-trained models provided by [EDM2](https://github.com/NVlabs/edm2) and [Diffusers](https://github.com/huggingface/diffusers).

# 📊 Evaluation

Our method is implemented in eval_edm2.py, eval_sdxl.py, and eval_sd3.py, and each script can be executed by running eval_edm2.sh, eval_sdxl.sh, and eval_sd3.sh respectively. First, download the ImageNet 512 validation set and place it in a directory named `imagenet_val` (this is required for evaluating ImageNet metrics). Then, run the following script:

```bash
bash eval_edm2.sh
```

```bash
bash eval_edm2_xl.sh
```

```bash
bash eval_sdxl.sh
```

```bash
bash eval_sd3.sh
```
Each script will run the evaluation pipeline and generate the results reported in the paper.

# 🧠 Pre-trained Models
We use pre-trained models from the [EDM2](https://github.com/NVlabs/edm2) and [Diffusers](https://github.com/huggingface/diffusers) repository.
Please follow their instructions for downloading the necessary checkpoints.


# 📁 Repository Structure
TODO: (EXAMPLE)
```text
├── torch_fidelity_utils                    # Utility functions for FID, IS, Precision and Recall etc.
├── eval_edm2.py                            # Generates class-conditional ImageNet images using EDM2 and 
                                            computes FID, FD (DINOv2), Precision, and Recall
├── eval_edm2.sh                            # Shell script to run eval_edm2.py
├── eval_sdxl.py                            # Generates text-to-image samples using SDXL and 
                                            computes FID,  CLIPScore, Precision, and Recall
├── eval_sdxl.sh                            # Shell script to run eval_sdxl.py
├── eval_sd3.py                             # Generates text-to-image samples using SD3 and 
                                            computes FID, CLIPScore, Precision, and Recall
├── eval_sd3.sh                             # Shell script to run eval_sd3.py
├── generate_images_custom.py               # Implements REG method for image generation with EDM2
├── pipeline_stable_diffusion_3_reg.py      # Implements REG pipeline using Stable Diffusion 3 (SD3)
├── pipeline_stable_diffusion_xl_reg.py     # Implements REG pipeline using Stable Diffusion XL (SDXL)
├── toy_example_custom.py                   # REG implementation on a 2D toy distribution
├── run_toy.sh                              # Shell script to run toy_example_custom.py
├── utils.py                                # Utility functions for CLIPScore, and MS COCO etc.
├── requirements.txt                        # Python dependencies
├── README.md                               # Project overview and usage instructions

```

# 📄 License
This codebase uses the same license as [EDM2](https://github.com/NVlabs/edm2), [Diffusers](https://github.com/huggingface/diffusers) and [torch-fidelity](https://github.com/toshas/torch-fidelity) in accordance with its licensing terms.
```
Copyright © 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

All material, including source code and pre-trained models, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
```
```
@misc{obukhov2020torchfidelity,
  author={Anton Obukhov and Maximilian Seitzer and Po-Wei Wu and Semen Zhydenko and Jonathan Kyl and Elvis Yu-Jing Lin},
  year=2020,
  title={High-fidelity performance metrics for generative models in PyTorch},
  url={https://github.com/toshas/torch-fidelity},
  publisher={Zenodo},
  version={v0.3.0},
  doi={10.5281/zenodo.4957738},
  note={Version: 0.3.0, DOI: 10.5281/zenodo.4957738}
}
```
```
@misc{von-platen-etal-2022-diffusers,
  author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Dhruv Nair and Sayak Paul and William Berman and Yiyi Xu and Steven Liu and Thomas Wolf},
  title = {Diffusers: State-of-the-art diffusion models},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huggingface/diffusers}}
}
```

# 🙌 Acknowledgements
We thank the authors of [EDM2](https://github.com/NVlabs/edm2) and [Diffusers](https://github.com/huggingface/diffusers) and [torch-fidelity](https://github.com/toshas/torch-fidelity) for open-sourcing their framework, which our method builds upon.
