# DDIM-GMM for HuggingFace Diffusers

DDIM-GMM scheduler for the HuggingFace Diffusers library.

## Overview

This is a custom scheduler that extends the standard DDIMScheduler with DDIM-GMM, which replaces DDIM's unimodal Gaussian kernel with a multimodal Gaussian mixture kernel. The mixture parameters are constrained so that the DDIM-GMM forward marginals have the same first and second order moments as the DDPM forward marginals.

## Installation

### Step 1: Install diffusers

```bash
pip install diffusers torch pillow
```

### Step 2: Integrate into diffusers

This scheduler must be placed within the diffusers library source code:

```bash
# Option 1: Copy to your diffusers installation
cp scheduling_ddim.py $(python -c "import diffusers; import os; print(os.path.dirname(diffusers.__file__))")/schedulers/

# Option 2: Clone diffusers and add to source
git clone https://github.com/huggingface/diffusers.git
cd diffusers
cp /path/to/scheduling_ddim.py src/diffusers/schedulers/
```

### Step 3: Register the scheduler (optional)

Edit `src/diffusers/schedulers/__init__.py` to add:

```python
from .scheduling_ddim import DDIMScheduler, GMM
```

## Usage

### Basic Example

```python
from diffusers import UNet2DModel
from diffusers.schedulers.scheduling_ddim import DDIMScheduler, GMM
import torch
from PIL import Image

# Load pretrained model
model = UNet2DModel.from_pretrained("google/ddpm-cat-256")
scheduler = DDIMScheduler.from_pretrained("google/ddpm-cat-256")

# Initialize GMM parameters
gmm_params = GMM(device='cuda')  # or 'cpu'
gmm_params.initialize(
    dim=3 * 256 * 256,      # image_channels × height × width
    n_components=16,         # number of mixture components
    n_steps=50,              # number of inference steps
    scale=1.0,               # scale factor
    uniform_priors=True,     # uniform mixture weights
    orthonormal=True,        # orthonormalize offsets
    upper_bound_vars=True    # use VUB approximation
)

# Set GMM parameters in scheduler
scheduler.set_gmm_params(gmm_params=gmm_params)
scheduler.set_timesteps(50)

# Generate
sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size), device='cuda')
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
        prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
        input = prev_noisy_sample

# Convert to image
image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image.save('output.png')
```

### With Stable Diffusion Pipeline

```python
from diffusers import StableDiffusionPipeline
from diffusers.schedulers.scheduling_ddim import DDIMScheduler, GMM
import torch

# Load pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

# Replace scheduler with DDIM-GMM version
scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

# Initialize GMM
z_resolution = pipe.unet.config.sample_size
z_channels = pipe.unet.config.in_channels
gmm_params = GMM(device='cuda')
gmm_params.initialize(
    dim=z_channels * z_resolution * z_resolution,
    n_components=16,
    n_steps=50,
    scale=0.1,
    uniform_priors=True,
    orthonormal=True,
    upper_bound_vars=True
)

scheduler.set_gmm_params(gmm_params=gmm_params)
pipe.scheduler = scheduler

# Generate
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt, num_inference_steps=50).images[0]
image.save("astronaut.png")
```

## Parameters

### GMM.initialize()

- **dim** (int): Flattened latent dimension
- **n_components** (int): Number of mixture components (8-32 recommended)
- **n_steps** (int): Number of inference steps
- **scale** (float): Scale factor for mean offsets
- **uniform_priors** (bool): Use uniform mixture weights
- **orthonormal** (bool): Orthonormalize mean offsets
- **upper_bound_vars** (bool): Use VUB approximation
  - True: Faster, diagonal approximation
  - False: More accurate, full covariance

## Implementation Notes

### Key Differences from Standard DDIM

1. **GMM Component Sampling**: At each timestep, a mixture component is sampled from the categorical distribution
2. **Mean Offset**: Sampled component's mean offset is added to the prediction direction (line 556)
3. **Covariance Offset**: Variance is reduced by the GMM covariance offset (lines 563-574)

### Code Locations

- **GMM Class**: Lines 34-69
- **set_gmm_params()**: Lines 647-648
- **GMM Integration in step()**: Lines 437-494 (sampling), 555-574 (application)

## Testing

A test script is provided (`test_generation.py`). To use it:

1. Update imports to match your diffusers installation path
2. Run: `python test_generation.py`

**Note**: The test script needs to be modified to uncomment line 28:
```python
scheduler.set_gmm_params(gmm_params=gmm_params)  # IMPORTANT: Uncomment this!
```

## Compatibility

- **Tested with**: diffusers >= 0.21.0
- **Compatible with**: UNet2DModel, StableDiffusionPipeline, and other diffusers pipelines
- **Devices**: CPU, CUDA

## Known Issues

1. **Import paths**: The scheduler uses relative imports and must be placed within the diffusers source tree
2. **Test script**: Imports from `src.diffusers` which assumes a specific directory structure

## License

Apache License 2.0 (inherited from HuggingFace Diffusers)

See LICENSE file for full details.

## Troubleshooting

### "ImportError: cannot import name 'GMM'"
- Make sure `scheduling_ddim.py` is in the correct location
- Check that you're importing from the right module path

### "AttributeError: 'DDIMScheduler' object has no attribute 'gmm_params'"
- Make sure you're using the modified `scheduling_ddim.py`, not the standard one
- Call `scheduler.set_gmm_params(gmm_params)` before sampling

### Generated images look wrong
- Ensure GMM parameters are initialized with correct dimensions
- Try using `upper_bound_vars=True` (VUB) first
- Check that `set_gmm_params()` is called before `set_timesteps()`
