<!-- TODO: Add SiD-DiT teaser image here -->
![SiD-DiT](Examples/000002_1.png "Four-[NAME] [NAME]-to-[NAME] [NAME]")
![SiD-DiT](Examples/000001_1.png "Four-[NAME] [NAME]-to-[NAME] [NAME]")

# SiD-DiT: Distillation of DiT-based flow-matching models. This repository enables **fast, few-step text-to-image generation** via scalable, generalizable distillation techniques. The same set of hyperparameters work across **all major DiT-based models**, including:

- **SANA** ([Rectified Flow and TrigFlow; 0.6B and 1.6B)
- **SD 3-Medium**
- **SD 3.5-Medium**
- **SD 3.5-Large**
- **FLUX.1-dev** (512×512 and 1024×1024)

---

## 🧪 

Two FSDP training variants are supported:

- **AMP (Autocast + bf16) + FSDP**
- **Pure BF16 + FSDP**

Example: In `run_sid_dit_sd3.sh`, use the following flags:

**AMP + FSDP**
```bash
--fp16 0 \
--bf16 0 \
--autocast_bf16 1
```
- Uses: `lr = glr = 1e-6`, ADAM `eps = 1e-8`

**Pure BF16 + FSDP**
```bash
--fp16 0 \
--bf16 1 \
--autocast_bf16 1
```
- Uses: `lr = glr = 1e-5`, ADAM `eps = 1e-4`

> These hyperparameters are plug-and-play across all supported models.

All models—**except FLUX.1-dev at 1024×1024 resolution**—can be trained on a **single node with 8×80GB A100 or H100 GPUs**, with **rapid convergence** typically achieved within a few hours. Longer training can yield incremental gains, but improvements taper off after the initial convergence phase.

For **FLUX.1-dev at 1024×1024**, we **recommend using B100 GPUs**. Although it is technically possible to fit the model into eight 80GB GPUs using `cpu_offloading`, we’ve observed inconsistencies in FSDP gradient updates when `cpu_offloading` is enabled—leading to behavior that diverges from the non-offloaded baseline. While `cpu_offloading` is supported in the codebase, it **has not been fully debugged** and should be used with caution.

---

## 🚀 Features
**SiD-DiT** compresses large DiT-based models into **fast**, **few-step generators** with high visual fidelity and broad applicability.

Key features include:

- **Few-step generation** (default: 4 steps)
- **Flexible noise scheduling**:
  - `fresh` (default)
  - `fixed` and `ddim` — equally effective in practice, and often preferred in tasks requiring deterministic latent inputs
- **Configurable loss weighting schemes**:
  - Default: `1_minus_sigma`
  - Alternatives: `sid_default`, `1_over_sigma`, and other variants
  - Each weighting function biases the output differently (e.g., toward higher contrast or saturation). Choose based on aesthetic preference.
    - We favor `1_minus_sigma` for brighter, more "sunny" visuals.
- **Distributed training** via FSDP 
- **Support for AMP and BF16** training modes
- **Automatic FID & CLIP evaluation** on COCO-2014 for checkpoint selection  
  - These metrics are useful for tracking progress **within the same teacher**, but may not be reliable when comparing across different teachers or at higher resolutions.

> **Note**: SiD is **data-free by default**, requiring only **text prompts** for distillation. In this repository, the default configuration uses the [midjourney-v6-llava] dataset, which provides synthetic text–image pairs. However, **only the prompts** are used under data-free settings.

For training with **adversarial losses**, the corresponding images are also utilized. Be aware that the synthetic images in midjourney-v6-llava are often of **lower quality than the outputs of SiD-distilled models** (e.g., from SD3, SD3.5, FLUX). As such, **we do not recommend enabling Diffusion GAN training (setting `--train_diffusiongan 1`) unless your provided image data is of demonstrably higher quality than your distilled model outputs**.

---

## 🛠️ Installation

### Step 1: 

```bash
# Clone repo
git clone [URL]
cd sid-dit

# Set up environment
conda env create -f sid_dit_environment.yml
conda init
source ~/.bashrc
conda activate sid_dit
```

### Step 2:

##  

```bash
mkdir -p /data/{datasets,image_experiment}

cd /data/datasets

# Download MS-COCO and other assets (needed for evaluation and text prompts)

cp /data/MS-COCO-256.tar.gz . --quiet
tar -xzf MS-COCO-256.tar.gz

cp /data/aesthetics_6_plus.txt aesthetics_6_plus/aesthetics_6_plus.txt --quiet
cp /data/midjouerny-v6-llava data/midjourney-v6-llava/ --recursive --quiet
cp /data/clipvitg14.pkl .
```

---

## 🔐 using your token:

huggingface-cli login --token <YOUR_HF_TOKEN>
```

Ensure your token has access to **SD3** and **SD3.5** models. This is not needed for using SANA.

---

## 🚦 

Run with:

```bash
sh run_sid_dit_sd3_ANON.sh sd3-medium 1_minus_sigma 8
```

- Check `run_sid_dit_sd3_ANON.sh` for all available model options and configurations.

---
