# EC$^2$Face: Explicit Conditional Consistency Diffusion for Multimodal Face Generation

## 1. Environment

```bash
conda create -y -n ec2face python=3.10
conda activate ec2face
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install diffusers==0.36.0 accelerate transformers pillow tqdm datasets einops safetensors bitsandbytes
```

Optional packages (monitoring / extra tooling): wandb, tensorboard, clean-fid.

## 2. Base Model

Download FLUX.1-dev from Hugging Face (or place the local folder) and pass its path via:

```text
--pretrained_model_name_or_path /path/to/FLUX.1-dev
```

## 3. Dataset Layout

Expected root (example):

```text
<DATA_ROOT>/
  mmcelebahq/
    train_data/
      face/   *.jpg (ground-truth faces)
      mask/   *.png (semantic mask)
      text/   *.txt (text prompt)
    test_data/
      face/
      mask/
      text/
  mmffhq/
    face/
    mask/
    text/
```
File naming must align across face / mask / text (e.g. 000123.png + 000123.txt).

## 4. Training

Quick start:

```bash
bash train.sh
```
`train.sh` internally launches multi-GPU training with torchrun. Key flags:

| Argument | Purpose |
|----------|---------|
| `--pretrained_model_name_or_path` | FLUX.1-dev base model path or hub id |
| `--output_dir` | Output folder for LoRA checkpoints & samples |
| `--mixed_precision` | bf16 / fp16 / no |
| `--drop_text_prob`, `--drop_condition_image_pro` | Modality dropout for robustness |
| `--enable_LTAL` | Long-tail adaptive learning |
| `--enable_ECCG` | Explicit conditional consistency guidance |
| `--enable_t_modulated` | Time-dependent modulation of consistency loss |
| `--alpha_1`, `--alpha_2` | Weights for mask/text consistency losses |

Outputs:
 
* LoRA checkpoints under `mmface/` (e.g. `mmface/checkpoint-5000/`).
* Periodic validation images in the same folder.

Resume (latest checkpoint auto-detected):

```text
--resume_from_checkpoint latest
```

## 5. Inference (Multimodal)

Single GPU batch (edit index ranges inside `test.py`):

```bash
python test.py
```

Multi-GPU parallel (masks + text prompts):

```bash
python test_mgpu.py mmface/checkpoint-5000/ output/mmface/
```
This script splits the test mask list across all visible GPUs and saves generated `.jpg` files.

## 6. Diversity & Unimodal Modes

| Script | Mode | Description |
|--------|------|-------------|
| `test_diversity.py` | Diversity | (Provide multiple seeds / strategies – adapt as needed) |
| `mask2face.py` | Mask → Face | Generate several face variants from a mask (blank prompt). Saves a 1x4 grid (mask + 3 samples). |
| `txt2face.py` | Text → Face | Generate variants from text only (uses a blank mask). Saves a 1x3 grid. |

Adjust index ranges, paths and output folders inside each script for your data.
 
## 7. Evaluation

- Text Consistency (CLIP Score): https://github.com/Taited/clip-score
- Mask Consistency (Mean Accuracy): https://github.com/kartik-3004/segface
- LPIPS (Perceptual Similarity): https://github.com/richzhang/PerceptualSimilarity
- NIQE: https://github.com/chaofengc/IQA-PyTorch
