# Directional Textual Inversion (DTI)

This repository contains the implementation of "Directional Textual Inversion for Personalized Text-to-Image Generation", a method for learning personalized concepts in text-to-image diffusion models.

## Installation

### Prerequisites
- Python 3.9-3.12 (tested on Python 3.12)
- CUDA-compatible GPU with sufficient memory (tested on A6000 GPU)

### Setup

1. Clone the repository:
```bash
git clone [repository-url]
cd dti
```

2. Install the package and dependencies:
```bash
pip install -e .
```

### Dependencies

The main dependencies include:
- PyTorch 2.7.1
- Diffusers 0.35.1
- Transformers 4.56.0
- Accelerate 1.10.1
- PEFT 0.17.1

All dependencies will be automatically installed when you run `pip install -e .`.

## Data Preparation

### Dataset Structure

The training data should be organized as follows:
```
data/
├── dreambooth.json          # Dataset configuration
├── dreambooth/              # Training images
│   ├── subject1/
│   │   ├── 00.jpg
│   │   ├── 01.jpg
│   │   └── ...
│   └── subject2/
├── dreambooth_mask/         # Optional: masks for subjects
└── styledrop/               # Style transfer data
```

### Dataset Configuration

The dataset configuration is stored in JSON files (e.g., `data/dreambooth.json`). Each subject should have:

```json
{
  "subject_name": {
    "path": "data/dreambooth/subject_name",
    "images": ["00.jpg", "01.jpg", "02.jpg", "03.jpg", "04.jpg"],
    "class": "object_class",
    "initialization": "descriptive text",
    "template": ["prompt template 1", "prompt template 2", ...]
  }
}
```

### Download Sample Data

You can download DreamBooth and StyleDrop data using:
```bash
python scripts/download_datasets.py
```

## Running Single-Subject Experiments

### Basic Textual Inversion with DTI

To train a single subject with DTI on SDXL:

```bash
python run.py -g 0 --instances subject_name --max_train_steps 500 --kappa 0.1
```

Key parameters:
- `-g`: GPU ID to use
- `--instances`: List of subjects to train (from the JSON file)
- `--max_train_steps`: Number of training steps
- `--kappa`: DTI regularization strength
- `--lr`: Learning rate (default: 2e-2)

### Training with Different Models

**SANA 1.5B:**
```bash
python run_sana.py -g 0 -m sana1.5_1.6b --instances subject_name --max_train_steps 1000 --kappa 0.05
```

**SANA 4.8B:**
```bash
python run_sana.py -g 0 -m sana1.5_4.8b --instances subject_name --max_train_steps 1000 --kappa 0.05
```

**LoRA + DTI:**
```bash
python run_lora.py -g 0 --instances subject_name --max_train_steps 300 --kappa 0.1 --rank 4
```

**Textual Inversion (TI) only:**
```bash
python run_ti.py -g 0 --instances subject_name --max_train_steps 500
```

### Advanced Options

- `--reparam`: Enable DTI reparameterization (default: true)
- `--scale`: Scaling method for embeddings ("max", "norm", or float value)
- `--init_method`: Initialization method ("token", "random", "mean")
- `--train_magnitude`: Train embedding magnitude separately
- `--adamw`: Use AdamW optimizer instead of SGD

### Evaluation

After training, evaluate the results:

```bash
python scripts/evaluate.py -e output/experiment_name --checkpoint 500
```

Evaluation options:
- `--prompt_set`: Choose prompt set ("simple", "complex", "style")
- `--seeds`: Random seeds for generation (default: [0,1,2,3])
- `--skip_gen`: Skip image generation (for metrics-only evaluation)

## Batch Experiments for Paper Reproduction

### Core Experiments

**1. DTI vs Baselines on SDXL:**
```bash
# DTI (our method)
python run.py --desc "dti" --kappa 0.1 --max_train_steps 500

# Textual Inversion baseline
python run_ti.py --desc "ti-baseline" --max_train_steps 500

# LoRA baseline
python run_lora.py --desc "lora-baseline" --max_train_steps 300 --rank 4
```

**2. DTI on Different Models:**
```bash
# SANA 1.6B
python run_sana.py -m sana1.5_1.6b --desc "sana1.6b-dti" --kappa 0.05 --max_train_steps 1000

# SANA 4.8B
python run_sana.py -m sana1.5_4.8b --desc "sana4.8b-dti" --kappa 0.05 --max_train_steps 1000
```

**3. Hyperparameter Ablations:**

Kappa (DTI strength) ablation:
```bash
for kappa in 0.0 0.05 0.1 0.2 0.5; do
    python run.py --desc "kappa${kappa}" --kappa $kappa --max_train_steps 500
done
```

Learning rate ablation:
```bash
for lr in 1e-2 2e-2 5e-2 1e-1; do
    python run.py --desc "lr${lr}" --lr $lr --max_train_steps 500
done
```

**4. Initialization Methods:**
```bash
# Token initialization (default)
python run.py --desc "init-token" --init_method token

# Random initialization
python run.py --desc "init-random" --init_method random

# Mean initialization
python run.py --desc "init-mean" --init_method mean
```

### Batch Evaluation

After running experiments, evaluate all results:

```bash
# Simple prompts
python scripts/evaluate.py -e output/experiment_dir --prompt_set simple

# Complex prompts
python scripts/evaluate.py -e output/experiment_dir --prompt_set complex --out_dir images_complex

# Style prompts
python scripts/evaluate.py -e output/experiment_dir --prompt_set style
```

### Results Analysis

Generated images are saved in:
- `images/` (simple prompts)
- `images_complex/` (complex prompts)

Quantitative metrics include:
- **Image Similarity**: DINOv2-based similarity to reference images
- **Text Alignment**: CLIP/SigLIP scores for prompt adherence
- **Subject Preservation**: Identity preservation metrics

## Project Structure

```
dti/
├── src/dti/                 # Main package
│   ├── datasets/            # Dataset loading utilities
│   ├── training/            # Training utilities and DTI implementation
│   ├── metrics/            # Evaluation metrics
│   └── utils/              # Utility functions
├── scripts/                # Training and evaluation scripts
├── data/                   # Dataset configurations and images
├── run*.py                 # Experiment runners for different setups
└── output/                 # Training outputs and checkpoints
```

## Key Features

- **Directional Textual Inversion (DTI)**: Novel regularization for better concept learning
- **Multi-model Support**: SDXL, SANA, SD3, FLUX implementations
- **Flexible Training**: Support for TI, LoRA, and combined approaches
- **Comprehensive Evaluation**: Multiple metrics and prompt sets
- **Reproducible Experiments**: Batch scripts for paper results

## Citation

```bibtex
@article{anonymous2025directional,
  title={Directional Textual Inversion \\for Personalized Text-to-Image Generation},
  author={Anonymous},
  year={2025}
}
```

## License

This project is licensed under the terms specified in the LICENSE file.
