# Diffusion from scratch

## Installation
```bash
pip install -e .
```

## Training the baseline DM
```bash
python scripts/train_edm.py --config.training.n_iters 30000
```

## Generating the synthetic dataset
```bash
python scripts/generate_dataset.py --config.checkpoint <BASELINE_DM_CHECKPOINT> --config.n_samples <N_SAMPLES> --config.out <OUT_NAME>
```
It will saves a positive dataset of `<N_SAMPLES>` samples at `results/neg_dataset/<OUT_NAME>_pos.npy` and the negative dataset of the same size at `results/neg_dataset/<OUT_NAME>_pos.npy`.

## Training the classifier
```bash
python scripts/train_classifier.py --config.data.dataset <POS_DATASET_PATH> --config.data.neg_dataset <NEG_DATASET_PATH> --config.training.alpha <ALPHA>
```
In this command, `<POS_DATASET_PATH>` and `<NEG_DATASET_PATH>` are the paths to the generated positive and negative datasets. `<ALPHA>` is "1 - infraction rate" of the baseline DM.

## Running the pipeline
To run the pipeline of, given a baseline DM, recursively training a classifier and stacking it to the model, run the following command:
```bash
python scripts/train_edm_iterative.py --baseline_checkpoint <BASELINE_DM_CHECKPOINT> --synth_dataset_size 50000 --n_train_iters 20000
```

## Distillation
```bash
python scripts/distill_edm.py --config.training.n_iters 250000 --config.distill.checkpoint <BASELINE_DM_CHECKPOINT> --config.distill.classifier <CLASSIFIER_CHECKPOINTS>
```
Here, `<CLASSIFIER_CHECKPOINTS>` is a comma-separated list of paths to classifier checkpoints.

## Testing the models
```bash
python scripts/test_edm.py --checkpoint <CHECKPOINT_PATH>
```
It computes the ELBO and infraction rate multiple times and reports mean and standard deviation of the results.