# README

This repository accompanies our anonymous NeurIPS 2025 main‑track submission. It provides a self‑contained implementation for training, evaluating and visualising diffusion‑based generative models (including our Lévy–process extension).

## Repository overview

The core entry‑points are:

* **`run.py`** for standard training;
* **`eval.py`** for evaluation and checkpoint picking;
* **`display.py`** for visualising generated samples;
* **`read_results.py`** for aggregating logs into publication‑quality plots;
* **`topological.py`** for topological‑data‑analysis experiments;
* **`ddpm_init.py`** plus the helpers in **`script_utils.py`** for model, optimiser and scheduler initialisation.

All other modules are imported from the `manage`, `data`, `models`, and `evaluate` packages and do not require modification for a standard run.

## Installation

Create a fresh Python (≥3.9) environment, then install the dependencies:

```bash
conda create -n neurips25 python=3.10
conda activate neurips25
pip install -r requirements.txt   # Contains PyTorch ≥2.1, torchvision, numpy, matplotlib, etc.
```

If you prefer, you can install PyTorch manually to match your CUDA toolkit before running the remaining `pip` command.

## Data and configuration

Datasets, preprocessing choices, model hyper‑parameters and run schedules are specified via YAML configuration files (one per experiment).
To download the butterflies dataset, please get a kaggle api key and put in the a `.kaggle` folder. 

## Training

A typical training session is launched with:

```bash
python run.py --config=<CONFIG_NAME> --name=<RUN_TAG>
```

Optional flags allow you to resume or fine‑tune from a checkpoint (`--resume`, `--from_pretrained`), change the number of diffusion steps (`--train_reverse_steps`), or log (`--log`, implement your own in `manager/logger.py`). All arguments are documented in `script_utils.py`.

## Evaluation

To compute quantitative scores on saved checkpoints run:

```bash
python eval.py --config=<CONFIG_NAME> --name=<RUN_TAG> --latest_checkpoint
```

Setting `--force_ema_eval` restricts evaluation to the exponential‑moving‑average weights; setting `--generate <N>` will draw `N` samples and and compute all implemented metrics from them. 

## Visualisation and analysis

* **Qualitative samples** – `display.py` loads checkpoints and either shows or saves panel images.
* **Metric plots** – `read_results.py` converts logged evaluation metrics into concise figures of loss curves etc.
* **Topology‑aware runs** – `topological.py` offers a specialised training loop that alternates between parameter updates and topological measurements; see the main paper appendix.

We also have a separate zip file containing more involved logic for loading and visualizing run results, and compute the topological quantities from saved trajectories. 

## Reproducibility

The code uses `--set_seed` for deterministic data loading and model initialisation. We use a random seed (seed set to `null` in the yaml configuration file) for most runs, except for the topological runs where we set the seed to 0 for all configurations for all the losses to be comparable (same timesteps, same Gaussian noise).
