# Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders

This repo is based on the paper "Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders" (ICLR 2024); our model are Holder+ (CHolderplus) and Holder++ (CHolderplus_disen).

## Dataset
### CUBICC
We introduce a variation of the CUB Image-Captions dataset, based on the Caltech-UCSD Birds (CUB) dataset. Sub-species are grouped into eight species.

To download the CUBICC dataset run:
```
curl -L -o CUBICC.zip https://polybox.ethz.ch/index.php/s/LRkTC2oa6YHHlUj/download
unzip CUBICC.zip
```

## Experiments
Run on CUBICC dataset (CHolderplus):
```
bash src/commands/run_CUBICC_experiment_CHolderplus.sh
```

Run on CUBICC dataset (CMVAE):
```
bash src/commands/run_CUBICC_experiment_CMVAE.sh
```

## Disentanglement metrics
CHolderplus:
```
bash src/commands/run_CUBICC_eval_disentanglement_metrics_CHolderplus.sh
```

CMVAE:
```
bash src/commands/run_CUBICC_eval_disentanglement_metrics_CMVAE.sh
```

If no classifier checkpoint exists yet, run once with `--train-classifier` to create it (saved to `pretrained/cubicc_image_classifier.pt`, then shared across runs).

## Citing
```
@inproceedings{
palumbo2024deep,
title={Deep Generative Clustering with Multimodal Diffusion Variational Autoencoders},
author={Emanuele Palumbo and Laura Manduchi and Sonia Laguna and Daphn{\'e} Chopard and Julia E Vogt},
booktitle={International Conference on Learning Representations},
year={2024},
}
```

## Acknowledgements
Codebase based on the MMVAE+ repo.
