# Autoencoder for Synthetic to Real Generalization: From Simple to More Complex Scenes

This repository is the official PyTorch implementation of [Autoencoder for Synthetic to Real Generalization: From Simple to More Complex Scenes](https://iclr.cc/). 

![alt text](graphics/architecture.png "Title")


If you want to cite our work, please use the following bibtex entry:

```
@article{
  TBC
}
```

## Requirements

To install requirements:

```setup
pip3 install -r requirements.txt
```

If you want to plot the t-SNE projection of the latent space, then you will need to use additionally anaconda and install `cuml` (which provides, among other things, GPU accelerated t-SNE projection).
One could use, for example, the following command, but please check their [website](https://rapids.ai/start.html) to make sure to install the correct version for your needs.

```setup
conda create -n rapids-0.19 -c rapidsai -c nvidia -c conda-forge cuml=0.19 python=3.8 cudatoolkit=11.0
```

## Downloads

The datasets need to be downloaded manually from the following locations:

| Dataset                       | Website                                                                              | License         |
| ------------------------------|--------------------------------------------------------------------------------------|-----------------| 
| SEATS_AND_PEOPLE_NOCAR (Ours) | https://drive.google.com/file/d/1D0ni-kIbSCsPLAuZDOxT-wv4YPNjr2hb/view?usp=sharing   | CC BY-NC-SA 4.0 |
| SVIRO                         | https://sviro.kl.dfki.de/                                                            | CC BY-NC-SA 4.0 |
| SVIRO-Illumination            | https://sviro.kl.dfki.de/                                                            | CC BY-NC-SA 4.0 |
| TICaM                         | https://vizta-tof.kl.dfki.de/                                                        | CC BY-NC-SA 4.0 |
| MPI3D                         | https://github.com/rr-learning/disentanglement_dataset                               | CC BY 4.0       |

Place the datasets inside a folder of your choice. Then define the root path to all the datasets inside `dataset.py`:
```
ROOT_DATA_DIR = Path("")
```
Potentially, you need to adapt the folder name of the downloaded datasets to match the names used inside `dataset.py` (i.e. the names from the table above).
When you run a training script for a dataset for the first time, then the script will perform a pre-processing to center crop and resize the images to 128 pixels and save them alongside the original images.

## Training

The hyperparameters for the different training approaches for the different models are defined in config files located in the cfg folder.
Modify the hyperparameters accordingly to which approach you want to use during training. 
Then, inside `train_ae.py`, `repeat_train_ae.py`,  `train_classifier.py` or `repeat_train_classifier.py` modify which config file to load, e.g.:
```
config = toml.load("cfg/extractor_ae.toml")
```
Finally, to train the model using the defined config file, run one of those commands:

```
python3 train_ae.py
```
```
python3 train_classifier.py
```
```
python3 repeat_train_ae.py
```
```
python3 repeat_train_classifier.py
```

`repeat_train_ae.py` and `repeat_train_classifier.py` will repeat the same experiment using the same hyperparameters for the different seeds defined inside the files respectively.
The config files are self-explanatory and provide the necessary information to reproduce the results of our paper. We provide config files for the most important experiments in the config folder.

## Evaluation

Evaluation needs to be performed by chaining a few scripts after another.
All the experiments, which you want to evaluate, need to be located inside the results folder.

If you want to evaluate your model on all (or a subset of all) datsets, you can use `eval_ae.py` or `eval_classifier.py`, depending on whether you want to evaluate an autoencoder approach or a classifier.
In either case, the accuracy will be saved inside the experiment folder, such that you can use it later using `calculate_mean_and_var_accuracy.py`.
Put the experiment folder names inside the script and run

```
python3 eval_ae.py
```
```
python3 eval_classifier.py
```

You can then use `calculate_mean_and_var_accuracy.py` to compute the mean, standard deviation and max performance on all datasets.
Again, you need to specify inside the scripts which experiments to consider and run

```
python3 calculate_mean_and_var_accuracy.py
```

If you just want to plot the training performance distribution on TICaM for each checkpoint across several runs, you can use `plot_accuracy_over_time.py`.
In that case, you only need to group and specify the experiments together which should be used to compute the average performance and run

```
python3 plot_accuracy_over_time.py
```

If you want to reconstruct examples on different datasets and/or extract and save the latent space for visualization, then you can use `recon_and_save_latent_space.py`.
The latent space and reconstructions will be saved inside the experiment folder. As previously, specify the experiments to consider and simply call
```
python3 recon_and_save_latent_space.py 
```

The extracted latent space can then be plotted using `plot_extracted_and_saved_latent_space.py`. You will have to use conda and cuml to plot this script.
Specify the directory of the experiment to consider inside the script and run
```
python3 plot_extracted_and_saved_latent_space.py 
```

If you want to measure the reconstruction quality on the MPI3D dataset, you can simply use the `compute_mpi3d_score.py` scripts.
You only need to specify the experiments inside the script and select which norm you want to use to make the comparison.
```
python3 compute_mpi3d_score.py 
```

## Results - Classification accuracies from paper

Mean accuracy of the different methods as reported in the paper. For more detailed results (standard deviation and maximum) please consult our paper. 
We also provide the pre-trained weights of the models used for the paper results, i.e. the 10 experiments for different seeds for all model variations, as well as the corresponding log and config files.
Due to Google Drive space limitation, we were not able to upload the results for the VGG-11 classifeir being fine-tuned or trained from scratch.
The two currently missing results will be made publicly available in case the paper gets accepted.


| Model              | Variant         | TICaM            | SVIRO           | Download link                                                                              |
| ------------------ |---------------- | ---------------- |---------------- |------------------------------------------------------------------------------------------- |
| VGG-11             | Scratch         | 58.5%            | 65.6%           | Not available at the moment                                                                |
| Resnet-50          | Scratch         | 53.3%            | 56.4%           | [link](https://drive.google.com/file/d/1BkKC03YZG5_gwdXxFOt1HSqx3UwLTiw8/view?usp=sharing) |
| Densenet-121       | Scratch         | 56.3%            | 68.8%           | [link](https://drive.google.com/file/d/1neMMqq_J9CHK03ab3KK_AiuH81esW8FS/view?usp=sharing) |
| VGG-11             | Pre-trained     | 75.5%            | 78.7%           | Not available at the moment                                                                |
| Resnet-50          | Pre-trained     | 78.1%            | 83.5%           | [link](https://drive.google.com/file/d/1xfAEZrXube0iloYToc_3MScJ40Eyev96/view?usp=sharing) |
| Densenet-121       | Pre-trained     | 72.2%            | 85.0%           | [link](https://drive.google.com/file/d/1OxRiUVuyOGnpcHecCqZ7aYhbxyJeRPsy/view?usp=sharing) |
| VGG-11             | E-TAE           | 76.7%            | 78.6%           | [link](https://drive.google.com/file/d/10oF9Kt2RFyGA_v4AQBPgwzfUNzm40UYi/view?usp=sharing) |
| Resnet-50          | E-TAE           | 83.8%            | 85.8%           | [link](https://drive.google.com/file/d/1fRsCJ2_-zp59ruLKHWTKvlXbGeDRKibF/view?usp=sharing) |
| Densenet-121       | E-TAE           | 78.5%            | 86.7%           | [link](https://drive.google.com/file/d/1WBJx96GJ5j2CNHY-hJZS-ZQwDdLtzMqz/view?usp=sharing) |
| VGG-11             | I-E-TAE         | 79.7%            | 80.9%           | [link](https://drive.google.com/file/d/1TPkaVw8yxRt16fQXG6Kmcd3yPC6lTV-5/view?usp=sharing) |
| Resnet-50          | I-E-TAE         | 83.5%            | 89.2%           | [link](https://drive.google.com/file/d/1QOTvo23VWWXSNvkCfSQKRvp_f-2maZRc/view?usp=sharing) |
| Densenet-121       | I-E-TAE         | 77.2%            | 90.4%           | [link](https://drive.google.com/file/d/1cIEkB-QNJfryhUf8T1R9KVd0aLbehE7d/view?usp=sharing) |
| VGG-11             | II-E-TAE        | 81.0%            | 79.1%           | [link](https://drive.google.com/file/d/17VKzqZ-DeTeSUxyDfDyfHiTVEmkN1Rr3/view?usp=sharing) |
| Resnet-50          | II-E-TAE        | 83.7%            | 93.0%           | [link](https://drive.google.com/file/d/1XiUlXn9VS0VCimlG6OTyNk5uyulkH_-c/view?usp=sharing) |
| Densenet-121       | II-E-TAE        | 79.3%            | 89.9%           | [link](https://drive.google.com/file/d/1WJamcRwN2EDjsQyTUObgSkOrWSp8_eHu/view?usp=sharing) |

## Results - Reconstructions from paper

Further, we provide pre-trained weights when being trained on other datasets. These were used for the reconstruction visualizations in our paper and the quantitative reconstruction metrics reported on MPI3D.

| Variant                      | Training dataset             | Download link                                                                              |
|----------------------------- | -----------------------------| ------------------------------------------------------------------------------------------ |
| AE, E-AE, VAE, $\beta$-VAE and FactorVAE               | MPI3D                        | [link](https://drive.google.com/file/d/1INm6mZCt7ABda5XDi0MmtLqHmyCcwTmu/view?usp=sharing) |
| E-AE, I-AE                   | SVIRO and SVIRO-Illumination | [link](https://drive.google.com/file/d/1Lv-OLVuTS1rGDIy5wzzI6Y79FAath8uh/view?usp=sharing) |


## Miscellaneous

Regarding the remaining scripts inside this repository, we provide some small explanations:

| Script                        | Training dataset                                                                              | 
|------------------------------ | ----------------------------------------------------------------------------------------------| 
| compute_complexity_dataset.py | Used to calculate Shannon entropy and mean gray level co-occurence matrix for MPI3D and TICaM | 
| compute_mpi3d_score.py        | Calculate the quantitative reconstruction results on the entire dataset                       | 
| model.py                      | Autoencoder model architecture definitions                                                    | 
| pretrained_model.py           | Classificaion model architecture definitions                                                  | 
| dataset.py                    | Dataloader for the different datasets                                                         | 
| utils.py                      | A few helperfunctions                                                                         | 

## Contributing

All contributions welcome! All content in this repository is licensed under the MIT license.

## Acknowledgment

This work was supported by TO BE COMPLETED