# Holographic-(V)AE

NOTE: we originally intended to present our code as an anonymous github repo, which would allow us to upload large data files thanks to git large file storage. However, in the few hours before submission, the Anonymous github server wasn't working. Therefore, we chose provide our code as a zip file upload in the eventuality that the server does not go back online in a timely manner. Unfortunately, most of the data is missing due to size constraints (MNIST NR/R and test nieghborhoods remain). Therefore we invite the reviewers to check the Anonymous Github link at https://anonymous.4open.science/r/holographic_vae-51E9 for a version of this repository that contains data as well, if it works. We sincerely apologize for the inconvenience.


Code for the paper "Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space"

This is mostly development code, and as such it has redundancies as well as naming conventions that precede the formulation of the paper. If accepted, we will release a cleaner and more user-friendly repository.


## Content of folders:

> `cg_coefficients`: saved Clebsch-Gordan coefficients

> `cgnet_fibers`: H-(V)AE architecture (in `cgvae_symmetric_simple_flexible.py`) and its components (in `blocks.py`, `linearity.py`, `nonlinearity.py` and `normalization.py`)

> `classifiers`: simple classifier classes. Only contains a Linear Classifier (single dense layer trained via cross-entropy loss)

> `loss_functions`: code with MSE loss and Cosine loss used in our experiments

> `utils`: various utility code

> `coordinates`: code used to manipulate protein structures, collecting structural information (e.g. SASA) and collecting neighborhoods from full protein structures.

> `projections`: code to perform fourier projections of both point clouds and spherical images.


> `data`: data used for our experiments, and source code to generate and manipulate it. Contains the pre-processed tensors for every dataset. We also provide "raw" (before fourier transform and normalization) data for MNIST-on-the-sphere and for Toy Aminoacids. We do not provide raw data for Shrec17 and the protein neighborhoods due to size limits.

> `runs`: trained models and evaluation results. We provide the pre-trained models used in our experiments on MNIST-on-the-sphere, Shrec17, and Protein neighborhoods. We do not provide the toy aminoacids models for space concerns as they are many.


> `mnist_experiments`: source code for the experiments on the MNIST-on-the-sphere dataset

> `shrec17_experiments`: source code for the experiments on the Shrec17 dataset

> `toy_aminoacids_experiments`: source code for the experiments on the Toy amino acids dataset

> `neighborhoods_experiments`: source code for the experiments on the Protein neighborhoods dataset



## Running our code

We provide here a run-down of the content of each `*_experiment` folder.

Each folder presents a similar structure. The `train_and_evaluate.py` (or `train_and_evaluate_zernicke.py`) file trains a model with architecture and hyperparameters specified by argparse arguments, and then performs inference and a series of evaluations on the desired data splits, both by calling other scripts within the folder.

For each dataset, there are specific scripts running dataset specific experiments. Common across datasets are a latent space classification script and a script to generate 2D visualizations of the invariant latent space, via UMAP if the dimensionality of the latent space is greater than two, colored by categorical or otherwise structural features of the data.

Results are always generated in the `runs` folder.

### Installation via conda

Run the following:<br>
`conda env create --file h_vae-env.txt`<br>
`conda activate h_vae-env.txt`


### MNIST-on-the-sphere

The default parameters of `train_and_evaluate.py` are set to the model with specifications: VAE, z = 16, NR/R dataset.<br>
To run an example training and evaluation pipeline, use the following commands:<br>
`cd mnist_experiments`<br>
`python train_and_evaluate.py --hash TEST_HVAE`<br>
`python latent_space_classification.py --hash TEST_HVAE --classifier KNN`<br>
`python latent_space_classification.py --hash TEST_HVAE --classifier LC`<br>
`python interpolation.py --hash TEST_HVAE --start_label 3 --end_label 7`<br>
`python interpolation.py --hash TEST_HVAE --start_label 0 --end_label 5`

For a shorter, proof-of-principle training run, just limit the maximum number of epochs:<br>
`python train_and_evaluate.py --hash TEST_HVAE --n_epochs 4 --no_kl_epochs 1 --warmup_kl_epochs 1`

To run evaluations on pre-trained models, start by running `evaluation_pipeline.py` with the desired arguments: model hash, input type (NRR-avg_sqrt_power or RR-avg_sqrt_power), comma separated dataset splits (train,valid,test)


### Shrec17

The default parameters of `train_and_evaluate.py` are set to the model with specifications: AE.<br>
To run an example training and evaluation pipeline, use the following commands:<br>
`cd shrec17_experiments`<br>
`python train_and_evaluate.py --hash TEST_HVAE`<br>
`python latent_space_classification.py --hash TEST_HVAE --classifier KNN`<br>
`python latent_space_classification.py --hash TEST_HVAE --classifier LC`

For a shorter, proof-of-principle training run, just limit the maximum number of epochs:<br>
`python train_and_evaluate.py --hash TEST_HVAE --n_epochs 4 --no_kl_epochs 1 --warmup_kl_epochs 1`

To run evaluations on pre-trained models, start by running `evaluation_pipeline.py` with the desired arguments: model hash, comma separated dataset splits (train,valid,test)


### Toy amino acids

The default parameters of `train_and_evaluate_zernicke.py` are set to the model with specifications: VAE, 1000 training datapoints, beta = 0.025.<br>
To run an example training and evaluation pipeline, use the following commands:<br>
`cd toy_aminoacids_experiments`<br>
`python train_and_evaluate_zernicke.py --hash TEST_HVAE`<br>
`python latent_space_classification_cross_val_on_test.py --hash TEST_HVAE --classifier KNN`<br>
`python latent_space_classification_cross_val_on_test.py --hash TEST_HVAE --classifier LC`

For a shorter, proof-of-principle training run, just limit the maximum number of epochs:<br>
`python train_and_evaluate_zernicke.py --hash TEST_HVAE --n_epochs 4 --no_kl_epochs 1 --warmup_kl_epochs 1`

To run evaluations on pre-trained models, start by running `evaluation_pipeline_zernicke.py` with the desired arguments: model hash, comma separated dataset splits (train,valid,test)


### Protein neighborhoods

The default parameters of `train_and_evaluate_zernicke.py` are set to the model with specifications: AE.<br>
To run an example training and evaluation pipeline, use the following commands:<br>
`cd neighborhoods_experiments`<br>
`python train_and_evaluate_zernicke.py --hash TEST_HVAE`<br>
`python neighborhoods_latent_space_classification.py --experiment_dir ../runs/neighborhoods/local_equiv_fibers/TEST_HVAE --classifier KNN`<br>
`python neighborhoods_latent_space_classification.py --experiment_dir ../runs/neighborhoods/local_equiv_fibers/TEST_HVAE --classifier LC`<br>
`python pretty_NB_umaps.py --experiment_dir ../runs/neighborhoods/local_equiv_fibers/TEST_HVAE`

For a shorter, proof-of-principle training run, just limit the maximum number of epochs:<br>
`python train_and_evaluate_zernicke.py --hash TEST_HVAE --n_epochs 4 --no_kl_epochs 1 --warmup_kl_epochs 1`

To run evaluations on pre-trained models, start by running `evaluation_pipeline_zernicke.py` with the desired arguments: model hash, comma separated dataset splits (train,valid,test). We do not recommend running evaluation on the training data as it is very large.




