# Hyperparameters

YAML-files specify all hyperparameters.
Parameters over hierarchical levels are organized using lists, which always range from lower to higher hierarchical levels.


<details><summary>Training (click me)</summary>

## General
- **epochs**, int
- **lr**, float: learning rate
- **sizes**, Dict[str, int]:
  - 'bs': batch size
  - 'k': number of importance samples per datapoint

## Loss
- **kl_factor**, int: weighting factor for KL-regularization
- **kl_end_warmup**, int: kl_factor is reached in this epoch
- **crossmodal_regularization**: int: whether to include p(x_2|x_1) in the loss
  - Such regularization can be useful for the HMVAE with PoE posterior. We solely use this hyperparameter in the synthetic experiments.
</details>



<details><summary>Joint model (mostly modality-agnostic model specifications; click me)</summary>

## General

- **learn_prior**, bool: whether to learn variance of prior
  - Note that per-dimension variances are regularized to be 1 on average
- **stoc_dim**: Dict[str, List[int]]: latent space sizes for both modalities
  - Vector dimension for dense layers and channel size for convolutional layers
- **stoc_dist_type**, str: distribution type across all latent spaces
  - 'normal'
  - 'laplace'
</details>



<details><summary>VAE: [Modality] (modality-specific model specifications; click me)</summary>

## General

- **[mod]_input_shape**, List[int]: shape of input tensor


## Deterministic Layers

- **[mod]_det_specs_bu**, List[List[dict]]: bottom-up layers, where the key 't' describes the layer type:
  - t: 'dense': single dense layer
    - 'out': output dimension (int)
    - 'reshape': how to reshape the output (Tuple[int])
  - t: 'conv': single convolutional layer
    - 'c': output channels (int)
    - 'k': kernel (int)
    - 's': stride (int)
    - 'p': padding (int)
  - t: 'dconv': multiple convolutional layers
    - 'c': output channels (int)
    - 'downsample': whether to downsample by factor of two (bool)
    - 'residual': whether to use residual connections (bool)
    - 'f': factor to downscale channel dimension after input layer and before output layer (int)
  - t: 'usd': upsampling layer for spatial dimensions (nearest-neighbor interpolation by factor of two)

- **[mod]_det_specs_td**, List[List[dict]]: top-down layers, which unfolds analogously to the bottom-up layers -- except for the following:
  - t: 'convt': single transposed convolutional layer
    - 'op': output padding (int)



## Stochastic Layer

- **[mod]_stoc_specs**, List[dict]: stochastic layers, which parameterize latent distributions, where the key 't' describes the layer type:
  - t: 'dense'
  - t: 'conv'

- **[mod]_stoc_upsampling**, List[dict]: upsampling of latent distribution samples, where the key 't' describes the layer type:
  - t: 'dense'
    - 'reshape': output shape (Tuple[int])
  - t: 'conv': convolutional layer, which upsamples channel-dimension

- **[mod]_merge_layer**, Optional[dict]: layer that merges bottom-up and top-down information, where the key 't' describes the layer type:
  - t: 'dense'
  - t: 'conv'


## Reconstruction Layer
- **[mod]_rec_specs**, dict: layer that parameterizes the reconstruction distribution, where the key 't' describes the layer type:
  - t: 'dense'
  - t: 'conv'
    - 'k': kernel size (int)

- **[mod]_rec_dist**, str: type of reconstruction distribution:
  - 'sigmoid': layer produces sigmoid-activated value that can be used for BCE-loss
  - 'normal': gaussian distribution
  - 'categorical': categorical distribution

- **[mod]_rec_factor**, int: factor multiplied with the reconstruction loss during training

## Generic Layer
- **[mod]_nonlin**, str: the kind of activation function used across the modality-specific network
  - 'leaky_relu'
  - 'swish'
  - 'gelu'
</details>



<details><summary>Misc (click me)</summary>

- **n_modalities**, int: number of modalities
- **eval_bs**, int: batch size for evaluation
- **exp_name**, str: experiment name
- **trial**, str: trial name
- **seed**, int
</details>
