When Encoders Should Stay Simple: An Empirical Analysis of Architectures for Variational Autoencoders
Keywords: Variational Autoencoders, Encoder–Decoder Architectures, Generative modelling, Representation learning
Abstract: Variational Autoencoders (VAEs) offer efficient training and inference without the reliance on computationally expensive Monte Carlo Markov Chain (MCMC) methods. Despite their foundational importance, the architectural choices for encoders and decoders in VAEs remain underexplored, particularly their impact on the learned latent representations and generative quality. This study investigates the influence of encoder and decoder architectures by systematically varying their configurations using dense and convolutional network-based models. Experiments were conducted across different latent space sizes to assess the models' compressive capacities and performance on reconstructive and generative losses.
The results reveal that small dense networks are more effective for encoding, while decoding benefits from architectures with structural processing capabilities, such as convolutional networks with multiple blocks. Dimensionally bigger latent space compression levels degrade representation quality but maintain separability at moderate compression levels. Notably, models with non-zero Kullback-Leibler Divergence (KLD) loss outperform collapsed latent space models, emphasizing the importance of balancing reconstruction and generative regularization. These findings provide insights into the architectural considerations necessary for designing efficient VAEs and improving their generative and representational capabilities.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4216
Loading