When Encoders Should Stay Simple: An Empirical Analysis of Architectures for Variational Autoencoders

12 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Variational Autoencoders, Encoder–Decoder Architectures, Generative modelling, Representation learning
Abstract: Variational Autoencoders (VAEs) offer efficient training and inference without the reliance on computationally expensive Monte Carlo Markov Chain (MCMC) methods. Despite their foundational importance, the architectural choices for encoders and decoders in VAEs remain underexplored, particularly their impact on the learned latent representations and generative quality. This study investigates the influence of encoder and decoder architectures by systematically varying their configurations using dense and convolutional network-based models. Experiments were conducted across different latent space sizes to assess the models' compressive capacities and performance on reconstructive and generative losses. The results reveal that small dense networks are more effective for encoding, while decoding benefits from architectures with structural processing capabilities, such as convolutional networks with multiple blocks. Dimensionally bigger latent space compression levels degrade representation quality but maintain separability at moderate compression levels. Notably, models with non-zero Kullback-Leibler Divergence (KLD) loss outperform collapsed latent space models, emphasizing the importance of balancing reconstruction and generative regularization. These findings provide insights into the architectural considerations necessary for designing efficient VAEs and improving their generative and representational capabilities.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 4216
Loading