Generating High-Fidelity Images with Disentangled Adversarial VAEs and Structure-Aware Loss

Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin

Published: 2020, Last Modified: 06 Feb 2025IJCNN 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: While variational autoencoders (VAE) provide the theoretical basis for deep generative models, they often produce "blurry" images which is linked to their training objective. In this paper, we propose the "Sharpened Adversarial Variational Auto-Encoder" (AVAE-S) which uses an adversarial training mechanism to fine-tune the learned latent code vector of the VAE with a specialized objective function. The loss function is designed to uncover global structure as well as the local and high frequency features in VAE and leading to the smaller variance in the aggregated posterior and hence, reducing the blurriness of their generated samples. AVAE-S leverages the learned representations to the meaningful latent features by enforcing feature consistency between the model distribution and the target distribution leading to the sharpened output with better perceptual quality. Then, AVAE-S starts training a GAN network, which generator has been collapsed on the VAE's decoder, upon that learned latent code vector. Moreover, we augment the standard VAE's evidence lower bound objective function with other element-wise similarity measures. Our experiments show that AVAE-S achieves the state-of-the-art sample quality in the common MNIST and CelebA datasets. AVAE-S shares many of the good properties of the VAE (stable training, encoder-decoder architecture, nice latent manifold structure) while generating more realistic images, as measured by the sharpness score.