SSC-VAE: Structured Sparse Coding Based Variational Autoencoder for Detail Preserved Image Reconstruction

Published: 01 Jan 2025, Last Modified: 01 Aug 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Discrete latent representation techniques, such as Vector Quantization (VQ) and Sparse Coding (SC), have demonstrated superior image reconstruction and generation quality compared to continuous representation methods in Variational Autoencoders (VAEs). However, existing approaches often treat the latent representations of an image independently in their discrete representation space, neglecting both the inherent structural information within each representation and the correlations among them. This oversight leads to coarse representations and suboptimal generated results. In this paper, we address these limitations by introducing correlations among and within the latent representations of individual images in the latent discrete space of VAEs using sparse coding. We impose two-dimensional structural information through adaptive thresholding, enhancing local structure in image representations while suppressing noise via parsimonious representation with a learned dictionary. Empirical studies on three real benchmark datasets, including a clinical Ultrasound dataset, BSDS500, and mini-Imagenet, demonstrate that our proposed model preserves fine-grained details in image reconstruction and significantly outperforms baseline models of SC-VAE and VQ-VAE across objective and subjective image quality metrics. Particularly noteworthy are the substantial performance improvements observed on the ultrasound dataset, where structure information is crucial. Specifically, we observe significant performance improvements of 7.68 % and 17.03 % in SSIM, 3.25 dB and 6.58 dB in PSNR, 0.15 and 0.24 in LPIPS, 45.38 and 84.05 in FID over SC-VAE and VQ-VAE, respectively, indicating the superiority of our method in terms of image reconstruction quality and fidelity.
Loading