Adaptively Hierarchical Quantization Variational Autoencoder Based on Feature Decoupling and Semantic Consistency for Image Generation

Ying Zhang, Hyunhee Park, Hanchao Jia, Fan Wang, Jianxing Zhang, Xiangyu Kong

Published: 01 Jan 2024, Last Modified: 17 Oct 2025ICIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The Vector Quantized Variational AutoEncoder (VQ-VAE) has shown great potential in image generation, especially the methods with hierarchical features. However, the lack of decoupling of structural information between hierarchical features leads to semantic inconsistencies and redundant structural features, resulting in incompatible outputs. In this study, we propose the Adaptively Hierarchical Quantization Variational AutoEncoder (AHQ-VAE) to generate high-fidelity images with a unified structure. To ensure the semantic consistency of continuous space, we employ the Spatially Consistent Semantic Embedding (SCSE) module to align the hierarchical features, while decoupling global structural information and local details. To ensure the consistency of discrete space, we introduce the Adaptive Bottom Quantizer (ABQ) to generate the quantized bottom codes consistent with quantized top codes, so that the local details can adapt to the global semantics. Extensive experiments demonstrate our approach can generate high-quality images with a unified structure.