QARV++: An improved hierarchical VAE for learned image compression

Yichi Zhang, Yuning Huang, Fengqing Zhu

Published: 21 Jan 2026, Last Modified: 15 Apr 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryonearXiv.org perpetual, non-exclusive license

Abstract: Hierarchical Variational Autoencoder (HVAE)-based Learned Image Compression (LIC) has shown great promise, but its performance still lags behind autoregressive models due to three key limitations identified: (1) shared latent mappings that lead to accumulated posterior collapse; (2) reliance on static convolutions, limiting adaptability; and (3) gradient imbalance during variable-rate optimization, causing unbalanced performance across different bit rates. To overcome these challenges, we propose QARV++, an improved HVAE-based LIC method. First, we introduce a disentangled latent mapping mechanism, assigning separate transformations to each latent variable to prevent posterior collapse propagation. Second, we integrate deformable convolutions into the network, introducing the DCNNeXt block, which enables dynamic feature adaptation while maintaining computational efficiency. Third, we reformulate variable-rate optimization to ensure balanced gradient updates across different λ values, stabilizing variable-rate training. Extensive experiments demonstrate that QARV++ achieves superior rate-distortion (R-D) performance among HVAE-based LIC models, exhibiting -12.20% -16.34% -15.23% BD-Rate against VVC Intra mode on the Kodak, Tecnick, and CLIC2020 test datasets, respectively. Our approach also generalizes effectively to existing LICs, delivering substantial improvements.