Keywords: image compression, diffusion, neural network
TL;DR: ultra-low bitrate image compression
Abstract: While recent VAE-based neural codecs achieve impressive results at low bitrates when optimized for perceptual quality, their performance degrades significantly under ultra-low bitrate conditions. To address this, generative methods that exploit semantic priors from pretrained models have emerged, revolutionizing ultra-low bitrate compression. However, these approaches remain constrained by a fundamental tradeoff between semantic faithfulness and perceptual realism. Methods relying on explicit semantic guidance preserve content accuracy but often lack textural fidelity, while those based on implicit representation can generate convincing details but may suffer from semantic drift. In this work, we introduce a unified framework that bridges this gap by coherently integrating explicit and implicit semantic representations. We condition a diffusion model with explicit high-level semantics while using reverse-channel coding to implicitly encode fine-grained information. In addition, a novel plugin encoder provides flexible control over the distortion-perception balance. Extensive experiments demonstrate that our framework achieves state-of-the-art rate–perception performance, outperforming existing approaches and surpassing DiffC by 23.49\%, 12.25\%, and 23.09\% DISTS-BD-Rate on the Kodak, DIV2K, and CLIC2020 datasets, respectively.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6856
Loading