Perceptual Image Compression With Conditional Diffusion Transformers

Published: 01 Jan 2024, Last Modified: 05 Mar 2025VCIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Generative models have significantly advanced generative AI, particularly in image and video generation. Recognizing their potential, researchers have begun exploring their application in image compression. However, existing methods face two primary challenges: limited performance improvement and high model complexity. In this paper, to address these two challenges, we propose a perceptual image compression solution by introducing a conditional diffusion model. Given that compression performance heavily depends on the decoder’s generative capability, we base our decoder on the diffusion transformer architecture. To address the model complexity problem, we implement the diffusion transformer architecture with Swin transformer. Equipped with enhanced generative capability, we further augment the decoder with informative features using a multi-scale feature fusion module. Experimental results demonstrate that our approach surpasses existing perceptual image compression methods while achieving lower model complexity.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview