Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compression

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Diffusion models show impressive performances in image generation with excellent perceptual quality. However, its tendency to introduce additional distortion prevents its direct application in image compression. To address the issue, this paper introduces a Consistency Guided Diffusion Model (CGDM) tailored for perceptual image compression, which integrates an end-to-end image compression model with a diffusion-based post-processing network, aiming to learn richer detail representations with less fidelity loss. In detail, the compression and post-processing networks are cascaded and a branch of consistency guided features is added to constrain the deviation in the diffusion process for better reconstruction quality. Furthermore, a Syntax driven Feature Fusion (SFF) module is constructed to take an extra ultra-low bitstream from the encoding end as input, guiding the adaptive fusion of information from the two branches. In addition, we design a globally uniform boundary control strategy with overlapped patches and adopt a continuous online optimization mode to improve both coding efficiency and global consistency. Extensive experiments validate the superiority of our method to existing perceptual compression techniques and the effectiveness of each component in our method.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: This work introduces a Consistency Guided Diffusion Model (CGDM) designed for perceptual compression, enhancing image perception quality with equivalent memory consumption, thus facilitating multimedia data transmission, especially in scenarios demanding high-quality image representation. Furthermore, it expands the application scope of diffusion models, originally used for image generation, demonstrating their effectiveness in cross-modal data processing. Additionally, this work innovates a fusion strategy for different datas, advancing the development and application of multimodal processing techniques.
Supplementary Material: zip
Submission Number: 3472
Loading