TL;DR: This paper propose a residual-guided framework that uses semantic residual coding and compression-aware diffusion model to achieve high-fidelity, ultra-low bit-rate image compression.
Abstract: Existing multimodal large model-based image compression frameworks often rely on a fragmented integration of semantic retrieval, latent compression, and generative models, resulting in suboptimal performance in both reconstruction fidelity and coding efficiency. To address these challenges, we propose a residual-guided ultra lowrate image compression named ResULIC, which incorporates residual signals into both semantic retrieval and the diffusion-based generation process. Specifically, we introduce Semantic Residual Coding (SRC) to capture the semantic disparity between the original image and its compressed latent representation. A perceptual fidelity optimizer is further applied for superior reconstruction quality. Additionally, we present the Compression-aware Diffusion Model (CDM), which establishes an optimal alignment between bitrates and diffusion time steps, improving compression-reconstruction synergy. Extensive experiments demonstrate the effectiveness of ResULIC, achieving superior objective and subjective performance compared to state-of-the-art diffusion-based methods with -80.7\%, -66.3\% BD-rate saving in terms of LPIPS and FID.
Lay Summary: Our research addresses a critical challenge in image compression: avoiding incorrect textures (e.g., wrong colors or shapes) when using AI to reconstruct images from tiny files.
Our solution combines two strategies:
1.**Semantic Residual Coding**: AI compares the original and compressed images to identify and restore lost semantic details.
2.**Compression-Aware Diffusion Models**: The AI’s image-generation process is dynamically adjusted based on compression levels, ensuring sharp, realistic results even at minimal file sizes.
These advancements are particularly vital for bandwidth-constrained applications—such as emergency communications or satellite imaging—where both accuracy and efficiency are paramount. By combining these strategies, our work bridges the gap between extreme compression and faithful visual reconstruction, delivering a robust solution for resource-limited scenarios.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Computer Vision
Keywords: image compression, diffusion models, large language model
Submission Number: 4270
Loading