Combine-ICMH: A Dual-Adapter Co-Tuning Framework in Image Compression for Machine and Human Vision

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: learned image compression, adapter, image compression for machine, task-aware compression
TL;DR: We propose an image-compression framework for machine and human vision (ICMH) that jointly optimizes the transform and entropy models, and it surpasses existing ICMH methods.
Abstract: To reduce the high training overhead of models for Image Compression for Machine and Human Vision (ICMH), the paradigm of fine-tuning pre-trained models has gained increasing attention. Among these, lightweight adapter-based approaches have emerged as efficient solutions. However, we argue that this paradigm suffers from two critical, yet overlooked flaws. First, existing frequency-domain adapters lack adaptability, often suppressing high-frequency details crucial for machine tasks. Second, fine-tuning the transform module alone introduces a ``transform-entropy mismatch," as the frozen entropy model cannot adapt to the altered latent distribution. To address these challenges, we propose Combine-ICMH, a novel framework that enables the synergistic co-optimization of both the transform and entropy models. Specifically, we design a Spatial-Wavelet Modulation Adapter (SWMA) to enhance frequency adaptability and introduce a Channel Modulation Adapter (CMA) to directly fine-tune the entropy model, resolving the mismatch. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches on various downstream tasks, including classification, detection, and segmentation, while maintaining comparable parameter efficiency.
Primary Area: generative models
Submission Number: 6520
Loading