Semantic Communication Using Intent-guided Coarse- and Fine-grained Codec with Pre-trained Diffusion Models

Rui Tang, Dahua Gao, Minxi Yang

Published: 2025, Last Modified: 09 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In image semantic communication, the granularity of semantic descriptions required for different objects within an image varies based on the communication intent and the importance of the objects. However, current semantic codecs optimized for global assessment metrics fail to adapt to user intent and cannot provide differentiated semantic granularity for objects of different importance. Generative semantic codecs using representations such as edge maps or semantic segmentation maps are insufficient for capturing fine-grained semantic information. This paper proposes dividing the transmitted image semantics into global coarse-grained and key object fine-grained semantics to better align with sender intent and optimize bandwidth usage. We introduce a novel semantic codec scheme based on a pre-trained text-to-image diffusion model. Global coarse-grained semantics are represented using short textual descriptions. Fine-grained semantic information of key objects is extracted using the Denoising Diffusion Implicit Model (DDIM) inversion and compressed in the frequency domain. Experimental results demonstrate that the proposed semantic codec enables high-quality recovery of coarse- and fine-grained semantics in image transmission while significantly reducing data transmission requirements.

External IDs:dblp:conf/icmcs/TangGY25