Multi-Scale Semantic Communication for Object Detection: Single and Cross-Domain Scenarios

Jie Guo, Hang Yin, Bin Song, Yuhao Chi, Zhaoyang Zhang, Chau Yuen, Dusit Niyato

Published: 2025, Last Modified: 07 May 2026IEEE Trans. Wirel. Commun. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the rapid popularity of vision-driven communication applications, object detection has become one of the fundamental techniques for performing practical vision tasks. In traditional communication systems, images are compressed for transmission, reconstructed at the receiver, and then processed by existing object detection algorithms. However, transmitting large amounts of images consumes significant storage and communication resources. To address this challenge, a semantic communication-based image reconstruction scheme has been proposed for object detection, which transmits only the semantic information relevant to image reconstruction. However, this method is prone to losing key information, such as object position and texture details, leading to degraded object detection performance. Additionally, it is sensitive to environmental factors such as weather and lighting, resulting in poor adaptability across multiple scenarios. To address these issues, we propose a multi-scale semantic communication framework for object detection that transmits only multi-scale semantic features relevant to the task and employs decoupling at the receiver to separate positional and classification information of target objects without requiring image reconstruction. To improve adaptability across multiple scenarios, we introduce a cross-domain object detection technique that ensures reliable object detection in new scenarios by optimizing the framework’s multi-scale semantic encoder through domain adversarial learning. Numerical results demonstrate that the proposed framework achieves mean average precision improvements of $15.4\% \sim 38.5\%$ over the traditional communication framework within low to medium signal-to-noise ratio regions in additive white Gaussian noise and Rayleigh fading channels.

External IDs:dblp:journals/twc/GuoYSCZYN25