Keywords: Semantic image transmission, OFDM sytem, Deep learning
Abstract: Semantic image transmission represents a significant evolution in the design of communication systems, shifting the focus from the transmission of raw bits to the delivery of meaningful semantic information. Nevertheless, many existing semantic communication methods rely on analog transmission schemes, which face fundamental integration issues with contemporary digital communication infrastructures, including widely adopted Orthogonal Frequency Division Multiplexing (OFDM) receivers. This incompatibility often leads to degraded performance in challenging wireless environments, such as urban areas with strong multipath effects or high-speed mobility scenarios. To address these challenges, this paper introduces a novel semantic transmission system that seamlessly integrates a Vision Transformer-based Masked Autoencoder (ViTMAE) with standard digital communication frameworks. The proposed design intentionally leverages existing physical-layer processing modules in base stations, avoiding the need for complex deep learning-based denoising components while utilizing the masked autoencoder's ability to learn robust and predictive feature representations through self-supervised reconstruction. We evaluated the proposed system under 3GPP-standard channel models, where it substantially outperformed conventional JPEG source coding coupled with LDPC channel coding, particularly in low signal-to-noise ratio (SNR) conditions. This study provides valuable insights for the development of practical and scalable semantic image transmission architectures for next-generation networks.
Submission Number: 15
Loading