Abstract: Monitoring multimodal signals provides a more comprehensive understanding of health conditions compared to singlemode monitoring. In the face of the significant volumes of multimodal signals, existing IoT health monitoring systems primarily focus on high-fidelity signal transmission by encoding multimodal signals separately. However, due to the lack of consideration for the downstream applications and correlation between multimodal signals, a portion of bandwidth resources is wasted on task-irrelevant information and intermodal redundancy. To address this issue, we propose the Multimodal Semantic Integration Communication (MoSIC) framework composed of three levels: At the sensor level, multiple wearable sensors collect and send different modal signals to a mobile terminal; at the mobile terminal level, the terminal employs deep source-channel joint encoding for the received multimodal signals, extracting single-modal embedded features using a backbone network, and obtaining cross-modal features through a feature fusion network with contrastive constrain; at the cloud level, a decoding network symmetric to the encoding network reconstructs the multimodal signals, which are then used for downstream applications such as human activity recognition. MoSIC focuses on semantically integrating multimodal signals for downstream applications, resulting in improved encoding and transmission efficiency. It also reduces the radio-frequency power consumption and bandwidth requirements.
Loading