VR-PCT: Enhanced VR Semantic Performance via Edge-Client Collaborative Multi-modal Point Cloud Transformers

Luoyu Mei, Shuai Wang, Ruofeng Liu, Yun Cheng, Wenchao Jiang, Zhimeng Yin, Tian He

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Transactions on Mobile ComputingEveryoneRevisionsCC BY-SA 4.0

Abstract: Real-time semantic recognition is crucial for virtual reality (VR) applications, but the efficient fusion of multi-modal data poses significant challenges under resource-constrained VR scenarios. While integrating millimeter-wave (mmWave) radar point clouds with vision data offers a promising solution, existing methods often suffer from excessive data overhead and degraded accuracy due to redundant and noisy information. To address this limitation, this paper presents VR-PCT, a multi-modal transformer for edge-client collaborative VR semantic recognition that fuses mmWave radar point cloud and vision data for VR applications. VR-PCT introduces a novel collaborative design where VR clients perform lightweight semantic region detection while VR edge processes multi-modal VR semantic recognition. Through efficient edge-client collaboration, VR-PCT optimizes the transmission of mmWave point cloud and vision data by transmitting only the VR semantic region of vision data instead of the entire video. Additionally, it incorporates adaptive cross-modal data selection and fusion strategies to achieve real-time semantic recognition while significantly reducing data redundancy. Across 22 participants engaged in four experimental scenes utilizing VR devices from three different manufacturers, our evaluation demonstrates that VR-PCT achieves 97.6% recognition accuracy while reducing transmission overhead by 81.5% compared to existing approaches. These results highlight the effectiveness of VR-PCT in enabling efficient and accurate multi-modal VR semantic recognition for VR applications. The code and data of VR-PCT are released on https://github.com/luoyumei1-a/VR-PCT.

External IDs:doi:10.1109/tmc.2025.3607791