Abstract: With advancements in AI, smart home gyms are becoming increasingly popular for providing fitness assistance in indoor environments. In this research, we propose a layer-by-layer framework, called Wi-Fitness, which bridges video perception with Wi-Fi sensing for smart fitness. At the data preprocessing layer, the singular value decomposition-based channel state information denoising mechanism is leveraged to do the Wi-Fi data calibration. Diverse and high-quality training samples are generated by a random quantization-based data augmentation method. At the bimodal fusion layer, the heterogeneity between the Wi-Fi and video is mitigated by the local attention mechanism and the bimodal feature integration mechanism. For the video modality, the attention-based spatio-temporal graph convolutional network (AST-GCN Net) is proposed to refine spatial information. The spatio-temporal semantic alignment module is proposed to transfer spatial information from video to Wi-Fi and maintain temporal consistency across modalities. The fitness assessment layer provides exercise visualization. The generalization of Wi-Fitness is enhanced by layer-by-layer collaboration. Wi-Fitness demonstrates its effectiveness by achieving an average F1-Score of 92.68% in three typical indoor environments.
External IDs:dblp:journals/iotj/WeiZZWZWFZM25
Loading