Anti-Noise and Cross-Domain CSI Gesture Recognition with Multi-Features Fusion Transformer

Xiaolong Li, Heng Yang, Changyan Yi, Ruiting Deng, Xin Wu

Published: 01 Jan 2024, Last Modified: 15 Jul 2025GLOBECOM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human-machine interaction has sparked significant interest, leading to the rise of Channel State Information (CSI)-based gesture recognition systems. However, these systems often struggle with accuracy due to high noise levels and limited cross-domain performance. This paper presents a robust approach that enhances CSI-based gesture recognition by addressing these challenges. We introduce key concepts such as the CSI ratio and phase matrix and develop a robust data preprocessing method that reduces environmental noise while retaining dynamic components crucial for gesture recognition. Our method, WiMT, leverages a robust multi-features fusion transformer with spatiotemporal partitioning and a multiscale spatiotemporal self-attention mechanism to effectively capture both local spatial and global temporal features of gestures. Evaluations on the Widar3 dataset demonstrate that our model surpasses existing methods in in-domain and cross-domain gesture recognition tasks.