Abstract: Human-machine interaction has sparked significant interest, leading to the rise of Channel State Information (CSI)-based gesture recognition systems. However, these systems often struggle with accuracy due to high noise levels and limited cross-domain performance. This paper presents a robust approach that enhances CSI-based gesture recognition by addressing these challenges. We introduce key concepts such as the CSI ratio and phase matrix and develop a robust data preprocessing method that reduces environmental noise while retaining dynamic components crucial for gesture recognition. Our method, WiMT, leverages a robust multi-features fusion transformer with spatiotemporal partitioning and a multiscale spatiotemporal self-attention mechanism to effectively capture both local spatial and global temporal features of gestures. Evaluations on the Widar3 dataset demonstrate that our model surpasses existing methods in in-domain and cross-domain gesture recognition tasks.
Loading