Wavelet Transform Embedding and Masked Autoencoder Transformer for Gesture Detection

Haozhe Pang, Shucheng Yu

Published: 31 May 2024, Last Modified: 06 Nov 2025OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: Recent advancements in deep learning have underscored the effectiveness of the Transformer model in complex data analysis. Despite its notable success, the application of Transformers to signal processingparticularly in gesture detectionposes challenges due to the presence of long data sequences and sparse features. In response, we introduce the Wavelet Transform-based Masked Autoencoder Transformer (WT-MAE). This novel architecture incorporates wavelet transform for signal embedding across the entire model. Wavelet transform, which is learning-free, decomposes signal data into frequency-based sub-series, thereby efficiently reducing noise and enhancing feature extraction. Furthermore, to address information redundancy within these sequences, we have developed a Masked Autoencoder structure that integrates a Transformer module. This includes a high-ratio masking strategy that significantly reduces the input dimensions, facilitating more efficient processing. We evaluated our model on two distinct Wi-Fi signal datasets used for gesture detection. The results indicate that our wavelet transform embedding enhances performance by 5 to 10 percent compared to the baseline Transformer Model, achieving or surpassing previous state-of-the-art benchmarks across various datasets.