Abstract: This paper presents FreqMAE, a novel self-supervised learning framework that synergizes masked autoencoding (MAE) with physics-informed insights to capture feature patterns in multi-modal IoT sensor data. FreqMAE enhances latent space representation of sensor data, reducing reliance on data labeling and improving accuracy for AI tasks. Differing from data augmentation-based methods like contrastive learning, FreqMAE's approach eliminates the need for handcrafted transformations. Adapting MAE for IoT sensing signals, we present three contributions from frequency domain insights: First, a Temporal-Shifting Transformer (TS-T) encoder that enables temporal interactions while distinguishing different frequency bands; Second, a factorized multi-modal fusion mechanism for leveraging cross-modal correlations and preserving unique modality features; Third, a hierarchically weighted loss function that emphasizes important frequency components and high Signal-to-Noise Ratio (SNR) samples. Comprehensive evaluations on two sensing applications validate FreqMAE's proficiency in reducing labeling needs and enhancing resilience against domain shifts.
Loading