Keywords: Indoor localization, Self-supervised, Pre-training, Channel State Information
Abstract: Indoor localization with WiFi Channel State Information (CSI) requires models that can generalize across diverse deployment conditions, yet collecting large amounts of high-quality labeled data is costly and often impractical. Pre-training offers a promising solution, but conventional masked modeling is not directly suitable for CSI signals. It tends to produce unstable representations in unmasked regions, fails to preserve long-range channel correlations, and remains highly sensitive to variations in access point layouts and propagation environments. To address these issues, we propose an autoregressive-enhanced masked pre-training (AEMP) framework. AEMP employs a hierarchical Transformer architecture where spatial subnetworks perform masked reconstruction to capture local channel features, while a temporal network enforces consistency through autoregressive prediction. In addition, multi-view fusion and span masking improve robustness under dynamic deployment conditions. Extensive experiments demonstrate that AEMP yields stable and transferable representations, achieving superior performance and strong generalization on downstream indoor localization tasks. To the best of our knowledge, this is the first pre-training framework for wireless sensing that integrates temporal prediction to complement masked reconstruction.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12760
Loading