AEMP: Autoregressive-Enhanced Masked Pre-training for Robust Indoor Localization

AEMP: Autoregressive-Enhanced Masked Pre-training for Robust Indoor Localization

ICLR 2026 Conference Submission12760 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Indoor localization, Self-supervised, Pre-training, Channel State Information

Abstract: Indoor localization with WiFi Channel State Information (CSI) requires models that can generalize across diverse deployment conditions, yet collecting large amounts of high-quality labeled data is costly and often impractical. Pre-training offers a promising solution, but conventional masked modeling is not directly suitable for CSI signals. It tends to produce unstable representations in unmasked regions, fails to preserve long-range channel correlations, and remains highly sensitive to variations in access point layouts and propagation environments. To address these issues, we propose an autoregressive-enhanced masked pre-training (AEMP) framework. AEMP employs a hierarchical Transformer architecture where spatial subnetworks perform masked reconstruction to capture local channel features, while a temporal network enforces consistency through autoregressive prediction. In addition, multi-view fusion and span masking improve robustness under dynamic deployment conditions. Extensive experiments demonstrate that AEMP yields stable and transferable representations, achieving superior performance and strong generalization on downstream indoor localization tasks. To the best of our knowledge, this is the first pre-training framework for wireless sensing that integrates temporal prediction to complement masked reconstruction.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 12760

Loading