Towards Robust and Realistic Human Pose Estimation via WiFi Signals

Towards Robust and Realistic Human Pose Estimation via WiFi Signals

ICLR 2026 Conference Submission16291 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: WiFi-based HPE

TL;DR: This paper tackles the under-explored challenges in WiFi-based HPE by introducing DT-Pose, a novel framework tailored to sparse, continuous, and pose-prior-absent WiFi signals, offering a promising non-invasive HPE solution for the emerging AIoT era.

Abstract: Robust WiFi-based human pose estimation (HPE) is a challenging task that bridges discrete and subtle WiFi signals to human skeletons. We revisit this problem and reveal two critical yet overlooked issues: 1) cross-domain gap, i.e., due to significant discrepancies in pose distributions between source and target domains; and 2) structural fidelity gap, i.e., predicted skeletal poses manifest distorted topology, usually with misplaced joints and disproportionate bone lengths. This paper fills these gaps by reformulating the task into a novel two-phase framework dubbed $\textit{\textbf{DT-Pose}}$: $\underline{\textit{\textbf{D}}}$omain-consistent representation learning and $\underline{\textit{\textbf{T}}}$opology-constrained $\underline{\textit{\textbf{Pose}}}$ decoding. Concretely, we first propose a temporal consistency contrastive learning strategy with uniformity regularization, integrated into a self-supervised masked pretraining paradigm. This design facilitates robust learning of domain-consistent and motion-discriminative WiFi representations while mitigating potential mode collapse caused by signal sparsity. Beyond this, we introduce an effective hybrid decoding architecture that incorporates explicit skeletal topology constraints. By compensating for the inherent absence of spatial priors in WiFi semantic vectors, the decoder enables structured modeling of both adjacent and overarching joint relationships, producing more realistic pose predictions. Extensive experiments conducted on various benchmark datasets highlight the superior performance of our method in tackling these fundamental challenges in 2D/3D WiFi-based HPE tasks. The code is available in the supplementary materials.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 16291

Loading