From Sparse to Dense: Learning to Construct 3D Human Meshes from WiFi

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Wi-Fi, Human Mesh Regression, Transformer
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: In this work, we propose WiMTR, the first method for regressing multi-person mesh from WiFi signals, along with a corresponding CSI-Mesh dataset.
Abstract: Estimating the pose and shape of multiple individuals in a scene is a challenging problem. While significant progress has been made using sensors like RGB cameras and radars, recent research has shown the potential of WiFi signals for pose estimation tasks. WiFi signals offer advantages such as resilience to obstructions, lighting independence, and cost-effectiveness. This raises the question of whether the sparse Channel State Information (CSI) of WiFi signals, with its limited size, can be utilized to regress dense multi-person meshes. In this paper, we introduce WiMTR (WiFi-based Mesh Regression Transformer), a novel end-to-end model for multi-person mesh regression using WiFi signals. WiMTR comprises four key components: CSI feature extractor, CSI feature encoder, coarse decoder, and refine decoder. The CSI feature extractor captures channel-wise features, while the CSI feature encoder extracts global features through internal interactions. The coarse decoder regresses initial parameters using randomly initialized queries, and the refine decoder further enhances the pose and shape parameters through a differentiation-based query generation approach. To facilitate our research, we curate a dataset specifically for multi-person mesh regression from CSI signals. The dataset consists of 171,183 frames, encompassing diverse scenes and multi-person scenarios. WiMTR achieves competitive results, with a Mean Per Joint Position Error (MPJPE) of 71.4mm, Procrustes Aligned MPJPE (PAMPJPE) of 29.7mm and Procrustes Aligned Vertex Error (PVE) of 57.3mm. WiMTR represents the first WiFi-based multi-person mesh regression framework, and we plan to release the code and dataset to facilitate further research in this area.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3553
Loading