From Radar to Depth: Multi - Modality Co- Learning for High-Resolution Human Pose Estimation

Junqiao Fan, Haocong Rao, Xuehe Wang

Published: 2024, Last Modified: 25 Jan 2025SmartIoT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rapid development of the Internet of Things (IoT) drives many applications in smart cities, homes, wearables, and healthcare. To improve human awareness, Human-Centric Sensing (HCS) is a crucial part of the IoT system, where human pose estimation plays a key role. However, existing camera-based approaches face limitations in poor lighting, occlusion, and privacy issues, while radio frequency-based technologies like mm Wave radar often struggle with noisy and sparse data. To overcome these limitations, multi-modal learning that integrates data from multiple sensors can enhance both accuracy and robustness. However, challenges arise when one sensor fails or its data degrades. This work introduces a multi-modal co-learning framework that improves human pose estimation by converting sparse radar point clouds into high-resolution depth images, reducing noise and enriching data. Our contributions include a robust co-learning framework for human pose estimation, a method for generating depth images from radar data, and a new loss function that improves image quality by capturing key motion information.