Can-Cap: Calibration-Free and Noise-Resilient Human Motion Capture via LiDAR-Camera Integration

14 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Motion Capture
Abstract: We propose $\textbf{Can-Cap}$, a $\underline{\textbf{Ca}}$libration-Free and $\underline{\textbf{N}}$oise-Resilient 3D human motion $\textbf{Cap}$ture framework that integrates multi-modal data from LiDAR and camera. While multi-modal sensors provide richer information than single-modal sensors, most existing approaches rely on pre-calibration for cross-sensor alignment, which propagates errors, especially when sensors have varying or dynamically changing perspectives. This reliance also requires fixed sensor placement with highly overlapping views, limiting flexibility and diminishing the benefits of diverse viewpoints for handling occlusions. Furthermore, prior methods often degrade under substantial noise or partial sensor failures, conditions common in real-world scenarios. To address these challenges, $\textbf{Can-Cap}$ introduces a Unified Across-Sensor Motion Estimator that reconstructs local pose and shape in a human-centric space without calibrations between sensors, supporting flexible number of sensors, and a Noise-Resistant Trajectory Tracker that maintains robustness under severe point cloud noise through iterative refinement. These calibration-free and noise-resilient features makes CAN-Cap more practical in real-world deployment. Notable, operating in real time at 25 FPS, $\textbf{Can-Cap}$ achieves state-of-the-art results on Human-M3 and FreeMotion, as well as strong cross-domain performance on LiDARHuman and RELI11D. This combination of flexibility and robustness opens new opportunities for motion capture in real-world scenarios, e.g. sports analytics, field robotics, and large-scale immersive environments.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5080
Loading