Abstract: Human motion prediction is crucial for human-centric multimedia understanding and interacting. Current methods typically rely on ground truth human poses as observed input, which is not practical for real-world scenarios where only raw visual sensor data is available. To implement these methods in practice, a pre-phrase of pose estimation is essential. However, such two-stage approach often leads to performance degradation due to the accumulation of errors.
Moreover, reducing raw visual data to sparse keypoint representations significantly diminishes the density of information, resulting in the loss of fine-grained features. In this paper, we propose LiDAR-HMP, the first single-LiDAR-based 3D human motion prediction approach, which receives the raw LiDAR point cloud as input and forecasts future 3D human poses directly. Building upon our novel structure-aware body feature descriptor, LiDAR-HMP adaptively maps the observed motion manifold to future poses and effectively models the spatial-temporal correlations of human motions for further refinement of prediction results. Extensive experiments show that our method achieves state-of-the-art performance on two public benchmarks and demonstrates remarkable robustness and efficacy in real-world deployments.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: The International Workshop on Human-Centric Multimedia Analysis has been a part of the ACM MM conference since 2020. Recognizing the growing attention of human-centric studies at ACM MM, our paper delves into the human motion prediction, a pivotal task for human-centric multimedia understanding and interacting. By accurately predicting future human movements through the analysis of past motion sequences, our work enables the creation of multimedia content that is more intuitive, responsive, and engaging. Integrating the depth and precision of LiDAR point clouds with innovative algorithms, our work presents a robust solution that excels in diverse lighting conditions and environments. This advancement broadens the applicability of multimedia systems in real-world scenarios. Numerous research papers on point cloud processing have been accepted by the ACM MM conference(27 papers in 2023, 15 in 2022, and 14 in 2021), underscoring the community's growing recognition of its value for the field of multimedia. We believe our research contributes to this vibrant discourse by pushing the boundaries of what's possible in understanding and interacting with human dynamics through multimedia technologies.
Supplementary Material: zip
Submission Number: 1644
Loading