Abstract: Fine-tuning directly on user-side data is an effective way to improve the performance of widely adopted data-driven HMR models. However, doing so for existing HMR methods often assumes aggregating user-side data from real-world HMR-deployed devices to a central training server, posing a significant privacy risk as sensitive human images are transmitted. How can human mesh recovery be evaluated and improved in privacy-constrained real-world settings? This paper serves as a benchmark study of this problem. We conduct a comprehensive benchmark in which state-of-the-art HMR models are trained under federated and secure-aggregation variants that avoid raw-image centralization under explicit threat model assumptions and benchmark their performance under a wide array of realistic clientscale and data-heterogeneity settings. We document that common HMR training pipelines are built around centralized data access and quantify how representative HMR backbones behave when that assumption is removed. Furthermore, to study the data bottleneck of privacy-constrained HMR training, we propose a local annotation and fine-tuning pipeline enhanced with depth foundation models, with which collaboratively trained HMR models can be locally tailored to the end user’s distribution. We demonstrate its effectiveness with results on in-the-wild data while clarifying that the reported personalization gains are measured against DePoser-generated pseudo-ground truth. This benchmark aims to support future work on privacy-constrained HMR models and their real-world deployment and evaluation.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: N/A. This is the first submission of this manuscript to TMLR.
Assigned Action Editor: ~Farzan_Farnia1
Submission Number: 9117
Loading