Abstract: In this work, we study the ambiguity problem in the task of unsupervised 3D human pose estimation from 2D counterpart. On one hand, without explicit annotation, the scale of 3D pose is difficult to be accurately captured (scale ambiguity). On the other hand, one 2D pose might correspond to multiple 3D gestures, where the lifting pro- cedure is inherently ambiguous (pose ambiguity). Previ- ous methods generally use temporal constraints (e.g., con- stant bone length and motion smoothness) to alleviate the above issues. However, these methods commonly enforce the outputs to fulfill multiple training objectives simulta- neously, which often lead to sub-optimal results. In con- trast to the majority of previous works, we propose to split the whole problem into two sub-tasks, i.e., optimizing 2D input poses via a scale estimation module and then map- ping optimized 2D pose to 3D counterpart via a pose lifting module. Furthermore, two temporal constraints are pro- posed to alleviate the scale and pose ambiguity respectively. These two modules are optimized via a iterative training scheme with corresponding temporal constraints, which ef- fectively reduce the learning difficulty and lead to better performance. Results on the Human3.6M dataset demon- strate that our approach improves upon the prior art by 23.1% and also outperforms several weakly supervised ap- proaches that rely on 3D annotations. Our project is avail- able at https://sites.google.com/view/ambiguity-aware-hpe.
0 Replies
Loading