Abstract: We address the problem of 3D human motion estimation from original MoCap optical markers. The original markers are noisy, disordered, and unlabeled, hence recovering 3D human motion from them is non-trivial. Existing works are either time-consuming or assuming the knowledge of the marker labels. We address these problems by presenting an end-to-end method for 3D human motion estimation by leveraging the capability of Transformer to model long-range dependencies. The method takes original markers as inputs and learns joint poses with a Transformer-like architecture. Experimental results show that our method is able to achieve better than centimeter-level errors.
0 Replies
Loading