Abstract: Estimating poses of objects that interact with hands is a key task for tangible user interface. It is highly challenging due to its inherence of self- and mutual occlusion. Previous approaches often predict 2D object keypoints from features to establish 2D-3D correspondence during object pose estimation. However, the features for the object and hand are usually intermixed and lead to unreliable output keypoints and inaccurate object pose estimation. To address this issue, we propose a novel Boundary-guided Network (BG-Net). This network takes two cooperative branches for the object and hand. It can effectively capture the object region and utilizes the region as guidance to narrow down the area for keypoint searching. Additionally, we introduce an efficient and effective loss function, min-max boundary distance (MMBD) loss, which restricts the range of estimated keypoint locations. This further benefits the 2D-3D mapping. Experiments demonstrate that the proposed model outperforms related state of the arts for object pose estimation in multiple interactive hand-object benchmarks.
Loading