Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D Object Pose Estimation in Color Images

Abstract: Most successful approaches to estimate the 6D pose of
an object typically train a neural network by supervising
the learning with annotated poses in real world images.
These annotations are generally expensive to obtain and a
common workaround is to generate and train on synthetic
scenes, with the drawback of limited generalisation when
the model is deployed in the real world. In this work, a
two-stage 6D object pose estimator framework that can be
applied on top of existing neural-network-based approaches
and that does not require pose annotations on real images is
proposed. The first self-supervised stage enforces the pose
consistency between rendered predictions and real input images, narrowing the gap between the two domains. The second stage fine-tunes the previously trained model by enforcing the photometric consistency between pairs of different
object views, where one image is warped and aligned to
match the view of the other and thus enabling their comparison. In the absence of both real image annotations and
depth information, applying the proposed framework on top
of two recent approaches results in state-of-the-art performance when compared to methods trained only on synthetic
data, domain adaptation baselines and a concurrent self-supervised approach on LINEMOD, LINEMOD OCCLUSION and HomebrewedDB datasets.
0 Replies
Loading