Abstract: The practicality of 3D object pose estimation remains
limited for many applications due to the need for prior
knowledge of a 3D model and a training period for new
objects. To address this limitation, we propose an approach
that takes a single image of a new object as input and pre-
dicts the relative pose of this object in new images without
prior knowledge of the object’s 3D model and without re-
quiring training time for new objects and categories. We
achieve this by training a model to directly predict discrim-
inative embeddings for viewpoints surrounding the object.
This prediction is done using a simple U-Net architecture
with attention and conditioned on the desired pose, which
yields extremely fast inference. We compare our approach
to state-of-the-art methods and show it outperforms them
both in terms of accuracy and robustness.
Loading