Abstract: Recent self-supervised models produce visual features that are not only effective at encoding image-level, but also pixel-level, semantics. They have been reported to obtain impressive results for dense visual semantic correspondence estimation, even outperforming fully-supervised methods. Nevertheless, these models still fail in the pres-ence of challenging image characteristics such as symme-tries and repeated parts. To address these limitations, we propose a new semantic correspondence estimation method that supplements state-of-the-art self-supervised features with 3D understanding via a weak geometric spherical prior. Compared to more involved 3D pipelines, our model provides a simple and effective way of injecting informative geometric priors into the learned representation while requiring only weak viewpoint information. We also propose a new evaluation metric that better accounts for re-peated part and symmetry-induced mistakes. We show that our method succeeds in distinguishing between symmetric views and repeated parts across many object categories in the challenging SPair-71 k dataset and also in generalizing to previously unseen classes in the AwA dataset.
Loading