Breaking the Frame: Image Retrieval by Visual Overlap Prediction

Published: 01 Jan 2024, Last Modified: 11 Apr 2025CoRR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Visual place recognition methods struggle with occlusions and partial visual overlaps. We propose a novel visual place recognition approach based on overlap prediction, called VOP, shifting from traditional reliance on global image similarities and local features to image overlap prediction. VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone and establishing patch-to-patch correspondences without requiring expensive feature detection and matching. Our approach uses a voting mechanism to assess overlap scores for potential database images. It provides a nuanced image retrieval metric in challenging scenarios. Experimental results show that VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world indoor and outdoor benchmarks. The code is available at https://github.com/weitong8591/vop.git.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview