Abstract: Visual relocalization addresses a problem of estimating the camera pose where a given query image is taken. In this paper, we focus on a scene-independent approach, which utilizes a database to maintain information of the scene. This approach thus enables adaptation to a new scene by switching a database. A standard procedure in this approach is to first retrieve images similar to the query from a database, then match 2D keypoints of the query image to 3D points visible in the retrieved images, and finally solve the perspective-n-point problem to estimate the pose. Recently, convolutional neural networks (CNNs) have been used for the retrieval and matching tasks in this process, and demonstrated promising accuracy and robustness. These CNNs are separately trained for retrieval and matching tasks, which could result in suboptimal relocalization accuracy. In this paper, we propose the first CNN-based relocalization framework, which is both scene-independent and end-to-end trainable. This framework jointly optimizes retrieval and matching tasks to maximize the relocalization accuracy by backpropagating the relocalization errors to both tasks. We demonstrate the effectiveness of the end-to-end training, robustness against new scenes, and the state-of-the-art accuracy on indoor and outdoor datasets, with computation done in real-time.
0 Replies
Loading