Abstract: Visual Place Recognition (VPR) in indoor environments is beneficial to humans and robots for better
localization and navigation. It is challenging due to appearance
changes at various frequencies, and difficulties of obtaining
ground truth metric trajectories for training and evaluation.
This paper introduces the NYC-Indoor-VPR dataset, a unique
and rich collection of over 36, 000 images compiled from 13
distinct crowded scenes in New York City taken under varying
lighting conditions with appearance changes. Each scene has
multiple revisits across a year. To establish the ground truth
for VPR, we propose a semiautomatic annotation approach
that computes the positional information of each image. Our
method specifically takes pairs of videos as input and yields
matched pairs of images along with their estimated relative
locations. The accuracy of this matching is refined by human
annotators, who utilize our annotation software to correlate the
selected keyframes. Finally, we present a benchmark evaluation
of several state-of-the-art VPR algorithms using our annotated
dataset, revealing its challenge and thus value for VPR research.
Loading