BoQ: A Place is Worth a Bag of Learnable Queries

Published: 2024, Last Modified: 15 Nov 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries, designed to capture universal place-specific attributes. Unlike existing techniques that employ self-attention and generate the queries directly from the input, BoQ employ distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, this technique provides an inter-pretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, despite being a global re-trieval technique (one-stage), BoQ surpasses two-stage re-trieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https:/github.com/amaralibey/Bag-of-Queries.
Loading