ObjLoc: Indoor Camera Relocalization based on Open-Vocabulary Object-Level Mapping

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: visual camera relocalization, object-level mapping, open vocabulary
Abstract: Indoor visual relocalization plays a key role in emerging spatial and embodied AI applications. However, prior research has predominantly focused on methods based on low-level vision. Despite notable progress, these methods inherently struggle to capture scene semantics and compositions, limiting their interpretability and interactivity. To address this limitation, we propose ObjLoc, a camera relocalization system designed to provide an intuition of scene object compositions and accurate pose estimation, which can be seamlessly reused in high-level tasks. Specifically, leveraging recent foundation models, we first introduce a multi-modal strategy to integrate open-vocabulary semantic knowledge for effective 2D-3D object matching. Additionally, we design an object-oriented reference frame and a corresponding retrieval strategy for pose priors, enabling extension to scalable scenes. To ensure robust and accurate pose optimization, we also propose a novel dual-path 2D Iterative Closest Pixel loss guided by object geometry. Experimental results demonstrate that ObjLoc achieves superior relocalization performance across various datasets. Our source code will be released upon acceptance.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 12453
Loading