ObjLoc: Indoor Camera Relocalization based on Open-Vocabulary Object-Level Mapping

Jiarui Hu; Boming Zhao; Jingbo Liu; Xiyue Guo; Boyin Feng; Yichen Shen; Haocheng Peng; Yujun Shen; Guofeng Zhang; Hujun Bao; Zhaopeng Cui

ObjLoc: Indoor Camera Relocalization based on Open-Vocabulary Object-Level Mapping

Jiarui Hu, Boming Zhao, Jingbo Liu, Xiyue Guo, Boyin Feng, Yichen Shen, Haocheng Peng, Yujun Shen, Guofeng Zhang, Hujun Bao, Zhaopeng Cui

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: visual camera relocalization, object-level mapping, open vocabulary

Abstract: Indoor visual relocalization plays a key role in emerging spatial and embodied AI applications. However, prior research has predominantly focused on methods based on low-level vision. Despite notable progress, these methods inherently struggle to capture scene semantics and compositions, limiting their interpretability and interactivity. To address this limitation, we propose ObjLoc, a camera relocalization system designed to provide an intuition of scene object compositions and accurate pose estimation, which can be seamlessly reused in high-level tasks. Specifically, leveraging recent foundation models, we first introduce a multi-modal strategy to integrate open-vocabulary semantic knowledge for effective 2D-3D object matching. Additionally, we design an object-oriented reference frame and a corresponding retrieval strategy for pose priors, enabling extension to scalable scenes. To ensure robust and accurate pose optimization, we also propose a novel dual-path 2D Iterative Closest Pixel loss guided by object geometry. Experimental results demonstrate that ObjLoc achieves superior relocalization performance across various datasets. Our source code will be released upon acceptance.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 12453

Loading