MUST-Loc: Multi-view Uncertainty-aware Semantic Token Association for Object-level Global Localization

Gihyeon Lee; Young-Sik Shin; Younggun Cho

MUST-Loc: Multi-view Uncertainty-aware Semantic Token Association for Object-level Global Localization

Gihyeon Lee, Young-Sik Shin, Younggun Cho

Published: 06 Oct 2025, Last Modified: 06 Oct 2025OWN 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generalizable perception/semantic understanding, Uncertainty estimation for learned/foundation models

TL;DR: MUST-Loc: multi-view uncertainty-aware semantic token association enables robust object-level global localization via mean–variance token descriptors and Wasserstein alignment.

Abstract: Object-level global localization is highly sensitive to semantic uncertainty from viewpoint variations in open-set scenarios. To address this problem, we present MUST-Loc, a multi-view, uncertainty-aware semantic token matching framework. The key idea is to aggregate object-level tokens through online updates in the mapping process to form mean–variance descriptors, capturing viewpoint-induced variability while maintaining semantic consistency. At the localization query, we compute uncertainty-aware semantic similarity, which down-weights high-variance token dimensions to establish reliable correspondences under semantic ambiguity. Finally, the camera pose is estimated by selecting the solution that maximizes the Wasserstein-based alignment score between observed detections and projected landmark hypotheses. For rigorous validation, we evaluate on challenging TUM RGB-D sequences with occlusions, label noise, and diverse categories, showing consistent improvements over baselines in association and pose accuracy. Project page: https://leekh951.github.io/MUST-Loc.

Submission Number: 8

Loading