MUST-Loc: Multi-view Uncertainty-aware Semantic Token Association for Object-level Global Localization
Keywords: Generalizable perception/semantic understanding, Uncertainty estimation for learned/foundation models
TL;DR: MUST-Loc: multi-view uncertainty-aware semantic token association enables robust object-level global localization via mean–variance token descriptors and Wasserstein alignment.
Abstract: Object-level global localization is highly sensitive to semantic uncertainty from viewpoint variations in open-set scenarios.
To address this problem, we present MUST-Loc, a multi-view, uncertainty-aware semantic token matching framework.
The key idea is to aggregate object-level tokens through online updates in the mapping process to form mean–variance descriptors, capturing viewpoint-induced variability while maintaining semantic consistency.
At the localization query, we compute uncertainty-aware semantic similarity, which down-weights high-variance token dimensions to establish reliable correspondences under semantic ambiguity.
Finally, the camera pose is estimated by selecting the solution that maximizes the Wasserstein-based alignment score between observed detections and projected landmark hypotheses.
For rigorous validation, we evaluate on challenging TUM RGB-D sequences with occlusions, label noise, and diverse categories, showing consistent improvements over baselines in association and pose accuracy.
Project page: https://leekh951.github.io/MUST-Loc.
Submission Number: 8
Loading