GeoReasoning: Structured Semantic Reasoning for Image-to-Map Localization

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reasoning localization, multimodal large language models, LLM reasoning
Abstract: We introduce *reasoning localization*, a new paradigm for self-localization that leverages multimodal large language models (MLLMs) to interpret spatial context from 2D maps and first-person images. Unlike traditional approaches that depend on LiDAR, odometry, or engineered markers, reasoning localization emulates how humans orient by aligning visual cues with map structure. To address this new self-localization problem, we present **GeoReasoning**, a zero-shot framework that decomposes image-to-map grounding into *structured semantic reasoning* followed by *geometric verification*. Instead of directly predicting coordinates, GeoReasoning (i) identifies map-visible landmarks, (ii) grounds them as anchors via promptable segmentation, (iii) estimates coarse distances through language-based reasoning, and (iv) solves a robust triangulation program to recover the pose. This design separates high-level semantic reasoning from metric optimization, yielding interpretable rationales, verifiable intermediate outputs, and resilience against map symmetries. To support this task, we release the first benchmark for reasoning localization, spanning diverse indoor maps, image-map pairs, and candidate poses, along with diagnostic metrics such as rationale consistency, mean/median localization error, and success@$r$ for $r\in{0.1,0.5,1,3}$ m. Experiments with state-of-the-art MLLMs demonstrate that GeoReasoning significantly improves localization accuracy over direct prediction baselines, while exposing open challenges in symmetry disambiguation and monocular scale estimation. Our results highlight structured reasoning--geometry integration as a promising path toward scalable, human-like localization in GPS-denied settings.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 7488
Loading