Benefit-cost frontier-aware semantic reasoning for zero-shot object navigation

Hanrui Chen, Liqi Yan, Qifan Wang, Jianhui Zhang, Fangli Guan, Pan Li

Published: 2026, Last Modified: 09 May 2026Appl. Intell. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Zero-shot object navigation involves locating target objects in unseen environments, a task fundamental to embodied intelligence. Traditional approaches, particularly recent vision-language navigation models, leverage Large Language Models (LLMs) to enable multimodal reasoning based on real-time visual perception. However, two key challenges remain: (1) Semantic mismatches emerge between the global map and the environment when frontier selection optimizes a singular optimization criterion; and (2) General-purpose LLMs are challenging to optimize for navigation tasks, leading to limited navigation-specific reasoning capabilities. To address these issues, we propose Benefit Frontier Semantic Map (BFSMap) to enable human-like semantic exploration through iterative reasoning. First, BFSMap reformulates semantic mapping as an image captioning task, integrating multiple maps to derive an optimal frontier that balances benefit and cost, thereby alleviating vision–language misalignment in prior end-to-end methods. Second, we introduce a lightweight semantic-aware benefit prediction module (LightSA), trained from scratch using a novel prompt learning strategy for cross-view semantic reasoning to update the semantic maps. Third, we design a modular object-aware decision-making policy that mimics human-like reasoning to identify the target object and correct suboptimal paths promptly. Our model achieves state-of-the-art performance (+2.8% SR on HM3D, +1.2% SR on MP3D compared to baseline) without relying on LLMs, demonstrating the promise of efficient navigation models based on human-like reasoning for unknown environment. Our code will be available.

External IDs:dblp:journals/apin/ChenYWZGL26