When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation

Matin Aghaei; Lingfeng Zhang; Mohammad Ali Alomrani; Mahdi Biparva; Yingxue Zhang

When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation

Matin Aghaei, Lingfeng Zhang, Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Navigation; Embodied AI; Frontier Exploration; Large Language Models

TL;DR: Geometry-only, training-free frontier exploration matches or beats instruction-guided LLM pipelines on ObjectNav under detector-controlled evaluation—showing engineering drives gains at far lower cost and runtime.

Abstract: Recent ObjectNav systems credit large language models (LLMs) for sizable zero-shot gains, yet it remains unclear how much comes from language versus geometry. We revisit this question by re-evaluating an instruction-guided pipeline, InstructNav, under a detector-controlled setting and introducing two training-free variants that only alter the action value map: a geometry-only Frontier Proximity Explorer (FPE) and a lightweight Semantic-Heuristic Frontier (SHF) that polls the LLM with simple frontier votes. Across HM3D and MP3D, FPE matches or exceeds the detector-controlled instruction follower while using no API calls and running faster; SHF attains comparable accuracy with a smaller, localized language prior. These results suggest that carefully engineered frontier geometry accounts for much of the reported progress, and that language is most reliable as a light heuristic rather than an end-to-end planner.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 24146

Loading