Prioritized Semantic Learning for Zero-Shot Instance Navigation

Xinyu Sun, Lizhao Liu, Hongyan Zhi, Ronghe Qiu, Junwei Liang

Published: 2024, Last Modified: 02 Mar 2026ECCV (12) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We study zero-shot instance navigation, in which the agent navigates to a specific object without using object annotations for training. Previous object navigation approaches apply the image-goal navigation (\(\texttt {ImageNav}\)) task (go to the location of an image) for pretraining, and transfer the agent to achieve object goals using a vision-language model. However, these approaches lead to issues of semantic neglect, where the model fails to learn meaningful semantic alignments. In this paper, we propose a Prioritized Semantic Learning (PSL) method to improve the semantic understanding ability of navigation agents. Specifically, a semantic-enhanced PSL agent is proposed and a prioritized semantic training strategy is introduced to select goal images that exhibit clear semantic supervision and relax the reward function from strict exact view matching. At inference time, a semantic expansion inference scheme is designed to preserve the same granularity level of the goal-semantic as training. Furthermore, for the popular HM3D environment, we present an Instance Navigation (\(\texttt {InstanceNav}\)) task that requires going to a specific object instance with detailed descriptions, as opposed to the Object Navigation (\(\texttt {ObjectNav}\)) task where the goal is defined merely by the object category. Our PSL agent outperforms the previous state-of-the-art by 66% on zero-shot \(\texttt {ObjectNav}\) in terms of success rate and is also superior on the new \(\texttt {InstanceNav}\) task. Code will be released at https://github.com/XinyuSun/PSL-InstanceNav.