NavTr: Object-Goal Navigation With Learnable Transformer Queries

Qiuyu Mao, Jikai Wang, Meng Xu, Zonghai Chen

Published: 2024, Last Modified: 11 Apr 2025IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This letter introduces Navigation Transformer (NavTr), a novel framework for object-goal navigation using Transformer queries to enhance the learning and representation of environment states. By integrating semantic information, object positions, and neighborhood information, NavTr creates a unified, comprehensive, and extensible state representation for the object-goal navigating task. In the framework, the Transformer queries implicitly learn inter-object relationships, which facilitates high-level understanding of the environment. Additionally, NavTr implements target-oriented supervisory signals, such as rotation rewards and spatial loss, which improve exploration efficiency in the reinforcement learning framework. NavTr outperforms popular graph-based and Attention-based methods by a large margin in terms of success rate (SR) and success weighted by path length (SPL). Extensive experiments on the AI2-THOR dataset demonstrate the effectiveness of our approach.