NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments
Abstract: In unknown cluttered and dynamic environments
such as disaster scenes, mobile robots need to perform targetdriven
navigation in order to find people or objects of interest,
where the only information provided about these targets are
images of the individual targets. In this paper, we introduce
NavFormer, a novel end-to-end transformer architecture
developed for robot target-driven navigation in unknown and
dynamic environments. NavFormer leverages the strengths of
both 1) transformers for sequential data processing and 2) selfsupervised
learning (SSL) for visual representation to reason
about spatial layouts and to perform collision-avoidance in
dynamic settings. The architecture uniquely combines dual-visual
encoders consisting of a static encoder for extracting invariant
environment features for spatial reasoning, and a general encoder
for dynamic obstacle avoidance. The primary robot navigation
task is decomposed into two sub-tasks for training: single robot
exploration and multi-robot collision avoidance. We perform
cross-task training to enable the transfer of learned skills to the
complex primary navigation task. Simulated experiments
demonstrate that NavFormer can effectively navigate a mobile
robot in diverse unknown environments, outperforming existing
state-of-the-art methods. A comprehensive ablation study is
performed to evaluate the impact of the main design choices of
NavFormer. Furthermore, real-world experiments validate the
generalizability of NavFormer.
Loading