Abstract: Vision-and-Language Navigation (VLN) has been an emerging and fast-developing research topic, where an embodied agent is required to navigate in a real-world environment based on natural language instructions. In this article, we present a Direction-guided Navigator Agent (DNA) that novelly integrates direction clues derived from instructions into the essential encoder-decoder navigation framework. Particularly, DNA couples the standard instruction encoder with an additional direction branch which sequentially encodes the direction clues in the instructions to boost navigation. Furthermore, an Instruction Flipping mechanism is uniquely devised to enable fast data augmentation as well as a follow-up backtracing for navigating the agent in a backward direction. Such a way naturally amplifies the grounding of instruction in the local visual scenes along both forward and backward directions, and thus strengthens the alignment between instruction and action sequence. Extensive experiments conducted on Room to Room (R2R) dataset validate our proposal and demonstrate quantitatively compelling results.
0 Replies
Loading