Visual Navigation of Target-Driven Memory-Augmented Reinforcement Learning

Published: 01 Jan 2023, Last Modified: 12 Apr 2025ICONIP (7) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Compared to visual navigation methods based on reinforcement learning that rely on auxiliary information such as depth images, semantic segmentation, object detection, and relational graphs, methods that solely utilize RGB images do not require additional equipment and have better flexibility. However, these methods often suffer from underutilization of RGB image information, resulting in poor generalization performance of the model. To address this limitation, we present the Target-Driven Memory-Augmented (TDMA) framework. This framework utilizes an external memory to store fused Target-Scene features obtained from the observed and target images. To capture and leverage long-term dependencies within this stored data, we employ the Transformer model to process historical information. Additionally, we introduce a self-attention sub-layer in the Decoder section of the Transformer to enhance the model’s focus on similar regions between the observed and target images. Experimental evaluations conducted on the AI2-THOR dataset demonstrate that our proposed method achieves an 8% improvement in success rate and a 16% improvement in success weighted by path length compared to methods in the same experimental setup.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview