Toward Efficient Self-Motion-Based Memory Representation for Visuomotor Navigation of Embodied Robot
Abstract: Learning-based navigation is a prominent topic in the fields of robotics and Embodied AI. To enhance long-term optimized navigation decisions for robots, an important approach is to incorporate memory mechanisms into policy networks, enabling the construction of scene understanding from historical observations. However, existing memory structures typically store and process raw perceptual features directly, leading to significant inefficiencies in computation and storage due to the inclusion of numerous irrelevant areas, redundant details, and noise. This hampers the robot's ability to learn effective scene cues. This paper introduces a novel memory structure that avoids directly storing perceptual features, opting instead to store lightweight historical self-motion data of the robot. Our navigation system consists of three key stages: estimating inter-frame poses using visual odometry, aggregating these into a global memory feature (since individual frame poses lack explicit cognitive meaning), and utilizing the aggregated memory for navigation decision-making. Our proposed pipeline is end-to-end, ensuring efficient utilization of memory information and close integration with the policy network. Quantitative experiments in photorealistic simulation environments demonstrate that our proposed memory structure can achieve over a 30% increase in environment coverage compared to competitive baselines, with path redundancy reduced to as low as 60% of the baseline. Additionally, computational and storage efficiency are significantly enhanced. Ablation and interpretability experiments further confirm that our system improves the robot's understanding of spatial exploration states, thereby enhancing navigation performance and efficiency. We also validate the system in real-world scenarios.
External IDs:dblp:journals/tetci/LiuXLW25
Loading