Keywords: Embodied Memory, Planning, Reinforcement Learning
TL;DR: We propose a novel learnable memory that, combined with planners, enables agents to plan tasks in large indoor spaces. We also introduce two methods to improve planning using human-in-the-loop data and a novel value-free RL training method.
Abstract: We develop a novel memory representation for embodied planning models performing long-horizon mobile manipulation in dynamic, large-scale indoor environments. Prior memory representations fall short in this setting, as they struggle with object movements, suffer from computational deficiencies, and often depend on the heuristic integration of multiple models. To overcome these limitations, we present the Embodied Perception Memory (EMP), a learnable memory designed for embodied planning. EMP is implemented as a unified Vision-Language Model (VLM) that uses egocentric vision to maintain and update a textual environment representation. We further introduce two complementary methods for training planners to leverage the EMP: an imitation strategy that uses human trajectories for natural exploration and interaction, and a novel reinforcement learning approach, Dynamic Difficulty-Aware Fine-Tuning (DDAFT), which improves planning performance via difficulty-aware exploration. Our memory representation, when integrated with our planning training methods, leads to significant improvements on planning tasks, showing up to a 55% increase in success rate on the PARTNR benchmark compared to strong baselines. Also, our planning method outperforms these baselines even when they have access to groundtruth perception.
Supplementary Material: pdf
Primary Area: applications to robotics, autonomy, planning
Submission Number: 2516
Loading