Keywords: Embodied AI, visual navigation
Abstract: Object goal navigation requires an agent to navigate to a specified object in an unseen environment based on visual observations and user-specified goals.
Human decision-making in navigation is sequential, planning a most likely sequence of actions toward the goal.
However, existing ObjectNav methods, both end-to-end learning methods and modular methods, rely on single-step planning. They output the next action based on the current model input, which easily overlooks temporal consistency and leads to myopic planning.
To this end, we aim to learn sequence planning for ObjectNav. Specifically, we propose trajectory diffusion to learn the distribution of trajectory sequences conditioned on the current observation and the goal.
We utilize DDPM and automatically collected optimal trajectory segments to train the trajectory diffusion.
Once the trajectory diffusion model is trained, it can generate a temporally coherent sequence of future trajectory for agent based on its current observations.
Experimental results on the Gibson and MP3D datasets demonstrate that the generated trajectories effectively guide the agent, resulting in more accurate and efficient navigation.
Supplementary Material: zip
Primary Area: Robotics
Submission Number: 3773
Loading