ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion

Zichao Hu; Chen Tang; Michael Joseph Munje; Yifeng Zhu; Alex Liu; Shuijing Liu; Garrett Warnell; Peter Stone; Joydeep Biswas

ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion

Zichao Hu, Chen Tang, Michael Joseph Munje, Yifeng Zhu, Alex Liu, Shuijing Liu, Garrett Warnell, Peter Stone, Joydeep Biswas

Published: 08 Aug 2025, Last Modified: 16 Sept 2025CoRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Motion Planning, Diffusion Model, Compositionality, Instruction-following Navigation in Dynamic Environments

TL;DR: We present ComposableNav, a composable, diffusion-based motion planner that composes motion primitives based on the instruction specifications to generate instruction-following motion trajectories.

Abstract: This paper considers the problem of enabling robots to navigate dynamic environments while following instructions. The challenge lies in the combinatorial nature of instruction specifications: each instruction can include multiple specifications, and the number of possible specification combinations grows exponentially as the robot’s skill set expands. For example, “overtake the pedestrian while staying on the right side of the road” consists of two specifications: *"overtake the pedestrian"* and *"walk on the right side of the road."* To tackle this challenge, we propose ComposableNav, based on the intuition that following an instruction involves independently satisfying its constituent specifications, each corresponding to a distinct motion primitive. Using diffusion models, ComposableNav learns each primitive separately, then composes them in parallel at deployment time to satisfy novel combinations of specifications unseen in training. Additionally, to avoid the onerous need for demonstrations of individual motion primitives, we propose a two-stage training procedure: (1) supervised pre-training to learn a base diffusion model for dynamic navigation, and (2) reinforcement learning fine-tuning that molds the base model into different motion primitives. Through simulation and real-world experiments, we show that ComposableNav enables robots to follow instructions by generating trajectories that satisfy diverse and unseen combinations of specifications, significantly outperforming both non-compositional VLM-based policies and costmap composing baselines.

Supplementary Material: zip

Spotlight: mp4

Submission Number: 627

Loading