Keywords: Manipulation, Imitation Learning, Transformers, Diffusion Models
TL;DR: We present ChainedDiffuser, a policy architecture that unifies transformer-based end-effector action prediction and diffusion-based trajectory generation for learning robotic manipulation policies from demonstrations.
Abstract: We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and trajectory diffusion generation for learning robot manipulation from demonstrations. Our main innovation is to use a global transformer-based action predictor to predict actions at keyframes, a task that requires multi- modal semantic scene understanding, and to use a local trajectory diffuser to predict trajectory segments that connect predicted macro-actions. ChainedDiffuser sets a new record on established manipulation benchmarks, and outperforms both state-of-the-art keypose (macro-action) prediction models that use motion plan- ners for trajectory prediction, and trajectory diffusion policies that do not predict keyframe macro-actions. We conduct experiments in both simulated and real-world environments and demonstrate ChainedDiffuser’s ability to solve a wide range of manipulation tasks involving interactions with diverse objects.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)
Publication Agreement: pdf
Poster Spotlight Video: mp4