Keywords: Imitation Learning, Diffusion, Robot Manipulation, Real-World Deployment
Abstract: Diffusion Policies have become a popular framework for robot visuomotor learning due to their superiority in capturing multi-modal distributions. However, the typical UNet and Transformer backbones for the policy network are computationally expensive, making them prohibitive for deployment on real robot systems that require real- time decision-making. In this paper, we address the challenge of efficient policy generation through a new architecture design. We explore the usage of structured state space models (SSMs), specifically the Mamba architecture, in helping diffusion policy inference speed. We further accelerate the diffusion process by starting the sampling from a Gaussian distribution around the previous action chunk. We validate our model on Adroit, MetaWorld, and Dexart environments, and show that it has 95% fewer parameters than the diffusion model with Unet backbone and consumes 75% less computation, yet achieves similar performance on long-horizon tasks.
Submission Number: 22
Loading