Keywords: multi-granularity learning, imitation learning, state-space model
TL;DR: DiSPo is a diffusion-SSM based policy that learns from diverse coarse skills and produces varying control scales of actions.
Abstract: We aim to solve the problem of multi-granularity action learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations to produce fixed-resolution actions. For memory-efficient learning and convenient granularity modulation, we propose a novel diffusion-SSM based policy (DiSPo) that learns from diverse coarse demonstrations and varying action scales using a state-space model, Mamba. Our evaluations show that the adoption of Mamba and the proposed step-scaling method enable DiSPo to outperform in three coarse-to-fine benchmark tests with a maximum $81\\%$ higher success rate than baselines.
In addition, DiSPo improves inference efficiency by generating coarse motions in less critical regions. We finally demonstrate the scalability through two real-world manipulation tasks.
Submission Number: 10
Loading