Diffusion-SSM based Policy Learning for Multi-Granularity Action Prediction

Nayoung Oh; Jaehyeong Jang; Moonkyeong Jung; Daehyung Park

Diffusion-SSM based Policy Learning for Multi-Granularity Action Prediction

Nayoung Oh, Jaehyeong Jang, Moonkyeong Jung, Daehyung Park

Published: 17 Sept 2025, Last Modified: 21 Sept 2025H2R CoRL 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multi-granularity learning, imitation learning, state-space model

TL;DR: DiSPo is a diffusion-SSM based policy that learns from diverse coarse skills and produces varying control scales of actions.

Abstract: We aim to solve the problem of multi-granularity action learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations to produce fixed-resolution actions. For memory-efficient learning and convenient granularity modulation, we propose a novel diffusion-SSM based policy (DiSPo) that learns from diverse coarse demonstrations and varying action scales using a state-space model, Mamba. Our evaluations show that the adoption of Mamba and the proposed step-scaling method enable DiSPo to outperform in three coarse-to-fine benchmark tests with a maximum $81\\%$ higher success rate than baselines. In addition, DiSPo improves inference efficiency by generating coarse motions in less critical regions. We finally demonstrate the scalability through two real-world manipulation tasks.

Submission Number: 10

Loading