Latent Action Diffusion for Cross-Embodiment Manipulation

Published: 19 Sept 2025, Last Modified: 19 Sept 2025CoRL 2025 Workshop Dexterous Manipulation SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Imitation Learning, Cross-Embodiment Learning, Manipulation
TL;DR: Learning cross-embodiment manipulation policies via a contrastively learned latent action space
Abstract: End-to-end learning approaches offer great potential for robotic manipulation, but their impact is constrained by data scarcity and heterogeneity across different embodiments. In particular, diverse action spaces across different end-effectors create barriers for cross-embodiment learning and skill transfer. We address this challenge through diffusion policies learned in a latent action space that unifies diverse end-effector actions. We first show that we can learn a semantically aligned latent action space for anthropomorphic robotic hands, a human hand, and a parallel jaw gripper via contrastive learning. Second, we show that by using our proposed latent action space for co-training on manipulation data from different end-effectors, we obtain capable policies that can control different robotic embodiments and obtain up to 28\% improved manipulation success rates through cross-embodiment skill transfer. Our approach using latent cross-embodiment policies presents a new method to unify different action spaces across embodiments, enabling efficient multi-robot control and data sharing across robot setups. This unified representation significantly reduces the need for extensive data collection for each new robot morphology, accelerates generalization across embodiments, and is an important step towards more scalable and efficient robotic learning.
Submission Number: 6
Loading