Embedding Morphology into Transformers for Cross-Robot Policy Learning

Published: 02 Mar 2026, Last Modified: 06 Mar 2026ES-Reasoning @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: vision-language-action, robot learning, multi-embodiment
TL;DR: An embodiment-aware transformer policy improves performance across diverse robot embodiments.
Abstract: Transformer-based VLA policies have advanced rapidly as training data scales, yet cross-robot policy learning—training a single policy across multiple embodiments—remains challenging. Such policies are often embodiment-agnostic and must infer kinematics from observations, which can hurt robustness. We propose an embodiment-aware transformer that injects morphology via: (1) kinematic tokens with per-joint temporal chunking; (2) topology-aware attention bias to encourage message passing along kinematic edges; and (3) joint-attribute conditioning using per-joint descriptors. Across multiple embodiments, our method consistently outperforms the vanilla $\pi_{0.5}$ baseline.
Submission Number: 3
Loading