Keywords: Offline Reinforcement Learning, Cross-Embodiment Learning, Locomotion
TL;DR: Analyzing Heterogeneous robot datasets with Suboptimal data, we find improved multitask and adaptation performance, but also gradient-conflict negative transfer on a 16-robot locomotion benchmark.
Abstract: Robot foundation models promise versatile control across diverse embodiments. Training a single policy on heterogeneous robot data can accelerate adaptation, reduce per-platform engineering, and improve sample efficiency. However, realizing this promise is constrained by the high cost of collecting expert demonstrations at scale. We investigate a path forward by combining offline reinforcement learning (offline RL) with cross-embodiment learning to leverage datasets that mix expert and suboptimal trajectories across many morphologies, and we introduce a new locomotion benchmark that spans 16 simulated robots and multiple data-quality tiers. Our study confirms the expected benefits, namely that offline RL can make use of suboptimal data and cross-embodiment pre-training can speed adaptation to unseen robots. The central result is a failure mode. As both morphology diversity and the fraction of suboptimal trajectories grow, performance degrades for specific embodiments, particularly when similar morphologies are underrepresented in the pool. Gradient-level diagnostics trace this negative transfer to inter-robot gradient conflicts, which indicates that naïve joint training can suppress useful updates. These findings position offline RL combined with cross-embodiment learning as a promising route toward scalable robot foundation models while highlighting the need for conflict-aware optimization and embodiment-aware data curation.
Lightning Talk Video: mp4
Submission Number: 28
Loading