The Embodiment Gap in Robot Foundation Models

TMLR Paper9268 Authors

28 May 2026 (modified: 29 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Robot foundation models (RFMs), including vision-language-action (VLA) policies, are often read through a familiar scaling story: more data, larger models, and broader benchmarks. Robotics adds a practical follow-up: when a shared model reaches a new body, what work lets it act there? This survey asks what travels across robot bodies and what has to be realized on the target robot. We call the mismatch between reusable structure and target-specific execution the embodiment gap. The gap identifies which structures become reusable, where body-specific work remains, and what evidence should accompany cross-embodiment success claims. We organize this lens around three scaling directions–semantic meaning and perception, physical robot data and interfaces, and embodiment correspondence–and use it to define a reporting agenda for target-body residuals. The goal is to make cross-embodiment progress easier to compare, reproduce, and build on, while encouraging systems that leave new robots with less target-specific work, clearer failure attribution, and safer recovery.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Michele_Caprio1
Submission Number: 9268
Loading