Abstract: We introduce UMI-on-Air, a framework for
embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained
human demonstrations collected with a handheld gripper
(UMI) to train generalizable visuomotor policies. A central
challenge in transferring these policies to constrained robotic
embodiments—such as aerial manipulators—is the mismatch
in control and robot dynamics, which often leads to out-ofdistribution behaviors and poor execution. To address this, we
propose Embodiment-Aware Diffusion Policy (EADP), which
couples a high-level UMI policy with a low-level embodimentspecific controller at inference time. By integrating gradient
feedback from the controller’s tracking cost into the diffusion
sampling process, our method steers trajectory generation
towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our
approach on multiple long-horizon and high-precision aerial
manipulation tasks, showing improved success rates, efficiency,
and robustness under disturbances compared to unguided
diffusion baselines. Finally, we demonstrate deployment in
previously unseen environments, using UMI demonstrations
collected in the wild, highlighting a practical pathway for scaling
generalizable manipulation skills across diverse—and even highly
constrained—embodiments. All code, data, checkpoints, and
result videos can be found at umi-on-air.github.io.
Loading