One-Shot Imitation under Mismatched Execution

Kushal Kedia; Prithwish Dan; Angela Chao; Maximus Adrian Pace; Sanjiban Choudhury

One-Shot Imitation under Mismatched Execution

Kushal Kedia, Prithwish Dan, Angela Chao, Maximus Adrian Pace, Sanjiban Choudhury

Published: 10 Nov 2024, Last Modified: 10 Nov 2024CoRL-X-Embodiment-WS 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation Learning, Manipulation, Representation Learning

TL;DR: Learn one-shot imitation policies by aligning task-equivalent human-robot video snippets using optimal transport.

Abstract: Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods either depend on human-robot paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically aligns human and robot task executions using optimal transport costs. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our datasets and graphics at https://portal.cs.cornell.edu/rhyme/.

Previous Publication: No

Submission Number: 35

Loading