Semi-Supervised One Shot Imitation Learning

Philipp Wu; Kourosh Hakhamaneshi; Yuqing Du; Igor Mordatch; Aravind Rajeswaran; Pieter Abbeel

Semi-Supervised One Shot Imitation Learning

Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel

Published: 15 May 2024, Last Modified: 14 Nov 2024RLC 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Imitation Learning, Semi-Supervised Learning, Pre-Training

TL;DR: Learning representations from unlabelled (unpaired) trajectories helps improve data efficiency of one-shot imitation learning.

Abstract: One-shot Imitation Learning (OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL requires a prohibitively large number of paired expert demonstrations: trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of tasks with only one demonstration each (unpaired dataset), along with a small dataset of tasks with multiple demonstrations (paired dataset). This presents a more realistic and practical embodiment of few-shot learning and requires the agent to effectively leverage weak supervision. Subsequently, we develop an algorithm applicable to this semi-supervised OSIL setting. Our approach first learns an embedding space where different tasks cluster uniquely. We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset. Through empirical results, we demonstrate that OSIL models trained on such self-generated pairings (labels) are competitive with OSIL models trained with ground-truth labels, presenting a major advancement in the label-efficiency of OSIL.

Submission Number: 328

Loading